Python3.5 dictview support #6787

Merged
merged 1 commit into from Aug 10, 2016

Conversation

Projects
None yet
5 participants
Member

story645 commented Jul 18, 2016

Currently the following doesn't work

test = {'ab':2, 'bc':4, 'cd':4, 'dw':2, 'ef':1}
plt.plot(test.keys(), test.values())

because keys & values are MappingView objects rather than lists and so they don't convert to arrays cleanly. So I added acbook.asSequence decorator that's really dumb and traverses over the argument list and converts any MappingView objects to lists. I chose this route (in discussion with/based on suggestions by @mdboom and @tacaswell) 'cause the .asSequence decorator can then be expanded to support other types. It's a decorator 'cause this is an issue in like I don't know how many functions-> I currently applied it just to plot, scatter, and bar-will do whole set based on discussion of PR.

So this patch now says this:

plt.plot(test.keys(), test.values())

yields:
index

Side note: this PR led to a patch on Python main better documenting dictview type introspection.

Owner

efiring commented Jul 18, 2016

What is the rationale for supporting this? It seems a bit odd because the order in which the categories will be given is unpredictable. Why not leave this step (unpacking the dictionary) to the user?

Member

story645 commented Jul 18, 2016

dict.keys() and dict.views() were supported under Py2.7 and they were technically equally unpredictable then...which really, that's why, to maintain behavior supported under Python2.7...

and because a lot of newbie users will get tripped up with views and converting views to lists and so from a tutorial/teaching matplotlib point of view it makes more sense to obfuscate that step. The better example though ('cause plot is silly) is bar where the user probably doesn't care about order on a first pass:
index

And 'cause OrderedDict is subject to all the same issues, and there the user has presorted their data.

Owner

efiring commented Jul 18, 2016

OK, thank you for the explanation. I'm not convinced this is a good idea. I'm concerned about bloat in code and documentation. As for the newbie aspect, sooner or later newbies will need to learn that the formerly straightforward dict methods now return special objects. I don't see how shielding them briefly via adding this conversion ability to mpl really helps anyone. In python 2.7, we don't support dict.keys(), we support the list returned by dict.keys(). We are under no obligation to support what dict.keys() now returns.

Member

story645 commented Jul 18, 2016 edited

Except matplotlib support all sorts of types out of the box: numpy arrays, pandas series objects and dateframes (some of these are on the pandas side and some are on the mpl side), and generators. #6712 is one such example. The only requirement as far as matplotlib is concerned is that np.array will convert it properly. There's no real purity of philosophy there. (And really, this patch should probably be upstream in numpy...but that's a different kettle of fish)

The argument for this is that dicts are one of the most commonly used datatypes around, 'specially for categorical data, and this code is specifically part of adding support for categorical data. It gets incredibly tedious to type:

plt.bar(list(test.keys()), list(test.values()))

and there's no value added to the user having to jump through the extra hoop. And honestly, from a teaching perspective it's a 10 minute tangent and a potential source of failure/thing that in my opinion will needlessly trip up users.

Contributor

anntzer commented Jul 18, 2016

As you mention yourself, the only requirement on matplotlib's side is that numpy can convert it to an array. Thus, this patch belongs to numpy, not here.

Member

story645 commented Jul 18, 2016 edited

That was sort of the discussion with @tacaswell and @mdboom about whether the "must convert to array" issue can be sort of factored out, (which this decorator kinda does since it can be expanded to support other types-and therefore possibly be used to factor out other cruft in the code used to ensure/convert types) since that wasn't so much a design decision as an accident of implementation choice.

The issue with pushing it upstream to numpy is that they seem to be having the same discussions going on here (numpy/numpy#5718)

Contributor

anntzer commented Jul 18, 2016

whether the "must convert to array" issue can be sort of factored out

It's already factored out in np.asarray.

since that wasn't so much a design decision as an accident of implementation choice.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array? I guess you could say that using numpy is an accident of implementation choice but that seems a bit extreme to me.

The issue with pushing it upstream to numpy is that they seem to be having the same discussions going on here (numpy/numpy#5718)

I don't see how this is an "issue"? If you truly believe that dictviews should be array-likes then you should just push for numpy to fix this. In fact I think things would just be even more confusing for newbs intermediate level users if matplotlib starts accepting a subtly different set of types than np.asarray.

A view I could subscribe to would be to say "we support all iterable" and just first call list() on everything (although again I think this really belongs to numpy, but at least the rule is clear, not "we support whatever numpy supports plus whatever types we decided were important to add").

Member

story645 commented Jul 18, 2016 edited

we support whatever numpy supports plus whatever types we decided were important to add

But we already do that, and arguably one of the main points of adding categorical support was to add support for datatypes numpy doesn't really handle 'cause numpy kinda doesn't do categorical data...like dicts aren't esoteric, but one of the most commonly used datatypes around.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array?

Except there are conversion routines and other pieces of code (like the labeled data decorator) that scaffold this.

A view I could subscribe to would be to say "we support all iterable"

But that's the issue at hand->dict keys/views are iterable. To intelligently cast everything to list, there'd still have to be object introspection on the args, at which point a decorator that looks at the args still makes sense.

It's already factored out in np.asarray.

except for the types that np.assarray don't support, hence this patch...this is a circular argument.

intermediate level users if matplotlib starts accepting a subtly different set of types than np.asarray

Why? Plenty of them don't care. This year I've been working a ton with people who are using matplotlib who aren't even using numpy (and are kinda scared to)-their data is all in dicts and lists (and as far as they know, dictkeys and dictviews are basically lists 'cause all they're doing is iterating over them...)

Contributor

anntzer commented Jul 18, 2016

A view I could subscribe to would be to say "we support all iterable"

But that's the issue at hand->dict keys/views are iterable. To intelligently cast everything to list, there'd still have to be object introspection on the args, at which point a decorator that looks at the args still makes sense.

No, we could just replace all call to np.asarray(...) by calls to np.asarray(list(...)) and let TypeErrors propagate (and yes, at that point this may as well be factored out, so that we can at least avoid the conversion back and forth when someone is actually passing an array in).

Anyways, it's clear that this discussion is going nowhere, so let's see what others have to say.

Owner

tacaswell commented Jul 18, 2016

Everyone please take a deep breath, we are all on the same side here 😄

I think this is a good idea to support dict views to simplify the spelling of a common use case. There is almost no ambiguity to the 'right' thing to do here is so we should just do it.

Almost everyone agrees that this should be handled upstream by numpy, however that issue is from 2015, they are at least a backlogged as we are, and even if a PR got merged today, we still support older versions of numpy so we need to handle this case ourselves anyway.

I suggest implementing this in the data decorator. We only want to apply this to things that are 'data' like inputs which are already marked up in that decorator. The (views -> array) transformation is in a similar category to the (string -> array) transformation as they both provide a nicer way to spell something that we could make the user do.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array? I guess you could say that using numpy is an accident of implementation choice but that seems a bit extreme to me.

This is not quite correct, we do a fair bit of work to avoid converting arrays (mostly to allow units to pass through and) unless we really have to.

There is also a general concept (which has been floating around for years) to abstract out our data sources. Doing things like this is a good way to explore the API and what the consequences are without committing to major code changes.

Member

story645 commented Jul 19, 2016

We only want to apply this to things that are 'data' like inputs which are already marked up in that decorator.

Kinda, sorta, it's got a really deep nesting if there's a data kwarg (... I kinda really wanna refactor this...) but for the most part just iterates over the args in all sorts of combinations and passes through to the _replacer method.

tacaswell added this to the 2.1 (next point release) milestone Jul 20, 2016

@tacaswell tacaswell and 2 others commented on an outdated diff Jul 22, 2016

lib/matplotlib/cbook.py
@@ -2445,6 +2445,14 @@ def safe_first_element(obj):
return next(iter(obj))
+def asSequence(data):
@tacaswell

tacaswell Jul 22, 2016

Owner

Can you call this something like view_as_sequence? named as it is it is a bit ambiguous as to what it does (and more importantly does not do).

@story645

story645 Jul 22, 2016

Member

Hmm...while I agree, is the plan to limit it strictly to views? and then it might be unclear why non-views are being passed in.

@efiring

efiring Jul 22, 2016

Owner

Naming is hard. A problem here is that where this is used, there is no clue that the "view" in question is the obscure "MappingView", not a "view" in the numpy sense, which is much more common. "convert_dict_view"? "convert_MappingView"?

@tacaswell

tacaswell Jul 22, 2016

Owner

How about ensure_sequence ?

On Thu, Jul 21, 2016 at 9:45 PM Eric Firing notifications@github.com
wrote:

In lib/matplotlib/cbook.py
#6787 (comment):

@@ -2445,6 +2445,14 @@ def safe_first_element(obj):
return next(iter(obj))

+def asSequence(data):

Naming is hard. A problem here is that where this is used, there is no
clue that the "view" in question is the obscure "MappingView", not a "view"
in the numpy sense, which is much more common. "convert_dict_view"?
"convert_MappingView"?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/matplotlib/matplotlib/pull/6787/files/efc53d5d021a853757495099716d6c34166179b0#r71816398,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMMhVT7zHoSEmkgfeLd_V8rlePZDhzgks5qYCCcgaJpZM4JOX_W
.

@efiring

efiring Jul 22, 2016

Owner

I was thinking about exactly that, but didn't suggest it because at least at present that's not at all what the function is doing. In addition, I suspect this needs to happily pass scalars.

@story645

story645 Jul 22, 2016 edited

Member

I like that name....hm..
what about some expansion of the form

if isnstance(data, [collections.Sequence, ndarray,...]):
     return data
if isinstance(data, numeric types): #since strings are sequences
    return data 
if isinstance(data, collections.abc.MappingView )
    return list(data)
MPLRuntimeWarning( "{} may be unsupported".format(type(data))
return data

And yes, will go figure out the proper way to raise an mpl version and I figure test failures will tell me what types of native types are expected to be supported. And how to support pandas without having to import it...

@efiring

efiring Jul 22, 2016

Owner

This looks tricky to get right. It seems contrary to the spirit of duck-typing we have historically favored. Objects can be array-like, and work fine in mpl, without being subclasses of anything we know about. I don't think we want to start issuing warnings in such cases. Nor can we do all of our usual homogenization at this very early stage; we have to preserve those pesky units, and decisions about missing values need to be handled by each function on an argument by argument basis.

@story645

story645 Jul 22, 2016 edited

Member

Hmm...even with duck typing, there's some basic something they need to have though that generators and views don't...granted, I'm also fine with the less work don't change anything and just find a good name approach...

Owner

tacaswell commented Jul 22, 2016

Other than a 🚲 🏠 y naming question, 👍

@story645 story645 renames: asSequence->sanitize_sequence, unpack_labeled_data->_preproc…
…ess_data
1a9640f
Member

story645 commented Jul 27, 2016 edited

per @tacaswell's suggestion, renamed asSequence to sanitize_sequence and unpack_labeled_data to _preprocess_data

Owner

tacaswell commented Aug 3, 2016

I am 👍 on this, but would like others to check the renaming ( @efiring @mdboom )

Owner

efiring commented Aug 3, 2016

Fine with me.

Owner

mdboom commented Aug 10, 2016

Fine with me on the renaming, too.

@mdboom mdboom merged commit 9f0d770 into matplotlib:master Aug 10, 2016

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.001%) to 70.372%
Details

mdboom removed the needs_review label Aug 10, 2016

story645 deleted the story645:dict3 branch Mar 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment