Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python3.5 dictview support #6787

Merged
merged 1 commit into from Aug 10, 2016
Merged

Python3.5 dictview support #6787

merged 1 commit into from Aug 10, 2016

Conversation

story645
Copy link
Member

Currently the following doesn't work

test = {'ab':2, 'bc':4, 'cd':4, 'dw':2, 'ef':1}
plt.plot(test.keys(), test.values())

because keys & values are MappingView objects rather than lists and so they don't convert to arrays cleanly. So I added acbook.asSequence decorator that's really dumb and traverses over the argument list and converts any MappingView objects to lists. I chose this route (in discussion with/based on suggestions by @mdboom and @tacaswell) 'cause the .asSequence decorator can then be expanded to support other types. It's a decorator 'cause this is an issue in like I don't know how many functions-> I currently applied it just to plot, scatter, and bar-will do whole set based on discussion of PR.

So this patch now says this:

plt.plot(test.keys(), test.values())

yields:
index

Side note: this PR led to a patch on Python main better documenting dictview type introspection.

@efiring
Copy link
Member

efiring commented Jul 18, 2016

What is the rationale for supporting this? It seems a bit odd because the order in which the categories will be given is unpredictable. Why not leave this step (unpacking the dictionary) to the user?

@story645
Copy link
Member Author

dict.keys() and dict.views() were supported under Py2.7 and they were technically equally unpredictable then...which really, that's why, to maintain behavior supported under Python2.7...

and because a lot of newbie users will get tripped up with views and converting views to lists and so from a tutorial/teaching matplotlib point of view it makes more sense to obfuscate that step. The better example though ('cause plot is silly) is bar where the user probably doesn't care about order on a first pass:
index

And 'cause OrderedDict is subject to all the same issues, and there the user has presorted their data.

@efiring
Copy link
Member

efiring commented Jul 18, 2016

OK, thank you for the explanation. I'm not convinced this is a good idea. I'm concerned about bloat in code and documentation. As for the newbie aspect, sooner or later newbies will need to learn that the formerly straightforward dict methods now return special objects. I don't see how shielding them briefly via adding this conversion ability to mpl really helps anyone. In python 2.7, we don't support dict.keys(), we support the list returned by dict.keys(). We are under no obligation to support what dict.keys() now returns.

@story645
Copy link
Member Author

story645 commented Jul 18, 2016

Except matplotlib support all sorts of types out of the box: numpy arrays, pandas series objects and dateframes (some of these are on the pandas side and some are on the mpl side), and generators. #6712 is one such example. The only requirement as far as matplotlib is concerned is that np.array will convert it properly. There's no real purity of philosophy there. (And really, this patch should probably be upstream in numpy...but that's a different kettle of fish)

The argument for this is that dicts are one of the most commonly used datatypes around, 'specially for categorical data, and this code is specifically part of adding support for categorical data. It gets incredibly tedious to type:

plt.bar(list(test.keys()), list(test.values()))

and there's no value added to the user having to jump through the extra hoop. And honestly, from a teaching perspective it's a 10 minute tangent and a potential source of failure/thing that in my opinion will needlessly trip up users.

@anntzer
Copy link
Contributor

anntzer commented Jul 18, 2016

As you mention yourself, the only requirement on matplotlib's side is that numpy can convert it to an array. Thus, this patch belongs to numpy, not here.

@story645
Copy link
Member Author

story645 commented Jul 18, 2016

That was sort of the discussion with @tacaswell and @mdboom about whether the "must convert to array" issue can be sort of factored out, (which this decorator kinda does since it can be expanded to support other types-and therefore possibly be used to factor out other cruft in the code used to ensure/convert types) since that wasn't so much a design decision as an accident of implementation choice.

The issue with pushing it upstream to numpy is that they seem to be having the same discussions going on here (numpy/numpy#5718)

@anntzer
Copy link
Contributor

anntzer commented Jul 18, 2016

whether the "must convert to array" issue can be sort of factored out

It's already factored out in np.asarray.

since that wasn't so much a design decision as an accident of implementation choice.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array? I guess you could say that using numpy is an accident of implementation choice but that seems a bit extreme to me.

The issue with pushing it upstream to numpy is that they seem to be having the same discussions going on here (numpy/numpy#5718)

I don't see how this is an "issue"? If you truly believe that dictviews should be array-likes then you should just push for numpy to fix this. In fact I think things would just be even more confusing for newbs intermediate level users if matplotlib starts accepting a subtly different set of types than np.asarray.

A view I could subscribe to would be to say "we support all iterable" and just first call list() on everything (although again I think this really belongs to numpy, but at least the rule is clear, not "we support whatever numpy supports plus whatever types we decided were important to add").

@story645
Copy link
Member Author

story645 commented Jul 18, 2016

we support whatever numpy supports plus whatever types we decided were important to add

But we already do that, and arguably one of the main points of adding categorical support was to add support for datatypes numpy doesn't really handle 'cause numpy kinda doesn't do categorical data...like dicts aren't esoteric, but one of the most commonly used datatypes around.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array?

Except there are conversion routines and other pieces of code (like the labeled data decorator) that scaffold this.

A view I could subscribe to would be to say "we support all iterable"

But that's the issue at hand->dict keys/views are iterable. To intelligently cast everything to list, there'd still have to be object introspection on the args, at which point a decorator that looks at the args still makes sense.

It's already factored out in np.asarray.

except for the types that np.assarray don't support, hence this patch...this is a circular argument.

intermediate level users if matplotlib starts accepting a subtly different set of types than np.asarray

Why? Plenty of them don't care. This year I've been working a ton with people who are using matplotlib who aren't even using numpy (and are kinda scared to)-their data is all in dicts and lists (and as far as they know, dictkeys and dictviews are basically lists 'cause all they're doing is iterating over them...)

@anntzer
Copy link
Contributor

anntzer commented Jul 18, 2016

A view I could subscribe to would be to say "we support all iterable"

But that's the issue at hand->dict keys/views are iterable. To intelligently cast everything to list, there'd still have to be object introspection on the args, at which point a decorator that looks at the args still makes sense.

No, we could just replace all call to np.asarray(...) by calls to np.asarray(list(...)) and let TypeErrors propagate (and yes, at that point this may as well be factored out, so that we can at least avoid the conversion back and forth when someone is actually passing an array in).

Anyways, it's clear that this discussion is going nowhere, so let's see what others have to say.

@tacaswell
Copy link
Member

Everyone please take a deep breath, we are all on the same side here 😄

I think this is a good idea to support dict views to simplify the spelling of a common use case. There is almost no ambiguity to the 'right' thing to do here is so we should just do it.

Almost everyone agrees that this should be handled upstream by numpy, however that issue is from 2015, they are at least a backlogged as we are, and even if a PR got merged today, we still support older versions of numpy so we need to handle this case ourselves anyway.

I suggest implementing this in the data decorator. We only want to apply this to things that are 'data' like inputs which are already marked up in that decorator. The (views -> array) transformation is in a similar category to the (string -> array) transformation as they both provide a nicer way to spell something that we could make the user do.

Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array? I guess you could say that using numpy is an accident of implementation choice but that seems a bit extreme to me.

This is not quite correct, we do a fair bit of work to avoid converting arrays (mostly to allow units to pass through and) unless we really have to.

There is also a general concept (which has been floating around for years) to abstract out our data sources. Doing things like this is a good way to explore the API and what the consequences are without committing to major code changes.

@story645
Copy link
Member Author

We only want to apply this to things that are 'data' like inputs which are already marked up in that decorator.

Kinda, sorta, it's got a really deep nesting if there's a data kwarg (... I kinda really wanna refactor this...) but for the most part just iterates over the args in all sorts of combinations and passes through to the _replacer method.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jul 20, 2016
@@ -2445,6 +2445,14 @@ def safe_first_element(obj):
return next(iter(obj))


def asSequence(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call this something like view_as_sequence? named as it is it is a bit ambiguous as to what it does (and more importantly does not do).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...while I agree, is the plan to limit it strictly to views? and then it might be unclear why non-views are being passed in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard. A problem here is that where this is used, there is no clue that the "view" in question is the obscure "MappingView", not a "view" in the numpy sense, which is much more common. "convert_dict_view"? "convert_MappingView"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ensure_sequence ?

On Thu, Jul 21, 2016 at 9:45 PM Eric Firing notifications@github.com
wrote:

In lib/matplotlib/cbook.py
#6787 (comment):

@@ -2445,6 +2445,14 @@ def safe_first_element(obj):
return next(iter(obj))

+def asSequence(data):

Naming is hard. A problem here is that where this is used, there is no
clue that the "view" in question is the obscure "MappingView", not a "view"
in the numpy sense, which is much more common. "convert_dict_view"?
"convert_MappingView"?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/matplotlib/matplotlib/pull/6787/files/efc53d5d021a853757495099716d6c34166179b0#r71816398,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMMhVT7zHoSEmkgfeLd_V8rlePZDhzgks5qYCCcgaJpZM4JOX_W
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about exactly that, but didn't suggest it because at least at present that's not at all what the function is doing. In addition, I suspect this needs to happily pass scalars.

Copy link
Member Author

@story645 story645 Jul 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that name....hm..
what about some expansion of the form

if isnstance(data, [collections.Sequence, ndarray,...]):
     return data
if isinstance(data, numeric types): #since strings are sequences
    return data 
if isinstance(data, collections.abc.MappingView )
    return list(data)
MPLRuntimeWarning( "{} may be unsupported".format(type(data))
return data

And yes, will go figure out the proper way to raise an mpl version and I figure test failures will tell me what types of native types are expected to be supported. And how to support pandas without having to import it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks tricky to get right. It seems contrary to the spirit of duck-typing we have historically favored. Objects can be array-like, and work fine in mpl, without being subclasses of anything we know about. I don't think we want to start issuing warnings in such cases. Nor can we do all of our usual homogenization at this very early stage; we have to preserve those pesky units, and decisions about missing values need to be handled by each function on an argument by argument basis.

Copy link
Member Author

@story645 story645 Jul 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...even with duck typing, there's some basic something they need to have though that generators and views don't...granted, I'm also fine with the less work don't change anything and just find a good name approach...

@tacaswell
Copy link
Member

Other than a 🚲 🏠 y naming question, 👍

@story645
Copy link
Member Author

story645 commented Jul 27, 2016

per @tacaswell's suggestion, renamed asSequence to sanitize_sequence and unpack_labeled_data to _preprocess_data

@tacaswell
Copy link
Member

I am 👍 on this, but would like others to check the renaming ( @efiring @mdboom )

@efiring
Copy link
Member

efiring commented Aug 3, 2016

Fine with me.

@mdboom
Copy link
Member

mdboom commented Aug 10, 2016

Fine with me on the renaming, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants