New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python3.5 dictview support #6787
Conversation
What is the rationale for supporting this? It seems a bit odd because the order in which the categories will be given is unpredictable. Why not leave this step (unpacking the dictionary) to the user? |
OK, thank you for the explanation. I'm not convinced this is a good idea. I'm concerned about bloat in code and documentation. As for the newbie aspect, sooner or later newbies will need to learn that the formerly straightforward dict methods now return special objects. I don't see how shielding them briefly via adding this conversion ability to mpl really helps anyone. In python 2.7, we don't support dict.keys(), we support the list returned by dict.keys(). We are under no obligation to support what dict.keys() now returns. |
Except matplotlib support all sorts of types out of the box: numpy arrays, pandas series objects and dateframes (some of these are on the pandas side and some are on the mpl side), and generators. #6712 is one such example. The only requirement as far as matplotlib is concerned is that np.array will convert it properly. There's no real purity of philosophy there. (And really, this patch should probably be upstream in numpy...but that's a different kettle of fish) The argument for this is that dicts are one of the most commonly used datatypes around, 'specially for categorical data, and this code is specifically part of adding support for categorical data. It gets incredibly tedious to type:
and there's no value added to the user having to jump through the extra hoop. And honestly, from a teaching perspective it's a 10 minute tangent and a potential source of failure/thing that in my opinion will needlessly trip up users. |
As you mention yourself, the only requirement on matplotlib's side is that numpy can convert it to an array. Thus, this patch belongs to numpy, not here. |
That was sort of the discussion with @tacaswell and @mdboom about whether the "must convert to array" issue can be sort of factored out, (which this decorator kinda does since it can be expanded to support other types-and therefore possibly be used to factor out other cruft in the code used to ensure/convert types) since that wasn't so much a design decision as an accident of implementation choice. The issue with pushing it upstream to numpy is that they seem to be having the same discussions going on here (numpy/numpy#5718) |
It's already factored out in
Uh, matplotlib uses numpy internally like, everywhere, so an object is plottable iff it is convertible to an array? I guess you could say that using numpy is an accident of implementation choice but that seems a bit extreme to me.
I don't see how this is an "issue"? If you truly believe that dictviews should be array-likes then you should just push for numpy to fix this. In fact I think things would just be even more confusing for A view I could subscribe to would be to say "we support all iterable" and just first call |
But we already do that, and arguably one of the main points of adding categorical support was to add support for datatypes numpy doesn't really handle 'cause numpy kinda doesn't do categorical data...like dicts aren't esoteric, but one of the most commonly used datatypes around.
Except there are conversion routines and other pieces of code (like the labeled data decorator) that scaffold this.
But that's the issue at hand->dict keys/views are iterable. To intelligently cast everything to list, there'd still have to be object introspection on the args, at which point a decorator that looks at the args still makes sense.
except for the types that np.assarray don't support, hence this patch...this is a circular argument.
Why? Plenty of them don't care. This year I've been working a ton with people who are using matplotlib who aren't even using numpy (and are kinda scared to)-their data is all in dicts and lists (and as far as they know, dictkeys and dictviews are basically lists 'cause all they're doing is iterating over them...) |
No, we could just replace all call to Anyways, it's clear that this discussion is going nowhere, so let's see what others have to say. |
Everyone please take a deep breath, we are all on the same side here 😄 I think this is a good idea to support dict views to simplify the spelling of a common use case. There is almost no ambiguity to the 'right' thing to do here is so we should just do it. Almost everyone agrees that this should be handled upstream by numpy, however that issue is from 2015, they are at least a backlogged as we are, and even if a PR got merged today, we still support older versions of numpy so we need to handle this case ourselves anyway. I suggest implementing this in the data decorator. We only want to apply this to things that are 'data' like inputs which are already marked up in that decorator. The (views -> array) transformation is in a similar category to the (string -> array) transformation as they both provide a nicer way to spell something that we could make the user do.
This is not quite correct, we do a fair bit of work to avoid converting arrays (mostly to allow units to pass through and) unless we really have to. There is also a general concept (which has been floating around for years) to abstract out our data sources. Doing things like this is a good way to explore the API and what the consequences are without committing to major code changes. |
Kinda, sorta, it's got a really deep nesting if there's a data kwarg (... I kinda really wanna refactor this...) but for the most part just iterates over the args in all sorts of combinations and passes through to the |
@@ -2445,6 +2445,14 @@ def safe_first_element(obj): | |||
return next(iter(obj)) | |||
|
|||
|
|||
def asSequence(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this something like view_as_sequence
? named as it is it is a bit ambiguous as to what it does (and more importantly does not do).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...while I agree, is the plan to limit it strictly to views? and then it might be unclear why non-views are being passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is hard. A problem here is that where this is used, there is no clue that the "view" in question is the obscure "MappingView", not a "view" in the numpy sense, which is much more common. "convert_dict_view"? "convert_MappingView"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ensure_sequence ?
On Thu, Jul 21, 2016 at 9:45 PM Eric Firing notifications@github.com
wrote:
In lib/matplotlib/cbook.py
#6787 (comment):@@ -2445,6 +2445,14 @@ def safe_first_element(obj):
return next(iter(obj))+def asSequence(data):
Naming is hard. A problem here is that where this is used, there is no
clue that the "view" in question is the obscure "MappingView", not a "view"
in the numpy sense, which is much more common. "convert_dict_view"?
"convert_MappingView"?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/matplotlib/matplotlib/pull/6787/files/efc53d5d021a853757495099716d6c34166179b0#r71816398,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAMMhVT7zHoSEmkgfeLd_V8rlePZDhzgks5qYCCcgaJpZM4JOX_W
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about exactly that, but didn't suggest it because at least at present that's not at all what the function is doing. In addition, I suspect this needs to happily pass scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that name....hm..
what about some expansion of the form
if isnstance(data, [collections.Sequence, ndarray,...]):
return data
if isinstance(data, numeric types): #since strings are sequences
return data
if isinstance(data, collections.abc.MappingView )
return list(data)
MPLRuntimeWarning( "{} may be unsupported".format(type(data))
return data
And yes, will go figure out the proper way to raise an mpl version and I figure test failures will tell me what types of native types are expected to be supported. And how to support pandas without having to import it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks tricky to get right. It seems contrary to the spirit of duck-typing we have historically favored. Objects can be array-like, and work fine in mpl, without being subclasses of anything we know about. I don't think we want to start issuing warnings in such cases. Nor can we do all of our usual homogenization at this very early stage; we have to preserve those pesky units, and decisions about missing values need to be handled by each function on an argument by argument basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm...even with duck typing, there's some basic something they need to have though that generators and views don't...granted, I'm also fine with the less work don't change anything and just find a good name approach...
Other than a 🚲 🏠 y naming question, 👍 |
per @tacaswell's suggestion, renamed asSequence to sanitize_sequence and unpack_labeled_data to _preprocess_data |
Fine with me. |
Fine with me on the renaming, too. |
Currently the following doesn't work
because keys & values are MappingView objects rather than lists and so they don't convert to arrays cleanly. So I added a
cbook.asSequence
decorator that's really dumb and traverses over the argument list and converts any MappingView objects to lists. I chose this route (in discussion with/based on suggestions by @mdboom and @tacaswell) 'cause the .asSequence decorator can then be expanded to support other types. It's a decorator 'cause this is an issue in like I don't know how many functions-> I currently applied it just to plot, scatter, and bar-will do whole set based on discussion of PR.So this patch now says this:
yields:
Side note: this PR led to a patch on Python main better documenting dictview type introspection.