Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh mappable remapper #4490

Closed
wants to merge 7 commits into from

Conversation

tacaswell
Copy link
Member

This came out of the discussion at pandas-dev/pandas#10129.

I think this will make the lives of pandas users much better (along with buckets and buckets of documentation).

Draft of decorators to automatically map DataFrame columns to
base objects for plotting.
@tacaswell tacaswell added this to the proposed next point release milestone Jun 2, 2015
@tacaswell
Copy link
Member Author

@jenshnielsen You are a hero for sorting out all of the doc related formatting issues.

@tacaswell
Copy link
Member Author

@matplotlib/developers I would like to talk about this in my scipy talk, any feed back would be appreciated.



def apply_args_mapping(ax, func, data, map_targets, *args, **kwargs):
args = tuple(data[k] for k in map_targets) + args
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using the values attribute for the kwargs cases but not for the args cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oversight, but I am not fully sure which way is better. .values gets us a numpy array full-stop, but just [] gets use a Series which more-or-less behave like a numpy array.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not using .values also means that these decorators will work with dicts of numpy arrays. It might be better to use np.asarray.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omitting .values also means it will work with a numpy structured array; but I'm not sure if there would be any point in this.
Maybe use np.asanyarray in case something might yield a masked array?
I presume with a DataFrame, missing data will end up as NaN after conversion, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is always NaN internally in pandas and they don't do masking/sparse at all

@tacaswell
Copy link
Member Author

These decorators + #4488 should make it easy to wrap non-pandas aware functions to play nicely with their new pipe method.

Use `np.asarray` to ensure that the contents of the mapping are suitable
to pass into mpl functions.  This allows these functions to work with
both DataFrames, with dict-likes of lists/arrays and (presumably) with
xray objects.
@pelson
Copy link
Member

pelson commented Jun 11, 2015

Any chance of some tests?

@tacaswell
Copy link
Member Author

So requests for tests means people like this idea ? ;)

On Thu, Jun 11, 2015 at 11:49 AM Phil Elson notifications@github.com
wrote:

Any chance of some tests?


Reply to this email directly or view it on GitHub
#4490 (comment)
.

@OceanWolf
Copy link
Contributor

I have no idea, as I don't use pandas.

The only concern I have comes from making mpl more difficult to contribute to, it adds an extra layer for developers to test...

@tacaswell
Copy link
Member Author

These should be stand-alone helper functions, I do not see these getting
deeply integrated into the rest of the library.

On Thu, Jun 11, 2015 at 12:03 PM OceanWolf notifications@github.com wrote:

I have no idea, as I don't use pandas.

The only concern I have comes from making mpl more difficult to contribute
to, it adds an extra layer for developers to test...


Reply to this email directly or view it on GitHub
#4490 (comment)
.

@pelson
Copy link
Member

pelson commented Jun 11, 2015

So requests for tests means people like this idea ? ;)

Ha! Caught me. I've just finished reading that thread 30mins down...
I'm not convinced... honestly, I think the right place for this is in its own package.
I guess in its own context, it will either fly or die - that is no bad thing.

I would like to talk about this in my scipy talk, any feed back would be appreciated.

My interest is piqued though - I've not spent 30 seconds thinking about it, but I'm not convinced that the best solution was found in the discussion. A scipy discussion in and of itself?

@tacaswell
Copy link
Member Author

Had a chat with @ellisonbg about how to do this, he is suggesting an API more like:

plt.plot('a', 'b', data=df)

which is what other plotting packages which take labeled data use. The idea is that we can implement a decorator that goes through the args and tries to replace any strings with the colums of the data. This has the advantage of not requiring a hard pandas dependency and will work with things like hdf5/netcdf4 objects or dicts of arrays.

I am not a huge fan of this is it just feels wrong that the (second) most important thing goes last. It also runs the risk of not doing this right due the constraints of not using the fact that we know the input is a datafram and providing a broken API is worse than not providing one.

attn @efiring

@efiring
Copy link
Member

efiring commented Jul 11, 2015

The @ellisonbg suggestion looks good to me, in the sense that I think it provides an interface that will make sense to users, and be easy and natural to use. I like the generality of allowing data to be anything that acts like a dict of arrays. Your "feels wrong" sensation is coming from having the data kwarg? It doesn't bother me. How might the API be broken if the input is a dataframe? I don't see that this API prevents one from having code that special-cases particular data sources, but it would be nicer not to have to do that. In particular, I don't want to see pandas become a dependency of mpl.

@tacaswell
Copy link
Member Author

Closing this as cute, but not useful.

@tacaswell tacaswell closed this Aug 6, 2015
@tacaswell tacaswell deleted the enh_mappable_remapper branch August 6, 2015 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants