Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: allow plotting libraries to easily hook into pandas #5489

Closed
jtratner opened this issue Nov 10, 2013 · 19 comments
Closed

ENH: allow plotting libraries to easily hook into pandas #5489

jtratner opened this issue Nov 10, 2013 · 19 comments
Assignees
Labels
Milestone

Comments

@jtratner
Copy link
Contributor

At PyData I was introduced to this great library that wraps matplotlib called prettyplotlib by @olgabot and it was used to good effect in @twiecki's presentation on PyMC3.

I want to incorporate this into pandas - it would be a great improvement and a light dep for the library.

@olgabot said she was interested in working on a PR to incorporate it, so going to track from this issue.

@jankatins
Copy link
Contributor

prettyplolib also has a dependencies on brewer2mpl.

I'm not so sure about this: the lib changes mpl.rcParams directly (and some on import -> so simple importing prettyplotlib will change other unrelated matplotlib plots!) and this means that you can't use your own style anymore.

So, an example which I think won't work anymore is this:

import matplotlib.pyplot as plt
from pandas import *
from numpy.random import randn

ts = Series(randn(1000), index=date_range('1/1/2000', periods=1000))
ts = ts.cumsum()

# plots normal and if this is implemented "nicer" plots
ts.plot()
plt.show() 

# prints currently a xkcdified plot, but probably won't work if the plot function sets other 
# mpl.rcParams *after*  the `xkcd()` context manager sets the mpl.rcParams for the 
# xkcd version
with plt.xkcd():
    ts.plot()
    plt.show() 

There are also other libs like prettyplotlib out there, which wrap matplotlib to produce nicer plots:

[EDIT: It would be nice if they have a way to us there own styling system without getting styles from whatever system you choose.]

There is also functions around, which style already existing matplotlib plots or set some "nicer" defaults:

@jtratner
Copy link
Contributor Author

Okay, that's a very good point, let's keep it under an optional argument (that's disabled by default). We should be guarding mpl imports altogether so that matplotlib is not imported unless (and until) the user calls for plotting. Bottom line, I'd like pandas to have better plotting output because the default styles are not great and most people (including me) want to be able to not think about it.

Other notes:

  • I suggest prettyplotlib because a number of people were using it at PyData, it looks nice, and the developer is both very engaged and was willing to do the work to integrate it.
  • Yhat's ggplot is pretty much impossible because it's a completely different syntax and setup from matplotlib.
  • Seaborn looks interesting; however it requires both moss and pyhusl, so it's not as if there are less deps.
  • If it turns out to be too difficult to incorporate prettyplotlib directly, the stylers from climactic look like they could be a good option.

@jankatins
Copy link
Contributor

If you "just" want to have a nice plotting:

import prettyplotlib
ts.plot()
plt.show()

(which shows that just importing prettyprintlib will change plotting for unrelated plots -> IMO a bug: olgabot/prettyplotlib#15 )

What could work is something like this (without any tests that it compiles...):

prettyplotlib_rcParams = {}
# hack, it would be nice to use a prettyplotlib.rc_context() -> see bugreport
with mpl.rc_context():
    import prettyplotlib
    prettyplotlib_rcParams.update(mpl.rcParams)
....

def _get_plotting_styles():
    name = [get the options system to spit out which styles should be used]
    if name in ["default", "prettyplotlib"]:
         return prettyplotlib_rcParams
    else:
         return {}

def plot(...):
    styles = _get_plotting_styles()
    with mpl.rc_context(rc=styles):
        [current ploting...]

@TomAugspurger
Copy link
Contributor

Alternatively a settings in pd.set_option to declare globally that you want to use prettyplotlib's styling (or potentially others) may be useful.

I've been struggling with what pandas should be doing to support all the awesome plotting libraries sprouting up. (I'd add Bokeh and Vincent to the ones @JanSchulz listed). Providing the data structure and letting each library build off that is one thing. But the style of df.plot() is the tricky issue. I don't think something like df.plot(..., backend='prettyplotlib') would be ideal.

@jorisvandenbossche
Copy link
Member

@jtratner How do you see it different to the current pd.options.display.mpl_style == "default"?
Maybe this style could be improved, or you could add a "prettyplotlib" style, so if you set that option prettyplotlib integration will be enabled?
(apart from this, default is a strange name, as this is not the default ..)

Regarding yhat's ggplot, this is not impossible, because, apart from the new syntax (which emulates matplotlib calls), it just uses some rcParams calls to define it's style (https://github.com/yhat/ggplot/blob/master/ggplot/themes/theme_gray.py), The "default" style could be updated to reflect the improvements here, or another style "ggplot" could be added based on these rcparams.

You also have mpltools where you can do something like style.use('style1'), which will be integrated in matplotlib in the future (matplotlib/matplotlib#2236).

@olgabot
Copy link

olgabot commented Nov 12, 2013

Hello, prettyplotlib developer here. I could also adjust the library so it
does everything programmatically and doesn't touch rcParams, and thus
other plotting issues. I agree that it's misleading that there's some
secret changes happening in the background, so thanks for opening that
issue. The pandas/tools/plotting.py file is pretty gnarly so I'm still
wrapping my head around it.

If there's a toggle like pd.options.display.mpl_style == "prettyplotlib"
then sounds like the plotting.py file should have a master dict or
something which matches plot and hist and bar and all that to the
different options.


Olga Botvinnik
PhD Program in Bioinformatics and Systems Biology
Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine
University of California, San Diego
olgabotvinnik.com
blog.olgabotvinnik.com
github.com/olgabot

On Tue, Nov 12, 2013 at 10:02 AM, Tom Augspurger
notifications@github.comwrote:

Alternatively a settings in pd.set_option to declare globally that you
want to use prettyplotlib's styling (or potentially others) may be
useful.

I've been struggling with what pandas should be doing to support all the
awesome plotting libraries sprouting up. (I'd add Bokeh and Vincent to the
ones @JanSchulz https://github.com/JanSchulz listed). Providing the
data structure and letting each library build off that is one thing. But
the style of df.plot() is the tricky issue. I don't think something like df.plot(...,
backend='prettyplotlib') would be ideal.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5489#issuecomment-28299975
.

@jankatins
Copy link
Contributor

@jorisvandenbossche the theming in ggplots PR yhat/ggpy#75 will use more than mpl.rcParams to emulate ggplot2 theming: some things are not possible with just mpl styles. See the last commit in that PR.

@olgabot
Copy link

olgabot commented Nov 12, 2013

Either way with ggplot2 or prettyplotlib styling, I think the pandas
default should not use matplotlib defaults. This will definitely increase
the "wow" factor for first-time users too.


Olga Botvinnik
PhD Program in Bioinformatics and Systems Biology
Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine
University of California, San Diego
olgabotvinnik.com
blog.olgabotvinnik.com
github.com/olgabot

On Tue, Nov 12, 2013 at 10:20 AM, JanSchulz notifications@github.comwrote:

@jorisvandenbossche https://github.com/jorisvandenbossche the theming
in ggplots PR yhat/ggpy#75 https://github.com/yhat/ggplot/pull/75will use more than
mpl.rcParams to emulate ggplot2 theming: some things are not possible
with just mpl styles. See the last commit in that PR.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5489#issuecomment-28318726
.

@ghost ghost assigned jtratner Nov 13, 2013
@ghost
Copy link

ghost commented Nov 13, 2013

I merged #3112 originally since the default mpl color theme is so horrible. However, It's off by
default because I considered changing another lib's settings as a side-effect of import pandas unacceptable, more so because the user may have her own customization in matplotlibrc and clobbering them without being asked is poor manners.

It may be possible to save/modify_rcParams/plot/restore to isolate pandas' from other matplotlib
clients, in which case I'm all for "pretty by default".

Note that the mpl_style option allows devs to define other styles, which you can leverage to supplement
the available styles. There's an optional callback mechanism triggered when an option is set you may find
useful.

pandas' deps have exploded in the last 6-9 months and increasingly support for multiple
deps providing similar functionality (html parsing, excel IO, serialization) is added. That has costs
in complexity, bug surface and on the test matrix but since it can take a long time for these costs
to mount there isn't much incentive for restraint except by admonition (apologies).

Would it be possible to integrate new styles by using the existing style mechanism rather then introducing
more deps?

@jtratner
Copy link
Contributor Author

@y-p I agree with you here (and that's why I changed the title of this PR today). I want to make it easy to plug into pandas plotting. At the least, we should probably enable the mpl_style by default.

What I'd like to come out of this is a more abstracted plotting, maybe changing pandas/tools/plotting to be a set of static methods on a class that then gets looked up by NDFrame (and could then be subclassed by plotting libraries). Alternatively could just allow monkey patching.

@olgabot would it be enough to just have an init method that changed the default color scheme? (maybe monkey patching mpl_style). Or do you actually need to use prettyplotlib's methods?

@jankatins
Copy link
Contributor

To make all mpl changes side effect free:

with mpl.rc_context():
    mpl.rcParams[...] = ...
    [or importing prettyplotlib or whatever...]
    [plotting commands as before]

There is also rc_context(rc=dict) to change the some mpl.rcParams styles directly, which could come from the option system...

@jtratner
Copy link
Contributor Author

I think this may be geting off topic for this post, but it's very
reasonable to have plotting libraries provide functions that globally
change matplotlib (like some init function that you call explicitly or
something).

@olgabot
Copy link

olgabot commented Nov 14, 2013

The mpltools library does exactly that - it changes the rcParams (among
other things) to make all following plots the same. So if pandas plotting
is completely removed from rcParams, then they're unaffected. Any
thoughts? Should pandas plotting respond to matplotlib rcParams changes, or
should it stick to its own plotting?

FYI @y-p I'm having trouble installing matplotlib on my OS X Mavericks
machine which is why I haven't fixed your prettyplotlib issue yet.


Olga Botvinnik
PhD Program in Bioinformatics and Systems Biology
Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine
University of California, San Diego
olgabotvinnik.com
blog.olgabotvinnik.com
github.com/olgabot

On Thu, Nov 14, 2013 at 3:00 AM, Jeff Tratner notifications@github.comwrote:

I think this may be geting off topic for this post, but it's very
reasonable to have plotting libraries provide functions that globally
change matplotlib (like some init function that you call explicitly or
something).


Reply to this email directly or view it on GitHubhttps://github.com//issues/5489#issuecomment-28475425
.

@ghost
Copy link

ghost commented Nov 15, 2013

if the ggplot.py people get their community-building act together I can totally see it
obsoleting the pandas' plotting code. It's there because nothing else was available
to fill the gap at the time. Now that pandas is ubiquitous, a library built on top of it
(rather then a new pandas dep) is the right way forward IMO and the ggplot2 approach
has already proven itself many times over.

@ghost
Copy link

ghost commented Jan 24, 2014

@jtratner , it looks like the libs have decided for themselves and they are consuming
pandas data objects rather then aiming at becoming a dependency. I think that's great.

Can we close?

@jtratner
Copy link
Contributor Author

Fine with me, though it would be nice if plot() looked better by default... not a big deal though.

@ghost
Copy link

ghost commented Jan 25, 2014

ok, better how?

@olgabot
Copy link

olgabot commented Jan 25, 2014

Right now not all parameters can be specified by a rcParams file to
matplotlib (matplotlib/matplotlib#374 and matplotlib/matplotlib#2637). They're
working on better ways to specify all parameters for functions and I think
it'll be much easier to have nice pandas default plots then. For example,
I'd love to replace my entire prettyplotlib library by just some
stylesheet that specifies everything for you.


Olga Botvinnik
PhD Program in Bioinformatics and Systems Biology
Gene Yeo Laboratory | Sanford Consortium for Regenerative Medicine
University of California, San Diego
olgabotvinnik.com
blog.olgabotvinnik.com
github.com/olgabot

On Sat, Jan 25, 2014 at 11:30 AM, y-p notifications@github.com wrote:

ok, better how?


Reply to this email directly or view it on GitHubhttps://github.com//issues/5489#issuecomment-33297637
.

@ghost
Copy link

ghost commented Jan 25, 2014

@olgabot, we did add an option to activate mpl styles which is a substantial improvement on
mpl defaults (no retina-searing blue, for one). It could surely be improved and we'd love to
have your contribution.

I thought prettyplotlib does substantially more then just color schemes, and we did end up
saying no to the really sophisticated plots you offered to add to pandas (with some redirected to Seaborn?).

Maybe the scope of the issue is unclear, but re making ggplot/seaborn/perttyplotlib
into pandas deps, they're already built on top of pandas and the integration level
seems reasonable. So I think we can close.

There's no shame in being a substrate. :)

On making pandas defaults prettier, we can have a new issue with more focused scope.

@ghost ghost closed this as completed Jan 25, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants