API: engine kw to .plot to enable selectable backends #14130

jreback · 2016-08-31T15:58:03Z

We have had some conversations in the past regarding an .plot(....., engine=) kw. in #8018
This would allow pandas to redirect the plotting to a user selectable back-end, keeping matplotlib as the default.

see chartpy here, for a way to selectively enable matplotlib, bokeh and plotly.

and generically via altair.

Missing from here is when to re-direct to seaborn.

So this issue is for discussion:

should we do this
implementation method and minimalist dependencies to actually do this (ideally pandas would add NO dependences itself, just import for a particular engine, raising if its not available).
maybe should spin off much of the current pandas code into a separate repo (pandas-plot)?
and this actually should be the default (rather than matplotlib), which is of course the dependency. This might allow simply removing the vast majority of the custom plotting code.

The text was updated successfully, but these errors were encountered:

jreback · 2016-08-31T16:00:00Z

@jorisvandenbossche
@sinhrks
@wesm
@TomAugspurger
@shoyer
cc @ellisonbg
cc @tacaswell
cc @mdboom
cc @mwaskom
cc @pzwang
cc @bryevdv

jorisvandenbossche · 2016-09-01T09:00:50Z

The way that chartpy does this is by having multiple implementations of their plotting method for each engine (like pandas now has for matplotlib, but less extensive).

So, AFAIK, to have something like this work, we would either have to implement in pandas also the other engines (which means: having more plotting related code, not something we want?), or either expect from each engine to implement some kind of plot_dataframe handling the different chart types that pandas can delegate to. And I am not sure this is something that the different engines would like to do?

tacaswell · 2016-09-02T01:21:04Z

With mpl we have been working to better support pandas input natively to all of our plotting routines (the data kwarg, automatic index extraction, automatic label extraction).

It is not too hard now to write dataframe aware functions that do mostly sensible things (ex) with matplotlib. I have a suspicion that if you started from scratch and mpl 1.5+ the mpl version of the pandas plotting code would be much shorter and clearer.

My suggestion would be to pull the current pandas plotting code out into it's own project and refactor it into functions that look like

def some_chart_type(df, optional=backend, independent=input, *, backend=dependent, keyword=args):

and use that as a reference implementation of the plotting API that backends need to expose to pandas for use in the plot accessor.

dhirschfeld · 2017-09-26T23:27:52Z

This may also be of interest to @santosjorge

TomAugspurger · 2017-09-28T15:59:10Z

some quick thoughts follow. Curious to hear other's.

Pandas Plotting

Goal: define a system for multiple backends (matplotlib, Bokeh, Plotly, Altair
etc.) to take over DataFrame.plot.

Note libraries can already achive this end, to an extent, with
DataFrame.pipe(func, **kwargs). func gets the DataFrame as the first
argument and all kwargs. It's completely up to func what happens then. This
is about the main .plot method, which is implemented around charts.

Overview of the implementation

DataFrame implements .plot as a AccessorProperty. This makes .plot
into a namespace with various plot methods. Currently, we define

['area', 'bar', 'barh', 'box', 'density', 'hexbin', 'hist', 'line',
 'pie', 'scatter']

(scatter and hexbin are DataFrame-only; the rest are also defined on Series.plot).
For backwards compatibility, plot is also callable, and is equivalent to .plot.line.
These methods call matplotlib axes plotting methods.

User API

A user-configurable

pandas.options.plotting.backend = {'matplotlib', 'bokeh', 'altair', 'plotly', ... }

Would be the main point for users. Users would set this globally

pd.options.plotting.backend = 'bokeh'

Or use a context manager

with pd.options_context('plotting.backend', 'bokeh'):
    df.plot(...)

Backend API

Now for the tough part.

Changes to Pandas

We'll refactor the current FramePlotMethods to MatplotlibFramePlotMethods.
We'll make the actual FramePlotMethods a simple shell that

looks up the currently active backend
calls the appropriate method on the active backend

So

class FramePlotMethods:
    def line(self, x=None, y=None, **kwds):
        backend = self.get_backend()
        # _data is the DataFrame calling .plot.line
        backend.line(self._data, x=x, y=y, **kwds)

At that point, things are entirely up to the backend. The various backends would
implement their own FramePlotMethods (probably inherit from a base class in
pandas that raises NotImplementedError with a nice error message saying that
this method isn't available with this backend).

Challenges

API consistency

How much should pandas care that backends accept similar keywords, behavior
similarly, etc? I'm not sure. For the most part, we've simply adopted
matplotlib's terminology for everything. That's probably not appropriate for
everyone. Certain methods do have "recommended" (somewhere between required
and optional) keyword arguments. For example .line takes an x and y. It'd
be nice if backends could agree on those.

Global State

Matplotlib has the notion of a "currently active figure", and some plotting
methods will add to that. Is there any difference between

with pd.options_context('plotting.backend', 'bokeh'):
    df.plot()

with pd.options_context('plotting.backend', 'matplotlib'):
    df.plot()
    
# Any difference here?
with pd.options_context('plotting.backend', 'bokeh'):
    df.plot()

I don't think so (aside from the extra matplotlib plot; the bokeh plots would be
identical). It's completely up to the backend how to handle global state between
calls.

Fortunately for us, pandas messed this up terribly at some point, so that
Series.plot goes onto the currently active axes, while DataFrame.plot
creates a new one. Users are used to diverging behavior in this area I guess :)

registration

I've been trying to improve pandas import time recently. Part of that involved
removing a

try: import matplotlib
excpet ImportError: pass

Pandas doesn't want to try / except each of the backends known to have an
implementation. Do we require users to import bokeh.pandas, which calls a
register_backend? That seems not great from the user's standpoint, but maybe
necessary?

TomAugspurger · 2017-09-28T16:00:24Z

Agreed with @tacaswell here that the current implementation should be moved to the plugin system I outlined above. That would be a good test case for what other backends would need.

mwaskom · 2017-09-28T16:04:07Z

Missing from here is when to re-direct to seaborn

Personally I don't think it really makes sense to consider seaborn a "backend" for pandas plotting. Seaborn seems higher in the stack than pandas, relative to the other backends. Are there particular plotting functions you had in mind for delegating to?

TomAugspurger · 2017-09-28T16:13:53Z

Personally I don't think it really makes sense to consider seaborn a "backend" for pandas plotting.

Agreed for the most part. We could implement df.plot(x, y, hue, ...), as an alternative to Facetgrid(x, y, hue, data, ...), but not sure how worthwhile that would be.

That brings up another point, we would want to allow backends to implement additional methods, e.g. regplot. We can probably support that with some getattribute magic on FramePlotMethods

shoyer · 2017-09-28T16:19:05Z

@TomAugspurger great summary! I agree with pretty much everything you write.

Pandas doesn't want to try / except each of the backends known to have an
implementation. Do we require users to import bokeh.pandas, which calls a
register_backend? That seems not great from the user's standpoint, but maybe
necessary?

There are basically three options:

pandas tries importing other packages
other packages import pandas, and register a plotting method
pandas is aware of other packages, so it can define a lazy importing stub. The actual implementation can be somewhere else.

1 is off the table for the reason you mention, and 2 is not attractive for the same reason (matplotlib doesn't want to import pandas, either, and needing to explicitly write import matplotlib.pandas is annoying).

My suggesting is that we do some variant of option 3. Some backends, e.g., matplotlib, might remain bundled in pandas for now, but in general it would be nice for backends to de-coupled. So let's define a protocol of some sort based on the value of pandas.options.plotting.backend.

For example, we could try importing the module giving by the string value of the backend, and then call backend._pandas_plot_(pandas_obj) as the equivalent to pandas_obj.plot. If the backend doesn't want a hard dependency on pandas, they can put their PandasPlotMethods subclass in a separate module that is imported inside their _pandas_plot_ function.

@mwaskom Agreed, I don't see Seaborn as a "backend" (and I don't think Tom does either, based on his post).

ellisonbg · 2017-09-29T03:07:40Z

Honestly, I would probably prefer to have pandas plotting retired, unless there are particular plots that other libraries (Altair, seaborn, bokeh, Matplotlib). If there are still some special things that these other libraries can't do, then it would probably be easier to just implement those things in those other libraries. But it totally depends on your philosophy about breaking APIs. I tend to lean towards breaking things to innovate, but I understand that not all libraries can do that...

…

On Thu, Sep 28, 2017 at 9:19 AM, Stephan Hoyer ***@***.***> wrote: @TomAugspurger <https://github.com/tomaugspurger> great summary! I agree with pretty much everything you write. Pandas doesn't want to try / except each of the backends known to have an implementation. Do we require users to import bokeh.pandas, which calls a register_backend? That seems not great from the user's standpoint, but maybe necessary? There are basically three options: 1. pandas tries importing other packages 2. other packages import pandas, and register a plotting method 3. pandas is aware of other packages, so it can define a lazy importing stub. The actual implementation can be somewhere else. 1 is off the table for the reason you mention, and 2 is not attractive for the same reason (matplotlib doesn't want to import pandas, either, and needing to explicitly write import matplotlib.pandas is annoying). My suggesting is that we do some variant of option 3. Some backends, e.g., matplotlib, might remain bundled in pandas for now, but in general it would be nice for backends to de-coupled. So let's define protocol of some sort based on the value of pandas.options.plotting.backend. For example, we could try importing the module giving by the string value of the backend, and then call backend._pandas_plot_(pandas_obj) as the equivalent to pandas_obj.plot. If the backend doesn't want a hard dependency on pandas, they can put their PandasPlotMethods subclass in a separate module that is imported inside their _pandas_plot_ function. @mwaskom <https://github.com/mwaskom> Agreed, I don't see Seaborn as a "backend" (and I don't Tom does either, based on his post). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14130 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0BpNjovscYpvKPlIu8cGuFsBSkPzks5sm8cMgaJpZM4Jxxtq> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

tacaswell · 2017-09-29T04:41:35Z

The two biggest attractions to dataframe plot accessors is a) discoverability b) easy swapping of backends (if you really want them to be interchangeable, you need someone (pandas) to own the API).

mwaskom · 2017-09-29T18:39:45Z

If there are still some special things that these other libraries can't do, then it would probably be easier to just implement those things in those other libraries.

FWIW I have been working on adding a few more "basic" plots to seaborn (mwaskom/seaborn#1285), which would help fill to a "higher-level, matplotlib-based" hole that would otherwise open up if pandas dropped plotting altogether.

ellisonbg · 2017-09-29T18:46:06Z

nice!

…

On Fri, Sep 29, 2017 at 11:40 AM, Michael Waskom ***@***.***> wrote: If there are still some special things that these other libraries can't do, then it would probably be easier to just implement those things in those other libraries. FWIW I have been working on adding a few more "basic" plots to seaborn ( mwaskom/seaborn#1285 <mwaskom/seaborn#1285>), which would help fill to a "higher-level, matplotlib-based" hole that would otherwise open up if pandas dropped plotting altogether. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14130 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0OTayoQZuDCsXKr2rlBvzhZej3koks5snTmMgaJpZM4Jxxtq> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

bryevdv · 2017-09-29T22:02:32Z

Re: retiring pandas plotting, I definitely disagree. I am personally excited and engaged by innovation, but some conversations last week at Strata reminded me that one tool, or style, will never suit all users and use cases. Some people prioritize absolute immediacy, simple expectations, and lack of friction. They just want to do df.plot() and get on with their day. That's why I definitely support this idea.

Regarding the decoupling: I think pandas should own the API, and if people want to do something beyond that, they should look to using the native plotting APIs. That comports with my observation that people who most want df.plot want it because it is frictionless and has simple, clear expectations. The value in multiple backends is that they all do the same general things, but offer a path to exploiting specific benefits inherent in the returned objects. Along those lines, (and without knowing anything yet about actual Pandas internals) I'd propose a decoupling along these lines as a starting point for discussion:

_registerers = {
   "mpl"   : _register_mpl,
   "bokeh" : _register_bokeh,
}

_registered = {}

def _register_bokeh():
    try: 
        # Bokeh defines where its "real" register func lives, commits to keeping it there
        from bokeh.compat import register_pandas
        return register_pandas()
    except Exception as e:
        return None

def plot(self, *args, **kw, backend_name="mpl"):
    # maybe backend_name comes from global settings, or whatever, just an illustration

    if backend_name not in _registerers: 
        raise RuntimeError("Unknown backend %r" % backend_name)

     backend = _registered.set_default(backend_name, _registerers[backend_name]()):
     if backend is None:
        raise RuntimeError("Error loading %r backend" % backend_name)

    backend.plot(self, *args, **kw)

Pandas would own all of that. It requires a commitment from known backends to maintain the "real" registration function (that lives in the respective projects) in a stable place so that register_foo always functions, and that the object returned implements the "Pandas plotting API", but otherwise puts the burden of defining that function and how it does what it does, on the individual projects.

ellisonbg · 2017-10-03T18:10:14Z

In general, it is brittle and painful to standardize architecture and extensibility around Python APIs. We have seen this many, many times in building different parts of Jupyter. The right way to do this is to build a declarative formal JSON schema that serves as the common exchange format and contract between pandas and different libraries which render visualizations. I would advocate for using Vega-Lite as that JSON schema, but that point is much less important than the bigger idea of using a JSON schema for this. Some of the benefits of this approach:

The pandas and renderer python APIs are free to evolve as needed, while keeping the JSON schema fixed.
Natural serialization format.
Opens the door for other languages to interoperate
Can build generic tools which transform that JSON data to other JSON data (vega, vega-lite, bohek, plotly)
In principle, with Jupyter's MIME-type based rendering, you could even build a completely frontend based renderer for that data.

ping @rgbkrk who is an advocate of "JSON schema all the things"

bryevdv · 2017-10-03T18:57:17Z

The right way to do this is to build a declarative formal JSON schema that serves as the common exchange format and contract between pandas and different libraries which render visualizations.

As we have found out and finally rectified after a long time with Bokeh, "JSON for everything" is inordinately slow for many use cases. I'm definitely not personally interested in expending very-limited bandwidth on a JSON-only solution. WRT to difficulties around standardizing APIs, I am not certain the specific issues with Jupyter history generalize everywhere.

pzwang · 2017-10-03T20:49:36Z

@ellisonbg you bring up an interesting point, that if the Pandas devs want to "own plotting", then outputting a JSON-based visualization spec would be the most flexible and accurate approach to doing that. However, the roundtrip through JSON-land is nontrivial - not merely from a logical mapping perspective, but also from the perspective of performance. In the most common case, directly calling matplotlib on a large dataframe is extremely fast. Similarly, there's no reason why Datashader or Bokeh server can't also be similarly fast on large dataframes. However, round-tripping those datasets through an encode/decode process to JSON would be quite painful. (And that's not even considering the use cases of e.g. GeoPandas, with tons of shape geometry data.)

My understanding is that the Pandas devs already have a plotting API on the plot object, namely, the ['area', 'bar', 'barh', 'box', 'density', 'hexbin', 'hist', 'line', 'pie', 'scatter'] methods, which defines their expectations of the API that the plotting backends must adhere to. At that point, it's on the viz library developers to properly implement those functions.

wesm · 2017-10-03T20:57:49Z

However, round-tripping those datasets through an encode/decode process to JSON would be quite painful. (And that's not even considering the use cases of e.g. GeoPandas, with tons of shape geometry data.)

I spoke with @bryevdv about this at some length during the Strata conference. There would be a great benefit to standardizing on a flexible binary zero-copy protocol for moving data (and column types) from pandas to JS libraries. Apache Arrow is the obvious candidate for this task, as we can already emit Arrow binary streams from Python and receive them in JavaScript (though what's been implemented on the JS side as far as glue with other frameworks is very limited at the moment). We have some other invested parties who may be able to assist with some of the development work to make this easy for us to do (@trxcllnt, @lmeyerov, and others)

The Arrow metadata is designed to accommodate user-defined types, so we could conceivably (with a bit of elbow grease) embed the geo data in an Arrow table column and send that as a first-class citizen.

I am not sure what all would be required from here to make this work seamlessly, but to have a list of requirements and next steps would be useful and give the community a chance to get to work.

ellisonbg · 2017-10-03T20:58:22Z

Sorry I wasn't clear - I would keep the data separate and only specify the visual encodings, marks, etc in the JSON. The actual data transfer could be done with either arrow or full pandas data frames. The rendering libraries could deal with the combination of JSON viz spec + DataFrame

…

On Tue, Oct 3, 2017 at 1:50 PM, Peter Wang ***@***.***> wrote: @ellisonbg <https://github.com/ellisonbg> you bring up an interesting point, that if the Pandas devs want to "own plotting", then outputting a JSON-based visualization spec would be the most flexible and accurate approach to doing that. However, the roundtrip through JSON-land is nontrivial - not merely from a logical mapping perspective, but also from the perspective of performance. In the most common case, directly calling matplotlib on a large dataframe is extremely fast. Similarly, there's no reason why Datashader or Bokeh server can't also be similarly fast on large dataframes. However, round-tripping those datasets through an encode/decode process to JSON would be quite painful. (And that's not even considering the use cases of e.g. GeoPandas, with tons of shape geometry data.) My understanding is that the Pandas devs already have a plotting API on the plot object, namely, the ['area', 'bar', 'barh', 'box', 'density', 'hexbin', 'hist', 'line', 'pie', 'scatter'] methods, which defines their expectations of the API that the plotting backends must adhere to. At that point, it's on the viz library developers to properly implement those functions. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#14130 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABr0FhD7EDH8cy1FmxfII3mpXZ_zcwnks5sop37gaJpZM4Jxxtq> .

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com

rgbkrk · 2017-10-03T20:59:12Z

To both Brians -- I may surprise you. I'd generally say that I'm a fan of format specifications. Versioned, specified in a manner that someone could write an implementation of the spec and convert it to other formats. At least in Jupyter, we need to be able to work solely with JSON because of the notebook format and messaging spec being a JSON based protocol (*). My primary care for specifications is to be able to support more than one language, which means being able to plot in non-python environments. I'd be more than happy if there were agreed upon binary format (even with arrow we can work with it on a web based frontend or serverside with node).

(*) Caveat: kernels can send arbitrary binary blobs with messages, they're not well specced for use on the protocol though (they do get used by ipywidgets, since they intercept messages and send their own).

Pandas luckily can return a standardized table schema thanks to @TomAugspurger and others, so I'm pretty happy in this regard for having something interoperable that isn't tied to a particular visualization. It ticks some basic boxes for the small cases that JSON formats are totally fine for.

trxcllnt · 2017-10-03T21:31:58Z

@rgbkrk heads up, our Arrow lib is now part of the official Apache/Arrow project. The package name on npm will stay the same, intending to release 0.1.3 in the next few days 🎉

rgbkrk · 2017-10-03T21:34:40Z

niiiiice

TomAugspurger · 2017-10-04T17:34:21Z

I've started a (super hacky) version of this over at master...TomAugspurger:plotting-plugin

For engine authors, there's a base class where you can override the various .line, .scatter, etc, and a function register_engine to make pandas aware of your implementation.

rs2 · 2017-10-04T20:44:17Z

@TomAugspurger:
Given there is DataFrame.pipe(func, **kwargs), are you suggesting that each plotting library implements callbacks to produce line, bar, scatter etc. plots for a given dataframe and a set of kwargs? This could be optimal for a number of reasons:

Keep majority of plotting code outside of pandas;
Core developers of each plotting library are probably more familiar with best practices of translating kwargs into plots;
pandas developers don't have to worry about changes in plotting libraries' APIs (e.g. there have been significant changes between bokeh 0.11.1 and 0.12.9).

Challenges:

Get core developers' time to actually implement the callbacks.

TomAugspurger · 2017-10-04T20:53:15Z

@rs2 yeah that sounds about right (I wouldn't call them callbacks though, and it won't be using .pipe).

Any library wishing to take over .plot can subclass the BasePlotMethods class in master...TomAugspurger:plotting-plugin and register their implementation with pandas. When the user configures pandas to use that backend, pandas will ensure that the .plot call is passed through to the correct implementation.

tacaswell · 2017-10-07T16:41:49Z

Could pandas also provide some helper functions for down-selecting the data frames to just the columns of interest / doing aggregations?

I think it would also make sense for the API to provide a semantic set of inputs (ex x and y_list to plot) to guide the implementations.

Can this be py3 only so we can use required keyword arguments?

@ellisonbg I don't see a big difference an python api with fixed kwargs and a json schema which embeds the function name as one of the keys. If you need it is json format, it should be up to the plotting library to do that translation and export as json if required.

ellisonbg · 2017-10-07T17:21:10Z

@tacaswell - great question. A few things we have observed in building things like this:

Imperative Python APIs tend to be leaky and end up with edge cases that are implicitly defined by behavior of a particular implementation or documentation (or lack thereof).
Because Python doesn't have things such as public/private, interfaces and static typing (unless you are willing to jump to python 3 only!), the contracts made by Python APIs are hard to enforce and end up relying on those informal contracts of documentation and behavior. This is very different from a JSON schema which is formal and can be validated at run time (structure and types).
A Python API is a stronger constraint on both Pandas and rendering libraries from the versioning perspective. A JSON schema allows both Pandas and the underlying renderers to evolve their public API to best serve users, while still following the contract of the JSON schema. A well versioned JSON schema allows a renderer to support multiple versions of that schema in a single Python package.
Serialization. With a JSON schema, serialization is a solved problem and users can instantly serialize their visualizations to files and send them over the network - all while allowing different renderers to be used as needed. It also makes it possible to build frontend based renderers with ease.
It is easy to build a set of declarative tests and examples that renderers should be able to run. With little bit of work, tests can even be autogenerated to make it easy to follow the JSON schema.

However, if there isn't support for a JSON schema based approach, I would love to see this be python3 only so at least the python api can be strongly typed and required kw args.

tacaswell · 2017-10-11T05:49:50Z

It is not clear to me that JSON schema based communication will be any better. The hard part is still the semantics, names, and intentions. We can end up with implementation specific behavior that leaks across a boxplot json schema just as easily as across a normal function call. The weirdest bits of leaky API tend to be when the meaning of a value is one parameter depends on the value of another which is something that JSON can not help us with
using explicit kwargs at least gets us checks that the keys are spelled correctly. It is not clear that we want these APIs to be deeply nested (that is the top level inputs to these should not be dicts). The places where I would expect the most pain would be things like "the user asked us to aggregate on a column which is not in the dataframe" which I do not think schemas can help us with (and maybe should be a basic check that pandas does for us). I am not sure I am ready to throw the duck-typing baby out with the bath water....
The renderer's are going to have to register with pandas, keeping around different versions and registering the correct ones based on the version of pandas installed in not hard to manage (I have managed this at my day job, it is not so bad) or to provide a version key with the registration function to pandas.
This is very specific to JSON based plotting libraries, what you register with pandas looks something like

def do_boxplot(data, **kwargs):
    json = build_my_json_of_boxplot_and_validate(kwargs)
    return data, json

For those of us with native python plotting libraries (well, fine me ;) ), this seems natural to restrict the json related things to the json libraries. It also lets you deal with any schema differences between different JSON based plotting libraries in python and give libraries a chance to do any data-preprocessing before exporting.

It is not clear to me how you could do better than smoke tests without a human in the loop.

Fundamentally I think the two approaches are functionally equivalent (I'm less worried about static typing because I render in the same process in python so I get nice error messages rather than it rendering who knows where is a browser that happily eats all exceptions 😈 ). Expressing the API with a schema is reasonable (and auto-generating the pandas side of the API?), but I am not convinced that the value add is worth the effort of just writing the API to begin with.

If this goes the JSON route mpl will just write functions that look like

def do_boxplot(data, json_kwargs):
    json_kwargs = json.loads(json_kwargs)  # ok, I may be being pedantic here
    return realy_do_boxplot(data, **json_kwargs)

but it seems odd to me to run an API for python libraries to talk to each other through JSON.

I am 100% on board with this being python3 only 👍

I should also be clear, I very much like JSON / JSON schema in general, I am just not convinced that it is the right thing to do in this case.

@TomAugspurger Have you considered using a SimpleNamespace containing functions that look like sig(data, *, k1, k2, ...) ? Seems nicer from the implementer side to not have to subclass something from pandas and keeps us from ever seeing any of the pandas internals of how the plot accessor is implemented.

flavianh · 2019-07-27T07:09:58Z

@jreback @datapythonista I can't get it to work:

pd.set_option('plotting.backend', 'plotly.plotly')
df.plot(x='created_at', y='updated_at', kind='scatter')

gives me

---------------------------------------------------------------------------
PlotlyError                               Traceback (most recent call last)
<ipython-input-32-3180a7c770d0> in <module>
      1 pd.set_option('plotting.backend', 'plotly.plotly')
----> 2 open_pulls_df.plot(x='created_at', y='updated_at', kind='scatter')

~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
    736         if kind in self._dataframe_kinds:
    737             if isinstance(data, ABCDataFrame):
--> 738                 return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
    739             else:
    740                 raise ValueError(

~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/chart_studio/plotly/plotly.py in plot(figure_or_data, validate, **plot_options)
    223     """
    224     import plotly.tools
--> 225     figure = plotly.tools.return_figure_from_figure_or_data(figure_or_data, validate)
    226     for entry in figure['data']:
    227         if ('type' in entry) and (entry['type'] == 'scattergl'):

~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/plotly/tools.py in return_figure_from_figure_or_data(figure_or_data, validate_figure)
   1130         validated = True
   1131     else:
-> 1132         raise exceptions.PlotlyError("The `figure_or_data` positional "
   1133                                      "argument must be "
   1134                                      "`dict`-like, `list`-like, or an instance of plotly.graph_objs.Figure")

PlotlyError: The `figure_or_data` positional argument must be `dict`-like, `list`-like, or an instance of plotly.graph_objs.Figure

and

pd.set_option('plotting.backend', 'plotly')
df.plot(x='created_at', y='updated_at', kind='scatter')

gives me

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-36-63d506018c44> in <module>
      1 pd.set_option('plotting.backend', 'plotly')
----> 2 open_pulls_df.plot(x='created_at', y='updated_at', kind='scatter')

~/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
    736         if kind in self._dataframe_kinds:
    737             if isinstance(data, ABCDataFrame):
--> 738                 return plot_backend.plot(data, x=x, y=y, kind=kind, **kwargs)
    739             else:
    740                 raise ValueError(

AttributeError: module 'plotly' has no attribute 'plot'

There is no doc on how it works, so I'm stuck here. Tested with pandas 0.25.0 against plotly 3.10.0 and plotly 4.0.0

datapythonista · 2019-07-27T07:23:22Z

Thanks @flavianh for reporting. Unfortunately it's not feasible to plot with arbitrary libraries, we can just plot with libraries that implement our interface. Plotly has plans to work on it, but I think the development hasn't started yet, I guess it will take few months. There is work being done in hvplot and altair to make these libraries compatible with the new API, but that is not available now.

Afaik the only library you can use at the moment is the latest version of: https://github.com/PatrikHlobil/Pandas-Bokeh I don't think is as mature as the matplotlib plotting we provide, and I wouldn't use it in production code, but I think should be helpful for interactive plots in a notebook.

What would be very useful is if you can open a pull request to clarify all this in the documentation you were following. So, we don't mislead other users as we did with you. Thank you in advance for it!

flavianh · 2019-08-05T06:26:52Z

I think I read the changelog which mentions this very issue. I may have fast-read through the issue and thought that I could try plotly. I looked around in the documentation and it seems quite clear, especially here: "plotting.backend : str The plotting backend to use. The default value is “matplotlib”, the backend provided with pandas. Other backends can be specified by prodiving the name of the module that implements the backend. [default: matplotlib] [currently: matplotlib]".

jreback added Visualization plotting API Design Needs Discussion Requires discussion from core team before further action labels Aug 31, 2016

jreback added this to the 0.20.0 milestone Aug 31, 2016

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

TomAugspurger mentioned this issue Oct 4, 2017

Feature: implement Bokeh dataframe plotability #6962

Closed

philippjfr mentioned this issue Nov 21, 2017

Streaming dataframe visualization python-streamz/streamz#126

Open

jbednar mentioned this issue Jan 8, 2018

Add HoloViews based plotting API python-streamz/streamz#129

Closed

philippjfr mentioned this issue May 30, 2018

HoloViews based plotting API pydata/xarray#2199

Closed

datapythonista mentioned this issue May 15, 2019

PLOT: Split matplotlib specific code from pandas plotting #26414

Merged

4 tasks

datapythonista mentioned this issue Jun 10, 2019

PLOT: Add option to specify the plotting backend #26753

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.25.0 Jun 21, 2019

datapythonista closed this as completed in #26753 Jun 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: engine kw to .plot to enable selectable backends #14130

API: engine kw to .plot to enable selectable backends #14130

jreback commented Aug 31, 2016

jreback commented Aug 31, 2016

jorisvandenbossche commented Sep 1, 2016

tacaswell commented Sep 2, 2016

dhirschfeld commented Sep 26, 2017

TomAugspurger commented Sep 28, 2017

TomAugspurger commented Sep 28, 2017

mwaskom commented Sep 28, 2017 •

edited

TomAugspurger commented Sep 28, 2017

shoyer commented Sep 28, 2017 •

edited

ellisonbg commented Sep 29, 2017 via email

tacaswell commented Sep 29, 2017

mwaskom commented Sep 29, 2017

ellisonbg commented Sep 29, 2017 via email

bryevdv commented Sep 29, 2017

ellisonbg commented Oct 3, 2017

bryevdv commented Oct 3, 2017

pzwang commented Oct 3, 2017

wesm commented Oct 3, 2017

ellisonbg commented Oct 3, 2017 via email

rgbkrk commented Oct 3, 2017 •

edited

trxcllnt commented Oct 3, 2017

rgbkrk commented Oct 3, 2017

TomAugspurger commented Oct 4, 2017

rs2 commented Oct 4, 2017 •

edited

TomAugspurger commented Oct 4, 2017

tacaswell commented Oct 7, 2017

ellisonbg commented Oct 7, 2017

tacaswell commented Oct 11, 2017

flavianh commented Jul 27, 2019

datapythonista commented Jul 27, 2019

flavianh commented Aug 5, 2019

API: engine kw to .plot to enable selectable backends #14130

API: engine kw to .plot to enable selectable backends #14130

Comments

jreback commented Aug 31, 2016

jreback commented Aug 31, 2016

jorisvandenbossche commented Sep 1, 2016

tacaswell commented Sep 2, 2016

dhirschfeld commented Sep 26, 2017

TomAugspurger commented Sep 28, 2017

Pandas Plotting

Overview of the implementation

User API

Backend API

Changes to Pandas

Challenges

TomAugspurger commented Sep 28, 2017

mwaskom commented Sep 28, 2017 • edited

TomAugspurger commented Sep 28, 2017

shoyer commented Sep 28, 2017 • edited

ellisonbg commented Sep 29, 2017 via email

tacaswell commented Sep 29, 2017

mwaskom commented Sep 29, 2017

ellisonbg commented Sep 29, 2017 via email

bryevdv commented Sep 29, 2017

ellisonbg commented Oct 3, 2017

bryevdv commented Oct 3, 2017

pzwang commented Oct 3, 2017

wesm commented Oct 3, 2017

ellisonbg commented Oct 3, 2017 via email

rgbkrk commented Oct 3, 2017 • edited

trxcllnt commented Oct 3, 2017

rgbkrk commented Oct 3, 2017

TomAugspurger commented Oct 4, 2017

rs2 commented Oct 4, 2017 • edited

TomAugspurger commented Oct 4, 2017

tacaswell commented Oct 7, 2017

ellisonbg commented Oct 7, 2017

tacaswell commented Oct 11, 2017

flavianh commented Jul 27, 2019

datapythonista commented Jul 27, 2019

flavianh commented Aug 5, 2019

mwaskom commented Sep 28, 2017 •

edited

shoyer commented Sep 28, 2017 •

edited

rgbkrk commented Oct 3, 2017 •

edited

rs2 commented Oct 4, 2017 •

edited