Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: plot method accessors #9321

Merged
merged 1 commit into from
Sep 11, 2015
Merged

ENH: plot method accessors #9321

merged 1 commit into from
Sep 11, 2015

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jan 21, 2015

Fixes #9124

This PR adds plotting sub-methods like df.plot.scatter() as an alternative to using df.plot(kind='scatter').

I've added meaningful function signatures and documentation for a few of these methods, but I would greatly appreciate help to fill in the rest -- this is a lot of documentation to assemble/reconstruct! The entire point of this PR, of course, is to have better introspection and docstrings.

Todo list:

  • Basic docstrings/signatures
    • area
    • line
    • bar
    • barh
    • box
    • hexbin
    • hist
    • kde/density
    • pie
    • scatter
  • Write tests for the methods
  • Fix groupby plots (tests currently failing)
  • Plotting docs
  • API docs
  • Release notes

@TomAugspurger
Copy link
Contributor

I'll do a PR against your branch for the rest of the docstrings this Saturday unless someone else gets there first.

@TomAugspurger
Copy link
Contributor

Thanks for doing this!

@sinhrks
Copy link
Member

sinhrks commented Feb 2, 2015

Current policy is nice for docstring thus for users, but I feel this makes consistent maintenance little difficult, because maintainer have to care all the separated functions. I think it is better to have a table which organizes what options are available on every plotting function. Maybe on Wiki?

I'm willing to do, but it takes a while as each function have options not listed in the signature, for example,
https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L1428

@shoyer
Copy link
Member Author

shoyer commented Feb 2, 2015

@sinhrks Agreed, this is going to be tricky. For a first draft (this PR), I think use of **kwargs for some options is unavoidable (every function will have **kwargs in the signature). Eventually, we can try to stop using it but practically that may not be possible.

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@TomAugspurger
Copy link
Contributor

I hope to get the docstrings done tonight / tomorrow.

@TomAugspurger
Copy link
Contributor

This is a rough list of which kinds accept which kws. It should be conservative, I think we can prune a bit further.

I'm going to try to come up with a hierarchy later, to factor out common kws.

line

  • data
  • x
  • y
  • use_index
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • position
  • table
  • yerr
  • xerr
  • stacked
  • sort_columns
  • kwds

bar / barh

  • data
  • x
  • y
  • use_index
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • position
  • table
  • yerr
  • xerr
  • stacked
  • sort_columns
  • kwds

hist

  • column
  • by
  • x
  • y
  • use_index
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • table
  • stacked
  • sort_columns
  • kwds

box

  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logy : isn't computed correctly.
  • rot
  • xlim
  • ylim
  • fontsize
  • figsize
  • table
  • kwds

kde / density

  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • colorbar
  • table
  • kwds

area

  • x
  • y
  • use_index
  • ax
  • subplots
  • sharex
  • sharey
  • layout
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • table
  • stacked
  • sort_columns
  • kwds

pie

  • y
  • ax
  • subplots
  • layout
  • grid
  • title
  • legend
  • rot
  • xlim
  • ylim
  • fontsize
  • figsize
  • colormap
  • table
  • kwds

scatter

  • x
  • y
  • ax
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • colorbar
  • position
  • table
  • yerr
  • xerr
  • kwds

hexbin

  • x
  • y
  • ax
  • grid
  • title
  • legend
  • logx
  • logy
  • loglog
  • xticks
  • yticks
  • rot
  • xlim
  • ylim
  • fontsize
  • secondary_y
  • mark_right
  • figsize
  • style
  • colormap
  • colorbar
  • position
  • table
  • yerr
  • xerr
  • gridsize
  • C
  • reduce_C_function
  • kwds

@TomAugspurger
Copy link
Contributor

Unfortunately, the __call__ signature is included in the docstring of the .plot method.

In [9]: df.plot?
Type:        FramePlotMethods
String form: <pandas.tools.plotting.FramePlotMethods object at 0x1087b9860>
File:        /Users/tom/miniconda3/envs/py3/lib/python3.4/site-packages/pandas/pandas/tools/plotting.py
Definition:  df.plot(self, x=None, y=None, kind='line', ax=None, subplots=False, sharex=True, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)
Docstring:
DataFrameFrame plotting accessor

Examples
--------
>>> df.plot.line()
>>> df.plot.scatter('x', 'y')
>>> df.plot.hexbin()

The plotting methods can also be accessed by directly calling the
accessor with the ``kind`` argument:
``df.plot(kind='line')`` is equivalent to ``df.plot.line()``
Call def:    df.plot(x=None, y=None, kind='line', ax=None, subplots=False, sharex=True, sharey=False, layout=None, figsize=None, use_index=True, title=None, grid=None, legend=True, style=None, logx=False, logy=False, loglog=False, xticks=None, yticks=None, xlim=None, ylim=None, rot=None, fontsize=None, colormap=None, table=False, yerr=None, xerr=None, secondary_y=False, sort_columns=False, **kwds)

I'm not sure this is avoidable.

@TomAugspurger
Copy link
Contributor

I did some work on this a while back. I'll see how it looks and maybe open up a pull against your branch.

@jreback
Copy link
Contributor

jreback commented May 9, 2015

@shoyer nice start....for 0.17.0!

@shoyer
Copy link
Member Author

shoyer commented May 11, 2015

@jreback @TomAugspurger Yes, let's get this in for 0.17. Even it's just our best guess for the full doc strings, we can fill in the long tail of options later.

@shoyer
Copy link
Member Author

shoyer commented May 29, 2015

OK, I have been updating this. To keep the complexity under control, so far all I've done is add the kind specific arguments to each plotting method. So at least it's clear which arguments are relevant for each kind, and these can be used positionally.

I would rather get this out in its somewhat incomplete form and add the long tail of generic keyword arguments (as documented by @TomAugspurger above) later. That should be pretty straightforward at this point, if someone can figure out a way to consolidate the docstring and function signature generation without going insane.

Here's what the new API docs look like (edit: updated to latest function signatures):

screen shot 2015-05-29 at 2 43 32 pm

It looks pretty good, except we get alias of FramePlotMethods as the API documentation for DataFrame.plot, without any docstring or generated page. @jorisvandenbossche do you know how to fix this? I can dig in if necessary.

@@ -134,6 +135,9 @@ These include:
* :ref:`'hexbin' <visualization.hexbin>` for hexagonal bin plots
* :ref:`'pie' <visualization.pie>` for pie plots

New in version 0.17, you can create these other plots using the method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a ..versionadded directive

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize that works on paragraphs

@shoyer
Copy link
Member Author

shoyer commented Jun 24, 2015

@jorisvandenbossche any ideas for fixing the doc generation? Even just a pointer to relevant docs would Sphinx would be helpful... my blind experiments did not have much luck.

To alleviate this issue, we have added a new, optional plotting interface, which exposes each kind of plot as a method of the ``.plot`` attribute. Instead of writing ``series.plot(kind=<kind>, ...)``, you can now also use ``series.plot.<kind>(...)``:

.. ipython::
:verbatim:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got some tabs instead of spaces here

@jorisvandenbossche
Copy link
Member

@shoyer I took a look at this, and I am afraid I don't directly see a way to solve this.

The problem is that DataFrame/Series.plot is no longer a method, but a class.
So also when you do df.plot?, you get the class docstring.
Just this fact is also already a bit annoying, as this will not show the signature as does df.plot? now.

To get the docstring page build for Series.plot, you have to use another template (see how it is done for cat/str/dt):

.. autosummary::
       :toctree: generated/
       :template: autosummary/accessor.rst

       Series.plot

But that does not solve the issue of above of the wrong docstring/wrong signature. The wrong docstring can of course easily be fixed by changing the docstring of the PlotMethods class. But the help of df.plot? will still look like that one of a class

@shoyer
Copy link
Member Author

shoyer commented Jun 25, 2015

Well, in the worst case scenario we could generate the rst on the API docs for .plot manually? That wouldn't be too bad

On Thu, Jun 25, 2015 at 5:29 AM, Joris Van den Bossche
notifications@github.com wrote:

@shoyer I took a look at this, and I am afraid I don't directly see a way to solve this.
The problem is that DataFrame/Series.plot is no longer a method, but a class.
So also when you do df.plot?, you get the class docstring.
Just this fact is also already a bit annoying, as this will not show the signature as does df.plot? now.
To get the docstring page build for Series.plot, you have to use another template (see how it is done for cat/str/dt):

.. autosummary::
       :toctree: generated/
       :template: autosummary/accessor.rst
       Series.plot

But that does not solve the issue of above of the wrong docstring/wrong signature. The wrong docstring can of course easily be fixed by changing the docstring of the PlotMethods class. But the help of df.plot? will still look like that one of a class

Reply to this email directly or view it on GitHub:
#9321 (comment)

@shoyer
Copy link
Member Author

shoyer commented Sep 5, 2015

This still needs a major rebase, but I have finally succeeded in generating the right API docs!


Here's what the relevant sections look like on the main API page:

For DataFrame:
screen shot 2015-09-05 at 12 14 23 pm

For Series:
screen shot 2015-09-05 at 12 17 52 pm


Here's the documentation page for DataFrame.plot:
screen shot 2015-09-05 at 12 14 40 pm


And here's an example of the doc pages for one of the submethods. Note that pandas.DataFrame.plot here is a link to the main DataFrame.plot page:
screen shot 2015-09-05 at 12 15 01 pm

@jorisvandenbossche
Copy link
Member

Nice hacks to get the autosummary working! :-) Glad you found a solution

@@ -473,8 +473,8 @@ def test_plot(self):
ax = _check_plot_works(self.ts.plot, style='.', loglog=True)
self._check_ax_scales(ax, xaxis='log', yaxis='log')

_check_plot_works(self.ts[:10].plot, kind='bar')
_check_plot_works(self.ts.plot, kind='area', stacked=False)
_check_plot_works(self.ts[:10].plot.bar)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe have a few duplicates where you use the plot accessor AND the function call (with kind). You may have done this, just checking. I mean run the tests twice (once with the accessor, once with the function)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- added explicit tests for all plot methods.

@jreback
Copy link
Contributor

jreback commented Sep 6, 2015

lgtm.

@sinhrks @TomAugspurger @jorisvandenbossche

needs a rebase / squash

existing docs in visualization.rst to change to the new accessing methods? in a follow-up PR (pls make an issue)

-------
axes : matplotlib.AxesSubplot or np.array of them
"""
return self(kind='line', **kwds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get argument completition on newer python versions if you (programmatically on import) add a __signature__ object: see this PR in mpl, which does it for decorators: https://github.com/matplotlib/matplotlib/pull/4829/files#diff-9ffb0d1db51a67ee4f37528e00715b3cR1597

There is also a PR in IPython which would backport this to 2.7 and which mpl will use if IPython is installed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. That's probably worth adding once we figure out all the other valid keyword argument...

@jorisvandenbossche
Copy link
Member

For the existing visualization.rst docs, I think it is OK to leave that for now

(also for later, question is one of the two we want to advertise the most)

@shoyer
Copy link
Member Author

shoyer commented Sep 10, 2015

Did the big rebase/squash -- let's see if things pass on Travis. Also added two followup issues for documentation. Assuming tests pass, this is ready for a final review.

@@ -116,6 +116,35 @@ Releasing of the GIL could benefit an application that uses threads for user int
.. _dask: https://dask.readthedocs.org/en/latest/
.. _QT: https://wiki.python.org/moin/PyQt

.. _whatsnew_0170.plot:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add to the highlites a pointer to this section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback
Copy link
Contributor

jreback commented Sep 10, 2015

lgtm.

deprecate .hist? (and point to .plot.hist)
@jorisvandenbossche @sinhrks @TomAugspurger

@@ -672,12 +672,34 @@ the Categorical back to a numpy array, so levels and order information is not pr
Plotting
~~~~~~~~

``Series.plot`` is both a callable method and a namespace attribute for
specific plotting methods of form ``Series.plot.<kind>``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"of form" -> "of the form". Same for DataFrame.plot below.

@jorisvandenbossche
Copy link
Member

I don't think a deprecation of hist is directly needed at the moment (they also don't have fully compatible signatures). But could certainly add a pointer in the hist docstring.

@TomAugspurger
Copy link
Contributor

👏

@shoyer
Copy link
Member Author

shoyer commented Sep 10, 2015

Added another followup issue for deprecating DataFrame.hist: #11053

I think this would be a good idea if we can sort out the API differences.

Fixes pandas-dev#9124

This PR adds plotting sub-methods like `df.plot.scatter()` as an alternative
to using `df.plot(kind='scatter')`.

I've added meaningful function signatures and documentation for a few of these
methods, but I would greatly appreciate help to fill in the rest -- this is a
lot of documentation to write! The entire point of this PR, of course, is to
have better introspection and docstrings.
@jreback
Copy link
Contributor

jreback commented Sep 10, 2015

ok, unless anyone has anything else, @shoyer merge on green.

shoyer added a commit that referenced this pull request Sep 11, 2015
@shoyer shoyer merged commit 52f4b75 into pandas-dev:master Sep 11, 2015
@shoyer shoyer deleted the PlotMethods branch September 11, 2015 04:42
@jreback
Copy link
Contributor

jreback commented Sep 11, 2015

I don't the image in the whatsnew got commited: _static/whatsnew_plot_submethods.png
can you push it up?

@shoyer
Copy link
Member Author

shoyer commented Sep 11, 2015

Oops. Just pushed to master: eca9db9

@jreback
Copy link
Contributor

jreback commented Sep 11, 2015

awesome thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: dedicated plot methods like plot.scatter as an alternative to using kind
6 participants