Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: feat(python): add plotting backend #13149

Closed
wants to merge 5 commits into from

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Dec 20, 2023

This has been on the back of my mind for several months, finally couldn't ignore it further

The general idea is:

  • Polars does not implement plotting logic itself. Rather:
    • if you use any plotting functions or arguments as documented here, then you
      should expect consistent behaviour supported backends. Any extra keyword arguments
      you specify are passed directly to the backend itself, and so their behaviour is
      backend-dependent.
    • Polars plots are only intended as convenience methods for ease of use during
      interactive development.
  • In order for a plotting backend to be supported, it must:
    • implement a top-level plot method which follows a certain standardised signature
      This already exists to some extent thanks to some fantastic work by @datapythonista a few years ago. So, Polars can just "piggy-back" off it, and document the expectations

Timeline

If there's appetite for this, I'd suggest proceeding in a direction that's something like:

  • start here with just adding scatter and line, as they're probably the most used quick plots in interactive development
  • make the backend kwarg configurable via pl.Config
  • very slowly and carefully consider adding extra plotting functions. The general principle of adding them would be:
    • is the function already supported in plotly and hvplot backends?
    • would the function be easy to support in hypothetical seaborn and altair backends?
    • only add keyword arguments which work (or could work) consistently for the above backends
    • never implement plotting logic in Polars itself
  • (stretch goal / pipe dream) could we define a DataFrame Plotting API Standard, which would be expected to work for any dataframe library implementing the DataFrame Interchange Protocol and any plotting library implementing a standardised plot function?

Demo

image

image

Where's matplotlib?

matplotlib is the default plotting backend in pandas.plotting, so they never implemented the top-level plot function. So, Polars can't (yet) use them as a plotting backend. But, if we define the rules and expectations, then a matplotlib-backed backend should be feasible, if they were interested! Either way, to re-iterate - no plotting code, whether matplotlib or anything else, would live in Polars itself

Necessary precursors

The following would need to go in first:

Nice to have precursors

So, there's time to update this and make it mergeable, just wanted to start getting feedback. Looks like people are generally supportive anyway, thanks!

@reswqa
Copy link
Collaborator

reswqa commented Dec 20, 2023

Great! I don't know much about plotting, but it's incredible to me that we can support it with such a small amount of code. 👍

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

170 lines of code. That's maintainable. :D


Polars plotting methods are only intended as convenience methods for ease of
use during interactive development. For more complex plots, it is recommended
to use plotting libraries directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Can we make explicit here that we don't guarantee anything here. It is meant for EDA and we don't want a major version bump if we change anything here.

@stinodego
Copy link
Member

I wholeheartedly support this initiative 👍

In order for a plotting backend to be supported, it must implement a top-level plot method which follows a certain standardised signature

As far as I can tell, this is the Achilles' heel of the approach. There would have to be some standardization effort across plotting libraries to be able to support them as backends. I can already see our issue tracker explode when one of our supported backends changes their API away from whatever we expect.

An alternative approach would be to create adapters for various plotting backends, but that would increase our maintenance burden - though it doesn't have to be too bad.

If we can get some popular backends to conform to our requirements then I am all in favor of the current approach!

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Dec 20, 2023

Thanks all for your feedback, I wasn't expecting such a positive response

There would have to be some standardization effort across plotting libraries to be able to support them as backends

It's kinda already happened though, because of pandas, see: https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#plotting-backends

So as long these libraries keep supporting pandas, and as long as the Polars signatures are strict subsets of the pandas ones, it should be safe!

I'd still like to see a Plotting DataFrame API Standard though, both to future-proof things and to spur more implementations - but I'd still suggest a 2-phase approach:

  • first, just implement a strict subset of what pandas does, and check that it works consistently across supported backends
  • then, codify this into a standard, and slowly expand it with inputs from both dataframe and plotting libraries. I already have buy-in for placing the standard in the data-apis org

EDIT

I've got a 3.5 hour train journey on Sunday to visit relatives, so...maybe that'll be enough time to actually draft out a Plotting DataFrame API Standard and tag relevant people for comments. Will see if I can do this 🤞 I like the idea of bringing so many projects together for such a collaboration 🤝

@ritchie46
Copy link
Member

An alternative approach would be to create adapters for various plotting backends, but that would increase our maintenance burden - though it doesn't have to be too bad.

Yes, I can foresee having some indirection to ensure proper mappings backend: {"plotly": {"plot_def": "plot}}. That would allow our API parameters to follow our standards. In any case that doesn't have to be a blocker for now. We tag this experimental and we learn.

:toctree: api/
:template: autosummary/accessor_method.rst

# DataFrame.plot.line
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this throws

Warning, treated as error:
[autosummary] failed to import polars.DataFrame.plot.line.
Possible hints:
* ImportError: 
* ModuleNotFoundError: No module named 'polars.DataFrame'
* AttributeError: 'property' object has no attribute 'line'
make: *** [Makefile:27: html] Error 2

if I uncomment it, can't make sense of why though

@MarcoGorelli
Copy link
Collaborator Author

170 lines of code. That's maintainable. :D

I think it can (and probably should!) be made simpler - updated version: #13238

@MarcoGorelli
Copy link
Collaborator Author

closing in favour of #13238, thanks all for your inputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants