Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KDE functionality to hist and hist2d plots #33

Open
lukelbd opened this issue Sep 5, 2019 · 3 comments
Open

Add KDE functionality to hist and hist2d plots #33

lukelbd opened this issue Sep 5, 2019 · 3 comments

Comments

@lukelbd
Copy link
Collaborator

lukelbd commented Sep 5, 2019

I'd like to add KDE (kernel density estimation) functionality for the 1D and 2D histogram plotting functions, hist, hist2d, and maybe hexbin. Users can then optionally add marginal distribution panels with panel_axes.

Currently, the only matplotlib plotting function supporting KDE estimation is violinplot, but the result is often gross -- the "violins" do not smoothly taper to zero-width tails like in seaborn. Instead they abruptly cut off at the distribution minimum/maximum. So, we shouldn't try to use the existing KDE engine -- we should implement a new KDE estimation engine, similar to seaborn, and use it to power hist, hist2d, and violinplot. This may involve writing a new violinplot from scratch.

@lukelbd lukelbd added the feature label Sep 6, 2019
@lukelbd lukelbd changed the title Add KDE functionality to histograms, box/violin plots, hexbin-style plots Add KDE functionality to hist and hist2d plots Sep 6, 2019
@bugsuse
Copy link

bugsuse commented Apr 19, 2021

It's a very nice feature! I can hardly wait!

@lukelbd lukelbd added this to the Version 0.8 milestone Jul 21, 2021
@lukelbd lukelbd modified the milestones: Version 0.8, Version 0.9 Aug 19, 2021
@Jhsmit
Copy link

Jhsmit commented Aug 22, 2021

I made some custom KDE graphs for a publication recently:

image

These type of graphs are called 'raincloudplots': https://wellcomeopenresearch.org/articles/4-63 (which has a python implementation but is based on seaborn and therefore has the same problems)
The code for the KDE part of the graph is here: https://github.com/Jhsmit/PyHDX-paper/blob/master/biorxiv_v2/functions/rainbows.py

Feel free to use the code in proplot if you find it useful (although some parts are from joyplot). I'm using scipy's kde function which mostly works fine but especially for the 2D case it can be slow if you have a lot of datapoints.
I might want to try to find time to make a PR myself, I'm a big fan of proplot. I've started using it for my last publication and the subplot layout and sizing options in proplot really made my life a lot easier :) (paper / code)

I'll try to provide some feedback to proplot if that helps you.
PS. perhaps you could consider connecting your repository to zenodo such that the project can be cited.

@lukelbd
Copy link
Collaborator Author

lukelbd commented Aug 23, 2021

Thanks for the code! This is a good base for adding KDE functionality -- I probably won't have time to work on this until later this year but happy to accept PRs if you feel inclined/want it sooner. Proplot's source code recently underwent some major improvements so it should be much easier to contribute.

We probably want to add the following:

  1. Add a shared helper function at the top of axes/plot.py that controls KDE estimation for various plotting functions. Users should be able to pass keyword arguments to the KDE algorithm from the plotting functions.
  2. Add a violinplot option to plot a left- or right-half violin (like in your example), maybe with the argument side='left' and side='right' (or side='top' or side='bottom' for horizontal violins), with side='both' being the default.
  3. Rewrite violinplot to use your method for KDE estimation rather than matplotlib's method. It would probably simply call fill_between or fill_betweenx and then you can add outlines to the violins like you would any other patch. It would still be able to add error bars/boxes using the shared PlotAxes._apply_bar method.
  4. Add a raincloudplot method (with the shorthand raincloud, consistent with other plotting commands) as a thin wrapper that calls boxplot, violinplot, and scatter. It would call boxplot and violinplot with reduced default widths arguments and default side='left' or side='top' for the violins.
  5. Make violinplot have no colormap gradations by default, but let users add them by passing cmap='name' to violinplot or raincloudplot (it should also accept vmin and vmax arguments, but set the default vmin and vmax to the minimum and maximum of all the distributions). To implement colormap gradations, violinplot will set the facecolor of the patch to 'none' (i.e., completely transparent) so that an imshow can be drawn underneath the patch border and "clipped" by the border coordinates, as you've done in your code.
  6. Add kdeplot and kdeplot2d commands (with shorthands kde and kde2d, consistent with other functions) that show KDE estimations using lines and contours (respectively). They should be thin wrappers around plot and contour/contourf, similar to how hist and hist2d are thin wrappers around bar and pcolor.
  7. Add the ability to pass kde=True to hist and hist2d and this will draw the kde and kde2d lines on top of the histograms, analogous to the current ability of passing linewidth=N to contourf and proplot adds an additional contour plot on top of the filled contours. KDE-algorithm or KDE-styling keywords could be passed to hist and hist2d with kde_kw={key: value, ...}, analogous to various other arguments ending in _kw.
  8. Update the user guide with lots of examples! By the time all of these features are added we'd probably need a separate "Statistical plotting" section separate from the current "1d plotting" and "2d plotting" sections.

And glad you find proplot useful :) it's already published on Zenodo but that probably wasn't clear -- there was just a Zenodo badge to the github home page. I've now added a link to the readthedocs homepage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants