Add UpSet plot function to figure_factory #4204

rickymagner · 2023-05-12T03:52:37Z

This PR is heavily inspired by this one, building on this forum post, which gave a minimal implementation of UpSet plots using the figure_factory. UpSet plots are a more natural way to generalize data represented via Venn diagrams, as they are more scalable and it's easier to see differences in bar sizes rather than circles. This PR builds on code introduced in the other PR, but vastly extends functionality, adds the characteristic "marginal" side plot, and includes "full" documentation.

I know in the previous PR, it was stated resources are limited for figure_factory PRs, so I hope by trying to make this as "complete" as possible, it'll be much easier to get merged. I'd be happy to discuss the code with any reviewers, and try to provide more details on the code below. As this includes both new features and corresponding documentation, I have both checklists here. I greatly appreciate any feedback on trying to get this PR compliant and improving the code.

Documentation PR

Code PR

I have read through the contributing notes and understand the structure of the package. In particular, if my PR modifies code of plotly.graph_objects, my modifications concern the codegen files and not generated files.
I have added tests (if submitting a new feature or correcting a bug) or
modified existing tests.
For a new feature, I have added documentation examples in an existing or
new tutorial notebook (please see the doc checklist as well).
I have added a CHANGELOG entry if fixing/changing/adding anything substantial.
For a new feature or a change in behaviour, I have updated the relevant docstrings in the code to describe the feature or behaviour (please see the doc checklist as well).

Notes on the Code

To make it easier to review, I'll provide a brief description of the code layout. The code is somewhat similar to the create_quiver method in the feature_factory. The main create_upset method creates an instance of the _Upset class using the user inputs. Aside from a few utilities for doing some preprocessing, most of the plot generating methods are contained in this class. This structure was used to make it a little easier for the conceptual major steps in generating the plot to freely share data using the class attributes.

The _Upset class performs the following steps:

Validate some user inputs belong to an explicit set of allowable entries (e.g. sort_by is either Counts or Intersections).
Perform some data manipulation to collect the appropriate subset/intersection counts. This includes inferring whether the data was provided in wide or condensed format, and handling the logic for splitting across color and x.
Create the "primary plot" which sits at the top of the final output, typically a bar chart (though the user can modify this choice when considering a distribution of counts over x values).
Add a "switchboard plot" (i.e. a carefully crafted scatter plot) below the primary plot, representing which intersection corresponds to the figure above it.
Add a "marginal plot" on the left, representing the counts of each of the subset/tag categories.

Any feedback is greatly appreciated!

A Preview

As motivation, here's a nice example plot generated in one line with a well-formatted DataFrame:

ff.create_upset(df, color='color', title='My UpSet Plot')

…modes.

…or subset inclusion.

…ing subsets.

rickymagner · 2023-05-12T15:40:03Z

It seems some tests are failing on older versions of Python because of a (presumably) older version of pandas which doesn't have value_counts for dataframes yet. Is there a way to have tests require certain version of pandas or otherwise ignore those older versions?

Also, if anyone has some insight in getting the notebook test to pass, that'd be great. It seems to be currently failing because it doesn't like the permalink attribute. I just copied and modified the notebook attributes from another example.

alexcjohnson · 2023-05-26T13:43:53Z

Thanks for the PR @rickymagner !

re: Pandas: looks like our Python 3.6 and 3.7 "optional" jobs still run against Pandas 0.24:

plotly.py/packages/python/plotly/test_requirements/requirements_36_optional.txt

Line 3 in bb5c2e2

pandas==0.24.2

plotly.py/packages/python/plotly/test_requirements/requirements_37_optional.txt

Line 3 in bb5c2e2

pandas==0.24.2

We do want to be flexible in the pandas versions we support, though 0.24 is pretty ancient. Looks like 1.1.5 is the last version that keeps Python 3.6 support and that's 2.5 years old so I'd be comfortable bumping the version in the above two files to 1.1.5 at this point. Is that new enough to support value_counts? If not, we'll need to include fallback code to mimic value_counts using older methods.

alexcjohnson · 2023-05-26T14:12:39Z

@rickymagner sorry for the contradictory notes but... thinking about this a bit more, I'd like us not to add more figure factories to plotly.py. As mentioned in #3833 (comment) further extensions like this would be better in a separate package - either one package to collect all sorts of new figure factories, or a package just for upset plots.

The challenge for us of adding figure factories here is it confuses people about plotly.py vs other ways to make Plotly charts, such as direct usage of plotly.js.

rickymagner · 2023-06-08T13:59:58Z

Thanks for getting back on this. If anything changes in the future and you'd like to discuss merging this into the FF package, let me know!

rickymagner added 11 commits May 10, 2023 11:36

Adding stable base for UpSet plots w/ single or multiple group color …

b989178

…modes.

Added functionality to allow user to specify column of lists/tuples f…

e06813b

…or subset inclusion.

Padded intersection counts with zeros when color groups had some miss…

cef0fb8

…ing subsets.

Added more useful hover data for switchboard.

df25318

Refactored plot args, updated hovers, etc

a6f04a7

Added main docstring, fixed some functionality for grouping, etc

c654b80

Changed margins to fix title issue; added webdocs

537ec39

Added some simple tests and removed some debugging code

41f9c40

Fixed inheritence for test

dbfab5d

Changed order for subset labels in test

1af712f

Updated permalink in notebook doc

38f7c8f

rickymagner added 3 commits May 12, 2023 12:13

Updated CHANGELOG

fd2420d

Fixed some bugs in scaling/labeling margin plot

4e0f60f

Merge branch 'master' into rm_add_upset_plot

c8b74b5

alexcjohnson closed this Jun 3, 2023

alexcjohnson mentioned this pull request Jun 3, 2023

Venturellac dev ff upset plot #3833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add UpSet plot function to figure_factory #4204

Add UpSet plot function to figure_factory #4204

Uh oh!

rickymagner commented May 12, 2023 •

edited

Loading

Uh oh!

rickymagner commented May 12, 2023

Uh oh!

alexcjohnson commented May 26, 2023

Uh oh!

alexcjohnson commented May 26, 2023

Uh oh!

rickymagner commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add UpSet plot function to figure_factory #4204

Add UpSet plot function to figure_factory #4204

Uh oh!

Conversation

rickymagner commented May 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation PR

Code PR

Notes on the Code

A Preview

Uh oh!

rickymagner commented May 12, 2023

Uh oh!

alexcjohnson commented May 26, 2023

Uh oh!

alexcjohnson commented May 26, 2023

Uh oh!

rickymagner commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rickymagner commented May 12, 2023 •

edited

Loading