Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_stacked_bars feature #137

Merged
merged 7 commits into from Jul 31, 2021
Merged

add_stacked_bars feature #137

merged 7 commits into from Jul 31, 2021

Conversation

jnothman
Copy link
Owner

@jnothman jnothman commented Jul 8, 2021

Resolves #136, thanks @amitfenn and @yollct for the inspiration!

TODO:

  • implementation
  • example
  • support Mapping {label: color} for specification of colors
  • docstring
  • test

@jnothman jnothman changed the title add_stacked_bars; needs test and docstring add_stacked_bars feature Jul 8, 2021
@jnothman
Copy link
Owner Author

jnothman commented Jul 19, 2021

This should now be working and documented. It was more effort than I anticipated to get the order of the stacked bars and the legend to match my (latin alphabet-biased) expectations in both vertical and horizontal orientation. Writing extensive automated tests - although tricky - was very valuable! (Currently legend order is fixed to be lexicographic, but I've got a TODO to make it more configurable in the future...)

Maybe you'd like to try it out with the following, @amitfenn:

pip install https://github.com/jnothman/UpSetPlot/archive/stacked-bars.zip

I'd appreciate feedback on API firendliness and docstring.

@yollct
Copy link

yollct commented Jul 21, 2021

Hi Joel,
I had a look at the new API, which is really cool. But I wonder if it is possible to plot directly with counts, which doesn't depend on the 'by' column.

Also, I got this error when I am trying with your example (the titanic data), this could be resolved by wrapping list() to data.index.names

idx = np.flatnonzero(data.index.to_frame()[data.index.names].values)

The error:

TypeError: Argument 'obj' has incorrect type (expected list, got FrozenList)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-c3cd63a5bdd2> in <module>
      6 upset = UpSet(df, intersection_plot_elements=0)  # disable the default bar chart
      7 upset.add_stacked_bars(by="Sex", title="Count by gender", elements=10)
----> 8 upset.plot()

~/miniconda3/lib/python3.7/site-packages/upsetplot/plotting.py in plot(self, fig)
    816         matrix_ax = self._reorient(fig.add_subplot)(specs['matrix'],
    817                                                     sharey=shading_ax)
--> 818         self.plot_matrix(matrix_ax)
    819         totals_ax = self._reorient(fig.add_subplot)(specs['totals'],
    820                                                     sharey=matrix_ax)

~/miniconda3/lib/python3.7/site-packages/upsetplot/plotting.py in plot_matrix(self, ax)
    658         n_cats = data.index.nlevels
    659 
--> 660         idx = np.flatnonzero(data.index.to_frame()[data.index.names].values)
    661         c = np.array([self._other_dots_color] * len(data) * n_cats, dtype='O')
    662         c[idx] = self._facecolor

~/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3443 
   3444         # Do we have a (boolean) 1d indexer?
-> 3445         if com.is_bool_indexer(key):
   3446             return self._getitem_bool_array(key)
   3447 

~/miniconda3/lib/python3.7/site-packages/pandas/core/common.py in is_bool_indexer(key)
    144     elif isinstance(key, list):
    145         # check if np.array(key).dtype would be bool
--> 146         return len(key) > 0 and lib.is_bool_list(key)
    147 
    148     return False

TypeError: Argument 'obj' has incorrect type (expected list, got FrozenList)

Best
Chit Tong

@jnothman
Copy link
Owner Author

Thanks @yollct. This error is due to the recent release of Pandas 1.3.0. Downgrade pandas, or upgrade to 1.3.1 when it is released in coming days. See #141.

But I wonder if it is possible to plot directly with counts, which doesn't depend on the 'by' column.

I don't understand this comment, or perhaps my documentation of by was deficient. In the following diagram, by would be the column which labels each sample as one of Attribute 1, 2, ... 8.

upset_plot1

If by is specified, and sum_over is not, then counts are what is plotted. Does that handle your use case appropriately? Does the description make sense?

        by : str
            Column name within the dataframe for color coding the stacked bars,
            containing discrete or categorical values.

@jnothman
Copy link
Owner Author

I'll merge this, but you should feel free to open further issues as you try it out, @yollct!

@jnothman jnothman merged commit 8532c99 into master Jul 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Stacked bar plots to query attributes.
2 participants