Skip to content

[query] Refactor ggplot to use pandas, support alpha on histograms and bar charts#11317

Merged
danking merged 13 commits into
hail-is:mainfrom
johnc1231:johnc-plot-alpha
Feb 7, 2022
Merged

[query] Refactor ggplot to use pandas, support alpha on histograms and bar charts#11317
danking merged 13 commits into
hail-is:mainfrom
johnc1231:johnc-plot-alpha

Conversation

@johnc1231
Copy link
Copy Markdown
Contributor

This PR:

  1. Refactors listify to return pandas dataframes, and every geom to take in pandas dataframes. This significantly simplifies the code and should also speed things up.
  2. Refactors the apply_to_fig method of most of the geoms to rely on a specification dict and then just loop over that. This simplifies adding a new argument / aesthethic. In the future, it may be best to just make all of the geoms take **kwargs so that arguments added to that dict will immediately be used for plotting, as right now I have to add something to geom_bar, GeomBar.__init__, and that dictionary for it to start showing up in plots.
  3. Adds identity as a bar position, which means to plot bars on top of each other. This is useful along with the alpha argument added to bars and histograms, which sets transparency of points.

@johnc1231
Copy link
Copy Markdown
Contributor Author

I'm still figuring out what the abstractions should be here. This refactoring has resulted in the apply_to_figure method of most geoms looking very similar, but still with little differences. Reluctant to commit to a very stringent class hierarchy yet and didn't want to further complicate this PR, but other than hline and vline, feels close to a world where each geom just overrides plot_one_group and the general looping over groups is handled by the parent class.

Copy link
Copy Markdown
Member

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just an obsoleted comment.

Comment thread hail/python/hail/ggplot/stats.py Outdated
for s, i in zip(listified_agg_result, numbering):
numbered_result.append(s.annotate(group=i))
listified_agg_result = numbered_result
df_agg_result["group"] = [numberer[tuple(x)] for _, x in subsetted_to_discrete.iterrows()]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this for future improvement, but I suspect you could do this more easily and faster with pandas groupby.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to do this in future improvement, since I'm currently in the midst of refactoring this a bit due to a bug report about running out of colors to plot with.

@danking danking merged commit b2129af into hail-is:main Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants