Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Refactor ggplot to use pandas, support alpha on histograms and bar charts #11317

Merged
merged 13 commits into from
Feb 7, 2022

Conversation

johnc1231
Copy link
Contributor

This PR:

  1. Refactors listify to return pandas dataframes, and every geom to take in pandas dataframes. This significantly simplifies the code and should also speed things up.
  2. Refactors the apply_to_fig method of most of the geoms to rely on a specification dict and then just loop over that. This simplifies adding a new argument / aesthethic. In the future, it may be best to just make all of the geoms take **kwargs so that arguments added to that dict will immediately be used for plotting, as right now I have to add something to geom_bar, GeomBar.__init__, and that dictionary for it to start showing up in plots.
  3. Adds identity as a bar position, which means to plot bars on top of each other. This is useful along with the alpha argument added to bars and histograms, which sets transparency of points.

@johnc1231
Copy link
Contributor Author

I'm still figuring out what the abstractions should be here. This refactoring has resulted in the apply_to_figure method of most geoms looking very similar, but still with little differences. Reluctant to commit to a very stringent class hierarchy yet and didn't want to further complicate this PR, but other than hline and vline, feels close to a world where each geom just overrides plot_one_group and the general looping over groups is handled by the parent class.

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just an obsoleted comment.

hail/python/hail/ggplot/stats.py Outdated Show resolved Hide resolved
for s, i in zip(listified_agg_result, numbering):
numbered_result.append(s.annotate(group=i))
listified_agg_result = numbered_result
df_agg_result["group"] = [numberer[tuple(x)] for _, x in subsetted_to_discrete.iterrows()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can leave this for future improvement, but I suspect you could do this more easily and faster with pandas groupby.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to do this in future improvement, since I'm currently in the midst of refactoring this a bit due to a bug report about running out of colors to plot with.

@danking danking merged commit b2129af into hail-is:main Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants