Skip to content

[query/ggplot] add support for MatrixTables #12293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 25, 2022

Conversation

iris-garden
Copy link
Contributor

Currently, ggplot expects a Table to be passed in as its data, and errors if passed a MatrixTable. With this change, the following code:

import hail as hl
mt = hl.utils.range_matrix_table(10, 10)
mt = mt.annotate_entries(entry_idx = mt.row_idx + mt.col_idx)
mt.show()

Which produces a MatrixTable that looks like this:

+---------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
| row_idx | 0.entry_idx | 1.entry_idx | 2.entry_idx | 3.entry_idx | 4.entry_idx | 5.entry_idx | 6.entry_idx | 7.entry_idx |
+---------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|   int32 |       int32 |       int32 |       int32 |       int32 |       int32 |       int32 |       int32 |       int32 |
+---------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
|       0 |           0 |           1 |           2 |           3 |           4 |           5 |           6 |           7 |
|       1 |           1 |           2 |           3 |           4 |           5 |           6 |           7 |           8 |
|       2 |           2 |           3 |           4 |           5 |           6 |           7 |           8 |           9 |
|       3 |           3 |           4 |           5 |           6 |           7 |           8 |           9 |          10 |
|       4 |           4 |           5 |           6 |           7 |           8 |           9 |          10 |          11 |
|       5 |           5 |           6 |           7 |           8 |           9 |          10 |          11 |          12 |
|       6 |           6 |           7 |           8 |           9 |          10 |          11 |          12 |          13 |
|       7 |           7 |           8 |           9 |          10 |          11 |          12 |          13 |          14 |
|       8 |           8 |           9 |          10 |          11 |          12 |          13 |          14 |          15 |
|       9 |           9 |          10 |          11 |          12 |          13 |          14 |          15 |          16 |
+---------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+-------------+
showing the first 8 of 10 columns

Can be plugged into ggplot like so:

from hail.ggplot import *
fig = ggplot(mt, aes(x=mt.row_idx, y=mt.entry_idx)) + geom_point()
fig.show()

To produce this plot:

Screen Shot 2022-10-07 at 15 30 50

This is accomplished using Expr._to_table to transform the MatrixTable such that the fields used to generate the plot can be straightforwardly selected from it.

Copy link
Contributor

@tpoterba tpoterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! a couple test comments

expected = [(0,0),(0,1),(0,2),(1,1),(1,2),(1,3),(2,2),(2,3),(2,4)]
for idx, (x, y) in enumerate(zip(data.x, data.y)):
assert(expected[idx][0] == x)
assert(expected[idx][1] == y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a test that we get the right output for non-entry-indexed fields? mt.row_idx and mt.row_idx * 2 for instance (I want to make sure we get 3 points, not 9)?

Also, a sneaky way this testing pattern can be violated is if the lengths of expected isn't the same as data.x or data.y. Let's add an assertion that the lengths of all three agree.

@iris-garden iris-garden force-pushed the query/ggplot/matrix-tables branch from ff54240 to 21f0d29 Compare October 11, 2022 14:00
@iris-garden iris-garden force-pushed the query/ggplot/matrix-tables branch from 21f0d29 to d7fe153 Compare October 25, 2022 18:12
CHANGELOG: Added support for `hail.MatrixTable`s to `hail.ggplot`.
@danking danking merged commit 178feda into hail-is:main Oct 25, 2022
@iris-garden iris-garden deleted the query/ggplot/matrix-tables branch October 26, 2022 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants