-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query/ggplot] Updates faceting to respect legend grouping #12497
Conversation
10531f3
to
dec969f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, a few thoughts on tightening the code up
|
||
Returns | ||
------- | ||
:class:`FigureAttribute` | ||
The faceter. | ||
|
||
""" | ||
return FacetWrap(facets) | ||
return FacetWrap(facets, nrow, ncol, scales) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that we error if both nrow and ncol are non-missing. I think its better to make our users be explicit rather than implicitly work but potential confuse them down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an error!
@@ -49,14 +55,43 @@ def get_expr_to_group_by(self): | |||
|
|||
class FacetWrap(Faceter): | |||
|
|||
def __init__(self, facets): | |||
def __init__(self, facets, nrow, ncol, scales): | |||
self.nrow = nrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pursuant to the above, I'd probably assert mutual exclusivity of nrow and ncol here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an error!
hail/python/hail/ggplot/facets.py
Outdated
ncol = int(math.ceil(math.sqrt(num_facet_values))) | ||
nrow = int(math.ceil(num_facet_values / ncol)) | ||
def _calc(self, num_facet_values, n): | ||
return int(math.ceil(num_facet_values / n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use integer math for this:
(num_facet_values + n - 1) // n
I prefer a more descriptive name than _calc. Maybe _n_partitions(self, items, n_splits)
? This could be in hailtop/utils.py, it's a general purpose thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced!
hail/python/hail/ggplot/facets.py
Outdated
return (self.nrow, self._calc(num_facet_values, self.nrow)) | ||
else: | ||
ncol = int(math.ceil(math.sqrt(num_facet_values))) | ||
return (self._calc(num_facet_values, ncol), ncol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This biases towards more columns rather than more rows. Seems fine. I'm not sure if I should prefer wide rectangles or long rectangles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's kinda what I figured haha 🤷♀️
hail/python/hail/ggplot/facets.py
Outdated
"shared_xaxes": False, | ||
"shared_yaxes": False, | ||
} | ||
}).get(self.scales, base) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should error if you give an invalid scales parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an error!
hail/python/hail/ggplot/geoms.py
Outdated
for aes_name, legend_group in legends.items(): | ||
if len(legend_group) > 1: | ||
number_of_displayed_legends += 1 | ||
traces = [*traces, [fig_so_far, df, facet_row, facet_col, values, trace_category]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about trace_category. Can you explain to me a bit more about what the goal is here? In particular, I'm generally suspicious of iterating through a group of values and arbitrarily taking the last one that is not None (what if there are multiple not None category
s?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace_category
is what will be used to label this group on the legend iff we are making an interactive plot, so in my example above, if we only had the color
aesthetic and not the shape
one, trace_category
would end up being "bad"
in one iteration, and "good"
in the other. Because there's only one shape
category and it was created by default instead of using a category from a user-provided aesthetic mapping, the value of category
when we're looking at the shape
aesthetic in the iteration is None
. Since trace_category
will only actually be used if we are making an interactive plot, and since we only support interactivity when there's one or fewer aesthetic mappings that will appear in the legend with multiple values, we can assume that if we have a category
that is None
, we have exactly one category
that is None
.
Agreed that this is suspicious haha, but hopefully we can ditch plotly and the way we currently do grouped legends soonish 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so, check that I understand this correctly:
grouped_data
is List[DataFrame]
where the DataFrame contains one column for each Hail field needed to render this geometry. I think each DataFrame represents a different "grouping" of the data. In particular, I think this comes from Stat.listify
. We should add a type to that method. I think it returns List[DataFrame]
. I also think there's a bug in StatNone. It should return [pd.DataFrame({})]
.
IIUC, there's a "grouping" for each "visually manifested" scale. "visually manifested" means it's a discrete scale and not part of the tooltip. I think this should be called data_by_visible_scale
or something like that.
Alright, so, IIUC, we're trying to collect all the possible values of these visually manifested scales. From that information, we can determine if we need a legend and, furthermore, if we need a static legend.
I think it's also the case that color_legend
is the description of the legend itself for color
s. Same for shape_legend and shape.
When we're looping through the data_by_visible_scale
, it's not obvious that we don't get conflicting values for category
across different data frames. Let's be concrete. In your example above there are four facets ({good, bad} cross {minor, major}) with two scales: color & shape. Color is perfectly correlated with {good, bad}. data_by_visible_scale
will have a data frame for each combination of color and shape that appears in that particular facet. OK, so after the first iteration of this loop, legends should be this:
{
"shape": {"shape": ["circle"]},
"color": {"color": ["blue"]}
}
Then the next iteration, unless I'm totally misunderstanding here, would blow away these values and replace them with:
{
"shape": {"shape": ["circle", "square"]},
"color": {"color": ["red"]}
}
So clearly I'm misunderstanding something. I think it must be my understanding of color_legend
/ category, what is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call, added return type annotations for listify
and fixed the return value for StatNone
!
for the example in the PR description, this is what legends
looks like:
### `apply_to_fig` call ###
#### iteration of `grouped_data` loop ####
{'color': {'bad': 'rgb(0, 219, 207)'}, 'shape': {'no': 'square'}}
#### iteration of `grouped_data` loop ####
{'color': {'bad': 'rgb(0, 219, 207)'}, 'shape': {'no': 'square', 'yes': 'circle'}}
### `apply_to_fig` call ###
#### iteration of `grouped_data` loop ####
{'color': {'bad': 'rgb(0, 219, 207)'}, 'shape': {'yes': 'circle'}}
### `apply_to_fig` call ###
#### iteration of `grouped_data` loop ####
{'color': {'good': 'rgb(255, 0, 0)'}, 'shape': {'no': 'square'}}
#### iteration of `grouped_data` loop ####
{'color': {'good': 'rgb(255, 0, 0)'}, 'shape': {'no': 'square', 'yes': 'circle'}}
### `apply_to_fig` call ###
#### iteration of `grouped_data` loop ####
{'color': {'good': 'rgb(255, 0, 0)'}, 'shape': {'yes': 'circle'}}
so each aesthetic (shape, color) has a key in legends, which is mapped to a dict
mapping the possible values for that aesthetic (the "categories" extracted from the corresponding column of the hail Table
or Expression
) to their value for that aesthetic.
in the None
case i mentioned above, we get something like this:
{'color': {None: 'black'}, 'shape': {'no': 'circle', 'yes': 'square'}}
due to .... some other code somewhere 😭 (maybe somewhere in ScaleDiscrete
?) ... the value for the category name when we don't actually have more than one category is None
does that actually address your question? sorry, i got a bit lost along the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think I kinda see what's happening now. I have a suspicion we could find a slightly simpler, if less efficient, solution by passing over the data twice: once to compute the legend and once to add the trace. No need to act on that now. Let's keep it in mind as we look to Altair though.
hail/python/hail/ggplot/geoms.py
Outdated
count = 0 | ||
for legend_group in legends.values(): | ||
count += 1 if len(legend_group) > 1 else 0 | ||
requires_static = count > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a nit but I think this whole bit is easier to read as:
has_displayed_legend = any(
len(legend_group) > 1 for legend_group in legends.values()
)
requries_static = is_faceted or has_displayed_legend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code is actually checking that there are at least two legend_group
s with len > 1
(if we only have one, we can still support interactivity), so any
wouldn't work here, as far as I can tell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh. This kind of makes the argument for me that this is a bit hard to follow. How about this:
non_empty_legend_groups = (g for g in legends.values() if len(g) > 1)
has_at_least_two_legend_groups = len(non_empty_legend_groups) >= 2
requries_static = is_faceted or has_at_least_two_legend_groups
The first line is a generator for the kinds of groups you describe. The second line counts the number of them.
Let me know if I'm picking nits here! Maybe I'm being obtuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's fair! updated based on your suggestion 😸
hail/python/hail/ggplot/geoms.py
Outdated
for aes_name, legend_group in legends.items(): | ||
if len(legend_group) > 1: | ||
number_of_displayed_legends += 1 | ||
traces = [*traces, [fig_so_far, df, facet_row, facet_col, values, trace_category]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use traces.append(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced!
|
||
if requires_static: | ||
for trace in traces: | ||
self._add_trace(*trace[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit too clever 😉 . What's the harm in sending the trace_category into _add_trace
in the static case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I'm using whether it's passed in to _add_trace
to decide whether the trace should have an interactive legend entry. Should I pass a flag to it instead, or do something else?
yaxis=dict(linecolor="black", showticklabels=True), | ||
**{ | ||
f"{var}axis{idx}": {"linecolor": "black", "showticklabels": True} | ||
for idx in range(2, n_facet_rows + n_facet_cols + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably worthy of a short comment explaining the sillyness of the axis keyword argument names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
# xaxis stands in for xaxis1; xaxis2 and beyond are as expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment!
b5af6a5
to
8352869
Compare
7656e08
to
9dc7a11
Compare
hail/python/hail/ggplot/facets.py
Outdated
@@ -23,40 +27,85 @@ def vars(*args): | |||
return hl.struct(**{f"var_{i}": arg for i, arg in enumerate(args)}) | |||
|
|||
|
|||
def facet_wrap(facets): | |||
def facet_wrap(facets: StructExpression, nrow: int = None, ncol: int = None, scales: str = "fixed") -> "FacetWrap": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, one more thing, let's make nrow, ncol, and scales keyword only. We generally prefer keyword only arguments unless there's a strong usability argument. In this case, I'd argue it's actually more confusing that facet_wrap(..., 3)
is allowed, there's no obvious reason that rows would come before cols.
def facet_wrap(facets: StructExpression, nrow: int = None, ncol: int = None, scales: str = "fixed") -> "FacetWrap": | |
def facet_wrap(facets: StructExpression, *, nrow: int = None, ncol: int = None, scales: str = "fixed") -> "FacetWrap": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
hail/python/hail/ggplot/geoms.py
Outdated
count = 0 | ||
for legend_group in legends.values(): | ||
count += 1 if len(legend_group) > 1 else 0 | ||
requires_static = count > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh. This kind of makes the argument for me that this is a bit hard to follow. How about this:
non_empty_legend_groups = (g for g in legends.values() if len(g) > 1)
has_at_least_two_legend_groups = len(non_empty_legend_groups) >= 2
requries_static = is_faceted or has_at_least_two_legend_groups
The first line is a generator for the kinds of groups you describe. The second line counts the number of them.
Let me know if I'm picking nits here! Maybe I'm being obtuse.
hail/python/hail/ggplot/geoms.py
Outdated
for aes_name, legend_group in legends.items(): | ||
if len(legend_group) > 1: | ||
number_of_displayed_legends += 1 | ||
traces = [*traces, [fig_so_far, df, facet_row, facet_col, values, trace_category]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so, check that I understand this correctly:
grouped_data
is List[DataFrame]
where the DataFrame contains one column for each Hail field needed to render this geometry. I think each DataFrame represents a different "grouping" of the data. In particular, I think this comes from Stat.listify
. We should add a type to that method. I think it returns List[DataFrame]
. I also think there's a bug in StatNone. It should return [pd.DataFrame({})]
.
IIUC, there's a "grouping" for each "visually manifested" scale. "visually manifested" means it's a discrete scale and not part of the tooltip. I think this should be called data_by_visible_scale
or something like that.
Alright, so, IIUC, we're trying to collect all the possible values of these visually manifested scales. From that information, we can determine if we need a legend and, furthermore, if we need a static legend.
I think it's also the case that color_legend
is the description of the legend itself for color
s. Same for shape_legend and shape.
When we're looping through the data_by_visible_scale
, it's not obvious that we don't get conflicting values for category
across different data frames. Let's be concrete. In your example above there are four facets ({good, bad} cross {minor, major}) with two scales: color & shape. Color is perfectly correlated with {good, bad}. data_by_visible_scale
will have a data frame for each combination of color and shape that appears in that particular facet. OK, so after the first iteration of this loop, legends should be this:
{
"shape": {"shape": ["circle"]},
"color": {"color": ["blue"]}
}
Then the next iteration, unless I'm totally misunderstanding here, would blow away these values and replace them with:
{
"shape": {"shape": ["circle", "square"]},
"color": {"color": ["red"]}
}
So clearly I'm misunderstanding something. I think it must be my understanding of color_legend
/ category, what is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
0d2e20d
to
86d9df2
Compare
86d9df2
to
9c706ee
Compare
hail/python/hail/ggplot/geoms.py
Outdated
@@ -44,7 +44,7 @@ def _update_legend_trace_args(self, trace_args, legend_cache): | |||
trace_args["showlegend"] = False | |||
else: | |||
trace_args["showlegend"] = True | |||
legend_cache.add(trace_args["name"]) | |||
legend_cache[trace_args["name"]] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 197, you **prev
where prev is a value from the legend_cache. **True
will error. Should this be assigning {}
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't cause problems on 197 because GeomPoint.apply_to_fig
never calls _update_legend_trace_args
. i changed legend_cache
to a dict
instead of a set
because i needed to put the whole legend_group
in there rather than just the aes_name
, but since all the other geoms still function the same as before, this was just the easiest way i could think of to have it basically still be a set
(just the set of the dict
's keys, while ignoring the values)
that said, i think it's still a good idea to update it, so that the type of legend_cache
is more consistent!
def listify(self, agg_result): | ||
return pd.DataFrame({}) | ||
def listify(self, agg_result) -> List[DataFrame]: | ||
return [pd.DataFrame({})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
9c706ee
to
76de8fd
Compare
CHANGELOG: Added support for `scales`, `nrow`, and `ncol` arguments, as well as grouped legends, to `hail.ggplot.facet_wrap`. This change brings the `facet_wrap` interface more in line with [the `ggplot2` implementation](https://ggplot2-book.org/facet.html#facet-wrap) by defaulting to show axes (with tick marks) for all facets, as well as providing `scales`, `nrow` and `ncol` arguments that the user can use to specify whether the scales should be the same across facets and how many rows or columns to use. It also updates the faceting code to avoid duplicating legend entries when [grouped legends](hail-is#12254) are used, and to disable interactivity accordingly.
This change brings the
facet_wrap
interface more in line with theggplot2
implementation by defaulting to show axes (with tick marks) for all facets, as well as providingscales
,nrow
andncol
arguments that the user can use to specify whether the scales should be the same across facets and how many rows or columns to use.It also updates the faceting code to avoid duplicating legend entries when grouped legends are used, and to disable interactivity accordingly.
With this update, running this code:
Produces the following plot: