-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] freeze when necessary to avoid hashing issues #12265
Conversation
bump for review! I fixed the lint issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
womp, looked earlier but didn't review. Looks fine, don't see a way around the dict hack with less complexity than the hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
hail/python/hail/expr/types.py
Outdated
def _convert_from_json(self, x): | ||
return frozendict({self.key_type._convert_from_json_na(elt['key']): self.value_type._convert_from_json_na(elt['value']) for elt in x}) | ||
def _convert_from_json(self, x, _should_freeze: bool = False): | ||
return frozendict({self.key_type._convert_from_json_na(elt['key'], _should_freeze=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can this return a normal dict if we don't need to freeze?
hail/python/hail/expr/types.py
Outdated
def _convert_from_json(self, x): | ||
return frozenset({self.element_type._convert_from_json_na(elt) for elt in x}) | ||
def _convert_from_json(self, x, _should_freeze: bool = False): | ||
return frozenset({self.element_type._convert_from_json_na(elt, _should_freeze=True) for elt in x}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
I added some types and also fixed a bug where we would fail to return the decoded version of an ndarray in one branch of the if. |
CHANGELOG: Fix long-standing bug wherein `hl.agg.collect_as_set` and `hl.agg.counter` error when applied to types which, in Python, are unhashable. For example, `hl.agg.counter(t.list_of_genes)` will not error when `t.list_of_genes` is a list. Instead, the counter dictionary will use `FrozenList` keys from the `frozenlist` package.
CHANGELOG: Fix long-standing bug wherein
hl.agg.collect_as_set
andhl.agg.counter
error when applied to types which, in Python, are unhashable. For example,hl.agg.counter(t.list_of_genes)
will not error whent.list_of_genes
is a list. Instead, the counter dictionary will useFrozenList
keys from thefrozenlist
package.Hey @iris-garden ! I figured this was good reviewing practice for you and also a chance to see how we convert data to/from JSON and to/from the JVM (by way of this "encoded" representation which is a binary one). The details of that are not super important to this PR, but you might take a peek to understand the change.
The main issue here is that in Python, you can't write:
Because sets must contain "hashable" data. Python lists are not hashable because they're mutable. This is transitively a problem. For example, the following also fails with the same error because the list inside the tuple is mutable thus the tuple is not (safely) hashable.
Hail's internal language is fully immutable, so every type can be placed inside a set (or used as the keys of a dict). When we convert from Hail's internal representation to Python, we cannot use mutable types in hashable positions. Unfortunately, we also need to maintain backwards compatibility with the way the code currently works. You can see this pretty clearly in the difference between
hl.agg.collect
andhl.agg.collect_as_set
:collected_ls
should be[[1, 2, 3]]
whereascollected_as_set_ls
necessarily uses hashable types:{frozenlist([1, 2, 3])}
.Things are particularly subtle with dictionaries whose keys must always be hashable and whose values need only be hashable if the dictionary itself must be hashable. For example:
in_list
will be[{frozenlist(["hello"]): ["goodbye"]}]
(because we should be backwards compatible with the expectation that lists in value-position are mutable) butin_set
will be{{frozenlist(["hello"]): frozenlist(["goodbye"])}}
(because the dictionary is inside a set and therefore it, itself, must be hashable).Aside: I also do an ugly subclass to put a type parameter on the
FrozenList
class fromfrozenlist
, a Python library.cc: @tpoterba , dang this was trickier than I hoped.