-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hail] Promote localize_entries to public & tested #5247
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs comments
hail/python/hail/matrixtable.py
Outdated
---------- | ||
entries_array_field_name : :obj:`str` | ||
The name of the table field containing the array of entry structs | ||
for the given row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: periods at the end of parameter descriptions
hail/python/hail/matrixtable.py
Outdated
def localize_entries(self, | ||
entries_array_field_name=None, | ||
columns_array_field_name=None) -> 'Table': | ||
"""Represent this matrix as a table of entry-rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very confusing sentence. How about:
"Convert the matrix table to a table with entries localized as an array of structs."
array([[0, 0, 0], | ||
[0, 1, 2], | ||
[0, 2, 4]]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note that filtered entries are represented as a missing struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ditched the struct on the second to last line so this would blow up without latest numpy, I assume, b/c you can't represent a missing value in a numpy ndarray. I think this is doctested, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean add a "Notes" section to the docs, and describe that the array of entries always contains Ncols elements (structs), with filtered entries appearing as missing structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want a notes section with a description of the type!
hail/python/hail/matrixtable.py
Outdated
if entries_array_field_name is None: | ||
t = t.drop(entries) | ||
if columns_array_field_name is None: | ||
t = t.drop_globals(cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
drop_globals isn't a thing. Just drop
hail/python/hail/matrixtable.py
Outdated
+---------+---------+-------+ | ||
| int32 | int32 | int32 | | ||
+---------+---------+-------+ | ||
| 0 | 2 | 0 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are the col indices in reverse order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doctest also flagged this. I must've had a buggy version of hail locally when I ran this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two docs requests
+---------+---------+-------+ | ||
|
||
>>> t = mt.localize_entries('entry_structs', 'columns') | ||
>>> t = t.select(entries = t.entry_structs.map(lambda entry: entry.x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a .describe()
here - that would help people understand what this is doing
hail/python/hail/matrixtable.py
Outdated
|
||
Warning | ||
------- | ||
This operation may increase the size of a partition. Use with care on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, this isn't true, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"may" is the operative word here, I'm trying to future proof against changes in representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move describe, add notes section documenting types.
|
||
>>> t = mt.localize_entries('entry_structs', 'columns') | ||
>>> t = t.select(entries = t.entry_structs.map(lambda entry: entry.x)) | ||
>>> t.describe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this describe should be above the select -- to best demonstrate the localized schema
array([[0, 0, 0], | ||
[0, 1, 2], | ||
[0, 2, 4]]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still want a notes section with a description of the type!
Parameters | ||
---------- | ||
entries_array_field_name : :obj:`str` | ||
The name of the table field containing the array of entry structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you added it in the parameter description. Can we move the last sentence of both of these up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm generally a bit allergic to the proliferation of Notes that we have in the docs. Seems like the parameters or returns section is the right place to put this information, right?
responded, I feel like this should be in parameters and returns, what's the argument for a Notes?
Numpy style guide indicates that the notes section is the appropriate place for detail on the algorithm. In particular, the types of the column and entry fields produced aren't directly related to their names, and having them in the parameter description is unintuitive. This style is also inconsistent with the rest of our documentation. In reading the above link, I've realized that all our docs sections are out of order - the Parameters section should come first, then Returns, then Notes, then Examples. We were hoodwinked by the Sphinx example page. I also strongly reject a warning about a possible future scaling limitation of the function that might never even exist in the life of 0.2. |
------- | ||
:class:`.Table` | ||
A table whose fields are the row fields of this matrix table plus | ||
one field named ``entries_array_field_name``. The global fields of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these parameter references should be single backticks to italicize (numpy style)
Issue to track using the numpy style docs: #5304 |
I capitulate, but I still don't like the added visual noise of a separate section.
@tpoterba @cseed I think this should be public, it's been useful more than once, and it's reasonable to do on certain datasets (currently all datasets, but maybe not when we have 10 million columns).