Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Promote localize_entries to public & tested #5247

Merged
merged 11 commits into from Feb 11, 2019

Conversation

@danking
Copy link
Collaborator

@danking danking commented Feb 4, 2019

@tpoterba @cseed I think this should be public, it's been useful more than once, and it's reasonable to do on certain datasets (currently all datasets, but maybe not when we have 10 million columns).

Copy link
Collaborator

@tpoterba tpoterba left a comment

docs comments

----------
entries_array_field_name : :obj:`str`
The name of the table field containing the array of entry structs
for the given row
Copy link
Collaborator

@tpoterba tpoterba Feb 4, 2019

style: periods at the end of parameter descriptions

def localize_entries(self,
entries_array_field_name=None,
columns_array_field_name=None) -> 'Table':
"""Represent this matrix as a table of entry-rows.
Copy link
Collaborator

@tpoterba tpoterba Feb 4, 2019

This is a very confusing sentence. How about:

"Convert the matrix table to a table with entries localized as an array of structs."

array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Copy link
Collaborator

@tpoterba tpoterba Feb 4, 2019

add a note that filtered entries are represented as a missing struct?

Copy link
Collaborator Author

@danking danking Feb 5, 2019

I ditched the struct on the second to last line so this would blow up without latest numpy, I assume, b/c you can't represent a missing value in a numpy ndarray. I think this is doctested, no?

Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

I mean add a "Notes" section to the docs, and describe that the array of entries always contains Ncols elements (structs), with filtered entries appearing as missing structs.

Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

This is the second comment.

Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

I still want a notes section with a description of the type!

if entries_array_field_name is None:
t = t.drop(entries)
if columns_array_field_name is None:
t = t.drop_globals(cols)
Copy link
Collaborator

@tpoterba tpoterba Feb 4, 2019

drop_globals isn't a thing. Just drop

+---------+---------+-------+
| int32 | int32 | int32 |
+---------+---------+-------+
| 0 | 2 | 0 |
Copy link
Collaborator

@tpoterba tpoterba Feb 4, 2019

why are the col indices in reverse order?

Copy link
Collaborator Author

@danking danking Feb 5, 2019

doctest also flagged this. I must've had a buggy version of hail locally when I ran this.

Copy link
Collaborator

@tpoterba tpoterba left a comment

two docs requests

+---------+---------+-------+
>>> t = mt.localize_entries('entry_structs', 'columns')
>>> t = t.select(entries = t.entry_structs.map(lambda entry: entry.x))
Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

add a .describe() here - that would help people understand what this is doing

Warning
-------
This operation may increase the size of a partition. Use with care on
Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

wait, this isn't true, is it?

Copy link
Collaborator Author

@danking danking Feb 5, 2019

"may" is the operative word here, I'm trying to future proof against changes in representation.

Copy link
Collaborator

@tpoterba tpoterba left a comment

move describe, add notes section documenting types.

>>> t = mt.localize_entries('entry_structs', 'columns')
>>> t = t.select(entries = t.entry_structs.map(lambda entry: entry.x))
>>> t.describe()
Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

this describe should be above the select -- to best demonstrate the localized schema

array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

I still want a notes section with a description of the type!

Parameters
----------
entries_array_field_name : :obj:`str`
The name of the table field containing the array of entry structs
Copy link
Collaborator

@tpoterba tpoterba Feb 5, 2019

oh, you added it in the parameter description. Can we move the last sentence of both of these up?

Copy link
Collaborator Author

@danking danking Feb 5, 2019

I think I'm generally a bit allergic to the proliferation of Notes that we have in the docs. Seems like the parameters or returns section is the right place to put this information, right?

@danking danking dismissed tpoterba’s stale review Feb 5, 2019

responded, I feel like this should be in parameters and returns, what's the argument for a Notes?

@tpoterba
Copy link
Collaborator

@tpoterba tpoterba commented Feb 7, 2019

what's the argument for a Notes

Numpy style guide indicates that the notes section is the appropriate place for detail on the algorithm. In particular, the types of the column and entry fields produced aren't directly related to their names, and having them in the parameter description is unintuitive. This style is also inconsistent with the rest of our documentation.

In reading the above link, I've realized that all our docs sections are out of order - the Parameters section should come first, then Returns, then Notes, then Examples. We were hoodwinked by the Sphinx example page.

I also strongly reject a warning about a possible future scaling limitation of the function that might never even exist in the life of 0.2.

-------
:class:`.Table`
A table whose fields are the row fields of this matrix table plus
one field named ``entries_array_field_name``. The global fields of
Copy link
Collaborator

@tpoterba tpoterba Feb 7, 2019

these parameter references should be single backticks to italicize (numpy style)

@danking
Copy link
Collaborator Author

@danking danking commented Feb 11, 2019

Issue to track using the numpy style docs: #5304

@danking danking dismissed tpoterba’s stale review Feb 11, 2019

I capitulate, but I still don't like the added visual noise of a separate section.

@danking danking merged commit 3e00028 into hail-is:master Feb 11, 2019
1 check passed
@danking danking deleted the localize-entries branch Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants