Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] Fix memory leak in BM and make cache size configurable #9501

Merged
merged 8 commits into from Sep 25, 2020

Conversation

danking
Copy link
Contributor

@danking danking commented Sep 24, 2020

CHANGELOG: Remove memory leak in BlockMatrix.to_matrix_table_row_major and BlockMatrix.to_table_row_major. Also, make the cache size used in both methods configurable.

The ref variable was holding entire blocks in memory for no reason. It was a
vestiage of debugging. Moreover, the configurable cache permits users to fine tune
memory usage.

CHANGELOG: Remove memory leak in `BlockMatrix.to_matrix_table_row_major` and `BlockMatrix.to_table_row_major`. Also, make the cache size used in both methods configurable.

The `ref` variable was holding entire blocks in memory for no reason. It was a
vestiage of debugging. Moreover, the configurable cache permits users to fine tune
memory usage.
Copy link
Contributor

@johnc1231 johnc1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs fixes, otherwise good.

matrix in memory. This value must be at least large enough to hold
one row of th matrix in memory. If this value is exactly the size of
one row, then a partition makes a network request for every row of
every block. Larger values reduce the number of network requests. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by the wording. Is:

If this value is exactly the size of
            one row, then a partition makes a network request for every row of
            every block.

saying the same thing as:

If memory permits, setting this value to the size of one row permits
            one network request per block per partition.

If so, why are we saying it twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the verbiage. Latter should have said one partition.

maximum_cache_memory_in_bytes : int or None
The amount of memory to reserve, per partition, to cache rows of the
matrix in memory. This value must be at least large enough to hold
one row of th matrix in memory. If this value is exactly the size of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

th => the

"""Returns a matrix table with row key of `row_idx` and col key `col_idx`, whose
entries are structs of a single field `element`.

Parameters
----------
n_partitions : int or None
Number of partitions of the matrix table.
maximum_cache_memory_in_bytes : int or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above, as this is the same paragraph.

@johnc1231
Copy link
Contributor

KING tests are failing

Copy link
Contributor

@johnc1231 johnc1231 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing

@danking danking merged commit d766d78 into hail-is:main Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants