Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check memory requirements #33

Merged
merged 11 commits into from
Jun 29, 2021
Merged

Check memory requirements #33

merged 11 commits into from
Jun 29, 2021

Conversation

rogerkuou
Copy link
Contributor

Add memory usage estimation function for cgc.coclustering_numpy.

Need to check:

  • Is there an OS dependency to spin up the `memory profiler?

@rogerkuou rogerkuou requested a review from fnattino June 23, 2021 16:57
Copy link
Contributor

@fnattino fnattino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rogerkuou , great work thanks! Left few comments, let me know if anything is unclear!

cgc/utils.py Outdated
import logging


def mem_estimate_numpy(m, n, k, l, out_unit=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add coclustering in the name of the function (mem_estimate_coclustering_numpy?) - it would be nice to add later on a similar one for the triclustering

cgc/utils.py Outdated
:param m: Number of rows of matrix Z
:param n: Number of columns of matrix Z
:param k: Number of row clusters
:param l: Number of column clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lint tests fail because l is not allowed as a variable for PEP8 (...)
What about using more descriptive names (n_rows, nclusters_row, ...)?

cgc/utils.py Outdated
:return: Size of the array in bytes
"""

return math.prod(shape) * nbytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By chance I have tried to run this in an environment with Python 3.7, and it turns out that math.prod is a new feature of Python 3.8. Can we use np.prod instead?

@@ -0,0 +1,218 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have tried to run it on my laptop, I also see the issue with the first run being a bit "off". I would actually keep a single set of matrix sizes/cluster parameters (right now the first set of parameters is repeated twice), and add one line suggesting to run the cell twice if the result for the first set of parameters is a bit off, probably due to some initialisations within memory_profiler..

Minor point: I have also noticed the sys package is used but not imported.

@fnattino
Copy link
Contributor

Uh, by the way, we should first merge with development before master..

@rogerkuou rogerkuou changed the base branch from master to development June 28, 2021 16:23
@rogerkuou
Copy link
Contributor Author

rogerkuou commented Jun 28, 2021

Hi @fnattino, thx for the review! I have made changes according to your suggestion.

As you can see the tests for Python 3.7 failed. It's due to math.product which is not supported by Python 3.7. Shall I change the application and keep Python 3.7 compatible?

@fnattino
Copy link
Contributor

Hi @rogerkuou , thanks! Indeed, I have notice the issue. Let's maybe use numpy.prod, which I think should do the work?

@rogerkuou rogerkuou merged commit 88fc584 into development Jun 29, 2021
@rogerkuou rogerkuou deleted the check_memory_requirements branch January 17, 2022 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants