-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check memory requirements #33
Conversation
Development
Development
Development
Development
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rogerkuou , great work thanks! Left few comments, let me know if anything is unclear!
cgc/utils.py
Outdated
import logging | ||
|
||
|
||
def mem_estimate_numpy(m, n, k, l, out_unit=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add coclustering
in the name of the function (mem_estimate_coclustering_numpy
?) - it would be nice to add later on a similar one for the triclustering
cgc/utils.py
Outdated
:param m: Number of rows of matrix Z | ||
:param n: Number of columns of matrix Z | ||
:param k: Number of row clusters | ||
:param l: Number of column clusters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lint tests fail because l
is not allowed as a variable for PEP8 (...)
What about using more descriptive names (n_rows
, nclusters_row
, ...)?
cgc/utils.py
Outdated
:return: Size of the array in bytes | ||
""" | ||
|
||
return math.prod(shape) * nbytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By chance I have tried to run this in an environment with Python 3.7, and it turns out that math.prod
is a new feature of Python 3.8. Can we use np.prod
instead?
examples/memory_estimation.ipynb
Outdated
@@ -0,0 +1,218 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I have tried to run it on my laptop, I also see the issue with the first run being a bit "off". I would actually keep a single set of matrix sizes/cluster parameters (right now the first set of parameters is repeated twice), and add one line suggesting to run the cell twice if the result for the first set of parameters is a bit off, probably due to some initialisations within memory_profiler..
Minor point: I have also noticed the sys
package is used but not imported.
Uh, by the way, we should first merge with |
Hi @fnattino, thx for the review! I have made changes according to your suggestion. As you can see the tests for Python 3.7 failed. It's due to |
Hi @rogerkuou , thanks! Indeed, I have notice the issue. Let's maybe use |
Add memory usage estimation function for
cgc.coclustering_numpy
.Need to check: