New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very Large Memory Consumption for Even A Small Dataset #50
Comments
Hmm interesting. I would have suspected some memory overhead, but not nearly this much! Thank you for the carefully written issue. Sadly though, I am also not able to reproduce this on my machine. Could you please describe your hardware / setup? Just to confirm this is CPU / host memory you are talking about correct? Not gpu? FYI: There is some buffering going on with the "prefetch_batches: int = 300", but 8x8 images this would mean: 884 (4 splits)*128(batchsize)*300 bytes, or 10mb.... So it is not this. |
Hi Luke, thank you very much for your quick response. I hope the following details can be helpful. I am using GPU
Python
More Details: I understand there are some prefetch batches. But as you calculated, the pre-fetched data should be very small. Again, thank you for your reply. I am also still investigating this issue and will let you know once I found something. |
Thanks for the info and being such an early tester! Just to confirm that is NOT gpu memory, but an explosion in host (CPU) memory? If you are observing GPU memory, see: https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html for flags on how to turn that off. Do you still see memory increases if you turn off the GPU? e.g. something like:
|
Hello, thank you so much! Your comment is really very helpful, especially the jax gpu memory allocation link.
In my case, it was GPU explosion.
For my case, I finally managed to reduce the GPU memory (from 10G+ to ~700M for running the pes.py) based on the suggestions on the above link. What I did were:
tf.config.experimental.set_visible_devices([], "GPU")
The link https://jax.readthedocs.io/en/latest/gpu_memory_allocation.html provided different options to avoid GPU OOM error, so there could be other solutions. |
Ahh tf also tries to also grab the GPU. That is annoying. I should fix that on my end. Going to make an issue. Thanks for posting your solution here! |
Dataset:
fashion_mnist
Dataset Size: 36.42MB (https://www.tensorflow.org/datasets/catalog/fashion_mnist)
Reproduce the Issue:
or
Issue Description:
As you can see, the original FashionMnist dataset is very small. However, when I run the above code, the memory usage became crazy high, such as 10G+.
In my case, the issues occurs when the program reaches this line which in the function
preload_tfds_image_classification_datasets
:Here is the code of
make_python_iter
:Could you please suggest a way to reduce the huge memory usage, do you have any idea why it requires so high memory, and do you (or anybody) also have this issue?
Thank you very much and looking forward to your comments.
The text was updated successfully, but these errors were encountered: