Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM in train.py . #71

Closed
jiangzhiwei2018 opened this issue Apr 2, 2023 · 0 comments
Closed

OOM in train.py . #71

jiangzhiwei2018 opened this issue Apr 2, 2023 · 0 comments

Comments

@jiangzhiwei2018
Copy link

jiangzhiwei2018 commented Apr 2, 2023

My env:
OS: Windows10
python version: 3.8
pytorch version: 1.13.1
numpy version: 1.23.5
GPU: RTX3090TI
RAM: 32GB

A suspected memory leak occurred when I ran train.py for the training process with a single GPU.
The usage memory is 95% after a period of time and keeps rising.
20230403030900

Afterwards, I ran a memory analysis through memory_profiler found that there seemed to be over-occupying memory during the load data phase.
memory_profiler
Maybe it can provide some suggestions for solutions

I used the same training data (Vimeo90K), and didn't make any major changes to the Vimeo7Dataset class but the following to make it fit my own cache_keys.pkl
1
2
3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant