-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak problem #2
Comments
Hello, just try to reproduce your problem, how much memory do you have in your machine? And after how many epoch does it happen? |
thanks for your reply. The memory is 256GB, and the memory problem happens in the first epoch. I can observe in |
could you provide more information about your machine? such as pytorch version, cuda version, python version, etc. We didn't go into a memory leak when we train on 4-3090 with 128GB memory. |
my python version, cuda version, and PyTorch version are 3.8.13, 11.1, and 1.8.1. I will train the model on other machines later and see if the problem still exists. |
it seems the same problem still exists on other machines with python version 3.6.13. |
So far everything works fine with our tests, and we’ll test on other machines later to see if we can reproduce your problem. |
Maybe you can pull the latest code and have a try to see if the problem still exists. |
Thanks, I pull the latest code but the problem still exists. Maybe it is caused by some unexpected environment problem. I would close this issue. If anyone else encounters this issue in the future, we may re-open this issue again. |
@liuzili97 I have met the same problem and my environment settings are the same as yours. Have you solved the problem? |
No, I haven't |
I also have the same issue! It consumes almost 99% of my system memory and crashes even before training starts (after loading dataset). I reported it in a separate issue here: #33 |
Thanks for your perfect work. Would you like to tell me the gcc --version of your environment with your 4-3090? My Server is 8-3090+CUDA11.1+pytorch1.8.0+gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu120.10), but I can't solve the problem in ‘INSTALL.md’ section when "cd models/nms/ --> python setup.py install" |
We’re using gcc 6 on 4-3090 machine
Best
Chonghao Sima
…________________________________
From: casialixiaodong ***@***.***>
Sent: Wednesday, August 3, 2022 5:49:25 PM
To: OpenPerceptionX/PersFormer_3DLane ***@***.***>
Cc: Sima, Chonghao ***@***.***>; Comment ***@***.***>
Subject: Re: [OpenPerceptionX/PersFormer_3DLane] Memory Leak problem (Issue #2)
---- External Email: Use caution with attachments, links, or sharing data ----
could you provide more information about your machine? such as pytorch version, cuda version, python version, etc. We didn't go into a memory leak when we train on 4-3090 with 128GB memory.
Thanks for your perfect work. Would you like to tell me the gcc --version of your environment with your 4-3090? My Server is 8-3090+CUDA11.1+pytorch1.8.0+gcc version 10.3.0 (Ubuntu 10.3.0-1ubuntu120.10), but I can't solve the problem in ‘INSTALL.md’ section when "cd models/nms/ --> python setup.py install"
—
Reply to this email directly, view it on GitHub<https://github.com/OpenPerceptionX/PersFormer_3DLane/issues/2#issuecomment-1203729550>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AM6RGLTQWODGIFKJWKTPBW3VXI6CLANCNFSM5TXVFLZQ>.
You are receiving this because you commented.Message ID: ***@***.***>
|
thanks for your work! I am getting a memory leak while training.
I strictly follow the installation instruction in
docs/INSTALL.md
, and train the model using the following script:CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node 4 main_persformer.py --mod=persformer --batch_size=2 --nepochs=40
However, the memory consumption (system memory instead of cuda memory) gradually increases during training, and finally takes all the memory.
Does this problem occur with you?
The text was updated successfully, but these errors were encountered: