New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing issues: broken pipe & CUDA out of memory errors #1
Comments
Hi @velocityCavalry, I do not have any idea why the The OOM issue may be due to a lack of GPU memory. We conducted experiments on 8 * Tesla V100 with 16GB memory. You can reduce the GPU memory by using |
Thanks for the reply! I do specify the flags by |
It seems that the OOM error happens in the validation step which involves copying computed passage representations between GPUs using the |
I close this issue as there is no activity. |
Hi,
I was trying to train BPR by running
However, there are a lot of errors. For example, after validation sanity check, there are a broken pipe error in
multiprocessing/connections.py
where the output is listed belowFurthermore, I encountered CUDA out of memory issues. The trimmed output is attached: (For each line it is repeated for 3 times because 3 out of 7 GPUs that I am using have encountered OOM errors)
Sorry for putting all these outputs here!
I install BPR by
pip install -r requirements.txt
and completed building passage database successfully. The GPUs I am using are 7 GeForce RTX 2080 Ti.Thanks for any help!
The text was updated successfully, but these errors were encountered: