Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda out of memory #5

Open
Arman-IMRSV opened this issue Apr 9, 2021 · 6 comments
Open

Cuda out of memory #5

Arman-IMRSV opened this issue Apr 9, 2021 · 6 comments

Comments

@Arman-IMRSV
Copy link

Hello. I am trying to reproduce the paper results. I am currently running the code on 2 Tesla V100 GPUs each containing 16GB of memory, but still I am getting out-of-memory error. I also tried to decrease MAX_TRANSCRIPT_WORD to 1000, but it did not help. Could you please let me what hardware and GPU it requires to run?

@Arman-IMRSV
Copy link
Author

@xrc10

@ilyaivensky
Copy link

The same story. Running with 4 Quadro RTX 6000, each with 24GB of memory

@xrc10
Copy link
Contributor

xrc10 commented Jun 24, 2021

We used V100 GPU with 32GB of memory. Unfortunately, I haven't tried it with other GPUs. Can you also try to decrease MAX_SENT_LEN and MAX_SENT_NUM to smaller values to see if the OOM eror still occurs?

@Arman-IMRSV
Copy link
Author

Thanks @xrc10 for the response. I had tried decreasing those parameters, but didn't help.

@omelnikov
Copy link

omelnikov commented Aug 14, 2021

Thanks @xrc10 for the response. I had tried decreasing those parameters, but didn't help.

Hi @Arman-IMRSV , good observations! Could you clarify what parameter values you have tried and decreased from what to what? Judging by the difference in GPU memory sizes, the change in parameters needs to produce batches about half the (byte) size of those used by the authors. Note that some GPU memory is used by its own tasks, so not all 24GB is available for training batches. Also, have you used the same training set? Sentence length will vary on different corpora. The byte size can also be estimated from the average character length of the batch. I'm also curious if you investigated the batch that caused the memory crash. Was it the first batch? What was the size of the batch (in bytes), etc.? You might also try a lower precision of your tensors versus that used in the paper. Try exploring the memory-crashing batch in greater detail. I hope it works out, but do tell what you discover. It helps others to reproduce with fewer glitches on a different hardware.

@shonaviso
Copy link

Hi @Arman-IMRSV
I am facing the above issue while evaluating. Is the case same for you also?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants