-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searching ITM is stucked at epoch10 #8
Comments
And after I killed the program manually,
And the output of
And after I killed the two processes on GPU 1 and 3 by
It seems that a process goes wrong (maybe out of memory but no hints) and the others are waiting. |
It really seems like facebookresearch/fairseq#708 (comment) . |
I have just noticed the prerequisites in README. I checked the RAM size of my server.
Is there any tricky way to reduce the memory cost but do not reduce the batch size? |
And I wonder why ITM requires so much mem. |
Sorry for the late reply. The ITM indeed need so much memory for a deep model like MMnas. If the memory is not sufficient, maybe you can reduce the hidden dimension from 512 to 256 to have a try. The reason for the large memory is that we need to forward the positive samples along with its negative samples into the network, which makes it more memory consuming compared to other tasks. |
@MIL-VLG Got it! Thanks! |
run search_itm.py
is stucked at epoch10. No errors occur and the program does not terminate itself.The last output is as the following,
And the output of
nvidia-smi
is as the following all the time since the program is stucked.I have noticed that epoch10 is the
NEG_START_EPOCH
, but I have no idea about what is wrong there.The text was updated successfully, but these errors were encountered: