-
Notifications
You must be signed in to change notification settings - Fork 888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error in training #40
Comments
Hi @nikky4D, thanks for opening up this issue. Is this the full stack trace? I haven't encountered anything like this before, so I'm wondering whether this could potentially be a mismatch in python / pytorch / cuda versions. Do your versions match those in Also, could you provide more details on what hardware you are using? |
For the stack trace: That is all I was given. I was running 10 epochs so this is at the final epoch. |
I see, that could be the issue here. Our codebase was only tested with the versions in |
You may be correct. I'm using 4 workers, for batchsize of 32, on a 2080TI. And everything runs well until the last epoch where the after calculating the evaluation metrics, I get that error. It appears to only be affecting the saving of the checkpoint of the epoch directly precedeing the error as I don't seem able to load that checkpoint. I can load other checkpoints. Anyway, I'll close this until I find a solution that would work. |
* Added support for Multilingual Dataset Wrapper and Multilingual MSCoco * Removed temp file * Delete model_loader.py * Added default value to model_cache_dir params * Added model_cache_dir option to test * Delete multilingual_mscoco-old.py * Converted Multilingual MS-COCO into own dataset * Made Multilingual COCO independent from wrapper * Delete multilingual_dataset.py * Fixed broken import Co-authored-by: Romain Beaumont <romain.rom1@gmail.com>
Hi, I encountered this error during training and I'm not sure what it means:
Does anyone have any idea what this means?
The text was updated successfully, but these errors were encountered: