New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use Neuron Cores while fine-tuning BERT on Trainium #3
Comments
Thank you for reporting can. When did you create your environment? It seems that there is an error with the new AMI. Can you use the previous one? |
I created the environment yesterday, using this AMI: huggingface-neuron-2023-06-26T09-27-02.137Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a. I can try the previous one. |
Trying huggingface-neuron-2023-04-20T11-02-28.279Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a |
Ok that AMI works, thanks for your quick response! I had to undo my PR to make it work on the previous AMI - #2 I am trying to train a T5 model. Do you know if this AMI can be used to train a T5 model? |
Thank you! We are working on fixing that ASAP! |
Hey!
I am trying to follow this guide: https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert and fine tune BERT on a trn1.2xlarge instance. I setup the datasets as mentioned in the blog and then ran the training script but the usage of neuron cores is still at 0%. The reason why this is relevant for me is because the expected training time for me is close to 5 hours.
cc: @philschmid
The text was updated successfully, but these errors were encountered: