Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use Neuron Cores while fine-tuning BERT on Trainium #3

Closed
DhruvaBansal00 opened this issue Jun 28, 2023 · 5 comments
Closed

Comments

@DhruvaBansal00
Copy link
Contributor

DhruvaBansal00 commented Jun 28, 2023

Hey!

I am trying to follow this guide: https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert and fine tune BERT on a trn1.2xlarge instance. I setup the datasets as mentioned in the blog and then ran the training script but the usage of neuron cores is still at 0%. The reason why this is relevant for me is because the expected training time for me is close to 5 hours.

Screenshot 2023-06-28 at 1 10 32 PM Screenshot 2023-06-28 at 1 09 34 PM

cc: @philschmid

@philschmid
Copy link
Owner

Thank you for reporting can. When did you create your environment? It seems that there is an error with the new AMI. Can you use the previous one?

@DhruvaBansal00
Copy link
Contributor Author

I created the environment yesterday, using this AMI: huggingface-neuron-2023-06-26T09-27-02.137Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a. I can try the previous one.

@DhruvaBansal00
Copy link
Contributor Author

Trying huggingface-neuron-2023-04-20T11-02-28.279Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a

@DhruvaBansal00
Copy link
Contributor Author

DhruvaBansal00 commented Jun 28, 2023

Ok that AMI works, thanks for your quick response!

I had to undo my PR to make it work on the previous AMI - #2

I am trying to train a T5 model. Do you know if this AMI can be used to train a T5 model?

@philschmid
Copy link
Owner

Thank you! We are working on fixing that ASAP!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants