-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemoryError when trying to fine-tune llama3.1 #25
Comments
what instance are you trying to use? |
|
I've tried using |
Yes could you share what you needed to change? |
@korneevm can u share your changes plz ? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Phil, thanks for the great repo and examples!
Everything worked well when I played with llama3-70b using your guide, but now I'm stuck when fine-tuning llama3.1-70b.
I've done all the steps from the https://www.philschmid.de/sagemaker-train-deploy-llama3 article and then managed to fix problems with incompatible package versions and start the training process. But on the "Loading checkpoint shards" step I'm getting an error:
I've tried to overcome this problem but no success. Maybe you could point out what I'm missing.
The text was updated successfully, but these errors were encountered: