-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
52B example sharding error #41
Comments
Thanks for reporting this and sorry for the poor experience. Unfortunately, we don't have access to a v4-384 right now so we don't have integration tests running and I can't get you timing data. The issue was caused by adding KV layernorm to improve numerical stability. I have put together a fix here, can you let me know if you're unblocked? |
Hi - I will close this out in a week (6/5) if I don't hear back. |
Hi, I was able to run with the provided fix. Thanks! |
Change-Id: I2f50ac40c89f2f16a0601e75f608b7cb4428643a
Hi,
I was trying to run the 1x v4-384 52B model example following MaxText/configs/1xv4-384.sh on a v4-384 slice and hit the following error:
It looks like this has to do with the sharding spec being incompatible with the tensor shape? Below are the commands I used to set up (used the main branch and jax-0.4.10) and run the experiment, any ideas on what went wrong here?
The text was updated successfully, but these errors were encountered: