-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter FP32 tests #614
Stricter FP32 tests #614
Conversation
allok = allok & check_tensor(tensors1[13], tensors2[13], L * C, "ln2b", 2.5e-3f); | ||
allok = allok & check_tensor(tensors1[14], tensors2[14], C, "lnfw", 0.12f); // hmm bit higher | ||
allok = allok & check_tensor(tensors1[15], tensors2[15], C, "lnfb", 2e-2f); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we turn this entire thing into a loop?
have a small struct {size, name, threshold}, which also would make it easier to match thresholds and tensor names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree we could do that but i'd do it in a separate PR
0.7367, | ||
0.4008, | ||
0.1874 | ||
5.270009, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally, we'd export these from pytorch when we generate the reference file instead of having to copy them manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully it doesn't change that often? since we're not touching the reference imp that frequently - but if it does then it's worth investing into that
@gordicaleksa - this appears to be causing a failure when running:
It may or may not be seen on your environment but am able to see it here. @karpathy - I think this is what was causing the issue. I ran my tests one commit back from this in your repo and it passes consistently. If I checkout this commit, then the failures start. Not 100 percent sure but it does make some sense since this is the test that's failing and there aren't many other changes to this file recently? Can you confirm since you were seeing the failure consistently too? Thank you. |
It's certainly this PR - sad our CI didn't catch this! See #615 for a fix. |
Regarding grad tensors: back when Andrej hardcoded the thresholds we had a bug in PyTorch that led to a bigger discrepancy between our PT vs C code - now that that's fixed we can be really strict and use
1e-6f
here.