Stricter FP32 tests #614

gordicaleksa · 2024-06-18T13:47:10Z

Stricter FP32 logit accuracy
Much stricter FP32 loss accuracy
Much stricter FP32 grad tensor accuracy (and somewhat stricter 16 bit accuracy)
Copied over new expected loss values from PyTorch (they're a bit different and I took all 6 decimal points and not just 4)
Also adapted our test logic to round loss to 6 decimal points

Regarding grad tensors: back when Andrej hardcoded the thresholds we had a bug in PyTorch that led to a bigger discrepancy between our PT vs C code - now that that's fixed we can be really strict and use 1e-6f here.

ngc92 · 2024-06-18T14:01:11Z

test_gpt2.cu

-            allok = allok & check_tensor(tensors1[13], tensors2[13], L * C, "ln2b", 2.5e-3f);
-            allok = allok & check_tensor(tensors1[14], tensors2[14], C, "lnfw", 0.12f); // hmm bit higher
-            allok = allok & check_tensor(tensors1[15], tensors2[15], C, "lnfb", 2e-2f);
+


can we turn this entire thing into a loop?
have a small struct {size, name, threshold}, which also would make it easier to match thresholds and tensor names?

agree we could do that but i'd do it in a separate PR

ngc92 · 2024-06-18T14:01:54Z

test_gpt2.cu

-        0.7367,
-        0.4008,
-        0.1874
+        5.270009,


ideally, we'd export these from pytorch when we generate the reference file instead of having to copy them manually

hopefully it doesn't change that often? since we're not touching the reference imp that frequently - but if it does then it's worth investing into that

rosslwheeler · 2024-06-19T07:41:39Z

@gordicaleksa - this appears to be causing a failure when running:

make testgpt2_cu USE_CUDNN=1 && ./testgpt2_cu

It may or may not be seen on your environment but am able to see it here.

@karpathy - I think this is what was causing the issue. I ran my tests one commit back from this in your repo and it passes consistently. If I checkout this commit, then the failures start. Not 100 percent sure but it does make some sense since this is the test that's failing and there aren't many other changes to this file recently?

Can you confirm since you were seeing the failure consistently too? Thank you.

gordicaleksa · 2024-06-19T08:26:34Z

It's certainly this PR - sad our CI didn't catch this! See #615 for a fix.

gordicaleksa added 2 commits June 18, 2024 15:41

Stricter FP32 tests

8f7f205

Represent loss threeshold using scientific notation

f0ffb64

ngc92 reviewed Jun 18, 2024

View reviewed changes

karpathy merged commit 6ecc52e into karpathy:master Jun 18, 2024
10 checks passed

gordicaleksa mentioned this pull request Jun 19, 2024

Relax grad tensor thresholds in tests #615

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter FP32 tests #614

Stricter FP32 tests #614

gordicaleksa commented Jun 18, 2024

ngc92 Jun 18, 2024

gordicaleksa Jun 18, 2024

ngc92 Jun 18, 2024

gordicaleksa Jun 18, 2024 •

edited

Loading

rosslwheeler commented Jun 19, 2024 •

edited

Loading

gordicaleksa commented Jun 19, 2024

Stricter FP32 tests #614

Stricter FP32 tests #614

Conversation

gordicaleksa commented Jun 18, 2024

ngc92 Jun 18, 2024

Choose a reason for hiding this comment

gordicaleksa Jun 18, 2024

Choose a reason for hiding this comment

ngc92 Jun 18, 2024

Choose a reason for hiding this comment

gordicaleksa Jun 18, 2024 • edited Loading

Choose a reason for hiding this comment

rosslwheeler commented Jun 19, 2024 • edited Loading

gordicaleksa commented Jun 19, 2024

gordicaleksa Jun 18, 2024 •

edited

Loading

rosslwheeler commented Jun 19, 2024 •

edited

Loading