-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rounding mode mismatch with the hardware #111
Comments
For 1., the output shows:
If i'm not mistaken, I believe that means that the functional model expects 4117 and the RTL got 4116, if we're rounding to the nearest even, then 4117 doesn't seem right? |
For 2. I suspect this is an issue related to the TB rather than fault because it's correctly identifying the error in the single run case, but perhaps it's an issue with how fault is generating the TBs for multiple tests. Perhaps it's the fact that all the tests are being run in the same |
My educated guess for 1 is that the rounding depends on how many extra precision bits you have. In the test, we have
If we have more than 1 bits used for rounding, then clearly it should round up, hence 0x4117. If we only have 1 bit for rounding, then we shouldn't round since 0x4117 is odd. |
Ah, I forgot we were considering FP numbers, I was interpreting as an integer |
Curiously on kiwi, running |
Do the tests depend at all on Python floating-point implementation, and thus could Python/C++ version/lib differences on kiwi vs. other machines explain the difference? |
I believe they use the |
Could the issue be related to how the multiply is promoting the inputs (concating 0s to the end) and truncating the output (see use of
|
@leonardt Yes we believe this is the issue. We're still investigating the issue now. There are more problems with the IP stub. |
Update: We have located the bug, which is due to the extra 3 bits in the initialization. However, there is another bug in the |
Okay, update on issue 2, weirdly enough, running |
I forgot about the extra precision bits. I think the appropriate solution is to modify the coreir stub to make it use the exact correct bits. If we want higher precision we can specify that at the peak/magma level |
The valina |
For me locally, it seems to reliably catch the failure when running |
@Kuree, are you saying there is also a bug actually in the IP itself? Currently coreir instances the verilog stub with an extra 3 bits of fractional precision, so I know that is likely part of the issue. I will work on a coreir fix for this |
@rdaly525 Yes I believe so. You can check it out and read the lines I pointed out above. In the short run we can write a |
I have a coreir branch 'float-fix' that updates the CoreIR instantiating the multiply. @Kuree, does this fix the issue? |
Also if we do actually want the multiply to have 3 extra bits of precision, we need to update the lassen description to concat 3 bits to the inputs, then put it into the appropriate BFloat[8,7+3] multiply operator, then update the tests to match these semantics. |
I don't think we need extra 3 bits of precision. You have to round somewhere anyway and it's difficult to match of other tools. |
okay, even stranger, I just had a run with |
This bug is deterministic since it's caused by improper instantiation of the IP. |
weird, I seem to be getting consistent failures (2 in a row at least) with It's strange that I did see the pass locally earlier, but now I can't seem to reproduce it. |
I can reproduce the same problem inside the buildkite docker environment last night, so I doubt it has something to do with the buildkite. I think for the best interest of saving time, we can put |
Oh I found something interesting in the buildkite logs: If we search the log for
Notice that in the previous test ( Now the output of the |
Hmm, can you force |
This is from Nikhil's email: I think the bug is in the way the multiplier is instantiated. Since CW_mult does not support a 7 bit mantissa (minimum is 10 bits), we had to instantiate with a 10 bit mantissa. So CW rounds to nearest even for that precision, and not for the 7 bit precision we need. I have pasted verilog code below that should round it to nearest even for our precision (note that CW outputs to int_out, not out). Can you please try with this added to the RTL? I have set the CW rounding mode to truncate now. CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,result_x}),.status()); always @(*) begin endmodule If we are going to go with the above code, lets explicitly write this in lassen so we can easily verify the high risk rounding code using our functional testbench. @Kuree or @nikhilbhagdikar, can you do a lassen PR using the multiply operator of FPVector[8,7+3,RNE,False] in sim.py, then use normal BFloat16 for the testbench? |
There's a couple ways to address the issue, but what I did for now is to have the We could add this to fault, but this was a faster work around because otherwise we need to get the change merged into fault and released. The thing is, fault creates the test bench file through python's file IO interface, see https://github.com/leonardt/fault/blob/master/fault/system_verilog_target.py#L272. We could add logic after this to touch the created file, but it seems to me that this should result in a new timestamp every test. Maybe there's something weird going on with the way |
I'll run with |
@rdaly525 What's the plan for the RTL? Are you going to hard-code the verilog fix in (I will verify the fix later today). |
In my opinion, it would be easier to verify the fix directly in peak by running functional tests and comparing the semantics of the code that nikhil wrote vs BFloat16. |
Weird, I added a
so it seems that even if I force remove that file before generating a new |
@leonardt. If you remove |
I'll try that first to see if that avoids the error, so we know the issue, then we can investigate how we can get ncsim to force recompile our |
Ah, I saw this in the output:
so maybe it has to do with whether a full test run takes under 1 second, so i'm going to try adding a |
Adding but this makes the tests run quite a bit slower (because most of them take under a second), not sure what the best workaround is here. we could benchmark nuking |
Issue seems to be resolved using fault's timestamp-edit branch (leonardt/fault#116), see https://buildkite.com/stanford-aha/lassen/builds/131#b4859b6e-ad64-47da-8104-98a5b23ef0ad |
Moved the discussion to #120, since this issue also covers another bug. |
Two problems:
The rounding mode, which is round to nearest even, in the functional model and RTL does not match. See: https://buildkite.com/stanford-aha/lassen/builds/109#712c5179-b555-4ddb-8fd3-6da54ebba32b
The functional model is using
mpfr
, which I believe should also be correct.When using
-v
withpytest
, fault doesn't catch the error yet running individually does. There is something wrong with either the test bench setup or fault. See the successful build: https://buildkite.com/stanford-aha/lassen/builds/108#ef4d8b94-f985-4e9e-9873-cdf64ba87be3@leonardt can you take a closer look?
The text was updated successfully, but these errors were encountered: