-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fp multiply should not implicitly add extra fractional bits of precision #753
Conversation
Remove |
Fixed |
This is from Nikhil's email:
He said that using 7-bit won't be synthesized properly, so we have to use 10-bit here. |
Okay, so that verilog code should perfectly emulate BFloat16? If thats the case, I am okay with hardcoding it and thinking of this as a target-specific implementation of Floating point multiply. @Kuree, could you run your multiply tests with that code as the RTL? |
@rdaly525 if I understand correctly, this issue manifests when the matissa is less than seven. How hard is it to extend the coreir backend to recognize this case and insert the requisite extra verilog? The other option is try describing it in peak, which I can work on, but this seems to be an issue with mapping a specific set of generator parameters to an implementation. i.e. there might be another FP mult implementation that supports this parameter set generically, so ideally we shouldn't have to change the Peak code to handle this (this is an implementation detail rather than a spec issue). |
I agree that this is technically a technology-specific thing that should be hidden from peak. From a pragmatic validation point of view, it seems easier to validate it in Peak (functionally) rather than in generated verilog. Perhaps validating it using peak first, then I can implement the appropriate backend in CoreIR. |
Even if we are going to implement the fix in peak, we still need to modify the CoreIR backend to use truncation mode. May add a generator flag to indicate which mode to use? |
Ill implement the fix in CoreIR, but could one of you verify it using peak? |
Sure, I can help with that |
@leonardt, can you review? |
Testing this change using lassen on kiwi |
This fixes
|
(also some required changes in pycoreir/magma which I'll pull in upstream) |
pycoreir PR leonardt/pycoreir#95 |
double free seems to occur even via CLI interface, here's the command, input file attached
|
Double free issue was due to the environment setup on kiwi (conflicts with loading older versions of coreir and recompiling libraries, etc...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix works for me locally on kiwi, let's get StanfordAHA/lassen#118 passing and then this should be good to merge.
It looks like buildkite for lassen is passing with the new RTL, see https://buildkite.com/stanford-aha/lassen/builds/156 Travis is failing because the new verilog causes verilator warnings
we can either ignore these warnings or try to fix them in the output RTL, I'll see if I can localize the specific lines |
Here are the specific lines causing the warnings: // %Warning-PINCONNECTEMPTY: WrappedPE.v:15: Cell pin connected by name with empty reference: status
15 CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,results_x}),.status()); // %Warning-IMPLICIT: WrappedPE.v:41: Signal definition not found, creating implicitly: out
// %Warning-WIDTH: WrappedPE.v:41: Output port connection 'z' expects 16 bits on the pin connection, but pin connection's VARREF 'out' generates 1 bits.
41 CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status)); // %Warning-WIDTH: WrappedPE.v:24: Operator ADD expects 8 bits on the RHS, but RHS's SEL generates 1 bits.
24 exp = exp + frac[frac_bits]; // %Warning-UNDRIVEN: WrappedPE.v:37: Signal is not driven: z
37 output [exp_bits+frac_bits:0] z, |
src/libs/float_CW.cpp
Outdated
}; | ||
vjson["definition"] = "" | ||
"wire [7:0] status;\n" | ||
"CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status));"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think .z(out)
should be .z(z)
or the output name should be updated.
@leonardt, latest commit has the syntax fixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest lassen build for https://github.com/StanfordAHA/lassen/pull/118/files#diff-eaf80413ec19809da1e06ef30ab67e56 is passing on travis and buildkite
Added a new library "float_CW" which defines the appropriate interfaces for the CW IP. Loading this library will also load implementations for float.add and float.mul. float.mul contains the verilog code that nikhil gave (in this thread). This verilog code needs to be tested explicitly