Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fp multiply should not implicitly add extra fractional bits of precision #753

Merged
merged 10 commits into from
Jun 11, 2019

Conversation

rdaly525
Copy link
Owner

@rdaly525 rdaly525 commented Jun 5, 2019

Added a new library "float_CW" which defines the appropriate interfaces for the CW IP. Loading this library will also load implementations for float.add and float.mul. float.mul contains the verilog code that nikhil gave (in this thread). This verilog code needs to be tested explicitly

@Kuree
Copy link
Contributor

Kuree commented Jun 5, 2019

Remove result_x?

@rdaly525
Copy link
Owner Author

rdaly525 commented Jun 5, 2019

Fixed

@Kuree
Copy link
Contributor

Kuree commented Jun 5, 2019

This is from Nikhil's email:

I think the bug is in the way the multiplier is instantiated. Since CW_mult does not support a 7 bit mantissa (minimum is 10 bits), we had to instantiate with a 10 bit mantissa. So CW rounds to nearest even for that precision, and not for the 7 bit precision we need. I have pasted verilog code below that should round it to nearest even for our precision (note that CW outputs to int_out, not out). Can you please try with this added to the RTL? I have set the CW rounding mode to truncate now.

module mul #(parameter exp_bits=1, parameter frac_bits=1) (
  input [exp_bits+frac_bits:0] in0,
  input [exp_bits+frac_bits:0] in1,
  output [exp_bits+frac_bits:0] out
);
wire [exp_bits+frac_bits:0] int_out;
reg sign;
reg [exp_bits-1:0] exp;
reg [frac_bits:0] frac;

CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,result_x}),.status());

always @(*) begin
  sign = int_out[exp_bits+frac_bits];
  exp  = int_out[exp_bits+frac_bits-1:frac_bits];
  frac = {1'b0,int_out[frac_bits-1:0]};
  if ((results_x[2]&(results_x[1] | results_x[0])) | (int_out[0] & results_x[2])) begin
    frac = frac + 1'd1;
    if (~&exp) begin
      exp = exp + frac[frac_bits]; 
    end
  end
end
assign out = {sign, exp, frac[frac_bits-1:0]};

endmodule

He said that using 7-bit won't be synthesized properly, so we have to use 10-bit here.

@rdaly525
Copy link
Owner Author

rdaly525 commented Jun 5, 2019

Okay, so that verilog code should perfectly emulate BFloat16? If thats the case, I am okay with hardcoding it and thinking of this as a target-specific implementation of Floating point multiply. @Kuree, could you run your multiply tests with that code as the RTL?

@leonardt
Copy link
Collaborator

leonardt commented Jun 5, 2019

@rdaly525 if I understand correctly, this issue manifests when the matissa is less than seven. How hard is it to extend the coreir backend to recognize this case and insert the requisite extra verilog? The other option is try describing it in peak, which I can work on, but this seems to be an issue with mapping a specific set of generator parameters to an implementation. i.e. there might be another FP mult implementation that supports this parameter set generically, so ideally we shouldn't have to change the Peak code to handle this (this is an implementation detail rather than a spec issue).

@rdaly525
Copy link
Owner Author

rdaly525 commented Jun 6, 2019

I agree that this is technically a technology-specific thing that should be hidden from peak. From a pragmatic validation point of view, it seems easier to validate it in Peak (functionally) rather than in generated verilog. Perhaps validating it using peak first, then I can implement the appropriate backend in CoreIR.

@Kuree
Copy link
Contributor

Kuree commented Jun 6, 2019

Even if we are going to implement the fix in peak, we still need to modify the CoreIR backend to use truncation mode. May add a generator flag to indicate which mode to use?

@rdaly525
Copy link
Owner Author

rdaly525 commented Jun 6, 2019

Ill implement the fix in CoreIR, but could one of you verify it using peak?

@leonardt
Copy link
Collaborator

leonardt commented Jun 6, 2019

Sure, I can help with that

@rdaly525
Copy link
Owner Author

@leonardt, can you review?

@leonardt
Copy link
Collaborator

Testing this change using lassen on kiwi

@leonardt
Copy link
Collaborator

This fixes pytest tests/test_pe.py -k test_fp_mul, but I get a double free error in coreir which seems troublesome

(env) lenny@kiwi:~/lassen$ pytest tests/test_pe.py -k test_fp_mul
============================================================================================================================== test session starts ==============================================================================================================================
platform linux -- Python 3.7.3, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
rootdir: /home/lenny/lassen
collecting ... *** Error in `/home/lenny/pycoreir/coreir/coreir': double free or corruption (fasttop): 0x00000000026a9630 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f2f8a5ea7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f2f8a5f337a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f2f8a5f753c]
/home/lenny/pycoreir/coreir/coreir(_ZNSsD1Ev+0x64)[0xa6cca4]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7f2f8a5acff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7f2f8a5ad045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7f2f8a593837]
/home/lenny/pycoreir/coreir/coreir(_start+0x29)[0x76a5f9]
======= Memory map: ========
00400000-00c29000 r-xp 00000000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e28000-00e33000 r--p 00828000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e33000-00e34000 rw-p 00833000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e34000-00e39000 rw-p 00000000 00:00 0
02536000-02e70000 rw-p 00000000 00:00 0                                  [heap]
7f2f84000000-7f2f84021000 rw-p 00000000 00:00 0
7f2f84021000-7f2f88000000 ---p 00000000 00:00 0
7f2f88291000-7f2f882a8000 r-xp 00000000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f882a8000-7f2f884a7000 ---p 00017000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a7000-7f2f884a8000 r--p 00016000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a8000-7f2f884a9000 rw-p 00017000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a9000-7f2f88d9d000 r-xp 00000000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88d9d000-7f2f88f9d000 ---p 008f4000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88f9d000-7f2f88fa9000 r--p 008f4000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88fa9000-7f2f88fcd000 rw-p 00900000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88fcd000-7f2f88fd2000 rw-p 00000000 00:00 0
7f2f88fd2000-7f2f89861000 r-xp 00000000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89861000-7f2f89a61000 ---p 0088f000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a61000-7f2f89a6d000 r--p 0088f000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a6d000-7f2f89a91000 rw-p 0089b000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a91000-7f2f89a96000 rw-p 00000000 00:00 0
7f2f89a96000-7f2f8a33e000 r-xp 00000000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a33e000-7f2f8a53e000 ---p 008a8000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a53e000-7f2f8a54a000 r--p 008a8000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a54a000-7f2f8a56e000 rw-p 008b4000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a56e000-7f2f8a573000 rw-p 00000000 00:00 0
7f2f8a573000-7f2f8a733000 r-xp 00000000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a733000-7f2f8a933000 ---p 001c0000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a933000-7f2f8a937000 r--p 001c0000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a937000-7f2f8a939000 rw-p 001c4000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a939000-7f2f8a93d000 rw-p 00000000 00:00 0
7f2f8a93d000-7f2f8aa45000 r-xp 00000000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8aa45000-7f2f8ac44000 ---p 00108000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac44000-7f2f8ac45000 r--p 00107000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac45000-7f2f8ac46000 rw-p 00108000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac46000-7f2f8ac49000 r-xp 00000000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ac49000-7f2f8ae48000 ---p 00003000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae48000-7f2f8ae49000 r--p 00002000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae49000-7f2f8ae4a000 rw-p 00003000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae4a000-7f2f8ae70000 r-xp 00000000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b034000-7f2f8b038000 rw-p 00000000 00:00 0
7f2f8b06d000-7f2f8b06f000 rw-p 00000000 00:00 0
7f2f8b06f000-7f2f8b070000 r--p 00025000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b070000-7f2f8b071000 rw-p 00026000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b071000-7f2f8b072000 rw-p 00000000 00:00 0
7ffc8225c000-7ffc8227e000 rw-p 00000000 00:00 0                          [stack]
7ffc82343000-7ffc82346000 r--p 00000000 00:00 0                          [vvar]
7ffc82346000-7ffc82348000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
collected 551 items / 550 deselected / 1 selected

tests/test_pe.py .                                                                                                                                                                                                                                                        [100%]

=================================================================================================================== 1 passed, 550 deselected in 8.37 seconds ====================================================================================================================

@leonardt
Copy link
Collaborator

(also some required changes in pycoreir/magma which I'll pull in upstream)

@leonardt
Copy link
Collaborator

pycoreir PR leonardt/pycoreir#95

@leonardt
Copy link
Collaborator

@leonardt
Copy link
Collaborator

double free seems to occur even via CLI interface, here's the command, input file attached

coreir -l corebit,coreir,float_CW,commonlib,float,global -i WrappedPE.json -o WrappedPE.v

WrappedPE.json.txt

@leonardt
Copy link
Collaborator

Double free issue was due to the environment setup on kiwi (conflicts with loading older versions of coreir and recompiling libraries, etc...)

Copy link
Collaborator

@leonardt leonardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix works for me locally on kiwi, let's get StanfordAHA/lassen#118 passing and then this should be good to merge.

@leonardt
Copy link
Collaborator

It looks like buildkite for lassen is passing with the new RTL, see https://buildkite.com/stanford-aha/lassen/builds/156

Travis is failing because the new verilog causes verilator warnings

----------------------------- Captured stderr call -----------------------------
%Warning-PINCONNECTEMPTY: WrappedPE.v:15: Cell pin connected by name with empty reference: status
%Warning-PINCONNECTEMPTY: Use "/* verilator lint_off PINCONNECTEMPTY */" and lint_on around source to disable this message.
%Warning-IMPLICIT: WrappedPE.v:41: Signal definition not found, creating implicitly: out
%Warning-WIDTH: WrappedPE.v:41: Output port connection 'z' expects 16 bits on the pin connection, but pin connection's VARREF 'out' generates 1 bits.
%Warning-WIDTH: WrappedPE.v:24: Operator ADD expects 8 bits on the RHS, but RHS's SEL generates 1 bits.
%Warning-UNDRIVEN: WrappedPE.v:37: Signal is not driven: z
%Error: Exiting due to 5 warning(s)

we can either ignore these warnings or try to fix them in the output RTL, I'll see if I can localize the specific lines

@leonardt
Copy link
Collaborator

Here are the specific lines causing the warnings:

// %Warning-PINCONNECTEMPTY: WrappedPE.v:15: Cell pin connected by name with empty reference: status
   15 CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,results_x}),.status());
// %Warning-IMPLICIT: WrappedPE.v:41: Signal definition not found, creating implicitly: out
// %Warning-WIDTH: WrappedPE.v:41: Output port connection 'z' expects 16 bits on the pin connection, but pin connection's VARREF 'out' generates 1 bits.

   41 CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status));
// %Warning-WIDTH: WrappedPE.v:24: Operator ADD expects 8 bits on the RHS, but RHS's SEL generates 1 bits.
   24       exp = exp + frac[frac_bits];
// %Warning-UNDRIVEN: WrappedPE.v:37: Signal is not driven: z
   37   output [exp_bits+frac_bits:0] z,

};
vjson["definition"] = ""
"wire [7:0] status;\n"
"CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status));";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .z(out) should be .z(z) or the output name should be updated.

@rdaly525
Copy link
Owner Author

@leonardt, latest commit has the syntax fixes

Copy link
Collaborator

@leonardt leonardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest lassen build for https://github.com/StanfordAHA/lassen/pull/118/files#diff-eaf80413ec19809da1e06ef30ab67e56 is passing on travis and buildkite

@rdaly525 rdaly525 merged commit cc6ca33 into master Jun 11, 2019
@rdaly525 rdaly525 deleted the float-fix branch June 11, 2019 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants