fp multiply should not implicitly add extra fractional bits of precision #753

rdaly525 · 2019-06-05T18:07:02Z

Added a new library "float_CW" which defines the appropriate interfaces for the CW IP. Loading this library will also load implementations for float.add and float.mul. float.mul contains the verilog code that nikhil gave (in this thread). This verilog code needs to be tested explicitly

Kuree · 2019-06-05T18:30:33Z

Remove result_x?

rdaly525 · 2019-06-05T20:09:11Z

Fixed

Kuree · 2019-06-05T20:10:27Z

This is from Nikhil's email:

I think the bug is in the way the multiplier is instantiated. Since CW_mult does not support a 7 bit mantissa (minimum is 10 bits), we had to instantiate with a 10 bit mantissa. So CW rounds to nearest even for that precision, and not for the 7 bit precision we need. I have pasted verilog code below that should round it to nearest even for our precision (note that CW outputs to int_out, not out). Can you please try with this added to the RTL? I have set the CW rounding mode to truncate now.

module mul #(parameter exp_bits=1, parameter frac_bits=1) (
  input [exp_bits+frac_bits:0] in0,
  input [exp_bits+frac_bits:0] in1,
  output [exp_bits+frac_bits:0] out
);
wire [exp_bits+frac_bits:0] int_out;
reg sign;
reg [exp_bits-1:0] exp;
reg [frac_bits:0] frac;

CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,result_x}),.status());

always @(*) begin
  sign = int_out[exp_bits+frac_bits];
  exp  = int_out[exp_bits+frac_bits-1:frac_bits];
  frac = {1'b0,int_out[frac_bits-1:0]};
  if ((results_x[2]&(results_x[1] | results_x[0])) | (int_out[0] & results_x[2])) begin
    frac = frac + 1'd1;
    if (~&exp) begin
      exp = exp + frac[frac_bits]; 
    end
  end
end
assign out = {sign, exp, frac[frac_bits-1:0]};

endmodule

He said that using 7-bit won't be synthesized properly, so we have to use 10-bit here.

rdaly525 · 2019-06-05T20:23:17Z

Okay, so that verilog code should perfectly emulate BFloat16? If thats the case, I am okay with hardcoding it and thinking of this as a target-specific implementation of Floating point multiply. @Kuree, could you run your multiply tests with that code as the RTL?

leonardt · 2019-06-05T23:26:22Z

@rdaly525 if I understand correctly, this issue manifests when the matissa is less than seven. How hard is it to extend the coreir backend to recognize this case and insert the requisite extra verilog? The other option is try describing it in peak, which I can work on, but this seems to be an issue with mapping a specific set of generator parameters to an implementation. i.e. there might be another FP mult implementation that supports this parameter set generically, so ideally we shouldn't have to change the Peak code to handle this (this is an implementation detail rather than a spec issue).

rdaly525 · 2019-06-06T00:01:35Z

I agree that this is technically a technology-specific thing that should be hidden from peak. From a pragmatic validation point of view, it seems easier to validate it in Peak (functionally) rather than in generated verilog. Perhaps validating it using peak first, then I can implement the appropriate backend in CoreIR.

Kuree · 2019-06-06T00:06:22Z

Even if we are going to implement the fix in peak, we still need to modify the CoreIR backend to use truncation mode. May add a generator flag to indicate which mode to use?

rdaly525 · 2019-06-06T01:55:41Z

Ill implement the fix in CoreIR, but could one of you verify it using peak?

leonardt · 2019-06-06T01:56:18Z

Sure, I can help with that

rdaly525 · 2019-06-10T19:36:54Z

@leonardt, can you review?

leonardt · 2019-06-10T22:27:26Z

Testing this change using lassen on kiwi

leonardt · 2019-06-10T22:51:30Z

This fixes pytest tests/test_pe.py -k test_fp_mul, but I get a double free error in coreir which seems troublesome

(env) lenny@kiwi:~/lassen$ pytest tests/test_pe.py -k test_fp_mul
============================================================================================================================== test session starts ==============================================================================================================================
platform linux -- Python 3.7.3, pytest-4.5.0, py-1.8.0, pluggy-0.11.0
rootdir: /home/lenny/lassen
collecting ... *** Error in `/home/lenny/pycoreir/coreir/coreir': double free or corruption (fasttop): 0x00000000026a9630 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f2f8a5ea7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f2f8a5f337a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f2f8a5f753c]
/home/lenny/pycoreir/coreir/coreir(_ZNSsD1Ev+0x64)[0xa6cca4]
/lib/x86_64-linux-gnu/libc.so.6(+0x39ff8)[0x7f2f8a5acff8]
/lib/x86_64-linux-gnu/libc.so.6(+0x3a045)[0x7f2f8a5ad045]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf7)[0x7f2f8a593837]
/home/lenny/pycoreir/coreir/coreir(_start+0x29)[0x76a5f9]
======= Memory map: ========
00400000-00c29000 r-xp 00000000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e28000-00e33000 r--p 00828000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e33000-00e34000 rw-p 00833000 00:2d 61504434                           /home/lenny/pycoreir/coreir/coreir
00e34000-00e39000 rw-p 00000000 00:00 0
02536000-02e70000 rw-p 00000000 00:00 0                                  [heap]
7f2f84000000-7f2f84021000 rw-p 00000000 00:00 0
7f2f84021000-7f2f88000000 ---p 00000000 00:00 0
7f2f88291000-7f2f882a8000 r-xp 00000000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f882a8000-7f2f884a7000 ---p 00017000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a7000-7f2f884a8000 r--p 00016000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a8000-7f2f884a9000 rw-p 00017000 08:01 3146120                    /lib/x86_64-linux-gnu/libgcc_s.so.1
7f2f884a9000-7f2f88d9d000 r-xp 00000000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88d9d000-7f2f88f9d000 ---p 008f4000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88f9d000-7f2f88fa9000 r--p 008f4000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88fa9000-7f2f88fcd000 rw-p 00900000 00:2d 61504431                   /home/lenny/pycoreir/coreir/libcoreir-commonlib.so
7f2f88fcd000-7f2f88fd2000 rw-p 00000000 00:00 0
7f2f88fd2000-7f2f89861000 r-xp 00000000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89861000-7f2f89a61000 ---p 0088f000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a61000-7f2f89a6d000 r--p 0088f000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a6d000-7f2f89a91000 rw-p 0089b000 00:2d 61504432                   /home/lenny/pycoreir/coreir/libcoreir-float.so
7f2f89a91000-7f2f89a96000 rw-p 00000000 00:00 0
7f2f89a96000-7f2f8a33e000 r-xp 00000000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a33e000-7f2f8a53e000 ---p 008a8000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a53e000-7f2f8a54a000 r--p 008a8000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a54a000-7f2f8a56e000 rw-p 008b4000 00:2d 61504435                   /home/lenny/pycoreir/coreir/libcoreir-float_CW.so
7f2f8a56e000-7f2f8a573000 rw-p 00000000 00:00 0
7f2f8a573000-7f2f8a733000 r-xp 00000000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a733000-7f2f8a933000 ---p 001c0000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a933000-7f2f8a937000 r--p 001c0000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a937000-7f2f8a939000 rw-p 001c4000 08:01 3145856                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2f8a939000-7f2f8a93d000 rw-p 00000000 00:00 0
7f2f8a93d000-7f2f8aa45000 r-xp 00000000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8aa45000-7f2f8ac44000 ---p 00108000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac44000-7f2f8ac45000 r--p 00107000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac45000-7f2f8ac46000 rw-p 00108000 08:01 3145805                    /lib/x86_64-linux-gnu/libm-2.23.so
7f2f8ac46000-7f2f8ac49000 r-xp 00000000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ac49000-7f2f8ae48000 ---p 00003000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae48000-7f2f8ae49000 r--p 00002000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae49000-7f2f8ae4a000 rw-p 00003000 08:01 3145872                    /lib/x86_64-linux-gnu/libdl-2.23.so
7f2f8ae4a000-7f2f8ae70000 r-xp 00000000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b034000-7f2f8b038000 rw-p 00000000 00:00 0
7f2f8b06d000-7f2f8b06f000 rw-p 00000000 00:00 0
7f2f8b06f000-7f2f8b070000 r--p 00025000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b070000-7f2f8b071000 rw-p 00026000 08:01 3145842                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2f8b071000-7f2f8b072000 rw-p 00000000 00:00 0
7ffc8225c000-7ffc8227e000 rw-p 00000000 00:00 0                          [stack]
7ffc82343000-7ffc82346000 r--p 00000000 00:00 0                          [vvar]
7ffc82346000-7ffc82348000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
collected 551 items / 550 deselected / 1 selected

tests/test_pe.py .                                                                                                                                                                                                                                                        [100%]

=================================================================================================================== 1 passed, 550 deselected in 8.37 seconds ====================================================================================================================

leonardt · 2019-06-10T22:51:58Z

(also some required changes in pycoreir/magma which I'll pull in upstream)

leonardt · 2019-06-10T22:53:19Z

pycoreir PR leonardt/pycoreir#95

leonardt · 2019-06-10T22:54:20Z

magma PR https://github.com/phanrahan/magma/compare/coreir-libs-opts

leonardt · 2019-06-10T23:18:00Z

double free seems to occur even via CLI interface, here's the command, input file attached

coreir -l corebit,coreir,float_CW,commonlib,float,global -i WrappedPE.json -o WrappedPE.v

WrappedPE.json.txt

leonardt · 2019-06-11T00:05:14Z

Double free issue was due to the environment setup on kiwi (conflicts with loading older versions of coreir and recompiling libraries, etc...)

leonardt

Fix works for me locally on kiwi, let's get StanfordAHA/lassen#118 passing and then this should be good to merge.

leonardt · 2019-06-11T00:56:28Z

It looks like buildkite for lassen is passing with the new RTL, see https://buildkite.com/stanford-aha/lassen/builds/156

Travis is failing because the new verilog causes verilator warnings

----------------------------- Captured stderr call -----------------------------
%Warning-PINCONNECTEMPTY: WrappedPE.v:15: Cell pin connected by name with empty reference: status
%Warning-PINCONNECTEMPTY: Use "/* verilator lint_off PINCONNECTEMPTY */" and lint_on around source to disable this message.
%Warning-IMPLICIT: WrappedPE.v:41: Signal definition not found, creating implicitly: out
%Warning-WIDTH: WrappedPE.v:41: Output port connection 'z' expects 16 bits on the pin connection, but pin connection's VARREF 'out' generates 1 bits.
%Warning-WIDTH: WrappedPE.v:24: Operator ADD expects 8 bits on the RHS, but RHS's SEL generates 1 bits.
%Warning-UNDRIVEN: WrappedPE.v:37: Signal is not driven: z
%Error: Exiting due to 5 warning(s)

we can either ignore these warnings or try to fix them in the output RTL, I'll see if I can localize the specific lines

leonardt · 2019-06-11T01:01:57Z

Here are the specific lines causing the warnings:

// %Warning-PINCONNECTEMPTY: WrappedPE.v:15: Cell pin connected by name with empty reference: status
   15 CW_fp_mult #(.sig_width(frac_bits+3), .exp_width(exp_bits), .ieee_compliance(0)) mul1 (.a({in0,3'h0}),.b({in1,3'h0}),.rnd('h1),.z({int_out,results_x}),.status());

// %Warning-IMPLICIT: WrappedPE.v:41: Signal definition not found, creating implicitly: out
// %Warning-WIDTH: WrappedPE.v:41: Output port connection 'z' expects 16 bits on the pin connection, but pin connection's VARREF 'out' generates 1 bits.

   41 CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status));

// %Warning-WIDTH: WrappedPE.v:24: Operator ADD expects 8 bits on the RHS, but RHS's SEL generates 1 bits.
   24       exp = exp + frac[frac_bits];

// %Warning-UNDRIVEN: WrappedPE.v:37: Signal is not driven: z
   37   output [exp_bits+frac_bits:0] z,

leonardt · 2019-06-11T01:04:38Z

src/libs/float_CW.cpp

+    };
+    vjson["definition"] = ""
+    "wire [7:0] status;\n"
+    "CW_fp_add #(.sig_width(frac_bits), .exp_width(exp_bits), .ieee_compliance(ieee_compliance)) add (.a(a),.b(b),.rnd(rnd),.z(out),.status(status));";


I think .z(out) should be .z(z) or the output name should be updated.

rdaly525 · 2019-06-11T04:44:29Z

@leonardt, latest commit has the syntax fixes

leonardt

Latest lassen build for https://github.com/StanfordAHA/lassen/pull/118/files#diff-eaf80413ec19809da1e06ef30ab67e56 is passing on travis and buildkite

fp multiply should not implicitly add extra fractional bits of precision

7d2234d

rdaly525 requested a review from leonardt June 5, 2019 18:07

rdaly525 mentioned this pull request Jun 5, 2019

Rounding mode mismatch with the hardware StanfordAHA/lassen#111

Closed

removed stray wire from verilog impl of fpmul

ff38a2a

rdaly525 added 7 commits June 5, 2019 20:21

started float change

120f280

Merge branch 'prim-api' into float-fix

1c8a357

used verilog for mult

52df584

added new lib for float CW

e43b278

Merge branch 'master' into float-fix

89a3d63

verilog issues in cw. fixed name generation for constructor api

c331966

fixed verilog syntax

49000e7

leonardt mentioned this pull request Jun 11, 2019

Rnd mode StanfordAHA/lassen#118

Merged

leonardt approved these changes Jun 11, 2019

View reviewed changes

leonardt requested changes Jun 11, 2019

View reviewed changes

fixed verilator syntax bugs

a2700da

leonardt approved these changes Jun 11, 2019

View reviewed changes

rdaly525 merged commit cc6ca33 into master Jun 11, 2019

rdaly525 deleted the float-fix branch June 11, 2019 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fp multiply should not implicitly add extra fractional bits of precision #753

fp multiply should not implicitly add extra fractional bits of precision #753

rdaly525 commented Jun 5, 2019 •

edited

Loading

Kuree commented Jun 5, 2019

rdaly525 commented Jun 5, 2019

Kuree commented Jun 5, 2019 •

edited

Loading

rdaly525 commented Jun 5, 2019 •

edited

Loading

leonardt commented Jun 5, 2019

rdaly525 commented Jun 6, 2019

Kuree commented Jun 6, 2019

rdaly525 commented Jun 6, 2019

leonardt commented Jun 6, 2019

rdaly525 commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 11, 2019

leonardt left a comment

leonardt commented Jun 11, 2019

leonardt commented Jun 11, 2019

leonardt Jun 11, 2019

rdaly525 commented Jun 11, 2019

leonardt left a comment

fp multiply should not implicitly add extra fractional bits of precision #753

fp multiply should not implicitly add extra fractional bits of precision #753

Conversation

rdaly525 commented Jun 5, 2019 • edited Loading

Kuree commented Jun 5, 2019

rdaly525 commented Jun 5, 2019

Kuree commented Jun 5, 2019 • edited Loading

rdaly525 commented Jun 5, 2019 • edited Loading

leonardt commented Jun 5, 2019

rdaly525 commented Jun 6, 2019

Kuree commented Jun 6, 2019

rdaly525 commented Jun 6, 2019

leonardt commented Jun 6, 2019

rdaly525 commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 10, 2019

leonardt commented Jun 11, 2019

leonardt left a comment

Choose a reason for hiding this comment

leonardt commented Jun 11, 2019

leonardt commented Jun 11, 2019

leonardt Jun 11, 2019

Choose a reason for hiding this comment

rdaly525 commented Jun 11, 2019

leonardt left a comment

Choose a reason for hiding this comment

rdaly525 commented Jun 5, 2019 •

edited

Loading

Kuree commented Jun 5, 2019 •

edited

Loading

rdaly525 commented Jun 5, 2019 •

edited

Loading