Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Wrong result of +0 minus +0 #93

Open
zhangkanqi opened this issue May 30, 2024 · 9 comments
Open

[Bug Report] Wrong result of +0 minus +0 #93

zhangkanqi opened this issue May 30, 2024 · 9 comments

Comments

@zhangkanqi
Copy link

Hi,
when executing +0 minus +0, the result in rsd is -0. However, i think it should be +0.

Is there something wrong in the implementation? Could you help me confirm this problem?

@lpha-z
Copy link

lpha-z commented Jun 1, 2024

Thank you for finding this. Such a result is a bug. For any finite floating-point number (meaning "non-NaN") x, x - x should be +0.0. (Thus, you are right.)

IEEE754-2019 6.3 The sign bit
When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 under all rounding-direction attributes except roundTowardNegative;

Unfortunately, I couldn't reproduce the problem on my end. I used a test code in Processor/Src/Verification/TestCode/Asm/FP/code.s. I rewrite L13-14 to set both a1 and a2 to zero. The result of the subtraction, which would be in fa4 (f14), was confirmed to be 0x00000000 (+0.0). The only 0x80000000 (-0.0) was in fs6 (f21), which resulted from the FNMSUB instruction and was correct.

To better understand, could you please provide more details? Specifically, the following information would be helpful:

  • Sorce code and compiler version, options, etc. (or an assembly code)
  • Kanata log
  • Register.csv

We want to resolve this issue and greatly appreciate your assistance. Thank you!

@zhangkanqi
Copy link
Author

Thanks for your reply.

Here are the all files you will need for debugging: https://wormhole.app/zkkA4#yJpgCuoEpgaozwhhZ-2R0w

@lpha-z
Copy link

lpha-z commented Jun 1, 2024

I confirmed the test_spike.log file contains the following:

core   0: 0x000020b0 (0x040fa407) flw     fs0, 64(t6)
core   0: 3 0x000020b0 (0x040fa407) f8  0x3abff4ac mem 0x80003040
(omit)
core   0: 0x000020e8 (0x0b0fab07) flw     fs6, 176(t6)
core   0: 3 0x000020e8 (0x0b0fab07) f22 0xd54ce604 mem 0x800030b0
(omit)
core   0: 0x00003420 (0x096406d3) fsub.s  fa3, fs0, fs6
core   0: 3 0x00003420 (0x096406d3) c1_fflags 0x00000001 f13 0x554ce604

Since std::bit_cast<unsigned>( std::bit_cast<float>(0x3abff4ac) - std::bit_cast<float>(0xd54ce604) ) == 0x554ce604, this is a correct behavior.
Could you please provide the hex file of the program where the wrong behavior occurs?
Thank you.

@zhangkanqi
Copy link
Author

Sorry, I lost the hex file of this bug. I manually restored the instruction sequence this afternoon according to spike's commit log. Here are the final hex:
zero_minus_zero.hex.zip

For reproduce, you need to modify PC_GOAL to 32'h000030dc and cancle define RSD_NARROW_PC in Processor/Src/Memory/MemoryMapTypes.sv.

Moreover, here is the spike commit log: spike_zero_minus_zero.log

In detail, there is a mismatch on ft8 between spike(0x00000000, as shown in the following figure) and rsd(0x80000000).
image

@lpha-z
Copy link

lpha-z commented Jun 2, 2024

Thank you for your cooperation. I'll try reproducing.

@lpha-z
Copy link

lpha-z commented Jun 2, 2024

It reproduced. Thank you restoring the hex file!

The root of this bug is wrong behavior of fmul.s.
First, the fmul.s ft1, ft8, fs6 is executed and should produce -0.0 because ft8 is +0.0 and fs6 is a negative number. However, in bd7c5c1, it wrongly returns +0.0 (#92).
Then, the fmsub.s ft8, ft1, ft6, ft8 with ft6 is a negative number and ft8 is +0.0 is executed. When ft1 is -0.0, it should returns +0.0 because it is a same-number subtraction. When ft1 is +0.0, it should returns -0.0 because it is a (-0.0) - (+0.0)-like subtraction. So, the behavior of fmsub.s is correct.

In summary, the fmsub.s implementation is correct and the wrong implementation of fmul.s (#92) leads to fmsub.s instruction returning -0.0.

Please do not close this issue and keep it open until it is sure that #92 is resolved and this issue no longer occurs.

@zhangkanqi
Copy link
Author

The meaning of fmsub.s ft8, ft1, ft6, ft8 is ft8=ft1×ft6−ft8. In rsd, ft1=+0.0, ft6 is negative, ft8=+0.0. so it means (+0.0)*negative-(+0.0).
However, when computing (+0.0)*negative, the result in rsd is +0.0 or -0.0?
Does #92 only happens on fmul not on ordinary multiplication?

@lpha-z
Copy link

lpha-z commented Jun 2, 2024

Yes, the problem #92 only happens on fmul.s.

The current fmul.s implementation on RSD wrongly returns fma(a,b,+0.0). It is not that the multiplier produces a wrong-sign result, but the last + (+0.0) causes the problem.

@zhangkanqi
Copy link
Author

I see. Thanks for your patient reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants