Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hps_accel: max pool fails golden tests #451

Closed
alanvgreen opened this issue Feb 3, 2022 · 14 comments
Closed

hps_accel: max pool fails golden tests #451

alanvgreen opened this issue Feb 3, 2022 · 14 comments

Comments

@alanvgreen
Copy link
Collaborator

alanvgreen commented Feb 3, 2022

hps_accel is exhibiting strange behavior.

To reproduce:

  1. Use code at PR WIP: new tests #450
  2. Build and run on NX/17
  3. From the menu, select 1 (TfLM models), 1 (HPS models), 3 (presence model), g (golden tests) - tests fail
  4. Build and run in simulator (make load PLATFORM=sim) - tests pass.
@alanvgreen
Copy link
Collaborator Author

Commenting out this line in the Makefile causes golden tests to pass on hardware

#DEFINES += ACCEL_MAX_POOL

@alanvgreen
Copy link
Collaborator Author

@kgugala could you look into this, please

@alanvgreen
Copy link
Collaborator Author

I tried building the design with Radiant, but Radiant (using both LSE and Synplify Pro) tries to use too many DSP resources:

   Number of Logical DSP Functions:
      Number of Pre-Adders (9+9):    0 out of 48 (0%)
      Number of Multipliers (18x18): 26 out of 24 (108%)
         Number of 9X9:       32 (1 18x18 = 2   9x9)
         Number of 18x18:      2 (1 18x18 = 1 18x18)
         Number of 18x36:      2 (2 18x18 = 1 18x36)
         Number of 36x36:      1 (4 18x18 = 1 36x36)

If looks like the systolic array (32 9x9) and post processor (1 36x36) multipliers are correctly synthesized, but the VexRiscV multiplier incorrectly uses 2 18x18 and 2 18x36 instead of 4 18x18 multipliers.

@alanvgreen
Copy link
Collaborator Author

To be clear (I miswrote earlier): tests fail on hardware and pass in verilator.

@acomodi
Copy link
Contributor

acomodi commented Feb 4, 2022

@alanvgreen There is a possibility to set the maximum DSP usage in LSE. For instance the following sets the DSP usage to 30%

prj_set_strategy_value -strategy Strategy1 lse_dsp_util=30.00

I did try higher values, but it turns out that still the total number of DSPs was exceeding the limit. Of course reducing the DSP usage will necessarily yield higher logic resources utilization, so it is possible that the pnr tool could have a hard time finding a valid solution.

(In the GUI, this option is found under Project -> Active Strategy -> LSE settings)

@alanvgreen
Copy link
Collaborator Author

I have confirmed - via printf() - that the right values go into the CFU and the wrong values come out.

@alanvgreen
Copy link
Collaborator Author

@acomodi Thanks for the tip! I was able to get Radiant to produce a bitstream, however the bitstream didn't work - the FPGA seems unresponsive. I'm not inclined to spend any more time on it right now.

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 4, 2022

Findings from @acomodi (copied from chat) :
Hi all, analyzing a the issue, it seems that the failure could be related to some memory error. In fact, also by running the project menu test a I get the same failure on the test pool

Running test pool 03
Testing Pool Pool 03
FAIL - 4 differences, first at word 1597

  • diff 0 == 1597
  • diff 1 == 1917
  • diff 2 == 2237
  • diff 3 == 2557
    actual:
    0x88808080, 0x80808080, 0x80808088, 0x80808080,
    0x80808080, 0x93808080, 0x808f8080, 0x80808080,
    0x80808080, 0x93808080, 0x808c8080, 0x80808080,
    0x80808380, 0x97808080, 0x808c8080, 0x80808080,
    expected:
    0x1a808080, 0x80808080, 0x80808088, 0x80808080,
    0x80808080, 0x93808080, 0x808f8080, 0x80808080,
    0x80808080, 0x93808080, 0x808c8080, 0x80808080,
    0x80808380, 0x97808080, 0x808c8080, 0x80808080,

I have modified the test in software to output also all the memory locations where the difference is spotted and it seems there is a repeating pattern. Every 320 words starting from 1597, there is a difference

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 4, 2022

I then ran the Verilator simulation with post-Yosys Verilog, and found that it failed the same 4 times as hardware. Pre-Yosys simulation (using the Verilog generated by nMigen) did not show the failures.

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 4, 2022

Looking at a window of inputs around the first failure, I noted that the failure was the only time that there were positive bytes (each 32b word is 4 signed bytes). So, I checked if there were any cases of positive bytes in the entire input that resulted in correct output. There was not. Every time there were positive bytes, the result was incorrect.

Alan will try a workaround to add 128 to each byte operand so that they can be compared with an unsigned comparator.

I'll try to find the root cause of why the signed comparisons are not working correctly after Yosys processing, and submit a reduced testcase to Yosys if appropriate.

@alanvgreen
Copy link
Collaborator Author

The unsigned comparison also fails on hardware but passes in software. Interestingly the failures are now different. The unit test case (3-a on the menu) now passes, but golden tests fail.

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 6, 2022

When using the unsigned comparison workaround, I noticed that the remaining errors disappeared if I added -noccu2 to the synth_nexus command. I filed YosysHQ/yosys#3187 with a very reduced test case and a script showing that formal equivalance fails.

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 6, 2022

There is a fix in Yosys overnight in response to my issue: https://github.com/YosysHQ/yosys/pull/3188/files.

It appears to fix the version that uses unsigned comparisons.

@tcal-x
Copy link
Collaborator

tcal-x commented Feb 6, 2022

The Yosys fix also fixes the original MaxPool gateware. Closing this issue.

@tcal-x tcal-x closed this as completed Feb 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants