Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaligned instruction fetch and data addresses #75

Closed
AntonBabushkin opened this issue Jun 11, 2019 · 33 comments
Closed

Unaligned instruction fetch and data addresses #75

AntonBabushkin opened this issue Jun 11, 2019 · 33 comments
Assignees
Labels

Comments

@AntonBabushkin
Copy link

Hi,
I've got an issue while trying to execute a code compiled by riscv64-unknown-elf-gcc-8.2.0-2019.02.0-x86_64-linux-centos6 (from https://www.sifive.com/boards/) on the ibex core.

Listing:

25e: 028b2783 lw a5,40(s6)

Message:

316725ns: Illegal instruction (core 0) at PC 0x0000025e: 0x028b028b

Could you please suggest what might be wrong here.
Thanks.

@imphil
Copy link
Contributor

imphil commented Jun 11, 2019

Ibex is a 32 bit core, you need to use a RV32IMC toolchain to compile software for it. (riscv32-unknown-elf-gcc)

@AntonBabushkin
Copy link
Author

AntonBabushkin commented Jun 12, 2019

Hi Phillip,
Sure, the binary was compiled in 32-bit mode (32-bit ABI: option -mabi=ilp32 used).
I'll attach here a test case for issue reproducing.
Thanks.

@AntonBabushkin
Copy link
Author

Hi Phillip,
Please find a test case here which reproduces the issue.
The message from ibex core is:

1575ns: Illegal instruction (core 0) at PC 0x00000262: 0x028b028b

@vogelpi
Copy link
Contributor

vogelpi commented Jun 12, 2019

Hi @AntonBabushkin ,
as pointed out by Philipp, you need to use a proper 32-bit compiler for Ibex. Using a 64-bit compiler with the -mabi=ilp32 does not help. If I am not misunderstood, this argument just makes sure that the program will use 32-bit data types (meaning an unsigned long will have a size of 4 Bytes as in 32bit systems and not 8 Bytes as in 64-bit systems). But the generated instructions are still RV64 and not RV32.
Also, if you look at the ISA manual, and search for the instruction that generates the exception, you also see that there is no instruction defined with that opcode for RV32 (bits[6:0] = 7'b000_1011).
Best regards,
Pirmin

@luismarques
Copy link
Contributor

luismarques commented Jun 12, 2019 via email

@AntonBabushkin
Copy link
Author

AntonBabushkin commented Jun 12, 2019

Hi Pirmin,
Thanks for your comment.
Yes, in my test I've set both -march=rv32imc and -mabi=ilp32, sorry for not mentioning that from the very beginning.
As a side note: produced binary perfectly works with picorv32 and ri5cy core, which makes me think there might be something wrong in my setup with ibex core specifically.
I'll also try using gnu-mcu-eclipse (8.2.0-2.2-20190521-0004) which is a 32-bit compiler as far as I can tell and will update you on the outcome.
Thanks.

@vogelpi
Copy link
Contributor

vogelpi commented Jun 12, 2019

Thanks @AntonBabushkin for the update and the useful information!
We have to investigate this and will get back to you.
Best,
Pirmin

@AntonBabushkin
Copy link
Author

Hi guys,
I've tried using gnu-mcu-eclipse toolchain which if used with options -march=rv32imc and -mabi=ilp32 will produce 32-bit binary acc to this page.
The outcome remains the same:
Listing file:

262: 028b2783 lw a5,40(s6)

Simulation message:

1575ns: Illegal instruction (core 0) at PC 0x00000262: 0x028b028b

Thanks.

@ikarageo
Copy link
Contributor

Hi. Is there a chance that you use the core with .RV32E (1), ? This is the embedded ISA (RV32E) variant which has only 16 registers. In this sense, s6 register from lw a5,40(s6) does not exist.

Best,
Ioannis

@AntonBabushkin
Copy link
Author

Hi Ioannis,
The ibex core is configured to RV32IMC ISA, please see /ibex_issue_75/simulation/ibex_cores.sv L36-37 from previously attached test case.
image
Thanks.

@vogelpi
Copy link
Contributor

vogelpi commented Jun 12, 2019

@AntonBabushkin , I would like to look at the wave forms you have provided. But we don't have ncsim at hand (is this the tool that you used?). Would you mind to generate and share a .vcd. I think there should be a tool in ncsim called shm2vcd for this purpose.

@AntonBabushkin
Copy link
Author

Hi Pirmin,
Sure, please find an extracted first 200ns of sim time converted to VCD here.
Yes, the simulation/sim.sh control script should be used together with ncsim simulator. Which simulator are you using, is it vcs? I think it shouldn't be hard to convert it to vcs control script.
I will be off for next 3 days having a limited access to emails, please excuse in advance a delays with reply.
Thanks.

@vogelpi vogelpi self-assigned this Jun 14, 2019
@vogelpi
Copy link
Contributor

vogelpi commented Jun 14, 2019

Hi Anton,
thanks for providing the VCD. We are using Verilator at the moment.

I am pretty sure I was able to track down the bug: The illegal instruction exception is triggered because the decoder gets indeed an illegal instruction from the IF stage. The actual problem is in the instruction fetch FIFO ibex_fetch_fifo.sv. In the particular case, the core wants to fetch from address 0x262 which is unaligned. For unaligned accesses, the fetch FIFO does actually two fetches and combines the data obtained internally. The first fetch is used to get the lower two bytes of the actual instruction (returned in the higher two bytes from memory) and the second fetch is used to get the higher two bytes (returned in the lower two bytes from memory). Now the actual problem is that the first of these two fetches is issued to address 0x262 (it should go to 0x260) and the second one to 0x264. As a result, the instruction resulting from the combination of these two fetches is wrong (0x028B_028B instead of 0x028B_2783). Both the lower and higher two bytes are the same as the higher two bytes of the fetch at 0x262 are the same as the lower two bytes fetched at 0x264.

The same should also happen when doing unaligned loads and stores as the load-store unit ibex_lsu.sv implements unaligned accesses in a similar way. This would probably be even harder to detect as it does not generate an illegal instruction exception.

The reason why no one ever ran into this problem so far is as follows: Most memories only support word-aligned accesses, the 2 LSBs of the address are simply ignored. Thus, the first access effectively returns the data at 0x260 instead of 0x262. In contrast, the memory in your simulation supports unaligned accesses, which together with the bug in the fetch FIFO triggers the wrong behavior.

We need to create a fix for this. But since this will take a while, I suggest to do the following for now. In ibex_dut.sv, on Line 77, replace

.instr_addr_i ( instr_addr[RAM_ADDR_WIDTH-1:0] ),

by

.instr_addr_i ( {instr_addr[RAM_ADDR_WIDTH-1:2], 2'b00}),

Similarly, on Line 82, replace

.data_addr_i ( data_addr ),

by

.data_addr_i ( {data_addr[RAM_ADDR_WIDTH-1:2], 2'b00}),

Please let me know, if this solves your problem or not.
Have a nice weekend,
Pirmin

@AntonBabushkin
Copy link
Author

Hi Pirmin,
Thank you so much for issue analysis and fitting suggestions.
I understand that acc to your investigation due to presence of a bug in the ibex_fetch_fifo.sv the core is not able to properly fetch half-word (16-bit) aligned instructions.

I've tried to implement a workaround on my end by:

  1. Aligning all sections in linker script to 4 bytes
  2. Compiling for RC32IM instead of IMC ISA to bypass compression extension and avoid half-word (16-bit) compressed instructions generation

So that produced binary will have only 32-bit instruction aligned on 32-bit address.
As for data access performed by LSU pointed-out by you - I don't have a good workaround, except of making all my variables of int type which is true at least for Simple Test (part of my SW test).
Also I've switched off compiler optimization by setting optimization level to O0.

But still I have faced 2 issues that I'd like to report back to you:

  1. "Branch decision X" assertion failure (please see simulation/irun.log):

[1220ns] SW tests execution started
ncsim: *E,ASRTST (../ibex-master/rtl/ibex_id_stage.sv,691): (time 4810 NS) Assertion tb.dut_top.riscv_dut.dut.rv32imc.uut.ibex_core.id_stage_i.__assert_1 has failed
Branch decision is X
[9530ns] SW tests execution started

  1. Improper SW execution due to no return from function (please see sw_tests/firmware_rv32im.lss and waves_extract_2.vcd.tar.gz):
    image
    It seems that ret operation isn't executed by the CPU.

I can't tell whether #⁠1 and #⁠2 are linked or not, so I thought that might be appropriate to report both.

An updates test case can be found here.
A VCD file for first 10200ns of simulation time can be found here.
I'll try to recreate a setup using Verilator to ease issue reproducibility on your end, but it might take some time since I am new to it.

Thank you in advance for your help in issue resolution.
Have a great weekend too!

Best regards,
Anton

@AntonBabushkin
Copy link
Author

Hi Pirmin,
I've create a setup using Verilator for your convenience. It can be found here.
You can run the simulation bu launching sim.sh script in the verilator_sim directory.
After simulation launching the sim.log and trace.vcd files will be created for debugging.

Surprisingly Verilator exhibits different behavior. The core reports:

11125: Illegal instruction (core 0) at PC 0x000000c4: 0x00010513
11155: Illegal instruction (core 0) at PC 0x00000084: 0x00010413

From listing file (sw_tests/firmware_rv32im.lss):

c4: fcd42823 sw a3,-48(s0)
...
84: 0001c137 lui sp,0x1c

Also by observing the testbench signals I can tell that my SW Tests are not executed properly.
Could you please take a look.
Thanks.

@imphil
Copy link
Contributor

imphil commented Jun 16, 2019

@AntonBabushkin please open a new issue for non-related issues to keep the discussion focused and actionable. Could you do that?

@imphil imphil added Component:RTL RTL issue Type:Bug Bugs labels Jun 16, 2019
@AntonBabushkin
Copy link
Author

AntonBabushkin commented Jun 17, 2019

Hi Philipp,
Thank for your comments.
Unfortunately due to my limited knowledge of the ibex core internal organization it's hard for me to judge whether those issues are related or not. To me it all sounds like "instruction decoding issues".
Please note that I am always referring to the exact same test case, it was just 1st adopted incorporating workarounds kindly proposed by Pirmin and 2nd Verilator support was added to the setup to easy reproducibility on your end - that's why observed behavior have changed a little bit. However to my believe they are still a part of a same "instruction decoding issue", I do acknowledge that I might be missing something here.
Yes, for sure I amble to split issue into a multiple issues, I just need your hand in dividing it into smaller relevant non-related trackable and actionable pieces. So could you please kindly suggest how to proceed. What are a distinct non-related things involved here to your opinion?
Thanks.

@vogelpi
Copy link
Contributor

vogelpi commented Jun 18, 2019

Hi @AntonBabushkin ,

thanks a lot for reporting these issues in the first place, for providing additional information, and also for generating a Verilator environment! This eases our work a lot!

I was traveling but will now look into your latest reports. Once I get a better understanding, I let you know if it is related to the original issue or not. If it is not related, we better create a new issue and continue the discussion there as suggested by @imphil .

Best,
Pirmin

@AntonBabushkin
Copy link
Author

Hi Pirmin,
Thank you for your willingness to help, it's much appreciated.
Sure, please let me know if some of the reported observations are not correlated and I'll gladly move them into appropriate issues.
Thanks.

Best regards,
Anton

@vogelpi
Copy link
Contributor

vogelpi commented Jun 19, 2019

I had a look at both simulations. The first two observations are definitely related to this issue here.

As you noted, your workaround for unaligned accesses in the LSU is not a good one. Together with @luismarques , I had a look at the disassembly of your binary. There are are still unaligned loads and stores. Maybe you linked in some other code that is not under full control of yours? Looking at the waves, one can see that there are still many unaligned loads and stores happening. One of them also leads to an unknown branch decision. This happens as due to the a previous load to an unaligned address, the memory returns only a partially initialized value. This is exactly what I meant by

The same should also happen when doing unaligned loads and stores as the load-store unit ibex_lsu.sv implements unaligned accesses in a similar way. This would probably be even harder to detect as it does not generate an illegal instruction exception.

But luckily there is the assertion failing. The only reliable fix seems to be what I suggested. Please use it while I am working on a proper fix.

Regarding the second issue (ret operation is not executed), I did not further investigate. There are many unaligned loads and stores happening before that. And I strongly believe that these mess up the processor state and the execution. I suggest to check if this still happens when you actually enable the suggested fix.

@vogelpi
Copy link
Contributor

vogelpi commented Jun 19, 2019

As for the illegal instruction exception occurring in the Verilator simulation, I am not 100% sure what the issue is. It is still there if I enable my fix. And the reason it is there is that the software at some point overwrites the instruction memory with something that is not a valid instruction (instruction 0xfce7a623 at address 0x958 writes 0x8 to address 0xC4). Why it does not happen with SimVision I do not know. It could be that it does not happen there because the processor state is messed up already (due to unaligned loads and stores).

@AntonBabushkin Could you please:

  1. Apply my fix suggested above.
  2. Rerun the simulation in SimVision
  3. Then check if the problem still occurs
  4. If yes, create a new issue here.

Best regards,
Pirmin

@AntonBabushkin
Copy link
Author

AntonBabushkin commented Jun 19, 2019

Hi Pirmin,
Thank you a lot for the analysis.
I've tried to implement the fix proposed by you:
image
and as far as I can tell it doesn't change the behavior much. Please see two ncsim log files attached.
irun_w_fix.log (with fix)
irun_wo_fix.log (without fix)
as can be seen the behavior is the same.
I've also recorded a VCD for your reference that is also attached to this issue.
waves_extract_w_fix.vcd.tar.gz

Also I'd like to report that when I removed all 16-bit variables from my code my SW test passes and no warnings or errors are reported by the ibex core.
So I think we can conclude that I've faced an issue with:

  1. non 32-bits aligned instructions (that is cause, as you mentioned by the bug in ibex_fetch_fifo.sv)
  2. non-32-bits aligned data access, which is most likely caused by a bug in the LSU

Please feel free to rename the caption of this issue accordingly, or let me know if you want me to do it.
Also kindly let me know if you want me to rename this issue to item #⁠1 from the list and to create new issue to cover item #⁠2.

I'd be also quite interested to know when one can expect those issues to be resolved.

Again, thank you so much for your help in supporting me with that.

Best regards,
Anton

@vogelpi vogelpi changed the title lw instruction not supported? Unaligned instruction fetch and data addresses Jun 20, 2019
@vogelpi
Copy link
Contributor

vogelpi commented Jun 20, 2019

Hi Anton,
thanks for checking. I thought you did not test the fix as it was commented out in your code. Would mind to record a longer VCD with the fix enabled, please? The one you uploaded only includes the first 2us. The first error appears at roughly 4.8us (according to your logs).

I renamed this issue and created the fix. It is in PR #82 and waiting to get reviewed.

@AntonBabushkin
Copy link
Author

Hi Pirmin,
Sorry for a misleading print-screen, the fix was actually enabled during VCD recording. And yes, it contains only first 2us, so please find attached a 12us long version of it (also log file can be found in the zipped folder).
waves_extract_w_fix.vcd.tar.gz
Thanks.

@vogelpi
Copy link
Contributor

vogelpi commented Jun 20, 2019

Hi Anton,
thanks for sending me the full VCD. It looks like the core reads uninitialized memory several times, and as far as I can see, mostly around function calls. It is very hard to debug this though. If I understood correctly, the very same binary works on other cores and Ibex works fine when using the rv32 compiler. Would you mind to send me the assembly of the 32-bit binary working correctly, please? This would allow me to compare the two side-by-side and with the waves.
Thank you.

@AntonBabushkin
Copy link
Author

Hi Pirmin,
I've just tried locally the PR #82 (by replacing ibex_load_store_unit.sv taken from commit e50d51c and ibex_prefetch_buffer.sv taken from commit 3e44b15 files in my test case) but unfortunately I still got exactly same behavior.
The message from ibex core is:

ncsim: *E,ASRTST (../ibex-master/rtl/ibex_id_stage.sv,691): (time 4810 NS) Assertion tb.dut_top.riscv_dut.dut.rv32imc.uut.ibex_core.id_stage_i.__assert_1 has failed
Branch decision is X

Please find the VCD file attached.
waves_extract_pr82.tar.gz

Yes, the same binary works just fine with picoRV32 and RI5CY cores.
You can find it in the previously attached test case under ibex_issue_75/sw_tests folder, the filename is firmware_rv32im.memh.

I also would like to comment on the

Ibex works fine when using the rv32 compiler

statement to avoid any misunderstanding.
The ibex core does not work properly even with rv32 compiler. It works properly only if:

  1. C extension is disabled AND
  2. only int (32-bits) variables are used in the C-code

Thanks.

Best regards,
Anton

@AntonBabushkin
Copy link
Author

AntonBabushkin commented Jun 20, 2019

Hi Pirmin,
I've create a setup for your convenience instantiating both ibex and ri5cy CPUs in the same DUT handler by Verilator.
The test case can be found here: ibex_issue_75_ibex_plus_ri5cy.zip.
Both cores are executing the same binary.
The ri5cy works perfectly fine, while ibex core unfortunately fails (in a same way as before).
You can run the simulation by execution sim.sh script under ibex_issue_75/verilator_sim.
The ibex core doesn't incorporate PR #82 fix.
UPDATE: I've just tried to re-run the simulation with fix, and unfortunately it doesn't help.
Thanks.

Best regards,
Anton

@vogelpi
Copy link
Contributor

vogelpi commented Jun 21, 2019

Hi Anton,
thanks for reporting back, clarifying and for providing the new setup. I am still investigating.

As a side note: I knew that the fix in PR #82 does not solve the problem completely. But it at least solves the first part of this issue. So it is needed.

Anyway, I let you know when I find out more.
Best,
Pirmin

@vogelpi
Copy link
Contributor

vogelpi commented Jun 24, 2019

Hi @AntonBabushkin ,
I could finally track down the bug you were facing. There is an issue in the latch-based register file (ibex_register_file.sv). I do not yet fully understand the bug. It seems that under certain circumstances, write operations to the register file are not properly handled. In the VCD file you provided, this happens at 4150ns (see first marker from the left in screenshot below). The core should write 0x0000_1000 in wdata_a_i to Reg 15. Unfortunately, the simulation did not record the actual content of the register file. Anyway, upon the next read from Reg 15 (second marker at 4190ns), the core gets back 0x0001_FFC0 in rdata_a_o. Interestingly, this is the same value that the core actually writes to Reg 15 in this very same cycle. This value is then used to compute the address of a load (marker 3). Since the address is wrong, the memory returns uninitialized data X. The core continues to write this value back to address 0xFFB4 in memory (marker 4). Finally, the core reads this address again inside/before the cordic_rotation_kernel function. Since the value returned is again X, the assertion about the branch decision being X fails in SimVision (Verilator does not support assertions).

reg_file

To get around the bug, please use the flip-flop based register file for now (ibex_register_file_ff.sv instead of ibex_register_file.sv). With this one, the wrong behavior in the register file does not occur when I rerun your environment in Verilator.

As said, I do not yet fully understand the bug. I am not sure if we messed up something here when cleaning up the code, or if it is a simulation bug we are facing. But I know that the latch-based register file has been used successfully for many tapeouts. Thus, I don't think the original latch-based register file is fundamentally wrong. I will report back once I know more.

Please let me know, if you can run the simulation with the flip-flop-based regsiter file (ibex_register_file_ff.sv).

Best regards,
Pirmin

@AntonBabushkin
Copy link
Author

Hi Pirmin,
Thank you so much for your reply.
Usage of ff-based register file (ibex_register_file_ff.sv) solves the problem indeed.

Best regards,
Anton

@vogelpi
Copy link
Contributor

vogelpi commented Jun 25, 2019

Thanks for reporting back @AntonBabushkin . Together with @luismarques and @gautschimi , I could nail down the bug. The latch-based register file itself is fine but the implementation of the clock gate in prim_clock_gating.sv is not compliant with the latch-based register file.

For SimVision, you can try to use cluster_clock_gating.sv instead (you have it in your simulation directory). This clock gate uses a latch inside to only pass the enable through while the input clock signal is low. You also might need to change the configuration of the simulator and use the -delay_mode_zero and -delay_trigger flags.

Verilator does not support this as it is not an event-based but cycle-accurate simulator.

We have to adapt our documentation and possibly the file names to make sure no one is running into this problem again. I will now create a new issue to track this work.

One more times, thank you for reporting the bug and for your valuable input!

@vogelpi
Copy link
Contributor

vogelpi commented Jun 25, 2019

The first problem discussed in this issue related to misaligned instruction and data addresses has been resolved in #82 . The second problem related to the clock gate for the latch-based register file is now tracked in #93 .

@vogelpi vogelpi closed this as completed Jun 25, 2019
@gautschimi
Copy link
Contributor

I think the flags -delay_mode_zero and -delay_trigger do not help. the register_file really expects a proper clock gate with a latch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants