-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_unit: Fix #356 uncached loads served from WB #368
Conversation
I have checked in a modified system containing a testbench with UART software. This appears to run in VCS, I would imagine it would be OK in modelsim as well. I have deleted the old cache because the definitions were conflicting (and it didn't work anyway ?). I haven't been able to find any difference in behaviour yet (other than reduced performance), so perhaps my theory about why it goes wrong is bogus, or else a more complicated interaction. To use, just check out recursively and run 'make simv', |
I have added a simulation of the SD-Card bridge to my branch above. This was the peripheral that was giving the trouble previously. I have added code that initializes that peripheral also. A strange thing happens, a spurious 2-cycle read of the current address (highlighted in red on the attached scrrenshot) is added part way through my list of programmed instructions. This peripheral's interface is different in that it uses axi2mem instead of axi2apb. Interrupts are never enabled in this simulation. You need an installation of Vivado setup with the usual environment variables for this simulation to run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have discovered that the miss unit can reorder uncachable transactions, the following might help:
@@ -388,6 +388,9 @@ module wt_dcache_missunit #(
end else begin
state_d = DRAIN;
end
// check for non-cached read whilst write buffer is non-empty, if so drain but still allow more stores
end else if ((|miss_req_masked_d) && (!miss_is_write) && miss_nc_i[miss_port_idx] && !wbuffer_empty_i) begin
state_d = DRAIN;
// we've got a miss to handle
end else if (|miss_req_masked_d) begin
// this is a write miss, just pass through (but check whether write collides with MSHR)
Looking at the AXI cache interface the cache seems to do the right thing (see attached picture). It is issuing exactly one load of size 3 (64 bit) ( How is the SD Card module connected? My new working assumption would be that the AXI bus is somehow smaller (32 bit) and the downsizer module is somehow translating the 64bit access into two 32bit accesses to (incorrect) address Can you comment on how you connected the SD module? |
As you can see from |
I think I might now understand what the problem is: I guess the core is making speculative reads to the IO region. The @msfschaffner took a look at your patch and concluded that this is not the final fix, that is the reason why I am trying to understand what is happening here. Can you please point me to the binary file. I can't reverse engineer this from the |
The binary is checked in here: https://github.com/jrrk2/ariane/blob/uncached_loads_bug/fpga/src/bootrom/bootrom.elf Normally you can just recreate with: make -C fpga/src/bootrom all The disassembly with interspersed source is here: https://github.com/jrrk2/ariane/blob/uncached_loads_bug/fpga/src/bootrom/bootrom.dis |
Thanks for your support. I've pushed a commit which should fix the idempotency issue you were seeing. I've declared the whole NC region as non-idempotent in
Please let me know how this works for you and whether it magically solves the P.S.: Somewhere you mentioned that the "wide-fetch" patch did not work for you. Which patch did you exactly mean? |
The wide fetch patch I meant was the generalisation of the front end to receive 64 bits from the icache into the realigner instead of 32. But perhaps this is not your current thinking. The test case that appears to fail was a 32-bit instruction sandwiched between two 16-bit instructions. But it would take a while to reproduce this. |
Yeah that was exactly what I meant, thanks for clarifying. That's the patch that we have on the |
I can confirm the latest patch is OK on FPGA to merge to master by the end of tomorrow I think. the dev branch doesn’t boot Linux if I remember rightly. |
This is a tentative fix for #356. I would very much appreciate your feedback @jrrk as I don't have an isolated test case for it.
What the fix does is basically checking whether the write buffer is empty in case we are performing an uncached load. If the WB should not be empty the load is killed and retried once the WB gets empty.