Fix loadmem bug in firesim_tsi.cc #1401

jerryz123 · 2023-01-22T10:05:01Z

The default chunk_align (32b) was less than the default host-DRAM interface (64b), which caused drops on some loadmem_writes. Fix this for now by setting chunk_align to 8.

This error would manifest only when the address written had (addr + len) % 8 == {5|6|7}.
For example, writing 0x6 bytes to 0x100 would result in a sequence of writes:

write 2 bytes to 0x104
write 4 bytes to 0x100

The second write would generate a 8-byte write to 0x100, clobbering the first write.

Related PRs / Issues

UI / API Impact

None

Verilog / AGFI Compatibility

Contributor Checklist

Is this PR's title suitable for inclusion in the changelog and have you added a changelog:<topic> label?
Did you add Scaladoc/docstring/doxygen to every public function/method?
Did you add at least one test demonstrating the PR?
Did you delete any extraneous prints/debugging code?
Did you state the UI / API impact?
Did you specify the Verilog / AGFI compatibility impact?

If applicable, did you regenerate and publicly share default AGFIs?

If applicable, did you apply the ci:fpga-deploy label?

If applicable, did you apply the Please Backport label?

Reviewer Checklist (only modified by reviewer)

Note: to run CI on PRs from forks, comment @Mergifyio copy main and manage the change from the new PR.

Is the title suitable for inclusion in the changelog and does the PR have a changelog:<topic> label?
Did you mark the proper release milestone?
Did you check whether all relevant Contributor checkboxes have been checked?

abejgonzalez

Does this have an impact on performance?

jerryz123 · 2023-01-23T19:58:53Z

No. This only affects initial binary loading, and only for the tail case - the last 8 bytes in a section in a ELF.

Even so it doesn't actually change the number of host-DRAM requests - it only aligns the chunked requests properly.

davidbiancolin · 2023-01-23T21:58:25Z

We've had other issues with handling the last bytes of loadmem in the past. In lieu of actually testing the infrastructure better, could we check in a few binaries that have elf sizes that trigger this problem?

jerryz123 · 2023-01-23T22:14:47Z

I think the right way to do this is to add some self-test in firesim_tsi_t. Checking some sequence of write/read commands can validate that the host-DRAM interface is sane before the target payload is loaded.

One of the test vectors should check this case. It is sufficient to check that:

write(0x8000_0000, 6, &data);
read(0x8000_0000, 6, &data);

behaves correctly.

davidbiancolin · 2023-01-24T17:26:00Z

I think the right way to do this is to add some self-test in firesim_tsi_t. Checking some sequence of write/read commands can validate that the host-DRAM interface is sane before the target payload is loaded.

One of the test vectors should check this case. It is sufficient to check that:
write(0x8000_0000, 6, &data);
read(0x8000_0000, 6, &data);
behaves correctly.

That's a great idea so long as you pull in the address somewhere sane. Could you do that here?

jerryz123 · 2023-01-24T23:37:32Z

That's a great idea so long as you pull in the address somewhere sane. Could you do that here?

It turns out this is actually not so easy to do safely.

The problem is:

SerialBridge, which instantiates fesvr, doesn't know anything about the underlying implementation of target memory (it might not even be FPGA-DRAM)
LoadMem doesn't know the target address map, LoadMem talks directly to DRAM through the host address map

So the only way to add a FESVR-side self-test of the loadmem interface is to add assumptions on the relationship between the FESVR-accessible memory map and the LoadMem widget. I don't think this should be done.

I think there are more pressing issues with the current setup. I believe an ELF file with sections mapped to invalid memory regions can cause a bus hang, or even worse, generate writes to host DRAM regions allocated to unrelated bridges without any indication. This ought to be fixed by moving the loadmem widget behind the AXI4 Xbar in FPGATop (instead of in front of it).

In any case, I'd like this PR to get merged soon, as its blocking a chisel/rc/scala bump. I've improved this fix to pull hKey.dataBits from the generated header, instead of hardcoding it. Fixing the loadmem setup to be more robust should be a longer-term goal.

jerryz123 · 2023-01-24T23:41:26Z

Nvm, my better fix is no longer applicable now that the driver has been split apart more. Hardcoding chunk_align to 8 is the best temporary solution, but the longer term goal IMO should be to more tightly couple the loadmem path to fesvr.

chunk_align has to match the loadmem-DRAM interface width Hardcode this for now

abejgonzalez

LGTM. You should also make an issue so we can track this there as well.

davidbiancolin

I'd like to see the small comment change so this is more trackable, otherwise ship it

sim/firesim-lib/src/main/cc/fesvr/firesim_tsi.h

Co-authored-by: David Biancolin <david.biancolin@sifive.com>

jerryz123 added bug Something isn't working changelog:fixed Put PR title in 'Fixed' section of changelog labels Jan 22, 2023

jerryz123 marked this pull request as ready for review January 22, 2023 10:05

jerryz123 requested a review from davidbiancolin January 22, 2023 10:05

jerryz123 force-pushed the jerryz123-patch-3 branch from 6bca50d to bc67e5f Compare January 22, 2023 20:15

jerryz123 mentioned this pull request Jan 22, 2023

Bump to latest rocket-chip/scala2.13 #1392

Merged

12 tasks

jerryz123 requested review from sagark and abejgonzalez January 23, 2023 18:14

abejgonzalez reviewed Jan 23, 2023

View reviewed changes

jerryz123 force-pushed the jerryz123-patch-3 branch from bc67e5f to 6b45c10 Compare January 24, 2023 23:32

Fix bug in fesvr loadmem path

64ea141

chunk_align has to match the loadmem-DRAM interface width Hardcode this for now

jerryz123 force-pushed the jerryz123-patch-3 branch from 6b45c10 to 64ea141 Compare January 24, 2023 23:46

abejgonzalez approved these changes Jan 25, 2023

View reviewed changes

davidbiancolin approved these changes Jan 25, 2023

View reviewed changes

sim/firesim-lib/src/main/cc/fesvr/firesim_tsi.h Outdated Show resolved Hide resolved

Update sim/firesim-lib/src/main/cc/fesvr/firesim_tsi.h

1b03338

Co-authored-by: David Biancolin <david.biancolin@sifive.com>

jerryz123 mentioned this pull request Jan 26, 2023

Loadmem/Fesvr behavior #1407

Open

jerryz123 merged commit 0b8289a into main Jan 26, 2023

jerryz123 deleted the jerryz123-patch-3 branch January 26, 2023 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loadmem bug in firesim_tsi.cc #1401

Fix loadmem bug in firesim_tsi.cc #1401

jerryz123 commented Jan 22, 2023

abejgonzalez left a comment

jerryz123 commented Jan 23, 2023 •

edited

davidbiancolin commented Jan 23, 2023

jerryz123 commented Jan 23, 2023

davidbiancolin commented Jan 24, 2023

jerryz123 commented Jan 24, 2023

jerryz123 commented Jan 24, 2023

abejgonzalez left a comment

davidbiancolin left a comment

Fix loadmem bug in firesim_tsi.cc #1401

Fix loadmem bug in firesim_tsi.cc #1401

Conversation

jerryz123 commented Jan 22, 2023

Related PRs / Issues

UI / API Impact

Verilog / AGFI Compatibility

Contributor Checklist

Reviewer Checklist (only modified by reviewer)

abejgonzalez left a comment

Choose a reason for hiding this comment

jerryz123 commented Jan 23, 2023 • edited

davidbiancolin commented Jan 23, 2023

jerryz123 commented Jan 23, 2023

davidbiancolin commented Jan 24, 2023

jerryz123 commented Jan 24, 2023

jerryz123 commented Jan 24, 2023

abejgonzalez left a comment

Choose a reason for hiding this comment

davidbiancolin left a comment

Choose a reason for hiding this comment

jerryz123 commented Jan 23, 2023 •

edited