Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reporting traps (faults) to GDB in SE mode #166

Merged
merged 3 commits into from Aug 17, 2023

Conversation

janvrany
Copy link
Contributor

@janvrany janvrany commented Aug 8, 2023

This addresses #123

@shingarov shingarov self-requested a review August 8, 2023 15:57
Copy link
Contributor

@shingarov shingarov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be as @janvrany says, and empirically, it happens to be so on every physical RISC-V machine I have around here.

Still, there may be some reason why we are both wrong; @gabemblack what was your rationale for adding lines 167, 168?

@BobbyRBruce BobbyRBruce self-assigned this Aug 8, 2023
@BobbyRBruce BobbyRBruce added sim-se gem5's Syscall Emulation Mode gdb The GNU Debugger arch-power The POWER ISA arch-riscv The RISC-V ISA labels Aug 8, 2023
@BobbyRBruce BobbyRBruce linked an issue Aug 8, 2023 that may be closed by this pull request
@janvrany
Copy link
Contributor Author

janvrany commented Aug 9, 2023

Looking at failed test, I do not really understand what went wrong... Has anyone any idea?

@powerjg
Copy link
Contributor

powerjg commented Aug 10, 2023

I think the VM running the tests died. Can you rebase and force push to update the branch? This will kick off the tests again.

@shingarov
Copy link
Contributor

Well I did a CI re-run yesterday and it didn't make any difference. So yeah, very strange.

@powerjg
Copy link
Contributor

powerjg commented Aug 11, 2023

Ah, it's hitting a timeout. I'm not totally sure why... I would try running the test that it failed on timing-cpu_2-cores_no_cache_DualChannelDDR4_2400_arm_boot_test_to-tick-ALL-x86_64-opt

@mkjost0, do you have any suggestions on diagnosing this?

@janvrany
Copy link
Contributor Author

@powerjg I did run

./build/ALL/gem5.debug -d /tmp/gem5out532w5i9e -re --silent-redirect tests/gem5/arm_boot_tests/configs/arm_boot_exit_run.py --cpu timing --num-cpus 2 --mem-system no_cache --dram-class DualChannelDDR4_2400 --resource-directory tests/gem5/resources --tick-exit 10000000000

on my machine and it terminated successfully after ~20 secs.

@powerjg
Copy link
Contributor

powerjg commented Aug 14, 2023

Let's see if it works this time. Maybe we got unlucky and ran on a really slow CI core or something.

@janvrany
Copy link
Contributor Author

@powerjg I think it got stuck again. Is there a way I can run a single testcase

timing-cpu_2-cores_no_cache_DualChannelDDR4_2400_arm_boot_test_to-tick-ALL-x86_64-opt

on my machine? (I realized that what I run above was problably the previous one)

@mkjost0
Copy link
Contributor

mkjost0 commented Aug 14, 2023

The smallest unit of test we can run is by SuiteUID, which you can run by using ./main.py run --skip-build --uid SuiteUID:tests/gem5/arm_boot_tests/test_linux_boot.py:timing-cpu_2-cores_no_cache_DualChannelDDR4_2400_arm_boot_test_to-tick-ALL-x86_64-opt -vvv However, it looks like the command you ran above should accomplish the same thing as running this command.

@janvrany
Copy link
Contributor Author

@mkjost0 Thanks. I tried that but still, "works for me".

I'm not sure what else to try locally, so let me just force-push one commit at time, starting with no-change commit, to see what happens.

@janvrany janvrany force-pushed the pr/fix-se-traps branch 2 times, most recently from d9ca5f2 to 1576435 Compare August 15, 2023 14:22
@janvrany
Copy link
Contributor Author

OK, so it looks like commit 7e64e7a is causing the hangup.

In theory that change may cause infinite loop (and therefore CI timeout) for RISC-V SE workloads if that workload depends on (incorrect) behavior of PC being increased before reporting trap.

However:

  • regardless of PC value, invokeSE() is going to panic() so it should not matter and
  • the that that hangs is ARM, so change in RISC-V should not affect it.

So, I do not understand that's going on here and out of ideas how to debug it.

@janvrany
Copy link
Contributor Author

Ah! It is not ARM test that hangs, it is indeed RISC-V test that hangs - only the CI output is truncated! Now I can debug it.

Note to myself: this is what fails:

./build/ALL/gem5.opt --debug-flags=Exec ./tests/gem5/asmtest/configs/simple_binary_run.py rv64samt-ps-sysclone_d atomic riscv --num-cores 4 --resource-directory ./tests/gem5/resources

@powerjg
Copy link
Contributor

powerjg commented Aug 16, 2023

Maybe we should make this PR be just the power change. I wonder if the gdb issue on RISC-V may be a different problem/solultion.

Comment on lines -167 to -168
inst->advancePC(pc_state);
tc->pcState(pc_state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think move the code section to SyscallFault::invokeSE(ThreadContext *tc, const StaticInstPtr &inst) will pass to test CI

@janvrany
Copy link
Contributor Author

@rogerchang23424 Thanks, this is exactly what I did.

janvrany added a commit to janvrany/MachineArithmetic that referenced this pull request Aug 17, 2023
This commit moves all smalltalk sources to `src` subdirectory.
This is needed in order to use (shared) smalltalk-makefiles,
see discussion in PR 166 [1]. Moreover, this is how it is done
in other packages (ArchC, Tinyrossa and so on).

[1]: gem5/gem5#166
Due to inverted logic in POWER fault handlers, unimplemented opcode and
trap faults did not report trap to GDB (if connected). This commit fixes
the problem.

While at it, I opted to use `if (! ...) { panic(...) }` rather than
`panic_if(...)`. I find it easier to understand in this case.

Change-Id: I6cd5dfd5f6546b8541d685e877afef21540d6824
On RISC-V when trap occurs the contents of PC register contains the
address of instruction that caused that trap (as opposed to the address
of instruction following it in instruction stream). Therefore this commit
does not advance the PC before reporting trap in SE mode.

Change-Id: I83f3766cff276312cefcf1b4ac6e78a6569846b9
This commit add code to report illegal instruction and breakpoint traps
to GDB (if connected). This merely follows what POWER does.
@janvrany
Copy link
Contributor Author

OK, I did what @rogerchang23424 suggested and now all checks pass. I think this is ready for review now.

@powerjg powerjg merged commit 22c52f4 into gem5:develop Aug 17, 2023
5 checks passed
@shingarov
Copy link
Contributor

Wait, what?

janvrany added a commit to janvrany/MachineArithmetic that referenced this pull request 16 hours ago

What does this have to do with janvrany/MachineArithmetic@81b31f1 ??

@janvrany janvrany deleted the pr/fix-se-traps branch August 18, 2023 09:17
@janvrany
Copy link
Contributor Author

What does this have to do with janvrany/MachineArithmetic@81b31f1 ??

Ah, sorry! Indeed this is completely unrelated, I just accidentally pasted wrong URL into commit message of MA. Fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-power The POWER ISA arch-riscv The RISC-V ISA gdb The GNU Debugger sim-se gem5's Syscall Emulation Mode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reporting traps (faults) to GDB in SE mode is broken
6 participants