New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in netif_rx_internal
#471
Comments
Closing this as this is not related to MPTCP. I will report it to netdev ML if I manage to reproduce it. |
I had probably the same issue, just after having sent a ping (in v6 I suppose), this time with a decoded stacktrace, still on top of
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7537561906/job/20516659466 EDIT: we have |
Yet another one, on top of net-next + net:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7545349968/job/20540751697 |
Note that Eric suggests this is probably an issue on x86's side. We had another stack trace:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7550415246/job/20556002186 |
Because it is impacting us with the CI, I suggest to reopen it for the moment. |
I managed to reproduce it manually by:
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 7898d62fce0b..52320cb95d31 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -852,6 +852,7 @@ done
mptcp_lib_result_code "${ret}" "ping tests"
stop_if_error "Could not even run ping tests"
+exit ${final_ret}
[ -n "$tc_loss" ] && tc -net "$ns2" qdisc add dev ns2eth3 root netem loss random $tc_loss delay ${tc_delay}ms
echo -n "INFO: Using loss of $tc_loss " Sometimes, it is "quick" (~10 attempts), but sometimes it takes more than 100 attempts. I started to do a Git bisect, but I can still reproduce it on a v6.4 kernel for example. The Cirrus CI (KVM) never complained about that, so maybe an issue with TCG that is used instead of KVM? Maybe an issue with QEmu? I tried to upgrade it to the v8 (currently on the v6.2), but virtme sets QEmu options that are no longer supported... |
After a few long Now... surprisingly, this commit is 8e791f7 ("x86/kprobes: Drop removed INT3 handling code"): a modification in I guess the best is to report this to the author of the patch.
|
Note that I just managed to reproduce it on top of the export branch (export/20240119T055335), after having done a ping in IPv4 this time:
What was being done in userspace: Full version
Here is the stripped version to ease the reading:
Kernel config: |
The issue has been reported to the x86 ML: Lore |
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Short summary of the discussion we had on lore:
Next steps are:
|
This is helpful for two things: - There is a bug in the QEmu version we use, causing some issues when KVM is not used, see [1] [2]. - BPF selftests need a more recent version of the compiler. This is needed for [3]. Link: multipath-tcp/mptcp_net-next#471 [1] Link: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ [2] Link: multipath-tcp/mptcp_net-next#406 Co-developed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
INT3 is used not only for software breakpoint, but also self modifying code on x86 in the kernel. For example, jump_label, function tracer etc. Those may not handle INT3 after removing it but not waiting for synchronizing CPUs enough. Since such 'ghost' INT3 is not handled by anyone because they think it has been removed already. Recheck there is INT3 on the exception address and if not, ignore it. Note that previously kprobes does the same thing by itself, but that is not a good location to do that because INT3 is commonly used. Do it at the common place so that it can handle all 'ghost' INT3. Reported-by: Matthieu Baerts <matttbe@kernel.org> Closes: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/ Closes: #471 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Fixes: 8e791f7 ("x86/kprobes: Drop removed INT3 handling code") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
reopening: we still have the issue without the workaround (kernel patch) with QEmu 8.0.4 that is installed in the docker used by the CI to execute the tests. Next steps: try to identify the fix on QEmu side and have it backported (or upgrade QEmu manually?) |
Note that we just had the issue with the kernel patch as a workaround:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/7918253003/job/21616220869 |
The issue has been fixed in QEmu v8.1.0, but not backported earlier. And it looks like there will not be any new v8.0 releases. The fixes on QEmu's side:
There are some conflicts when backporting them to v8.0.4, but it is not blocking. I resolved the conflicts and pushed these 3 commits in this branch: https://gitlab.com/matttbe/qemu/-/commits/lp-2051965/ Thanks to Canonical Server devs, Ubuntu 23.10 (and maybe 22.04 too) will get a new version with the fixes. Once it is available, we can revert the kernel patch acting as workaround, and close this issue. For more details: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/2051965 |
QEmu 8.0.4+dfsg-1ubuntu3.23.10.3 in Ubuntu 23.10 now includes a fix to avoid the kernel panic. I then reverted the workaround from our tree. Note that the workaround is also no longer needed since all our CIs are now using KVM support #474 New patches for t/upstream-net and t/upstream:
Tests are now in progress: |
A kernel panic has been detected by the CI (no debug kconfig).
Click to expand but probably ignore this one, no debug info
It looks like it is not related to MPTCP. Due to a global timeout, the trace has not been decoded and the
vmlinux
file has not been saved.Anyway, logging it here, just in case. I just relaunched the job, hoping to be able to reproduce it (no issues on my side).
The text was updated successfully, but these errors were encountered: