-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic under heavy network load #19
Comments
@dch thanks for all of the details! Btw, was it GENERIC kernel you tested on? |
On Thu, 23 Mar 2023, at 10:54, dsl wrote:
@dch <https://github.com/dch> thanks for all of the details! Btw, was
it GENERIC kernel you tested on?
yes.
|
I've been seeing this a lot (like every 5-10 minutes) after 0d574d8 with https://reviews.freebsd.org/D40094 |
sorry it took a while but 718bdb6 is the culprit. Reverting this & we're all ok again. |
Correct, I thought that was clear from the original title & updated comment.
|
@dch Thanks for a summary, that's how I understood the issue. Its root cause is in the different channels accessing bus_dma resources concurrently, I assume. You won't see those panics with the only channel up and running. Just FYI, I'm trying to isolate channels within their own tasks and limit an access to shared resources as much as possible. |
@dch I've prepared a lot of changes in the https://github.com/mcusim/freebsd-src/tree/dpaa2 branch. Could you try it? GENERIC kernel had worked for me under high network load for ~14 hours when I stopped the test myself. Btw, I've also discovered that the kernel panics with "undefined instruction" when the Ten64's SoC is heated up to 80-90C ( |
It should be fixed on CURRENT with https://cgit.freebsd.org/src/commit/?id=58983e4b0253ad38a3e1ef2166fedd3133fdb552 merged in. |
so far LGTM on 15.0-CURRENT - a 3h test (albeit on 1G ifaces only) is stable. thanks @dsalychev |
I'm on
It's able to link up when plugged in via loopback, but not when I plug in to Ten64. I haven't reported it yet, because I still haven't tested it working under Linux. |
using e04c4b4 this still stable. thanks! |
Good to know :) Thanks for testing! |
netlink(4) calls back into the driver during detach and it attempts to start an internal synchronized op recursively, causing an interruptible hang. Fix it by failing the ioctl if the VI has been marked as DOOMED by cxgbe_detach. Here's the stack for the hang for reference. #6 begin_synchronized_op #7 cxgbe_media_status #8 ifmedia_ioctl #9 cxgbe_ioctl #10 if_ioctl #11 get_operstate_ether #12 get_operstate #13 dump_iface #14 rtnl_handle_ifevent #15 rtnl_handle_ifnet_event #16 rt_ifmsg #17 if_unroute #18 if_down #19 if_detach_internal #20 if_detach #21 ether_ifdetach #22 cxgbe_vi_detach #23 cxgbe_detach #24 DEVICE_DETACH MFC after: 3 days Sponsored by: Chelsio Communications
this only reproduces when more than usual cross-dpaa interface traffic is present.
I can trigger it using iperf3 reliably. This is using normal CURRENT, not fork.
while true; vmstat -i | grep dpaa2_io; sleep 1; end
top -SjwHPz -mcpu
at moment of crash (tmux over mosh)The text was updated successfully, but these errors were encountered: