New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in kernel mode with 10 Gbps IXGBE driver on amd64 #136
Comments
I have more info about this problem (I work with @devmusings). It works fine if I use the last ixgbe driver from intel at http://sourceforge.net/projects/e1000/files/ixgbe%20stable/3.22.3/ . The problem is always in ixgbe_xmit_frame_ring. So to conclude, the problem is ixgbe-related, and is corrected in the last version from intel, but is still in the most recent kernel. Maybe a bug report should be reported as it could come from one of the modification in the kernel version of the driver, but it is quite hard to explain the problem as I couldn't really find the source of the problem. For now if someone has the same problem, just use intel's driver... Also, it appears only with some packet generators. Using a loop configuration with Netmap, all goes fine, but using a Tilera to generate packets it fails. The generated packets are the same, generated with quite the same program... Here is the kernel last messages recovered with kdump. The two first lines are the output of the last packet passing through Print() and then a click_chatter I added. The last packet seems fine. [ 1838.399421] chatter: 60 | 90e2ba46 f2e067c6 697351ff 08004510 002e0000 40004011 |
If this makes you feel any better (or worse) my team had similar experiences with that driver and we always use the Intel version now. Sorry I didn't see this issue earlier since I probably could have saved you some time. |
I spoke too quickly. It's maybe another problem but it doesn't work if I use multiqueue... This seems to be because even with single-thread click, packet_notifier_hook() in fromdevice.cc can be called concurrently, as they are multiple interrupts comming from the card on multiple CPUs. Adding a big lock fixes the problem. I double check that, think a little to a better solution (I'd say atomic increment on the queue head) and come back with a patch... |
This was solved by #182 |
Hello,
I tried to use Click in linuxmodule mode for various kernels between 2.6.32 and 3.16 with 10 Gbps IXGBE cards, and it always crashes with a page fault in
ixgbe_xmit_frame_ring()
after sending a few packets (only 144 in my last test with only 2 packets per second). Our server is an Intel-based amd64 platform, using Debian jessie (except for the 3.16 kernel try, which used sid).The relevant part of the kernel panic stack trace is as follows:
Investigating
ToDevice::queue_packet()
with the help of addedclick_chatter()
and early return, I think the problem occurs in the call todev->netdev_ops->ndo_start_xmit(skb1, dev)
.I can reproduce the problem even with the trimmed-down configuration that follows:
For completeness, I had to take the following steps to compile kernel-mode Click (most of them suggested in the other issues).
/usr/src/linux-headers-VERSION-amd64
and/usr/src/linux-headers-VERSION-common
./boot/config-VERSION
and/boot/Sytem.map-VERSION
to the source directory as.config
andSystem.map
, respectively./usr/src/linux-headers-VERSION-merged/include/generated/autoconf.h
to/usr/src/linux-headers-VERSION-merged/include/linux/autoconf.h
.#undef DEPRECATED
ininclude/click/handler.hh
(as it is defined inlinux/printk.h
, which is apparently included byhandler.hh
)../configure --disable-userlevel --enable-linux-module --with-linux=/usr/src/linux-headers-VERSION-merged
(I tried first with other options like multithread but the problem also occurs with this simpler configuration).The text was updated successfully, but these errors were encountered: