New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pty04 sporadically fails #674
Comments
This looks like an issue I thought I had fixed by limiting the MTU to the internal chunk size. Probably the read is sleeping waiting for "unthrottle" which is not sent by the PTY with SLIP. I was debating whether this is a kernel bug or not and decided I didn't want to risk breaking anything in the kernel by implementing it (it is probably sent by TTY drivers for physical ports, just not PTY). Probably I should use asynchronous reads instead to avoid sleeping in the kernel. |
IMHO before merging 3c8f84e it was on every second run (detected with |
If I run 10 x
and
and timeouts similar to what Jan reported. This looks like data corruption happening somewhere in the kernel, but I don't see how. |
It seems I have fixed the timeouts by retrying the reads. Even though they are blocking and the read size is the same as the packet size, it seems read can return before the full packet is read which is probably what I should have assumed from the start. However it also seems I am getting SLIP packets on SLCAN bound sockets or vice versa.
It doesn't appear netdevice indexes or PTS numbers are reused. Perhaps some line discipline structure is reused? Or it has something to do with binding to raw sockets. At any rate I will send another patch to the mailing list. |
I sent another patch which increases the timeout amongst other tings. This seems to fix it in the simple case. It seems the kernel stalls; possibly trying to allocate memory. Sometimes it even returns ENOBUFS and prints a stack trace for the memory failure followed by OOM killer:
|
@richiejp Well, previous issue has not been fixed, so feel free to reopen. But IMHO that needs kernel fix and at least original report for |
I sent in yet another patch to fix a bug in Also it appears the test fails on PowerVM with this:
Possibly that is a packet from another interface, or something else. |
Yup it was a packet from another interface. By default raw packet sockets collect info from all interfaces. I have sent another patch to the ML. |
Still sporadically fails |
Just hit it too with 5.7.0-b23c477 on x86_64
|
Any update on this issue? |
Nope, sorry. It is on my backlog, but low priority. Are you using SLIP or SLCAN or is the issue just that the tests are randomly failing? |
It is a random failure in our daily LTP test. |
On 5.10 RC I keep hitting a softlockup inside slc_bump (slcan) if I run the test repeatedly. The stack trace points to different locations within slc_bump and there is no clear infinite loop or anything like that. So this is probably a kernel issue. Will investigate further. |
It appears that sometimes the test sends a lot of data and the kernel requires a long time to process it. The test should work in a reasonably reliable way so I have sent a patch to the mailing list to fix that: I don't think there is an issue with the kernel other than being slow which is probably not a real world issue with SLCAN. |
This should be fixed in e40bcd5 |
pty04 is running sporadically into TBROKs:
5.6.10-3dce3c1 x86_64 (J:4234886)
5.6.10-55754d7 aarch64 (J:4234757)
The text was updated successfully, but these errors were encountered: