Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux AX25: address already in use error after first connection #352

Closed
Tyler-2 opened this issue May 16, 2022 · 24 comments
Closed

Linux AX25: address already in use error after first connection #352

Tyler-2 opened this issue May 16, 2022 · 24 comments

Comments

@Tyler-2
Copy link

Tyler-2 commented May 16, 2022

I've got Direwolf running as a software TNC with the -p command, and I run kissattach to connect the Linux AX.25 stack to that port.

This works, and when I connect in Pat, my emails are sent and received, and Pat disconnects.

Afterwards, any attempts to connect again are met with:

Remote accepted 4BR6SH6JZATY
Transmitting [Re: Test] [offset 0]
FF
>FQ
2022/05/16 02:25:09 Disconnected.
2022/05/16 02:25:24 Connecting to N4POW-10 (ax25)...
2022/05/16 02:25:24 Unable to establish connection to remote: address already in use
Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
N4POW-10   K1MLN-0    ax0     LISTENING    006/001  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    002/004  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    006/000  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    005/002  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    005/002  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    005/002  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    006/003  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    006/000  0       0     
N4POW-10   K1MLN-0    ???     LISTENING    005/002  0       0     
$ sudo ifconfig ax0
ax0: flags=67<UP,BROADCAST,RUNNING>  mtu 255
        ax25 K1MLN-9  txqueuelen 10  (AMPR AX.25)
        RX packets 81  bytes 6678 (6.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 49  bytes 1397 (1.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I have no similar problems when using simply axcall. I can connect and reconnect all day, even after Pat gets into this bad state. But Pat will not work.

Restarting Pat doesn't resolve it - I have to restart everything all the way up to Direwolf.

Pat Version 0.12.1-2+b1
Direwolf Version 1.6
$ uname -rv
5.17.5-surface #3 SMP PREEMPT Sat May 7 16:37:05 UTC 2022
(A custom kernel for Surface Laptops running Debian)

@martinhpedersen
Copy link
Member

I've been trying to reproduce this today running two Pat instances (P2P) (develop builds) with older kernel versions:

  • Linux node-a 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l GNU/Linux
  • Linux node-b 4.9.0-18-686-pae #1 SMP Debian 4.9.303-1 (2022-03-07) i686 GNU/Linux

The connection teardown works as intended, with no dangling connections. This leads me to believe that this is either a bug in recent Linux kernels or has already been resolved in recent commits.

Can you please try building the develop branch and see if you're able to reproduce the issue?

Thanks!

@dranch
Copy link

dranch commented Jun 2, 2022

This specific kernel bug (others) has finally been getting some review. Please see https://www.spinics.net/lists/linux-hams/ for some of the archives. Specifically, here is the thread on the stale AX.25 session issue: https://www.spinics.net/lists/linux-hams/msg04952.html . Once a viable fix is made available aligning to all the other fixes that are in flight, it will take some time (months / quarters) until a standard Linux distribution might publish them for enduser consumption

@programmin1
Copy link

Same here with latest Pat and Linux 5.15.0-37-generic #39-Ubuntu.
I have to unplug and plug it in again and use my window picker to call up AX25 on the device again:
https://github.com/programmin1/Pat-Window/

@dranch
Copy link

dranch commented Jun 13, 2022

Not sure what you mean by "unplug/plug" but an ugly work around is to do:
ifconfig ax0 down
killall kissattach
and then bring the stack back up again. This issue doesn't always happen and it's not clear why it does/doesn't occur but I can say that it does NOT happen to me on u20.04 running 5.13.0-40-generic. Anyway.. several fixes have made it into the kernel but I don't when they will make it to specific distro kernel packages just yet.

@n7nix
Copy link

n7nix commented Jun 13, 2022

The symptom described by Tyler-2 is the symptom fixed for kernel 4.2.0 in June of 2016. See submission here: AX.25: Close socket connection on session completion. @Tyler-2 which Linux kernel are you using?

@Tyler-2
Copy link
Author

Tyler-2 commented Jun 13, 2022

Linux 5.17.9-surface #1 SMP PREEMPT Sat May 21 02:11:54 UTC 2022
from
https://github.com/linux-surface/linux-surface

I don't know how to check the status of that patch in this custom kernel but naturally I assume it's in there.

@nx2i
Copy link

nx2i commented Aug 24, 2022

I am seeing the same behavior. I am running pat v0.13 with Linux 5.4.0-124-generic #140-Ubuntu SMP Thu Aug 4 02:23:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux. This is the current version of Linux Mint. The direwolf version is 1.7.

@martinhpedersen
Copy link
Member

As this is most likely a kernel bug, I'm tagging this issue as wontfix. Keeping it open for now, so we can track progress and discuss this further.

It would be great if someone could dig into this and investigate why the bug still exists in Linux >= 5.4 if the patch was merged in 4.2.0. Maybe a new bug was introduced? 🤔 I guess a good place to start would be to verify that the patch did in fact fix the issue.

I saw similar issues back when I was running Debian Jessie (kernel 3.6). When I upgraded to Debian Stretch (kernel 4.9) the issue was resolved IIRC. Maybe a similar bug was introduced in a later kernel release? 🤷

Anyway, the local RMS packet node in my area has been QRT for a couple of years now, so it's difficult for me to investigate further right now 😞.

@martinhpedersen martinhpedersen changed the title "Unable to establish connection to remote: address already in use" after first connection Linux AX25: Unable to establish connection to remote: address already in use after first connection Sep 17, 2022
@martinhpedersen martinhpedersen changed the title Linux AX25: Unable to establish connection to remote: address already in use after first connection Linux AX25: address already in use error after first connection Sep 17, 2022
@jlrgraham
Copy link

For what it's worth, I did a custom build of Linux 5.19.9 on Debian Bullseye and this issue no longer seems to be present. It was present for me on 5.10.0-18-amd64 (the 5.10.140 dpkg in Bullseye).

@xssfox
Copy link

xssfox commented Oct 31, 2022

stumbled upon this issue while researching. I suspect this kernel patch might resolve the issue entirely https://github.com/torvalds/linux/blob/v5.19/net/ax25/af_ax25.c - so 5.19 kernels might be ok now - yet to test

@xssfox
Copy link

xssfox commented Oct 31, 2022

Just upgraded to 5.19 kernel on Ubuntu and this appears to be resolved :)

@nx2i
Copy link

nx2i commented Nov 23, 2022

I am still stuck with this issue on Linux Mint. Curiously, I have the same version of Linux running on 2 different machines:
john@john-Latitude-D630:~$ uname -a
Linux john-Latitude-D630 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux; and,
uname -a
Linux Mintbox 5.4.0-132-generic #148-Ubuntu SMP Mon Oct 17 16:02:06 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Extensive testing shows the first machine always shows the "address already in use" error, while the second machine does not.
Given the comments above, I struggle to understand this.
John

@dranch
Copy link

dranch commented Nov 23, 2022

There have been fixes in very new versions of the Linux kernel but I don't think they will be backported to the old 5.4.0 series. It's not clear what version of Mint you're running but does Mint have the HWE (hardware enablement) kernel like what Ubuntu offers? This should get you a newer kernel. For example, read this: https://forums.linuxmint.com/viewtopic.php?t=367736

@nx2i
Copy link

nx2i commented Nov 23, 2022

they are the same mint release:
cat /etc/os-release
NAME="Linux Mint"
VERSION="20.3 (Una)"

@dranch
Copy link

dranch commented Nov 23, 2022

Ok but on my u20.04 system, I'm running the follow HWE kernel:

5.15.0-53-generic #59~20.04.1-Ubuntu SMP Thu Oct 20 15:10:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@TheSkorm
Copy link

From looking at the linux kernel changes you are probably going to have a bad time with anything older than 5.19.

@nx2i
Copy link

nx2i commented Nov 23, 2022

they are the same mint release:
cat /etc/os-release
NAME="Linux Mint"
VERSION="20.3 (Una)"

@nx2i
Copy link

nx2i commented Nov 23, 2022

OK before I update the kernel, can someone please explain how the same kernel and OS-release result in different behaviors on the 2 different machines. I am having a hard time seeing the logic in this.

@TheSkorm
Copy link

I believe the issue was related to race conditions with the timers used for managing AX25 connections - so different CPUs and different workloads would have different results. On my hardware I could only get the issue to occur occasionally.

@nx2i
Copy link

nx2i commented Nov 24, 2022

I could upgrade to Mint 21 but that only has kernel 5.15. How do i get upgraded to the 5.19 version?

@richcannings
Copy link
Contributor

richcannings commented Dec 20, 2022

Workaround I ran into this issue. I now use the AGWPE branch of Pat on Linux and I have no problems. The AGWPE feature does not require kernel AX.25 support, and communicates directly direwolf. Install info is at:

#367 (comment)

Note that I had an error installing, so I used:

GOPROXY=direct GOSUMDB=off go install github.com/la5nta/pat@feature/agwpe-engine

@martinhpedersen
Copy link
Member

Closing this one now. It's a kernel issue. AGWPE support has been added, so affected users might want to try that 🙂

@martinhpedersen martinhpedersen closed this as not planned Won't fix, can't repro, duplicate, stale Jun 18, 2023
@dranch
Copy link

dranch commented Jun 18, 2023

Very new kernels (6.x) should have this issue fix now though I'm not exactly clear which specific modern kernels or older kernels have backport fixes committed.

@martinhpedersen
Copy link
Member

Thanks! That is certainly good news 😊 Debian Bookworm (current stable) has 6.1 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants