Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listening asyncio UNIX socket isn't removed on close #111246

Closed
CendioOssman opened this issue Oct 24, 2023 · 14 comments · Fixed by #111483
Closed

Listening asyncio UNIX socket isn't removed on close #111246

CendioOssman opened this issue Oct 24, 2023 · 14 comments · Fixed by #111483
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@CendioOssman
Copy link
Contributor

CendioOssman commented Oct 24, 2023

Bug report

Bug description:

After creating a UNIX socket server with loop.create_unix_server() and then closing it using Server.close(), then the UNIX socket file is still left on disk.

Although asyncio itself can handle this on the next startup since #72585, it's still a bit rude (and confusing) to leave stale things around on the file system.

Callers can make sure to pair Server.close() with an os.unlink(). But it would be a lot neater if Python could automatically do this, as basically everyone should do it anyway.

CPython versions tested on:

3.9

Operating systems tested on:

Linux

Linked PRs

@CendioOssman CendioOssman added the type-bug An unexpected behavior, bug, or error label Oct 24, 2023
@gvanrossum
Copy link
Member

@CendioOssman I'm pretty ignorant when it comes to such matters. For regular IP sockets this doesn't seem to be a problem (after closing one, the kernel sees to its proper disposal), so this seems unique to UNIX domain sockets. Can you point to some well-respected document explaining why it's best practice to explicitly remove a UNIX domain socket after closing it? I would also be concerned about removing the socket too soon, when multiple processes are listening on the same address (this is possible for IP sockets, I don't know if it works for UNIX sockets).

@CendioOssman
Copy link
Contributor Author

@CendioOssman I'm pretty ignorant when it comes to such matters. For regular IP sockets this doesn't seem to be a problem (after closing one, the kernel sees to its proper disposal), so this seems unique to UNIX domain sockets. Can you point to some well-respected document explaining why it's best practice to explicitly remove a UNIX domain socket after closing it?

I'm not sure if there is a forum where such a document could appear. This is more a de facto situation than de jure. There is this in the unix(7) man page though:

Binding to a socket with a filename creates a socket in the filesystem that must be deleted by the caller when it is no longer needed (using unlink(2)).

Some examples of very major software doing this, though:

Xorg: https://gitlab.freedesktop.org/xorg/lib/libxtrans/-/blob/master/Xtranslcl.c#L1309
Wayland: https://gitlab.freedesktop.org/wayland/wayland/-/blob/main/src/wayland-server.c#L1166
wlroots: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/xwayland/sockets.c#L154
PulseAudio: https://gitlab.freedesktop.org/pulseaudio/pulseaudio/-/blob/master/src/pulsecore/socket-server.c#L516

An example of a major software not doing it is dbus-daemon, but it normally uses abstract sockets or uses temporary directories, so it might not have considered much of a need.

I would also be concerned about removing the socket too soon, when multiple processes are listening on the same address (this is possible for IP sockets, I don't know if it works for UNIX sockets).

I'm not sure, to be honest, but this thread suggests that it isn't:

https://stackoverflow.com/questions/15716302/so-reuseaddr-and-af-unix

It also contains another quote as to the responsibility of deleting the file:

The system does not delete the file. It should therefore be deleted by the process when it is no longer required.

@CendioOssman
Copy link
Contributor Author

A technical note here is that it is pretty easy to steal UNIX socket addresses, unlike TCP/IP addresses. You just unlink() the existing file and do your own bind().

None of the above examples consider this scenario, but there might be some odd cases that consider this normal and sane behaviour.

For some handling of this, Python could do a stat() of the file before unlink() and make sure the inode number is still the expected one. That isn't completely race free, but is hopefully such a short race that it isn't a problem in practice.

A second technical issue if the server was created using the sock argument. The path of the socket isn't as readily available at that point. But getsockname() should be able to fetch it.

@gvanrossum
Copy link
Member

Okay, you've convinced me. Can you submit a PR? This would be new behavior in 3.13, require tests and docs, and a brief "what's new" entry.

@CendioOssman
Copy link
Contributor Author

I can have a go. I've never run the tests for CPython though. I hope this is a good guide:

https://devguide.python.org/testing/run-write-tests/index.html

@gvanrossum
Copy link
Member

Yeah, that’s the ultimate guide. I often run

python-m test test_asyncio

or even

python-m test test_asyncio.

The further flags -v and -m are also handy.

CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
Try to clean up the socket file we create so we don't add unused noise
to the file system.
CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
We only want to clean up *our* socket, so try to determine if we still
own this address or if something else has replaced it.
CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
Allow the implicit cleanup to be disabled in case there are some odd
corner cases where this causes issues.
@CendioOssman
Copy link
Contributor Author

I've now opened #111483. Please have a look and see what you think.

CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
Try to clean up the socket file we create so we don't add unused noise
to the file system.
CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
We only want to clean up *our* socket, so try to determine if we still
own this address or if something else has replaced it.
CendioOssman added a commit to CendioOssman/cpython that referenced this issue Oct 30, 2023
Allow the implicit cleanup to be disabled in case there are some odd
corner cases where this causes issues.
gvanrossum pushed a commit that referenced this issue Nov 8, 2023
Try to clean up the socket file we create so we don't add unused noise to the file system.
@ronf
Copy link

ronf commented Dec 23, 2023

Following up on my mention above, I'm a bit worried that this change will break backward compatibility with earlier versions of Python. Any code out there which creates UNIX domain sockets and manually cleans them up (which should be quite common) will now get an error back when trying to remove/unlink the socket. Unless the code intentionally ignores the failure or uses some other form of cleanup like TemporaryDirectory which doesn't reference the socket name explicitly, this change will lead to failures.

ronf added a commit to ronf/asyncssh that referenced this issue Dec 23, 2023
This commit adds guards around code which cleans up UNIX domain
sockets, to protect against a change proposed at
python/cpython#111246
which would cause the socket to clean itself up on close.
@gvanrossum
Copy link
Member

Thanks for the feedback. It reminds me of https://xkcd.com/1172/. In the issue you mention only test were affected -- I agree it's annoying, but the fix is straightforward and can be backported with no problem. I think in this case it's better to Do The Right Thing (TM), i.e. remove the socket, which ideally this API should have done from the start, rather than forever being stuck in a slightly buggy state, where the API leaves behind random junk.

@ronf
Copy link

ronf commented Dec 23, 2023

Thanks, Guido. I agree that it's much cleaner to automatically remove the socket from the filesystem on close, and it would have been great if it had done so from the start. This change will eventually make things better, at least once people no longer need to support Python 3.12 and earlier.

Is there any mechanism in the changelog to call out potentially breaking changes that need attention on upgrade? While in my case it only broke test code, any real code using UNIX domain sockets probably would likely have triggered the same kind of error my test code did. As you said, it's not hard to fix, but it will require a code change in the caller in most cases if the code was previously doing proper cleanup.

@gvanrossum
Copy link
Member

Is there any mechanism in the changelog to call out potentially breaking changes that need attention on upgrade?

I'm not aware of something like that. It might be a useful idea though -- you could start a thread on discuss.python.org about it. (Better wait until January, folks are checking out for the holidays.)

@ronf
Copy link

ronf commented Dec 24, 2023

Thanks - will do. Happy holidays!

@CendioOssman
Copy link
Contributor Author

For reference, we ended up with the following to support all Python versions:

            server.close()
            try:
                os.unlink(addr)
            except FileNotFoundError:
                pass
            except OSError as e:
                log.error('Error removing socket %s: %s' % (addr, e))    

If the os.unlink() call is before the close() call, you wouldn't need to do any special handling. It would give you a slightly different race, though.

@ronf
Copy link

ronf commented Dec 27, 2023

Yeah - I did something similar here in asyncssh, to avoid the application code triggering a failure if it finds that the UNIX domain socket is already removed.

aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Try to clean up the socket file we create so we don't add unused noise to the file system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants