Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select.kqueue uses invalid fd after fork #110395

Closed
sorcio opened this issue Oct 5, 2023 · 9 comments
Closed

select.kqueue uses invalid fd after fork #110395

sorcio opened this issue Oct 5, 2023 · 9 comments
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@sorcio
Copy link
Contributor

sorcio commented Oct 5, 2023

Bug report

Bug description:

select.kqueue is not aware of forks, and will try to use its file descriptor number after fork. Since kqueues are not inherited1 in children, the fd will be invalid, or it can refer to a file opened by someone else.

Reproducer for a case where select.kqueue will close the fd opened with open() on its destructor:

import os
import select

def repro():
    kq = select.kqueue()
    pid = os.fork()
    if pid == 0:
        f = open("/dev/null", "wb")
        print(f"{f.fileno()=} {kq.fileno()=}")
        del kq
        f.write(b"x")
        f.close()

repro()
Reproducer with asyncio

import asyncio
import gc
import os

def asyncio_repro():
    loop = asyncio.new_event_loop()
    loop.run_until_complete(asyncio.sleep(0))
    if os.fork() == 0:
        del loop
        with open("/dev/null", "wb") as f:
            gc.collect()
            f.write(b"x")

asyncio_repro()

This will fail with OSError: [Errno 9] Bad file descriptor when operating on f, because its fd was coincidentally closed by the loop destructor. Dropping the reference after fork does not help; it actually makes the problem worse, because the loop becomes cyclic garbage and the kqueue can be closed at a later, less predictable time.

In the asyncio example I need a bit of setup: the first loop object needs to be open at fork time2. The bug will be observable if, in the child, the loop's kqueue is closed/destructed after a different fd is opened.

I encountered this because I got test_asyncio.test_unix_events.TestFork failures while working on something unrelated (and it's been a pain to debug), not in production code. I guess it can still happen in real-world code though, because there is no proper way to dispose of a select.kqueue object in a forked process, and it's hard to debug a random EBADF that triggers only on Mac/BSD in otherwise-correct code.

I'm willing to work on a fix, if one is desired, but I'll need some guidance on the strategy. I thought select.kqueue objects can be invalidated after fork, but that would add some (small) tracking overhead so I don't know if it's acceptable.

CPython versions tested on:

3.11, 3.12, CPython main branch

Operating systems tested on:

macOS, Other

Linked PRs

Footnotes

  1. the queue itself is not available in children, and the OS will automatically close any fd referring to a kqueue after fork.

  2. this can also happen if event loop policy holds a reference to a loop, and later is dropped (maybe to create a new loop in the child). Or a Runner context is still active at fork time and is exited in the child.

@sorcio sorcio added the type-bug An unexpected behavior, bug, or error label Oct 5, 2023
@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Oct 5, 2023
@Nau56
Copy link

Nau56 commented Oct 5, 2023

import os
import select

def safe_kqueue():
kq = select.kqueue()
pid = os.fork()
if pid == 0: # This is the child process
try:
kq.close() # Close the kqueue explicitly in the child
except Exception as e:
print(f"Error in child process: {e}")
os._exit(1) # Terminate the child process if there's an error
# Now you can do other work in the child process
os._exit(0) # Exit the child process gracefully

# This is the parent process
# You can continue using kq here in the parent process
return kq

kq = safe_kqueue()

Continue using kq in the parent process

@sorcio
Copy link
Contributor Author

sorcio commented Oct 5, 2023

@Nau56 unfortunately, calling kq.close() in the child process is not safe, because the kqueue object has an invalid fd (which is the issue reported here). In the best case it will raise EBADF; in the worst case, it will succeed, but it will have closed the wrong fd. If run as the very first operation it could be risky-but-okay, but you still need some way to know which kqueues to close, and ensure the cleanup is run first.

@Nau56

This comment was marked as spam.

@sorcio
Copy link
Contributor Author

sorcio commented Oct 8, 2023

Gave it a try in #110517.

Personally I find it a bit unsavory to call posix.register_at_fork() but I was not able to find a better solution myself.

@vstinner
Copy link
Member

Since kqueues are not inherited in children, the fd will be invalid, or it can refer to a file opened by someone else.

Oh, that's bad. But should Python really attempt to be nice to every crazy use cases? Can't users take care of that themselves?

@gpshead: Should we be extra nice to users executing non-trivial code after fork()? What's the trend in Linux, do you we also do that?

@gpshead
Copy link
Member

gpshead commented Oct 31, 2023

I think this is something that can be fixed and the proposed PR makes sense. There are still people using non-threaded forking-server-stack style applications. This would apply to those, on platforms with kqueue (FreeBSD, macOS?, etc).

@gpshead
Copy link
Member

gpshead commented Oct 31, 2023

As for calling the Python register_at_fork function, while my initial thought was why not just use pthread_atfork directly... using the Python function makes more sense: It gives you the module state and it only happens when post-fork process will be executing further Python code.

@gpshead gpshead added 3.13 new features, bugs and security fixes 3.12 bugs and security fixes labels Oct 31, 2023
@sorcio
Copy link
Contributor Author

sorcio commented Nov 1, 2023

But should Python really attempt to be nice to every crazy use cases? Can't users take care of that themselves?

If you have a process with an open kqueue and it forks-without-exec you have no way to take care of that in user code, though. That was my rationale for the fix. Whether it is a "crazy" use case I can't comment. But forking-without-exec is still supported.

my initial thought was why not just use pthread_atfork directly... using the Python function makes more sense: It gives you [...]

Yeah, I went through the same thought process. It helped that there is no precedent for using pthread_atfork in the current code. The alternative is to somehow add this to the interpreter's after-fork handlers, but it seemed too invasive. I mentioned other options in the PR.

gpshead pushed a commit that referenced this issue Nov 4, 2023
Invalidate open select.kqueue instances after fork as the fd will be invalid in the child.
gpshead pushed a commit to gpshead/cpython that referenced this issue Nov 4, 2023
…110517)

Invalidate open select.kqueue instances after fork as the fd will be invalid in the child.
(cherry picked from commit a6c1c04)

Co-authored-by: Davide Rizzo <sorcio@gmail.com>
gpshead pushed a commit to gpshead/cpython that referenced this issue Nov 4, 2023
…110517)

Invalidate open select.kqueue instances after fork as the fd will be invalid in the child.
(cherry picked from commit a6c1c04)

Co-authored-by: Davide Rizzo <sorcio@gmail.com>
gpshead added a commit to gpshead/cpython that referenced this issue Nov 7, 2023
based on review from Victor Stinner.
gpshead added a commit that referenced this issue Nov 7, 2023
based on review from Victor Stinner.  I already made this edit in the 3.12 backport PR.
hugovk pushed a commit to hugovk/cpython that referenced this issue Nov 8, 2023
based on review from Victor Stinner.  I already made this edit in the 3.12 backport PR.
gpshead added a commit that referenced this issue Nov 11, 2023
…1745)

* [3.12] gh-110395: invalidate open kqueues after fork (GH-110517)

Invalidate open select.kqueue instances after fork as the fd will be invalid in the child.
(cherry picked from commit a6c1c04)

Co-authored-by: Davide Rizzo <sorcio@gmail.com>

* move assert to after the child dying

this is in `main` via https://github.com/python/cpython/pull/111816/files
@gpshead
Copy link
Member

gpshead commented Nov 11, 2023

thanks!

@gpshead gpshead closed this as completed Nov 11, 2023
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
Invalidate open select.kqueue instances after fork as the fd will be invalid in the child.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
based on review from Victor Stinner.  I already made this edit in the 3.12 backport PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants