-
-
Notifications
You must be signed in to change notification settings - Fork 29.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault in a docker container on host with ipv6 disabled when using getaddrinfo from socket.py #100795
Comments
FWIW (as further discussed on IRC), even disabling it in kernel by start doesn't avoid |
Since fail2ban will circumvent that now and in order to provide pure python test case without external refs:
python3 -c "`printf "import socket\nfor n in ['localhost', socket.gethostname()]:\n\ttry:\n\t\tprint(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))\n\texcept:\n\t\tprint(n, 'NA')"`"
# or:
python3 -c "`printf "import socket, sys\nfor n in sys.argv[1].split(' '):\n\ttry:\n\t\tprint(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))\n\texcept:\n\t\tprint(n, 'NA')"`" "localhost `hostname`"
import socket
for n in ['localhost', socket.gethostname()]:
try:
print(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))
except:
print(n, 'NA') (not tested with disabled IPv6, but doing almost the same than initial example) |
I am unable to reproduce this on Ubuntu 22.04, Linux kernel 5.15, x86_64:
|
If you have an environment where you can reproduce this, compile your own python using configure --with-pydebug and reproduce it under that and provide the stack trace from the simplified code. Note that the stack trace from your fail2ban issue suggests the crash happens within glibc library itself: (CentOS 9 presumably uses glibc 2.34 as does RHEL 9)
the crash is listed as coming from https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/posix/getaddrinfo.c#L933 which does not make sense code wise... (but debug information on optimized library will have questionable accuracy). regardless that is squarely within the guts of glibc supposedly accessing memory that it allocated itself earlier and has already successfully accessed. This suggests you've got something else going on with your system. Unless someone else can reproduce this on a system other than yours, I don't think there is a Python issue. You can see what Python's socket.getaddrinfo() code does here: https://github.com/python/cpython/blob/3.9/Modules/socketmodule.c#L6501 |
Although I agree about After failure it'd go to err in line 6583: cpython/Modules/socketmodule.c Lines 6578 to 6584 in 5ef90ee
And hereafter it could cause free of res0 in line 6617, which is unexpected, because you can't free it after failed invocation of getaddrinfo , also if occasionally it becomes not NULL for some reason inside the call:cpython/Modules/socketmodule.c Lines 6613 to 6618 in 5ef90ee
Because there are other exits on error code pieces where res0 should be freed, the fix may look like this (that'd avoid freeaddrinfo implicitely after failed error = getaddrinfo(hptr, pptr, &hints, &res0);
...
if (error) {
+ res0 = NULL;
set_gaierror(error);
goto err;
} There are yet another places with similar condition, so I'll provide a pull request fixing all that cases. |
fixes segfault pythongh-100795 - avoid unexpected `freeaddrinfo` if `res` becomes not NULL during invocation of `getaddrinfo` if it fails
Please ping me if you need more tests. |
Well, the fix in #101010 is pretty simple, it'd be nice if you could checkout it (or apply the patch) and build python from source and test it (so one could confirm the fix works, but I'm sure it does). |
Since patch branch is rebased to main now, if someone (@ptempier) wanted to test it against 3.9 or for later backport, I saved original patch in fix-gh-100795--3.9-based branch. |
fixes segfault pythongh-100795 - avoid unexpected `freeaddrinfo` if `res` becomes not NULL during invocation of `getaddrinfo` if it fails
It would be very unfortunate if any Q: What does this? glibc in particular looks like it should never be causing this problem, the output pointer parameter is only assigned to on a success return value. https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/posix/getaddrinfo.c#L2470 So it'll never overwrite the NULL and trigger logic leading to an errant freeaddrinfo() This exact behavior and expectation is underspecified - https://pubs.opengroup.org/onlinepubs/009619199/getad.htm omits direct info about the output pointer parameter on error via "Upon successful return of getaddrinfo(), the location to which res points refers to a linked list of ..." without saying if the output parameter may have been modified and still need freeing upon unsuccessful error return. Examining other language VM projects that use libc So I'm not convinced the PR is actually good, it seems just as likely to lead to a memory leak on all |
…#101220) Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
…rinfo` (pythonGH-101220) (cherry picked from commit 5f08fe4) Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
…rinfo` (pythonGH-101220) (cherry picked from commit 5f08fe4) Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
… `getaddrinfo` (python#101220)" This reverts commit 5f08fe4.
looking over socketmodule.c, our code is inconsistent with all of the uses of
@ptempier if you've got a simple way for any of us to reproduce this (not running fail2ban), that'd be helpful. Via code inspection I don't think it should matter one way or another, but explicitly overwriting a returned pointer with NULL has an antipattern code smell to it. |
No, I found only 2 (both are in PR). Another code pieces calling freeaddrinfo() on many errors, but not after call of getaddrinfo() (only if it was successful).
Antipattern? The out pointer may or not may contain any value after failed attempt of getaddrinfo(). But you definitely cannot free something (after error), because this is the job needs to be done inside invoked function. |
By the way, see https://pubs.opengroup.org/onlinepubs/9699919799/functions/freeaddrinfo.html cpython/Modules/socketmodule.c Lines 1093 to 1095 in 5ef90ee
Also note https://stackoverflow.com/a/24498271 (and I guess one'd find more, just it simply first I found about the subject): Anyway this is a common pattern in C not to call free for an output pointer from a function if it fails, unless it is not explicitly specified in its documentation. |
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
TL;DR - okay I'm redoing your PR. Worst case it leads to a memory leak on errors on some unknown implementation (not glibc 2.34), at the cost of preventing a double free on some other unknown libc getaddrinfo implementation. I doubt it'll ever come up again, no well written getaddrinfo implementation should fill in the output pointer until it knows it is returning success. As far as I can tell, the bug leading to the crash of the original reporter is likely elsewhere in their system and this isn't the root cause.
Nothing I have found specifies this with any authority. It cannot be inferred from a statement about "upon success this pointer is valid". That statement does not logically imply "upon returning an error the pointer is invalid". A contrapositive statement like that cannot be assumed logically valid. The example code in the standard doesn't clarify it either because the error paths are calling exit which is situation people normally don't bother to free anything in so I won't draw any conclusion from that, it's another omission of direct information. Stack overflow, while useful, isn't authoritative information. I agree a common nice pattern is to not leave cleanup to the caller on when you return an error, but C leaves everything up to code authors so it must be explicitly specified one way or another. It wasn't, so we're left making assumptions one way or the other and looking within implementations. |
In this case one'd already notice such leaks (considering all code pieces doing
Sure, but is not that a common practice in C libraries? E. g. general responsibility of libraries implies that.
Partially agree, just I'd still expect if it is different to common pattern, it must be explicitly documented this way in library specification (what we don't see). |
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. (cherry picked from commit b724ac2) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. (cherry picked from commit b724ac2) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
agreed, a potential for an unlikely leak is better than an unlikely heisenbug. |
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. (cherry picked from commit b724ac2) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
When getaddrinfo returns an error, the output pointer is in an unknown state Don't call freeaddrinfo on it. See the issue for discussion and details with links to reasoning. _Most_ libc getaddrinfo implementations never modify the output pointer unless they are returning success. (cherry picked from commit b724ac2) Co-authored-by: Gregory P. Smith <greg@krypto.org> Co-authored-by: Sergey G. Brester <github@sebres.de> Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Bug report
I did open an issue with fail2ban, but apparently the issue is in python itself.
fail2ban/fail2ban#3438
fail2ban-python -c 'from fail2ban.server.ipdns import DNSUtils; print(DNSUtils.dnsToIp("fail2ban_01"))'
fail2ban_01 is the container name and its hostname
I am sorry , i am not a python dev, so at this moment, the best i can provide as a minimal test is the backtrace.
It will show exactly which function was called, with which paramater, from which file thr function was taken from.
Your environment
OS of the vm centos stream 9
OS of the container centos stream 9
ipv6 disabled with net.ipv6.conf.all.disable_ipv6 = 1
Linked PRs
freeaddrinfo
after failedgetaddrinfo
#101010freeaddrinfo
after failedgetaddrinfo
#101220freeaddrinfo
after failedgetaddrinfo
(GH-101220) #101236freeaddrinfo
after failedgetaddrinfo
(GH-101220) #101237freeaddrinfo
after failed `geta… #101238The text was updated successfully, but these errors were encountered: