Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in a docker container on host with ipv6 disabled when using getaddrinfo from socket.py #100795

Closed
ptempier opened this issue Jan 6, 2023 · 15 comments
Assignees
Labels
pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ptempier
Copy link

ptempier commented Jan 6, 2023

Bug report

I did open an issue with fail2ban, but apparently the issue is in python itself.
fail2ban/fail2ban#3438

fail2ban-python -c 'from fail2ban.server.ipdns import DNSUtils; print(DNSUtils.dnsToIp("fail2ban_01"))'

fail2ban_01 is the container name and its hostname

I am sorry , i am not a python dev, so at this moment, the best i can provide as a minimal test is the backtrace.
It will show exactly which function was called, with which paramater, from which file thr function was taken from.

Your environment

  • CPython versions tested on: python3.9-3.9.16-1.el9.x86_64
  • Operating system and architecture: x86_64 vm on a dell server
  • Container inside that vm, run with docker 20.10.22

OS of the vm centos stream 9
OS of the container centos stream 9

ipv6 disabled with net.ipv6.conf.all.disable_ipv6 = 1

Linked PRs

@ptempier ptempier added the type-bug An unexpected behavior, bug, or error label Jan 6, 2023
@AlexWaygood AlexWaygood added type-crash A hard crash of the interpreter, possibly with a core dump and removed type-bug An unexpected behavior, bug, or error labels Jan 6, 2023
@sebres
Copy link
Contributor

sebres commented Jan 9, 2023

FWIW (as further discussed on IRC), even disabling it in kernel by start doesn't avoid socket.getaddrinfo from segfaulting.

@sebres
Copy link
Contributor

sebres commented Jan 10, 2023

Since fail2ban will circumvent that now and in order to provide pure python test case without external refs:

  • bash:
python3 -c "`printf "import socket\nfor n in ['localhost', socket.gethostname()]:\n\ttry:\n\t\tprint(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))\n\texcept:\n\t\tprint(n, 'NA')"`"
# or:
python3 -c "`printf "import socket, sys\nfor n in sys.argv[1].split(' '):\n\ttry:\n\t\tprint(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))\n\texcept:\n\t\tprint(n, 'NA')"`" "localhost `hostname`"
  • python:
import socket
for n in ['localhost', socket.gethostname()]:
        try:
                print(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))
        except:
                print(n, 'NA')

(not tested with disabled IPv6, but doing almost the same than initial example)

@gpshead
Copy link
Member

gpshead commented Jan 13, 2023

I am unable to reproduce this on Ubuntu 22.04, Linux kernel 5.15, x86_64:

greg@bluesteel:~$ sudo sysctl net.ipv6.conf.all.disable_ipv6=1
[sudo] password for greg: 
net.ipv6.conf.all.disable_ipv6 = 1
greg@bluesteel:~$ ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether a8:a1:59:50:93:61 brd ff:ff:ff:ff:ff:ff
    inet 192.168.86.60/24 metric 100 brd 192.168.86.255 scope global dynamic enp8s0
       valid_lft 11999sec preferred_lft 11999sec
greg@bluesteel:~$ python3
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> for n in ['localhost', socket.gethostname()]:
...         try:
...                 print(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))
...         except:
...                 print(n, 'NA')
... 
localhost NA
bluesteel NA
>>> 
greg@bluesteel:~$ oss/b/python 
Python 3.12.0a3+ (heads/multiprocessing/deprecate-fork-default-dirty:fc64626d03, Dec 31 2022, 06:4) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> for n in ['localhost', socket.gethostname()]:
...         try:
...                 print(n, socket.getaddrinfo(n, None, socket.AF_INET6, 0, socket.IPPROTO_TCP))
...         except:
...                 print(n, 'NA')
... 
localhost NA
bluesteel NA
>>> 

@gpshead
Copy link
Member

gpshead commented Jan 13, 2023

If you have an environment where you can reproduce this, compile your own python using configure --with-pydebug and reproduce it under that and provide the stack trace from the simplified code. Note that the stack trace from your fail2ban issue suggests the crash happens within glibc library itself: (CentOS 9 presumably uses glibc 2.34 as does RHEL 9)

Program received signal SIGSEGV, Segmentation fault.
0x00007f29c19f24a3 in gaih_inet (name=<optimized out>, name@entry=0x7f29c1683d40 "fail2ban_01", service=service@entry=0x0, req=req@entry=0x7ffd53933320, pai=pai@entry=0x7ffd53932d88, naddrs=naddrs@entry=0x7ffd53932d84, tmpbuf=tmpbuf@entry=0x7ffd53932e80)
    at ../sysdeps/posix/getaddrinfo.c:933
933	      if (at->family == AF_UNSPEC)
(gdb) 
(gdb) backtrace
#0  0x00007f29c19f24a3 in gaih_inet (name=<optimized out>, name@entry=0x7f29c1683d40 "fail2ban_01", service=service@entry=0x0, req=req@entry=0x7ffd53933320, pai=pai@entry=0x7ffd53932d88, naddrs=naddrs@entry=0x7ffd53932d84, tmpbuf=tmpbuf@entry=0x7ffd53932e80)
    at ../sysdeps/posix/getaddrinfo.c:933
#1  0x00007f29c19f29c9 in __GI_getaddrinfo (name=<optimized out>, name@entry=0x7f29c1683d40 "fail2ban_01", service=<optimized out>, service@entry=0x0, hints=hints@entry=0x7ffd53933320, pai=pai@entry=0x7ffd53933318) at ../sysdeps/posix/getaddrinfo.c:2240
#2  0x00007f29c148ceab in socket_getaddrinfo (self=<optimized out>, args=<optimized out>, kwargs=<optimized out>) at /usr/src/debug/python3.9-3.9.16-1.el9.x86_64/Modules/socketmodule.c:6578
...

the crash is listed as coming from https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/posix/getaddrinfo.c#L933 which does not make sense code wise... (but debug information on optimized library will have questionable accuracy). regardless that is squarely within the guts of glibc supposedly accessing memory that it allocated itself earlier and has already successfully accessed.

This suggests you've got something else going on with your system.

Unless someone else can reproduce this on a system other than yours, I don't think there is a Python issue.

You can see what Python's socket.getaddrinfo() code does here: https://github.com/python/cpython/blob/3.9/Modules/socketmodule.c#L6501

@gpshead gpshead added the pending The issue will be closed if no feedback is provided label Jan 13, 2023
@sebres
Copy link
Contributor

sebres commented Jan 13, 2023

Although I agree about gaih_inet and getaddrinfo (I also don't see something wrong there), I see it in python's socket.getaddrinfo() probably.
Indirectly the issue may be an aftereffect of inappropriate memory free (e. g. double freeing), which may happen previously.
And it'd be hard reproducible for example due to threaded/timing issue.

After failure it'd go to err in line 6583:

cpython/Modules/socketmodule.c

Lines 6578 to 6584 in 5ef90ee

error = getaddrinfo(hptr, pptr, &hints, &res0);
Py_END_ALLOW_THREADS
RELEASE_GETADDRINFO_LOCK /* see comment in setipaddr() */
if (error) {
set_gaierror(error);
goto err;
}

And hereafter it could cause free of res0 in line 6617, which is unexpected, because you can't free it after failed invocation of getaddrinfo, also if occasionally it becomes not NULL for some reason inside the call:

cpython/Modules/socketmodule.c

Lines 6613 to 6618 in 5ef90ee

err:
Py_XDECREF(all);
Py_XDECREF(idna);
if (res0)
freeaddrinfo(res0);
return (PyObject *)NULL;

Because there are other exits on error code pieces where res0 should be freed, the fix may look like this (that'd avoid freeaddrinfo implicitely after failed getaddrinfo only):

     error = getaddrinfo(hptr, pptr, &hints, &res0);
...
     if (error) {
+        res0 = NULL;
         set_gaierror(error);
         goto err;
     }

There are yet another places with similar condition, so I'll provide a pull request fixing all that cases.

sebres added a commit to sebres/cpython that referenced this issue Jan 13, 2023
fixes segfault pythongh-100795 - avoid unexpected `freeaddrinfo` if `res` becomes not NULL during invocation of `getaddrinfo` if it fails
@ptempier
Copy link
Author

Please ping me if you need more tests.

@sebres
Copy link
Contributor

sebres commented Jan 16, 2023

Please ping me if you need more tests.

Well, the fix in #101010 is pretty simple, it'd be nice if you could checkout it (or apply the patch) and build python from source and test it (so one could confirm the fix works, but I'm sure it does).

@sebres
Copy link
Contributor

sebres commented Jan 21, 2023

Since patch branch is rebased to main now, if someone (@ptempier) wanted to test it against 3.9 or for later backport, I saved original patch in fix-gh-100795--3.9-based branch.

arhadthedev pushed a commit to arhadthedev/cpython that referenced this issue Jan 21, 2023
fixes segfault pythongh-100795 - avoid unexpected `freeaddrinfo` if `res` becomes not NULL during invocation of `getaddrinfo` if it fails
@gpshead
Copy link
Member

gpshead commented Jan 22, 2023

It would be very unfortunate if any getaddrinfo implementation modifies the res output parameter before knowing if they've encountered and error and internally cleaning up intermediates and freeing things before an error return without also setting their modified output back to NULL.

Q: What does this?

glibc in particular looks like it should never be causing this problem, the output pointer parameter is only assigned to on a success return value. https://github.com/bminor/glibc/blob/glibc-2.34/sysdeps/posix/getaddrinfo.c#L2470

So it'll never overwrite the NULL and trigger logic leading to an errant freeaddrinfo()

This exact behavior and expectation is underspecified - https://pubs.opengroup.org/onlinepubs/009619199/getad.htm omits direct info about the output pointer parameter on error via "Upon successful return of getaddrinfo(), the location to which res points refers to a linked list of ..." without saying if the output parameter may have been modified and still need freeing upon unsuccessful error return.

Examining other language VM projects that use libc getaddrinfo, I see some that do the same as the proposed PR: never touching the output pointer or calling freeaddrinfo if an error was returned. But others do call freeaddrinfo when res is not NULL regardless of error such as https://cs.github.com/openjdk/jdk11u/blob/50b3039415616831fdd96c36df0a42f00fea60ea/src/java.base/unix/native/libnet/Inet6AddressImpl.c#L248. That code did not used to, it was changed to do it in 2017 in openjdk/jdk11u@c8776e9.

So I'm not convinced the PR is actually good, it seems just as likely to lead to a memory leak on all getaddrinfo error returns. But well behaved implementations will work either way because they don't populate the pointer on error.

kumaraditya303 pushed a commit that referenced this issue Jan 22, 2023
…#101220)

Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 22, 2023
…rinfo` (pythonGH-101220)

(cherry picked from commit 5f08fe4)

Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 22, 2023
…rinfo` (pythonGH-101220)

(cherry picked from commit 5f08fe4)

Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
kumaraditya303 added a commit to kumaraditya303/cpython that referenced this issue Jan 22, 2023
@kumaraditya303 kumaraditya303 removed the pending The issue will be closed if no feedback is provided label Jan 22, 2023
@gpshead
Copy link
Member

gpshead commented Jan 22, 2023

looking over socketmodule.c, our code is inconsistent with all of the uses of getaddrinfo(), many of them do not check res and call freeaddrinfo() on error. The two methods the PR tried to change do. But I'm not convinced that is a problem.

Please ping me if you need more tests.

@ptempier if you've got a simple way for any of us to reproduce this (not running fail2ban), that'd be helpful. Via code inspection I don't think it should matter one way or another, but explicitly overwriting a returned pointer with NULL has an antipattern code smell to it.

@gpshead gpshead added the pending The issue will be closed if no feedback is provided label Jan 22, 2023
kumaraditya303 added a commit that referenced this issue Jan 22, 2023
…ddrinfo` (#101220)" (#101238)

Revert "gh-100795: avoid unexpected `freeaddrinfo` after failed `getaddrinfo` (#101220)"

This reverts commit 5f08fe4.
@sebres
Copy link
Contributor

sebres commented Jan 22, 2023

uses of getaddrinfo(), many of them do not check res and call freeaddrinfo() on error.

No, I found only 2 (both are in PR). Another code pieces calling freeaddrinfo() on many errors, but not after call of getaddrinfo() (only if it was successful).

explicitly overwriting a returned pointer with NULL has an antipattern code smell to it.

Antipattern? The out pointer may or not may contain any value after failed attempt of getaddrinfo(). But you definitely cannot free something (after error), because this is the job needs to be done inside invoked function.
This is default pattern in C (no matter python related or not)… more or less the matter of responsibility.

@sebres
Copy link
Contributor

sebres commented Jan 22, 2023

By the way, see https://pubs.opengroup.org/onlinepubs/9699919799/functions/freeaddrinfo.html
Also the example in specification shows clearly and explicitly that freeaddrinfo() being executed only if getaddrinfo() succeeds.
In the same way many code pieces in python library do return -1 if getaddrinfo() fails, excepting 2 cases from PR. And one of them is even mentioned in the PR:

cpython/Modules/socketmodule.c

Lines 1093 to 1095 in 5ef90ee

if (error) {
set_gaierror(error);
return -1;

Also note https://stackoverflow.com/a/24498271 (and I guess one'd find more, just it simply first I found about the subject):
you are calling freeaddrinfo() even if getaddrinfo() fails. Don't do that.

Anyway this is a common pattern in C not to call free for an output pointer from a function if it fails, unless it is not explicitly specified in its documentation.

gpshead added a commit to gpshead/cpython that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
@gpshead
Copy link
Member

gpshead commented Jan 23, 2023

TL;DR - okay I'm redoing your PR.

Worst case it leads to a memory leak on errors on some unknown implementation (not glibc 2.34), at the cost of preventing a double free on some other unknown libc getaddrinfo implementation. I doubt it'll ever come up again, no well written getaddrinfo implementation should fill in the output pointer until it knows it is returning success.

As far as I can tell, the bug leading to the crash of the original reporter is likely elsewhere in their system and this isn't the root cause.


The out pointer may or not may contain any value after failed attempt of getaddrinfo().

Nothing I have found specifies this with any authority. It cannot be inferred from a statement about "upon success this pointer is valid". That statement does not logically imply "upon returning an error the pointer is invalid". A contrapositive statement like that cannot be assumed logically valid.

The example code in the standard doesn't clarify it either because the error paths are calling exit which is situation people normally don't bother to free anything in so I won't draw any conclusion from that, it's another omission of direct information. Stack overflow, while useful, isn't authoritative information.

I agree a common nice pattern is to not leave cleanup to the caller on when you return an error, but C leaves everything up to code authors so it must be explicitly specified one way or another. It wasn't, so we're left making assumptions one way or the other and looking within implementations.

@gpshead gpshead self-assigned this Jan 23, 2023
@sebres
Copy link
Contributor

sebres commented Jan 23, 2023

Worst case it leads to a memory leak on errors on some unknown implementation

In this case one'd already notice such leaks (considering all code pieces doing return after failed invocation).
And having an appropriate tests (todo), finding of leaks is basically a lot easier (e. g. mit purify, valgrind, sanitizer, mem-debug build) than an heisenbug like that.
Let alone the leak (which is rare due to error case) is several times better than a segfault followed by crash.

A contrapositive statement like that cannot be assumed logically valid.

Sure, but is not that a common practice in C libraries? E. g. general responsibility of libraries implies that.

It wasn't, so we're left making assumptions one way or the other and looking within implementations.

Partially agree, just I'd still expect if it is different to common pattern, it must be explicitly documented this way in library specification (what we don't see).

gpshead added a commit that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

(cherry picked from commit b724ac2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

(cherry picked from commit b724ac2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
@gpshead
Copy link
Member

gpshead commented Jan 23, 2023

agreed, a potential for an unlikely leak is better than an unlikely heisenbug.

@gpshead gpshead closed this as completed Jan 23, 2023
miss-islington added a commit that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

(cherry picked from commit b724ac2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
miss-islington added a commit that referenced this issue Jan 23, 2023
When getaddrinfo returns an error, the output pointer is in an unknown state
Don't call freeaddrinfo on it.  See the issue for discussion and details with
links to reasoning.  _Most_ libc getaddrinfo implementations never modify the
output pointer unless they are returning success.

(cherry picked from commit b724ac2)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Sergey G. Brester <github@sebres.de>
Co-authored-by: Oleg Iarygin <dralife@yandex.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

5 participants