Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket.getfqdn() UnicodeDecodeError depending on LANG variable #93251

Open
cpina opened this issue May 26, 2022 · 2 comments
Open

socket.getfqdn() UnicodeDecodeError depending on LANG variable #93251

cpina opened this issue May 26, 2022 · 2 comments
Labels
3.9 only security fixes 3.10 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@cpina
Copy link

cpina commented May 26, 2022

Bug report

This code:

import locale
import socket

locale.setlocale(locale.LC_ALL, '')

socket.getfqdn()

Raise an exception if running it like this:

LANG=ru_RU.CP1251 /opt/Python-3.9.2/bin/python3 bug.py

Note the LANG. I haven't checked for which "LANG" this works or fails.

⚠️ : to exercise the problematic code (see comments for details on the problematic code path) the hostname should not be resolvable (so not in /etc/hosts, not resolvable via DNS or other methods up to /etc/nsswitch.conf hosts settings). The hostname, to reproduce the problem, can be changed on Linux via sudo hostname something-that-does-not-exist.

Traceback (most recent call last):
  File "/root/t/prova.py", line 7, in <module>
    socket.getfqdn()
  File "/opt/Python-3.9.2/lib/python3.9/socket.py", line 791, in getfqdn
    hostname, aliases, ipaddrs = gethostbyaddr(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 0: invalid continuation byte

Your environment

Tested this on a Debian 11 bullseye with the the following Python interpreters:

  • Packaged Python 3.9.2
  • Compiled from source Python 3.9.2
  • Compiled from source Python 3.9.13
  • Compiled from source Python 3.10.4

I've encountered this bug in two independent Debian installations (with different locale settings) and in a CI system (also Debian based but unrelated settings).

Only tested in x64 systems.

@cpina cpina added the type-bug An unexpected behavior, bug, or error label May 26, 2022
@cpina
Copy link
Author

cpina commented May 26, 2022

In case that it helps, the stacktrace before hits the line:

errmsg = "invalid continuation byte";

In Objects/unicodeobject.c function unicode_decode_utf8.

Backtrace:

#0  unicode_decode_utf8 (s=0x555555a2e8e0 "����������� ��� ��� ������", size=26, error_handler=_Py_ERROR_UNKNOWN, errors=0x0, consumed=0x0)
    at Objects/unicodeobject.c:5069
#1  0x00005555556348c4 in PyUnicode_DecodeUTF8Stateful (s=0x555555a2e8e0 "����������� ��� ��� ������", size=26, errors=0x0, consumed=0x0)
    at Objects/unicodeobject.c:5141
#2  0x0000555555629dae in PyUnicode_FromStringAndSize (u=0x555555a2e8e0 "����������� ��� ��� ������", size=26) at Objects/unicodeobject.c:2267
#3  0x00005555556a0064 in do_mkvalue (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, flags=1) at Python/modsupport.c:423
#4  0x000055555569f5cd in do_mktuple (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, endchar=41 ')', n=2, flags=1) at Python/modsupport.c:264
#5  0x000055555569f737 in do_mkvalue (p_format=0x7fffffff73b8, p_va=0x7fffffff73a0, flags=1) at Python/modsupport.c:289
#6  0x00005555556a06ac in va_build_value (format=0x7ffff79bf942 "(is)", va=0x7fffffff73f0, flags=1) at Python/modsupport.c:562
#7  0x00005555556a05b0 in _Py_BuildValue_SizeT (format=0x7ffff79bf942 "(is)") at Python/modsupport.c:530
#8  0x00007ffff79b3a91 in set_gaierror (error=-2) at /root/python/Python-3.9.2/Modules/socketmodule.c:680
#9  0x00007ffff79b43b2 in setipaddr (name=0x7ffff7b6bb90 "reprotest-capture-hostname", addr_ret=0x7fffffffb600, addr_ret_size=128, af=0)
    at /root/python/Python-3.9.2/Modules/socketmodule.c:1211
#10 0x00007ffff79bada7 in socket_gethostbyaddr (self=0x7ffff79de220, args=0x7ffff7b64940) at /root/python/Python-3.9.2/Modules/socketmodule.c:5822

Ignore the line numbers - In some files I had added some debug information.

I wonder (but I cannot reproduce outside Python) if the handling of the result of set_gaierror is what is causing errors depending on the locale settings.

@cpina
Copy link
Author

cpina commented May 26, 2022

If it helps, gai_strerror is called (in set_gaierror) and might return a localised error:

root@reprotest-capture-hostname:~/t# cat bug.py 

import locale
import socket

locale.setlocale(locale.LC_ALL, '')

print('test')

socket.getfqdn()
root@reprotest-capture-hostname:~/t# ./a.out 
test
gai_strerror: Name or service not known
root@reprotest-capture-hostname:~/t# LANG=ru_RU.CP1251 ./a.out 
test
gai_strerror: ����������� ��� ��� ������
root@reprotest-capture-hostname:~/t# 

In set_gaierror there is:

    v = Py_BuildValue("(is)", error, gai_strerror(error));

With the russian locale (and I suspect that other locales) it seems that when using PyUnicode_FromString via Py_BuildValue it cannot create the PyUnicode (see the original post) and it all fails.

Hopefully this helps to find the error.

@AA-Turner AA-Turner added topic-unicode 3.10 only security fixes 3.9 only security fixes labels May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants