Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3: UnicodeDecodeError on calling socket.gethostbyaddr("hostname") #1311

Closed
FlorianFieber opened this issue May 27, 2015 · 18 comments
Closed

Comments

@FlorianFieber
Copy link
Contributor

I've stumbled over an UnicodeDecodeError which backtracks to socket.getfqdn() calling socket.gethostbyaddr("hostname").

Python3:

spam@egg:~$ python3
Python 3.4.3 (default, May 26 2015, 12:25:24) 
[GCC 4.8.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getfqdn()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/socket.py", line 463, in getfqdn
    hostname, aliases, ipaddrs = gethostbyaddr(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte
>>> socket.gethostbyaddr("egg")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

Whereas it's working in Python2:

spam@egg:~$ python2
Python 2.7.9 (default, May  1 2015, 23:21:05) 
[GCC 4.8.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.getfqdn()
'spam.egg'
>>> socket.gethostbyaddr("egg")
('spam.egg', ['\xc0\xa8\x01\x01'], ['192.168.1.1'])
@commodo
Copy link
Contributor

commodo commented May 27, 2015

will look into it;
by default python3 uses unicode, whereas python2 uses ASCII as default encoding;
it could be a bug in python3

@commodo
Copy link
Contributor

commodo commented Jun 3, 2015

@FlorianFieber
Just re-tried again with latest trunk.
Seems to work now.
I wanted to update my trunk build before getting to investigate deeper.
Most likely something was fixed/changed underneath the hood (DNS stuff in OpenWRT), because Python3 was unchanged.

Mind to retry ?
I know this may be a cheap fix, in the sense that this issue might still reside somewhere in Python3.
In my case, I tested with x86 (both 32 & 64 bit versions, to make sure it's not a 32-64-issue).

If this re-appears I'll dig deeper into it, but with this one, it seems like it was either a pythnon3 bug, or something was not agree-ing with python3.

@FlorianFieber
Copy link
Contributor Author

I happened to build and flash the latest trunk with NLS (full language support) for mips yesterday. The problem persists:

spam@egg:~# python3 -c "import socket;print(socket.getfqdn())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.4/socket.py", line 463, in getfqdn
    hostname, aliases, ipaddrs = gethostbyaddr(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start byte

@commodo
Copy link
Contributor

commodo commented Jun 4, 2015

Ok.
Will check further.

Thanks

@commodo
Copy link
Contributor

commodo commented Jun 11, 2015

Right; so this is a heart-beat message (i.e. to let people know I haven't forgot about this).
Just time constraints are (well...) constraining me to allocate not so much time on this.

Last time I checked this (and did not reproduce it) was on my home network.
Managed to re-reproduce this on another network; seems it's important that gethostbyaddr() returns some value, otherwise the system's hostname will be printed.
Not sure when a fix for this will be found.

@ziadsawalha
Copy link

I was able to work around this by making sure my /etc/resolve.conf had a valid nameserver in it (see bug).

I changed my default /etc/resolve.conf from:

search lan
nameserver 127.0.0.1

to

search lan
nameserver 192.168.1.55  #< this is my ISP connection's DNS address (modified for this post)
nameserver 127.0.0.1

I'm not sure what was invalid about 127.0.0.1 as it responds if queried.

@commodo
Copy link
Contributor

commodo commented May 29, 2017

@FlorianFieber
can you retry again ? :p
[ i know it's been 2 years ]

this looks a lot like an issue that's more related to Python3 or libc
i tried real quick now ; and it looks to be running ok [or at least consistently ]
Python & Python3 have run through a few upgrades, musl as well [ I don't remember if it was uClibc that was being used in the initial report ]

root@LEDE:/#  python3 -c "import socket;print(socket.getfqdn('google.com'))"
bud02s23-in-f206.1e100.net
root@LEDE:/#  python -c "import socket;print(socket.getfqdn('google.com'))"
bud02s23-in-f206.1e100.net
root@LEDE:/#  python -c "import socket;print(socket.getfqdn())"
LEDE.lan
root@LEDE:/#  python3 -c "import socket;print(socket.getfqdn())"
LEDE.lan
root@LEDE:/#  python3 -c "import socket;print(socket.getfqdn('8.8.8.8'))"
google-public-dns-a.google.com
root@LEDE:/# ^C
root@LEDE:/#  python -c "import socket;print(socket.getfqdn('8.8.8.8'))"
google-public-dns-a.google.com

@FlorianFieber
Copy link
Contributor Author

I'm sorry for the late reply. Sadly, I don't have access to a flashable OpenWRT router right now to test it.

@commodo
Copy link
Contributor

commodo commented Sep 4, 2017

no worries :)

@CarlEdman
Copy link

CarlEdman commented Dec 23, 2017

Hi guys, just posting to confirm that Florian is not the only one experiencing this problem. Under OpenWrt Chaos Calmer 15.05.1, the gethostbyaddr() (and as a consequence getfqdn() and much more) in standard Python 3.4 from opkg will crash with a UnicodeDecodeError for almost address:

% python3.4 -c "import _socket; print(_socket.gethostbyaddr('google.com'))"
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 2: invalid start byte

The sole exception I found was for localhost:
% python3.4 -c "import _socket; print(_socket.gethostbyaddr('localhost'))"
('localhost', [], ['127.0.0.1'])

If I was to guess, the reason is that the series of bytes 127, 0, 0, 1 is valid ASCII (and hence Unicode) while any IP address which contains any byte with the high bit set may not be.

@poranje
Copy link
Contributor

poranje commented Dec 23, 2017 via email

@commodo
Copy link
Contributor

commodo commented Dec 27, 2017

@CarlEdman
I think the issue may have been fixed in Python/Python3 here:
https://bugs.python.org/issue26227
I'm not completely sure that, it's the same issue as here ; I could try.

I don't know if ChaosCalmer is supported anymore, and I can't find a build infrastructure that would build another package.
Technically, CC is using Python3 ver 3.4.3 ; ver 3.4.7 is the more recent one, and I could probably backport the patch from the Python bugs-site.

If this issue is still of interest for CC, I could try to see about fixing it.
My preference would be to upgrade to 17.01 if possible.

@commodo
Copy link
Contributor

commodo commented Dec 27, 2017

Technically, CC is using Python3 ver 3.4.3 ; ver 3.4.7 is the more recent one, and I could probably backport the patch from the Python bugs-site.

3.4.7 is the more recent from the 3.4.x series

3.6.4 is the more recent one

@CarlEdman
Copy link

@commodo Thanks for the response! I'm planning to upgrade to a new version one of these days anyways. If it requires more than minimal work on your part, please don't bother to fix my issue until I can confirm it happens on a current version of OpenWRT (hopefully after the eagerly awaited LEDE merger; that is still happening, right?).

@commodo
Copy link
Contributor

commodo commented Jan 1, 2018

First of all: Happy New Year :)

I tried upgrading 3.4.3 to 3.4.7 and applying the patch from : https://bugs.python.org/issue26227
It did not help ; it still fails like

One next thing I could try, is to upgrade to 3.6.4 and try it, but I'm not sure about upgrading it officially into the 15.05 version.
On LEDE trunk with 3.6.x [and 3.5.x] it seems to work [the issue does not reproduce], but it could also work due to a newer kernel. 15.05 is on kernel 3.18 [i think].

@commodo
Copy link
Contributor

commodo commented Jan 2, 2018

@CarlEdman
Python 3.6.4 behaves the same.

root@OpenWrt:/# python3 -V
Python 3.6.4
root@OpenWrt:/# python3 -c "import socket;print(socket.getfqdn())"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/socket.py", line 673, in getfqdn
    hostname, aliases, ipaddrs = gethostbyaddr(name)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfd in position 0: invalid start byte
root@OpenWrt:/# uname -a
Linux OpenWrt 3.18.84 #6 SMP Tue Jan 2 16:59:13 EET 2018 x86_64 GNU/Linux

I just noticed that uClibc is used vs musl in OpenWrt 15.05.
At this point, I am not sure if it's libc or kernel version.

I guess at this point, a full system upgrade is required to make this work.

@CarlEdman
Copy link

Just a quick update in case anybody is interested. I just bought a new router (Linksys WRT3200ACM, $119 refurb from Amazon) which actually has functional WiFi under OpenWRT and installed the latest stable firmware (LEDE Reboot 17.01.4 r3560-79f57e422d / LuCI lede-17.01 branch (git-18.061.17832-d092772) ) and python 3.6.0.

And now the same code works just fine:

# python3 -c "import _socket;print(_socket.gethostbyaddr('google.com'))"
('iad23s58-in-f14.1e100.net', ['iad23s58-in-f14.1e100.net'], ['172.217.7.238'])

@commodo
Copy link
Contributor

commodo commented Mar 17, 2018

ah ; 3 years later here :)
shall we close this and re-open if needed ? :)
@FlorianFieber @hnyman ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants