Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000) #38

Open
michelmno opened this issue Jun 29, 2017 · 2 comments · May be fixed by #85

Comments

@michelmno
Copy link

michelmno commented Jun 29, 2017

UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000)

Reported while trying to build datrie on ppc64 architecture (Big Endian) on openSUSE as per (1)
(there is no failure for ppc64le that is Little Endian)

(1) https://build.opensuse.org/package/live_build_log/openSUSE:Factory:PowerPC/python-datrie/standard/ppc64

=== extract 
[   43s] __________________________________ test_keys ________
[   43s]
[   43s]     def test_keys():
[   43s]         trie = _trie()
[   43s]         state = datrie.State(trie)
[   43s]         it = datrie.Iterator(state)
[   43s]
[   43s]         keys = []
[   43s]         while it.next():
[   43s] >           keys.append(it.key())
[   43s]
[   43s] tests/test_iteration.py:85:
[   43s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[   43s] src/datrie.pyx:942: in datrie._TrieIterator.key (src/datrie.c:17947)
[   43s]     cpdef unicode key(self):
[   43s] src/datrie.pyx:945: in datrie._TrieIterator.key (src/datrie.c:17845)
[   43s]     return unicode_from_alpha_char(key)
[   43s] src/datrie.pyx:1111: in datrie.unicode_from_alpha_char (src/datrie.c:19975)
[   43s]     return c_str[:length].decode('utf_32_le')
[   43s] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[   43s]
[   43s] input = <read-only buffer ptr 0x100070a81b0, size 16 at 0x3fffa166fb30>
[   43s] errors = 'strict'
[   43s]
[   43s]     def decode(input, errors='strict'):
[   43s] >       return codecs.utf_32_le_decode(input, errors, True)
[   43s] E       UnicodeDecodeError: 'utf32' codec can't decode bytes in position 0-3: code point not in range(0x110000)
[   43s]
[   43s] /usr/lib64/python2.7/encodings/utf_32_le.py:11: UnicodeDecodeError
===
@fpytloun
Copy link
Contributor

fpytloun commented Apr 28, 2018

Hello,

we are seeing the same issue in Debian when bulding on big endian platforms (failing on mips, ppc, sparc, etc.):
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=897094
https://buildd.debian.org/status/package.php?p=python-datrie&suite=sid

@Aniket-Pradhan
Copy link

Aniket-Pradhan commented Nov 4, 2020

Hey!

Same issue in Fedora in the s390x architecture (big-endian platform).

A simple fix would be to identify whether the machine is little-endian or big-endian, and then decoding the string appropriately then. Something like:

import sys

if sys.byteorder == "little":
    return c_str[:length].decode('utf_32_le')
else:
    return c_str[:length].decode('utf_32_be')

EDIT: It works fine, when using this fix: https://koji.fedoraproject.org/koji/taskinfo?taskID=54903600

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants