Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode errors #117

Closed
wants to merge 2 commits into from
Closed

Conversation

@s2hc-johan
Copy link
Contributor

@s2hc-johan s2hc-johan commented Sep 18, 2015

make decode if we're using python2

@@ -131,7 +132,10 @@ def make_json(self, posts, descriptions, previewimage, output_path, lang):
recent_posts.append(entry)
data = json.dumps(recent_posts, indent=2, sort_keys=True)
with io.open(output_path, "w+", encoding="utf8") as outf:
outf.write(data)
if sys.version_info[0] != 3:

This comment has been minimized.

@Kwpolska

Kwpolska Sep 18, 2015
Member

This is not right.

On Python 2, json.dumps() might return Unicode in some cases. It doesn’t by default; yet we could use to change the default:

data = json.dumps(recent_posts, ensure_ascii=False, indent=2, sort_keys=True)
with io.open(output_path, "w+", encoding="utf-8") as outf:
    try:
        outf.write(data.decode('utf-8'))
    except AttributeError:
        outf.write(data)

This comment has been minimized.

@s2hc-johan

s2hc-johan Sep 19, 2015
Author Contributor

Absolutley we can do it like that. Don't know why the first commit is wrong though, isn't json utf-8 by design? In ptyhon2 ".decode('utf-8')" works on both string and unicode objects

This comment has been minimized.

@Kwpolska

Kwpolska Sep 19, 2015
Member

It does not work properly.

>>> u"ą".decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0105' in position 0: ordinal not in range(128)

Please switch it to the solution I recommended.

This comment has been minimized.

@Kwpolska

Kwpolska Sep 19, 2015
Member

Actually, in this case we need

except (AttributeError, UnicodeEncodeError, UnicodeDecodeError):

This comment has been minimized.

@s2hc-johan

s2hc-johan Sep 19, 2015
Author Contributor

yea, excepting more thing makes it more clear.

We don't decode random unicode, we decode output from json.dumps:

>>> import json
>>> json.dumps([u"ą"]).decode('utf-8')
u'["\\u0105"]'
>>>

This comment has been minimized.

@Kwpolska

Kwpolska Sep 19, 2015
Member

Just do it with ensure_ascii=False. More modern.

@ralsina
Copy link
Member

@ralsina ralsina commented May 2, 2018

We no longer care about python 2

@ralsina ralsina closed this May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants