Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

HTML to MD conversion fails on Unicode chars #300

Open
brsma opened this Issue Apr 20, 2012 · 1 comment

Comments

Projects
None yet
1 participant

brsma commented Apr 20, 2012

When importing an URL or HTML file, all characters outside the ASCII range are dropped during the conversion.

Desired behaviour: respect encoding and keep full Unicode range of chars.

brsma commented Apr 25, 2012

Found the culprit: the supplied (obsolete) version of readability.py defaults to ASCII:

print Document(file.read(), debug=options.verbose).summary().encode('ascii','ignore')

Ouch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment