-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode/string parsing error #93
Comments
You cannot parse a unicode string that contains an encoding declaration, see here. In Harvard's html the encoding is specified in the first line. You can just encode the text before passing it to extruct: Strings encoding are very confusing, this article helped me understanding the basis :) |
Thank you @Kebniss, worked perfectly! Your help (and your addition to my must-read list) is much appreciated! |
I think it's possible to make extruct work on such cases, and it should be a responsibility of the library. |
Example document:
|
Not about Unicode, but I got an issue when parsing from |
I'm trying to parse structured metadata from this url. I first executed this code on the example URL https://www.optimizesmart.com/how-to-use-open-graph-protocol/:
And works just fine. However, this block of code:
returns this error
Any idea what is going on here? Seems like an lxml.etree parsing error? Can I somehow modify
r.text
to fix this error? Any help is appreciated...The text was updated successfully, but these errors were encountered: