-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashes when parsing data which is not valid UTF-8 #48
Comments
Also, it could be valid UTF-8 but if a multi-byte character is on the border of the 1024-byte blocks, it wouldn't decode correctly. |
This issue sounds familiar to something i've worked with before. The solution is to use python's https://docs.python.org/2.7/library/codecs.html#codecs.getincrementalencoder rather than attempt to decode each individual byte block received, the byte block should be "feed" into an incremental decoder instance, and "final=True" should only be used on the final byte (such as closed/EOF), allowing for partial decoding of byte blocks as-they-are-received |
Yeah, sounds great :) Another option would be to not decode, and dump on stdout straight using |
Cool, I'll push a fix.
AFAIK this won't work on Python2. |
Verified, 953c098 is just as i described, good! |
This is a simple but contrived example. But you get the same problem with e.g. Shift-JIS content:
The text was updated successfully, but these errors were encountered: