-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle UnicodeDecodeError
#77
Comments
I'd disagree that the checker should handle that, since in most cases UnicodeDecodeErrors mean that the checker-script is kind of broken. Seeing a random checker error appear is exactly the right response to that. |
The question is would there be any situation where the checker would not raise a BrokenServiceException as a consequence of this? I would let the |
Any service that doesn't really care about unicode and displays user input may output invalid unicode output. The checker error has the advantage of us knowing that something is wrong, where a mumbleexception would rely on team complaints. |
While I do agree in general, we should probably document this in the |
Unfortunately there is no standardized way to specify thrown exceptions using type annotations, the best way would be putting this into the docstring raising the question whether people actually read that. |
How about removing illegal chars with a logger warning? That way the checker won't fail unless some later comparison fails(?) |
If we go this way I would make this an configurable (and off by default) option, otherwise this is not something one would expect from the function and sounds like something that will lead to a weird bug somewhere down the line (which will be hard to find for the checker author since they would not expect such behavior). Also, stripping bytes which are not invalid UTF-8 is neither trivial nor necessarily deterministic, If you have a start byte of 2 byte sequence and the two continuation bytes, which one do you strip? If you have a start byte of a 3 byte sequence followed by a valid 2 byte sequence (meaning there are two start bytes following each other), do you strip only the first byte or the whole three bytes? But imo even having this as a configurable option does not seem very useful, if you don't care about the invalid encoded bytes you could simply encode the UTF-8 string you are trying to find and search for bytes in bytes without ever having to deal with UTF-8 decoding issues. |
Not all string functions exist for bytes and just unicode allthethings makes it easy. But yeah lets add a warning to docstring then? |
Did you consider removing it? If you know the argument's type it can be replaced in one line (either by |
Any new ideas on this topic? Else I would close this issue, and hope for the best. |
Is So I would agree that this issue can be closed, but it might be worth considering deprecating the function altogether |
Oh, and by deprecating I mean removing, since basically all checkers were broken by the update to v2 anyways and would need to be reworked for other reasons, during which we could simply fix checkers using |
I agree, I will open a PR to remove |
Openend #108 for this. Closing this issue now. |
If we pass illegal utf-8 into
ensure_unicode
, teams should be Mumble, but a checker error occurs.The text was updated successfully, but these errors were encountered: