-
Notifications
You must be signed in to change notification settings - Fork 322
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
wsgi: Stop replacing invalid UTF-8 on py3
For more context, see #467 and #497. On py3, urllib.parse.unquote() defaults to decoding via UTF-8 and replacing invalid UTF-8 sequences with "\N{REPLACEMENT CHARACTER}". This causes a few problems: - Since WSGI requires that bytes be decoded as Latin-1 on py3, we have to do an extra re-encode/decode cycle in encode_dance(). - Applications written for Latin-1 are broken, as there are valid Latin-1 sequences that are mangled because of the replacement. - Applications written for UTF-8 cannot differentiate between a replacement character that was intentionally sent by the client versus an invalid byte sequence. Fortunately, unquote() allows us to specify the encoding that should be used. By specifying Latin-1, we can drop encode_dance() entirely and preserve as much information from the wire as we can.
- Loading branch information
Showing
2 changed files
with
19 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters