-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow changing difflib._file_template character encoding. #46328
Comments
When passed unicode strings, difflib.HtmlDiff.make_file and make_table |
Oops, please close this. Apparently was fixed in 2.5.1, and I'm just |
After installing 2.5.1, the UnicodeEncodeError is gone, but the charset is |
difflib._file_template is still hard-coded in py3k SVN. I'm unsure as to whether this is a feature request, a behaviour issue or not an issue at all, can someone please advise, thanks. |
I believe that charset is the standard default for html, which would make this a feature request. |
In 3.2, it is line 1629: That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly. For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as 'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'c㌳'. To check:
>>> 'c㌳'.encode().decode(encoding='Latin-1')
'cã\x8c³' To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out. |
After thinking about it more, the real problem is that the charset setting must match the chars used and how they re encoded, so no one setting is right for all uses. An alternative to changing the default in existing versions is to at least document what it is and explain how to work around it with .replace -- for instance output.replace('ISO-8859-1', 'utf-8'). I agree that adding a parameter (charset=xxx) is a new feature. |
I haven't looked at the code, but if an HTML page is generated it should probably be updated to use HTML5 and <meta charset="utf-8">. |
Attaching two patches: bpo-2052.diff adds a "charset" keyword argument to HtmlDiff.make_file(). issue2052_html5.diff also adds a "charset" keyword argument to HtmlDiff.make_file() and updates the markup of HtmlDiff() to HTML5. I tested it with Firefox 29 and Chrome 34. |
Attaching a new version of issue2052_html5.diff. Changes:
|
May be updating the markup to HTML5 should be different issue. issue2052_html5_v2.diff not only adds charset in HTML5 format, it totally changes the template. This definitely a separate issue. |
Here is an updated patch. Thanks for the review, Serhiy. I will open a new issue for the HTML 5 part of the patch. |
LGTM |
New changeset e058423d3ca4 by Berker Peksag in branch 'default': |
Thanks Serhiy. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: