fix: add encoding="utf-8" to save_html() for Windows compatibility#103
Merged
cpsievert merged 7 commits intoApr 3, 2026
Merged
Conversation
save_html() opens the output file with open(file, "w") without specifying encoding="utf-8". On Windows, Python defaults to the system locale encoding (e.g. cp1252), which cannot represent many Unicode characters that commonly appear in HTML content (e.g. U+2212 MINUS SIGN from matplotlib SVGs). Since the rendered HTML already includes <meta charset="utf-8">, explicitly writing with UTF-8 encoding is the correct behavior on all platforms. Closes posit-dev#102
There was a problem hiding this comment.
Pull request overview
This PR fixes Windows UnicodeEncodeError cases by ensuring HTMLDocument.save_html() writes HTML files using UTF-8, aligning file output with the document’s <meta charset="utf-8"> and making behavior consistent across platforms.
Changes:
- Write HTML output with
open(..., encoding="utf-8")inHTMLDocument.save_html(). - Update the
saved_html()test helper to read using UTF-8. - Add a new test to verify non-ASCII characters are preserved when saving HTML.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
htmltools/_core.py |
Forces UTF-8 when writing saved HTML to avoid Windows locale encoding failures. |
tests/test_html_document.py |
Adjusts test helper to read as UTF-8 and adds a new Unicode preservation test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use context manager in saved_html() helper to prevent file handle leaks that can cause flaky TemporaryDirectory cleanup on Windows/PyPy - Make test_save_html_utf8_encoding deterministic by monkeypatching builtins.open to verify encoding="utf-8" is explicitly passed, ensuring the test catches regressions even on UTF-8-default systems
Remove `call` from type annotation `list[call]` since `unittest.mock.call` is a runtime object, not a valid type expression for pyright.
Use cast(Dict[str, Any], x) instead of a plain type annotation to satisfy pyright's reportUnknownVariableType check.
Collaborator
|
Thanks @jooyoungseo! I appreciate you taking the time to create the PR as well as addressing the If you could also update the |
Run black on files modified in this PR and add CHANGELOG entry for the save_html() UTF-8 encoding fix as requested by reviewer.
Black's latest version reformats parenthesized tuple assignments and trailing commas differently, causing the `black --check` CI step to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without an explicit target-version, black auto-detects from the runtime Python, producing different formatting on 3.9 vs 3.10+. This caused CI failures only on the Python 3.9 matrix. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
encoding="utf-8"toopen()call inHTMLDocument.save_html()to fixUnicodeEncodeErroron Windows where the default encoding (e.g. cp1252) cannot represent Unicode characters like U+2212 (MINUS SIGN)<meta charset="utf-8">, explicitly writing with UTF-8 encoding is the correct behavior on all platformsCloses #102
Test plan
test_save_html_utf8_encodingtest verifies Unicode content is preserved