Skip to content

fix: add encoding="utf-8" to save_html() for Windows compatibility#103

Merged
cpsievert merged 7 commits into
posit-dev:mainfrom
jooyoungseo:fix/save-html-utf8-encoding
Apr 3, 2026
Merged

fix: add encoding="utf-8" to save_html() for Windows compatibility#103
cpsievert merged 7 commits into
posit-dev:mainfrom
jooyoungseo:fix/save-html-utf8-encoding

Conversation

@jooyoungseo
Copy link
Copy Markdown
Contributor

Summary

  • Adds encoding="utf-8" to open() call in HTMLDocument.save_html() to fix UnicodeEncodeError on Windows where the default encoding (e.g. cp1252) cannot represent Unicode characters like U+2212 (MINUS SIGN)
  • Since the rendered HTML already includes <meta charset="utf-8">, explicitly writing with UTF-8 encoding is the correct behavior on all platforms
  • Adds a test to verify non-ASCII characters (U+2212 MINUS SIGN, U+00B0 DEGREE SIGN) are correctly written to HTML files

Closes #102

Test plan

  • Existing test suite passes (77/77 passed)
  • New test_save_html_utf8_encoding test verifies Unicode content is preserved
  • Manual verification on Windows with cp1252 locale

save_html() opens the output file with open(file, "w") without specifying
encoding="utf-8". On Windows, Python defaults to the system locale encoding
(e.g. cp1252), which cannot represent many Unicode characters that commonly
appear in HTML content (e.g. U+2212 MINUS SIGN from matplotlib SVGs).

Since the rendered HTML already includes <meta charset="utf-8">, explicitly
writing with UTF-8 encoding is the correct behavior on all platforms.

Closes posit-dev#102
Copilot AI review requested due to automatic review settings March 10, 2026 19:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Windows UnicodeEncodeError cases by ensuring HTMLDocument.save_html() writes HTML files using UTF-8, aligning file output with the document’s <meta charset="utf-8"> and making behavior consistent across platforms.

Changes:

  • Write HTML output with open(..., encoding="utf-8") in HTMLDocument.save_html().
  • Update the saved_html() test helper to read using UTF-8.
  • Add a new test to verify non-ASCII characters are preserved when saving HTML.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
htmltools/_core.py Forces UTF-8 when writing saved HTML to avoid Windows locale encoding failures.
tests/test_html_document.py Adjusts test helper to read as UTF-8 and adds a new Unicode preservation test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_html_document.py Outdated
Comment thread tests/test_html_document.py Outdated
- Use context manager in saved_html() helper to prevent file handle
  leaks that can cause flaky TemporaryDirectory cleanup on Windows/PyPy
- Make test_save_html_utf8_encoding deterministic by monkeypatching
  builtins.open to verify encoding="utf-8" is explicitly passed,
  ensuring the test catches regressions even on UTF-8-default systems
Remove `call` from type annotation `list[call]` since `unittest.mock.call`
is a runtime object, not a valid type expression for pyright.
Use cast(Dict[str, Any], x) instead of a plain type annotation to
satisfy pyright's reportUnknownVariableType check.
@cpsievert
Copy link
Copy Markdown
Collaborator

Thanks @jooyoungseo! I appreciate you taking the time to create the PR as well as addressing the pyright issues. Seems there are some formatting changes still that need to happen via black.

If you could also update the CHANGELOG.md, that would be much appreciated, thanks!

jooyoungseo and others added 3 commits March 11, 2026 14:10
Run black on files modified in this PR and add CHANGELOG entry for
the save_html() UTF-8 encoding fix as requested by reviewer.
Black's latest version reformats parenthesized tuple assignments and
trailing commas differently, causing the `black --check` CI step to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without an explicit target-version, black auto-detects from the runtime
Python, producing different formatting on 3.9 vs 3.10+. This caused CI
failures only on the Python 3.9 matrix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cpsievert cpsievert merged commit 18dad64 into posit-dev:main Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

save_html() fails on Windows with UnicodeEncodeError for non-ASCII characters

3 participants