Description
HTMLDocument.save_html() opens the output file with open(file, "w") without specifying encoding="utf-8". On Windows, Python defaults to the system locale encoding (e.g. cp1252), which cannot represent many Unicode characters that commonly appear in HTML content.
For example, matplotlib generates SVG with U+2212 (MINUS SIGN −) in axis tick labels, which causes:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 74151: character maps to <undefined>
Location
https://github.com/posit-dev/py-htmltools/blob/main/htmltools/_core.py#L786
with open(file, "w") as f:
f.write(rendered["html"])
Suggested fix
with open(file, "w", encoding="utf-8") as f:
f.write(rendered["html"])
Since HTML files are expected to be UTF-8 (and the rendered HTML includes <meta charset="utf-8">), explicitly writing with UTF-8 encoding is the correct behavior on all platforms.
Precedent
Other Python libraries have fixed the same issue:
Reproduction
On Windows with cp1252 locale:
from htmltools import HTMLDocument, HTML
doc = HTMLDocument(HTML("<p>Temperature: −5°C</p>"), lang="en")
doc.save_html("test.html") # UnicodeEncodeError
Environment
- htmltools version: 0.6.0
- Python 3.12
- Windows 11, system encoding cp1252
Description
HTMLDocument.save_html()opens the output file withopen(file, "w")without specifyingencoding="utf-8". On Windows, Python defaults to the system locale encoding (e.g. cp1252), which cannot represent many Unicode characters that commonly appear in HTML content.For example, matplotlib generates SVG with U+2212 (MINUS SIGN
−) in axis tick labels, which causes:Location
https://github.com/posit-dev/py-htmltools/blob/main/htmltools/_core.py#L786
Suggested fix
Since HTML files are expected to be UTF-8 (and the rendered HTML includes
<meta charset="utf-8">), explicitly writing with UTF-8 encoding is the correct behavior on all platforms.Precedent
Other Python libraries have fixed the same issue:
Reproduction
On Windows with cp1252 locale:
Environment