Skip to content

save_html() fails on Windows with UnicodeEncodeError for non-ASCII characters #102

@jooyoungseo

Description

@jooyoungseo

Description

HTMLDocument.save_html() opens the output file with open(file, "w") without specifying encoding="utf-8". On Windows, Python defaults to the system locale encoding (e.g. cp1252), which cannot represent many Unicode characters that commonly appear in HTML content.

For example, matplotlib generates SVG with U+2212 (MINUS SIGN ) in axis tick labels, which causes:

UnicodeEncodeError: 'charmap' codec can't encode character '\u2212' in position 74151: character maps to <undefined>

Location

https://github.com/posit-dev/py-htmltools/blob/main/htmltools/_core.py#L786

with open(file, "w") as f:
    f.write(rendered["html"])

Suggested fix

with open(file, "w", encoding="utf-8") as f:
    f.write(rendered["html"])

Since HTML files are expected to be UTF-8 (and the rendered HTML includes <meta charset="utf-8">), explicitly writing with UTF-8 encoding is the correct behavior on all platforms.

Precedent

Other Python libraries have fixed the same issue:

Reproduction

On Windows with cp1252 locale:

from htmltools import HTMLDocument, HTML

doc = HTMLDocument(HTML("<p>Temperature: −5°C</p>"), lang="en")
doc.save_html("test.html")  # UnicodeEncodeError

Environment

  • htmltools version: 0.6.0
  • Python 3.12
  • Windows 11, system encoding cp1252

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions