Skip to content

pygettext.py crashes with UnicodeEncodeError when writing .pot files containing emoji #139873

@mtfreitasf

Description

@mtfreitasf

Bug report

Bug description:

Summary:
pygettext.py fails with a UnicodeEncodeError when generating .pot files that contain Unicode characters such as emoji (e.g. ✅). This happens on Windows systems where the default encoding is cp1252, which cannot represent many Unicode symbols.

Steps to Reproduce:

  • Add a translatable string with an emoji to a Python file:
    _("Operation complete ✅")
  • Run pygettext.py to generate the .pot file:
    python pygettext.py -d messages -o messages.pot your_script.py
  • Observe the crash:
    UnicodeEncodeError: 'charmap' codec can't encode character '\u2705' in position 1: character maps to

Cause:
The .pot file is opened using:
fp = open(output_file, 'w')

This defaults to the system encoding (cp1252 on Windows), which cannot encode emoji or other extended Unicode characters.

Suggested Fix:
Update the file-writing line in pygettext.py to explicitly use UTF-8:
fp = open(output_file, 'w', encoding='utf-8')

This ensures compatibility with all Unicode characters and aligns with modern Python practices.

Environment:

  • Python version: 3.14
  • OS: Windows 10
  • Locale: pt_BR
  • Default encoding: cp1252

Related Pull Request:
This issue appears related to PR #132244, which discusses encoding behavior in pygettext.py. However, the PR focuses on testing and does not yet address the default encoding used when writing .pot files.

CPython versions tested on:

3.14

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Labels

triagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions