-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Description
Bug report
Bug description:
Summary:
pygettext.py fails with a UnicodeEncodeError when generating .pot files that contain Unicode characters such as emoji (e.g. ✅). This happens on Windows systems where the default encoding is cp1252, which cannot represent many Unicode symbols.
Steps to Reproduce:
- Add a translatable string with an emoji to a Python file:
_("Operation complete ✅") - Run pygettext.py to generate the .pot file:
python pygettext.py -d messages -o messages.pot your_script.py - Observe the crash:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2705' in position 1: character maps to
Cause:
The .pot file is opened using:
fp = open(output_file, 'w')
This defaults to the system encoding (cp1252 on Windows), which cannot encode emoji or other extended Unicode characters.
Suggested Fix:
Update the file-writing line in pygettext.py to explicitly use UTF-8:
fp = open(output_file, 'w', encoding='utf-8')
This ensures compatibility with all Unicode characters and aligns with modern Python practices.
Environment:
- Python version: 3.14
- OS: Windows 10
- Locale: pt_BR
- Default encoding: cp1252
Related Pull Request:
This issue appears related to PR #132244, which discusses encoding behavior in pygettext.py. However, the PR focuses on testing and does not yet address the default encoding used when writing .pot files.
CPython versions tested on:
3.14
Operating systems tested on:
Windows
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status