Skip to content

Fix a Windows issue where Python codepage would be reverted from unicode to cp1252#26972

Merged
juj merged 1 commit into
emscripten-core:mainfrom
juj:fix_windows_python_cp1252
May 21, 2026
Merged

Fix a Windows issue where Python codepage would be reverted from unicode to cp1252#26972
juj merged 1 commit into
emscripten-core:mainfrom
juj:fix_windows_python_cp1252

Conversation

@juj
Copy link
Copy Markdown
Collaborator

@juj juj commented May 16, 2026

Fix a Windows issue where Python codepage would be reverted from unicode to cp1252, if stdout/stderr was being redirected to a file.

To fix this issue, pass the -X utf8 command line parameter whenever python -E flag is being used.

Fixes test other.test_wasm_sourcemap_relative_paths on Windows when build is driven by Buildbot CI. See buildbot/buildbot#9047 for related info.

…ode to cp1252, if stdout/stderr was being redirected to a file. Fixes test other.test_wasm_sourcemap_relative_paths on Windows when build is driven by Buildbot CI. See buildbot/buildbot#9047 for related info.
@juj juj enabled auto-merge (squash) May 16, 2026 21:15
@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented May 17, 2026

Is this because you are settings PYTHONUTF8 and -E is then ignoring it?

@juj
Copy link
Copy Markdown
Collaborator Author

juj commented May 17, 2026

No. I do not set PYTHONUTF8=1 on my CI.

I first tried setting PYTHONUTF8=1 as the fix, but it did nothing. (which is expected, since -E is being passed)

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented May 17, 2026

Is there any reason somebody might what to write something other than utf-8 to stdout/stderr?

What kind of output are were generating that is non-acsii? i.e. which test fails?

I'm a little worryied that we could break some other use case here because -X utf8 also ignores the system encoding. As well as effecting stdout/stderr it apparently also effects sys.getfilesystemencoding(), locale.getpreferredencoding().

On the other hand we go out our way to always write files explictly in utf-8 in almost all cases so maybe this fine?

There is one place we specifically do something different: expand_response_file in tools/response_file.py. However looking at that funcion it looks like the attached comment regarding locale.getpreferredencoding might be out-of-date?

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented May 17, 2026

Yup, it looks like the comment in response_file.py #15426 was maybe never accurate?

@juj
Copy link
Copy Markdown
Collaborator Author

juj commented May 18, 2026

Is there any reason somebody might what to write something other than utf-8 to stdout/stderr?

That I don't know an answer to. Currently I don't know of a use case here.

What kind of output are were generating that is non-acsii? i.e. which test fails?

Test other.test_wasm_sourcemap_relative_paths fails on Windows when build is driven by Buildbot CI. It attempts to print the name of a file during the test to stdout:

test('A ä☃ö Z.cpp')

which in this test is 'A ä☃ö Z.cpp'.

I'm a little worryied that we could break some other use case here because -X utf8 also ignores the system encoding.

If system encoding is CP437 or CP1252, then if one attempts to print() a character that is not part of either of these encodings, the Python print() function will throw. That would cause an exception from anywhere from the internals of Emscripten that happened to contain a unicode character as part of filename, or as part of a source file.

See e.g. http://clbri.com:8010/api/v2/logs/444660/raw_inline where it happened in the test.

Yup, it looks like the comment in response_file.py #15426 was maybe never accurate?

Not sure which comment?

Iiuc encoding of response files is somewhat orthogonal to the encoding of stdout/stderr streams?

I think the options we have here are to either run as -X utf8, or alternatively monkeypatch Python's print() so that it doesn't throw when attempting to print a character that cannot be encoded in the current stdout/stderr codepage.

@juj
Copy link
Copy Markdown
Collaborator Author

juj commented May 21, 2026

Ping, more thoughts here?

Copy link
Copy Markdown
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love how we don't really understand why -E has this effect on the codepage, but lgtm if this fixes your proximate issue.

Is there some way we can write a test for this so that we don't break it when we move to executable (.exe) entry points?

@juj juj merged commit a2bfe33 into emscripten-core:main May 21, 2026
30 checks passed
@juj
Copy link
Copy Markdown
Collaborator Author

juj commented May 21, 2026

Is there some way we can write a test for this

The test other.test_wasm_sourcemap_relative_paths does currently test this, since it prints ☃ during the test.

For more coverage, creating tests that have e.g. ☃ as part of a filename, or ☃ as part of a C++ file, which gets printed out by clang as part of diagnostics warning/error output, or ☃ in a JS file, which gets printed out as a warning/error print, could improve the coverage here.

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented May 21, 2026

Is there some way we can write a test for this

The test other.test_wasm_sourcemap_relative_paths does currently test this, since it prints ☃ during the test.

For more coverage, creating tests that have e.g. ☃ as part of a filename, or ☃ as part of a C++ file, which gets printed out by clang as part of diagnostics warning/error output, or ☃ in a JS file, which gets printed out as a warning/error print, could improve the coverage here.

But will that test file on the emscripten CI windows bot? Or the emscripten-release windows bot? Or does it require some kind of extra setup that your windows bot has ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants