Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool to print code page information on Windows #1918

Merged
merged 2 commits into from Feb 26, 2021

Conversation

bradking
Copy link
Contributor

Since #1915, ninja does not always expect build.ninja to be encoded in the system's ANSI code page. The expected encoding now depends on how ninja is built and the version of Windows on which it is running.

Add a -t wincodepage tool that generators can use to ask ninja what encoding it expects.

Issue: #1195

@bradking
Copy link
Contributor Author

Cc: @jhasse @KyleFromKitware

@bradking
Copy link
Contributor Author

With ninja built by CMake, running in Windows Console on Windows 10 1903+:

>ninja -t wincodepage
ANSI code page: 65001
Console code page: 437

With ninja built by CMake, running in mintty on Windows 10 1903+:

$ ninja -t wincodepage
ANSI code page: 65001
Console code page: 65001

With ninja built by configure.py, running in Windows Console on Windows 10 1903+:

>ninja -t wincodepage
ANSI code page: 1252
Console code page: 437

With ninja built by configure.py, running in mintty on Windows 10 1903+:

$ ninja -t wincodepage
ANSI code page: 1252
Console code page: 65001

With ninja built by either CMake or configure.py, running in Windows Console on Windows 7:

>ninja -t wincodepage
ANSI code page: 1252
Console code page: 437

@KyleFromKitware
Copy link

KyleFromKitware commented Feb 26, 2021

This output is not machine readable, and a detailed description of the output is not provided in the documentation.

+1 for the overall concept though.

@bradking
Copy link
Contributor Author

This output is not machine readable

Sure it is. It's a key: value syntax, with one entry per line. I'm open to alternative suggestions, but the format should remain very simple (e.g. no JSON).

@KyleFromKitware
Copy link

You're quite right... I guess that didn't occur to me. It's so human-readable that it didn't occur to me that it didn't occur to me that it might be machine-readable too.

+1

Group all the paragraphs together in the definition list entry.
Since commit 00459e2 (Use UTF-8 on Windows 10 Version 1903, fix ninja-build#1195,
2021-02-17), `ninja` does not always expect `build.ninja` to be encoded
in the system's ANSI code page.  The expected encoding now depends on
how `ninja` is built and the version of Windows on which it is running.

Add a `-t wincodepage` tool that generators can use to ask `ninja`
what encoding it expects.

Issue: ninja-build#1195
@bradking
Copy link
Contributor Author

I updated the branch to specify the tool's output format in the documentation. I also fixed formatting of the missingdeps tool's documentation in a separate commit.

@jhasse jhasse merged commit 1a80150 into ninja-build:master Feb 26, 2021
@jhasse
Copy link
Collaborator

jhasse commented Feb 26, 2021

Thanks!

@bradking bradking deleted the windows-code-page branch February 26, 2021 16:26
@KyleFromKitware
Copy link

We are now working on using this on our end - please see https://gitlab.kitware.com/cmake/cmake/-/merge_requests/5860.

@tristanlabelle
Copy link

tristanlabelle commented Feb 26, 2021

@bradking , @jhasse Code pages are super tricky. I have some feedback on the design:

The concept of supporting multiple codepages is something we should move away in favor of using utf-8 as much as possible. For this reason, I think a better design would be to have ninja report whether it supports utf-8 as a boolean, yes or no. If yes, then its build configuration files should be utf-8. If no, then they should be ANSI, which is a system-global setting that can be retrieved by cmake. This avoids having implicit support for ninja reporting that it operates in a third codepage that is neither the system ANSI nor utf-8.

The console code page is an extremely tricky piece of legacy and reporting it can only cause misinterpretations. What the console code page controls is how the console decodes the stdout of a program attached to it. When a program's output is redirected, there is no reason to honor the console code page and the Windows console team advocates always using utf-8 in this case for interoperability between the programs involved. This gets more complicated by the fact that not all programs honor this rule, and ninja pipes the output of other program to its own. Ninja can decide in which encoding to write its own output text, but it has no control over the encoding used by the programs it calls, which it pipes back to its own stdout. This is prone to leading to a mixed encoding stream.

Moreover, the console code page from GetConsoleOutputCP() is a property of the console window to which Ninja is attached, not of the Ninja process. It could be different depending on the context of a specific Ninja invocation, so it is generally incorrect to ask it in one invocation and cache the information for later invocations.

My suggestion is to treat the stdout of ninja as utf-8 at all times (even when that might not be true), which I think is what cmake already does. This might cause corruption in some ill-behaved programs, but is the only reasonable way to solve code page issues in the long run.

Summary:

  • ANSI code page: change to utf-8 boolean
  • Console code page: remove

@bradking
Copy link
Contributor Author

bradking commented Mar 1, 2021

@MrTrillian thanks. I included the console code page for use in debugging encoding problems when running ninja in a given console. It's useful information. Having the full ANSI code page is also useful to debug encoding problems. The change on CMake's side will treat the result as a boolean == 65001.

@bradking
Copy link
Contributor Author

bradking commented Mar 1, 2021

#1920 clarifies the documentation.

@tristanlabelle
Copy link

@bradking If you're in a cmd.exe prompt and about to run ninja, chcp will give you the same results as starting ninja and having ninja report GetConsoleCP()/GetConsoleOutputCP() since it'll be sharing the same console. Similarly, if you're in a process like cmake running in a console and about to call CreateProcess("ninja"), calling GetConsoleCP()/GetConsoleOutputCP() yourself will give the same results as asking ninja to do it since it'll be sharing the same console. Given this, there is no value in having ninja report the console code page, and it seems prone to be misused or misunderstood.

For the ANSI code page, cmake might know to only interpret it as == 65001, but other developers seeing ninja report this information might think that they have to handle the full gamut of codepages in case ninja ever reports one of them as its ANSI codepage. In my opinion, the additional information is more problematic than useful in this case.

@bradking
Copy link
Contributor Author

bradking commented Mar 1, 2021

@MrTrillian if ninja.exe is invoked in some arbitrary context, not manually in a console window, then the console window created under the hood for the process is not something a human can access to run chcp. Showing it with -t wincodepage could be an important debugging tool.

other developers seeing ninja report this information might think that they have to handle the full gamut of codepages

Not if they read the documentation updated as in #1920.

@tristanlabelle
Copy link

@bradking In that case, the console window created for ninja will have its console code page initialized to the OEM code page, which the starting process could have retrieved using GetOEMCP() since that's a system-wide setting. But why would the starting process even care about this? There is nothing useful it can do with that information (if ninja's output is redirected, it shouldn't try to honor its console code page).

Relying on documentation is more brittle than transmitting targeted info.

@bradking
Copy link
Contributor Author

bradking commented Mar 1, 2021

#1921 minimizes information in the output as proposed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants