Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codepage issue on Windows 8.1 #8263

Closed
ad8e opened this issue Jan 27, 2021 · 28 comments
Closed

Codepage issue on Windows 8.1 #8263

ad8e opened this issue Jan 27, 2021 · 28 comments
Labels
bug OS:windows Winodows OS specific issues

Comments

@ad8e
Copy link

ad8e commented Jan 27, 2021

Describe the bug
I receive a build failure when building harfbuzz through vcpkg. vcpkg maintainers believe it is a meson upstream bug with codepages. The downstream bug is microsoft/vcpkg#15916. I have no ability to file a sane bug report, since I know nothing. However, I have logs.

To Reproduce
I can only reproduce this when building harfbuzz with vcpkg, using ./vcpkg install harfbuzz. I don't receive this error when building harfbuzz with the project's native meson instructions without vcpkg. These builds are on the same system, but my native build uses mingw-w64's gcc, while the vcpkg build uses Visual Studio. The compiler isn't yet invoked when the build error occurs, so I don't believe this difference of compilers is the cause.

Expected behavior
I expected it to build cleanly. This is the log:

[3/66] "C:\Users\adm\Downloads\vcpkg\downloads\tools\python\python-3.9.0-x64\python.exe" "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\meson.py" "--internal" "depscan" "src/harfbuzz.dll.p/harfbuzz.dat" src/harfbuzz.dll.p/depscan.dd ../src/f4d3aa1127-a820a32874.clean/src/hb-aat-layout.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-aat-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-blob.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-buffer-serialize.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-buffer.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-common.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-draw.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-face.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-fallback-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-font.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-number.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-cff1-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-cff2-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-color.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-face.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-font.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-layout.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-math.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-meta.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-metrics.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-name.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-arabic.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-default.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-hangul.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-hebrew.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-indic-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-indic.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-khmer.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-myanmar.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-thai.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-use-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-use.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-vowel-constraints.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-fallback.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-normalize.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-tag.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-var.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-set.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shape-plan.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shaper.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-static.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-style.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ucd.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-unicode.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ft.cc
FAILED: src/harfbuzz.dll.p/depscan.dd 
"C:\Users\adm\Downloads\vcpkg\downloads\tools\python\python-3.9.0-x64\python.exe" "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\meson.py" "--internal" "depscan" "src/harfbuzz.dll.p/harfbuzz.dat" src/harfbuzz.dll.p/depscan.dd ../src/f4d3aa1127-a820a32874.clean/src/hb-aat-layout.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-aat-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-blob.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-buffer-serialize.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-buffer.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-common.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-draw.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-face.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-fallback-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-font.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-number.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-cff1-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-cff2-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-color.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-face.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-font.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-layout.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-map.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-math.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-meta.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-metrics.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-name.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-arabic.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-default.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-hangul.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-hebrew.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-indic-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-indic.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-khmer.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-myanmar.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-thai.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-use-table.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-use.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-complex-vowel-constraints.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-fallback.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape-normalize.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-tag.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ot-var.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-set.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shape-plan.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shape.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-shaper.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-static.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-style.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ucd.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-unicode.cc ../src/f4d3aa1127-a820a32874.clean/src/hb-ft.cc
Traceback (most recent call last):
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\meson.py", line 29, in <module>
    sys.exit(mesonmain.main())
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\mesonmain.py", line 229, in main
    return run(sys.argv[1:], launcher)
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\mesonmain.py", line 218, in run
    return run_script_command(args[1], args[2:])
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\mesonmain.py", line 166, in run_script_command
    return module.run(script_args)
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\scripts\depscan.py", line 91, in run
    return scanner.scan()
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\scripts\depscan.py", line 64, in scan
    self.scan_file(s)
  File "C:\Users\adm\Downloads\vcpkg\downloads\tools\meson\meson-f5871f434a5b768ad9fcafa797a6db0286421842\mesonbuild\scripts\depscan.py", line 37, in scan_file
    for line in pathlib.Path(fname).read_text().split('\n'):
  File "pathlib.py", line 1256, in read_text
  File "encodings\cp1252.py", line 23, in decode
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 36961: character maps to <undefined>

Attached: src\harfbuzz.dll.p\harfbuzz.dat
Attached: package-x64-windows-rel-out.log, probably no need to look at this

system parameters

  • native build
  • Windows 8.1
  • Python 3.9.1
  • meson 0.56.2
  • ninja 1.10.2
@dcbaker dcbaker added bug OS:windows Winodows OS specific issues labels Jan 27, 2021
@dcbaker
Copy link
Member

dcbaker commented Jan 27, 2021

Yes, that looks like our bug. We should probably be using utf-8 or ascii for the encoding. What do you think @jpakkane (since this is your code)?

I don't have access to windows 8 anymore, If we wrote a patch would you be able to test it?

@ad8e
Copy link
Author

ad8e commented Jan 27, 2021

I can make a best effort, run code/patches you send me, and rummage around trying things. However, I am totally new to meson and vcpkg.

@dcbaker
Copy link
Member

dcbaker commented Jan 27, 2021

I think this is likely a meson + harfbuzz problem, looking at the output. It would be interesting to see if you can reproduce this building harfbuzz with meson outside vcpkg.

@ad8e
Copy link
Author

ad8e commented Jan 28, 2021

I just tried building harfbuzz-master with meson outside vcpkg, and it was successful (no errors). I tried building the harfbuzz tarball that vcpkg downloads, and also succeeded with meson build --wrap-mode=default -Dfontconfig=disabled -Dtests=disabled -Ddocs=disabled. (There are a lot of steps skipped here, involving freetype.) So I couldn't reproduce the error outside vcpkg.

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

We're currently triggering this same issue in Mesa on Windows, so maybe it's worth tossing out a patch that passes an encoding-argument to read_text() is worth a go.

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

Some more digging shows that this Python change is what caused the regression in Mesa.

@mensinda
Copy link
Member

@kusma Could you also report this to upstream Python, since this is a Python bug apparently?

Also, we are using Path.read_text() all over the place, so we likely need to patch pathlib for Windows to reliably work around this...

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

I already did: https://bugs.python.org/issue44487

@mensinda
Copy link
Member

Seems like with #7295 we should really consider patching pathlib (although I am of the opinion that this really should be the job of the python devs)...

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

That ticket only seem superficially connected to this one as far as I can tell; #7295 seems to be about the path, whereas this is about the encoding of the content of the file.

@mensinda
Copy link
Member

Yeah, but the solution for Meson is the same: Patch the pathlib module to work around upstream Python bugs.

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

Yeah, fair enough.

@nirbheek
Copy link
Member

fwiw, I would recommend not waiting for Python to fix this issue. The best way forward IMHO is:

  1. Monkey-patch pathlib in Meson to do the right thing
  2. Submit a patch upstream to do the right thing
  3. Wait (this may take a long time)

@mensinda
Copy link
Member

Already working on it :)

@jpakkane
Copy link
Member

Rather than monkeypatching, it would be better to change the call sites. The thing about monkeypatching is that it always comes back to bite you in the ass.

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

Here's a minimal reproduction:

lib.cpp:

extern const char *utf8_encoded[] = {
    "", "", "",
    "", "", "", "",
    "", "", "", "",
    "", "", "", ""
};

meson.build:

project('windows-encoding-issue', ['cpp'],
  default_options : ['cpp_std=vc++latest'])

test_lib = static_library(
  'test-lib',
  files('lib.cpp'),
  build_by_default : true,
)

@mensinda
Copy link
Member

@jpakkane The problem with changing the call sites is that there are just so many, and that every .resolve() call has to be wrapped inside a try: except: blocks. It is also far too easy to miss passing an explicit encoding to read_text() (both when writing and in the code review).

With monkey patching, we have a dirty hack (just for Windows) in one location and don't have to worry about the remaining code base. Not saying that I like the idea, but that it is better (IMHO) than fixing 66 (for .resolve()) and 25 (for .read_text()) call sites all over the project.

@nirbheek
Copy link
Member

Rather than monkeypatching, it would be better to change the call sites. The thing about monkeypatching is that it always comes back to bite you in the ass.

I agree, if it's not too difficult to find all the call-sites. I thought the problem in this case was that not just that pathlib.resolve() was broken, but that other pathlib functions are also affected, and that it would be difficult to figure out which ones might decide to call pathlib.resolve() internally?

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

It seems that reproduction case, which is taken from the mesa sources, fails even with Meson 0.57.2 and with Python 3.9.5. So It's a bit unclear to me if the python change I blamed is the actual culprit.

Still, we're seeing this passing on CI with recent-ish images, so something changed. Unfortunately, I think we've lost the logs for when the current base-image was built, so I don't think it's easy to inspect the details closely. There's a lot of conditions in should_use_dyndeps_for_target, and I guess if any of those conditions failed, we'd end up dodging the issue.

Maybe this is about the VS version instead of all the other things I've suspected so far? 26ffd4f mentions "Currently only the preview version of Visual Studio is supported". And since this triggers for me, it seems a new non-preview Visual Studio version has been released, and it might support this. So if our build-image all of a sudden picked up a new Visual Studio version, it could be that we started triggering a previously latent bug.

@mensinda
Copy link
Member

I agree, if it's not too difficult to find all the call-sites.

Even then I would argue that we have way too many and no automatic way to ensure that they are correctly maintained...

@mensinda
Copy link
Member

Looks like this issue at least really isn't a python bug, but a major python annoyance:

On Windows, UTF-8 is not the default charmap, but something else based on the locale:
https://dev.to/methane/python-use-utf-8-mode-on-windows-212i
https://www.python.org/dev/peps/pep-0540/

This results in your error when meson reads files on Windows.

@mensinda
Copy link
Member

@kusma Setting PYTHONUTF8=1 should fix the error in the meantime.

@methane
Copy link

methane commented Jun 22, 2021

Python 3.10 has PYTHONWARNDEFAULTENCODING environment variable. You can find missing encoding="utf-8" by it.
If you don't want to add encoding="utf-8" everywhere, you can consider PYTHONUTF8 environment variable.

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

Nice, setting PYTHONUTF8=1 does indeed work around the issue. Thanks :)

@mensinda
Copy link
Member

Unfortunately, PYTHONUTF8 is a Python 3.7 feature, so we can't depend on it since we are still stuck on 3.6

@kusma
Copy link
Contributor

kusma commented Jun 22, 2021

Yeah, but we can use it on CI for Mesa until a real fix has landed :)

Just my two cents; I think just patching this particular call to read_text() to take an UTF-8 encoding is the right thing to do; it's the only case where we know that opening it as CP1252 is a bad idea. Windows doesn't really have a UTF-8 locale (or at least not all Windows versions do), so CP1252 might be a reasonable default for a lot of things. But C++ sources are not CP1252 (these days, anyway).

@mensinda
Copy link
Member

We mostly use .read_text() for source files and JSON files. Since both types of files should be UTF-8, assuming that Meson only interacts with UTF-8 should be reasonable.

@mensinda
Copy link
Member

Should be fixed/worked around with #8918.

jpakkane added a commit that referenced this issue Jun 29, 2021
pathlib: Patch pathlib to work around some bugs (fixes #8263 #7295)
nirbheek pushed a commit that referenced this issue Jul 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug OS:windows Winodows OS specific issues
Projects
None yet
Development

No branches or pull requests

7 participants