Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ld does not work with non-ASCII file path #649

Open
CyanoHao opened this issue Jun 24, 2023 · 6 comments · Fixed by #661
Open

ld does not work with non-ASCII file path #649

CyanoHao opened this issue Jun 24, 2023 · 6 comments · Fixed by #661

Comments

@CyanoHao
Copy link
Contributor

These are upstream bugs (2 bugs as far as I know), and seem can be fixed with a patch for binutils.

1st bug: slash conversion in ld breaks wide filename.

big1-slash

This is cause by a mistake in variable name in FILE *_bfd_real_fopen(const char *filename, const char *modes) (binutils-2.39/bfd/bfdio.c).

   /* Convert any UNIX style path separators into the DOS i.e. backslash separator.  */
   for (ix = 0; ix < partPathLen; ix++)
     if (IS_UNIX_DIR_SEPARATOR(filename[ix]))
       partPath[ix] = '\\';

It should be

   for (ix = 0; ix < partPathLen; ix++)
     if (IS_UNIX_DIR_SEPARATOR(partPath[ix]))
       partPath[ix] = L'\\';  // prefix `L` is optional

2nd bug: ld gets wrong active code page with ___lc_codepage_func(). (MSVCRT only)

bug2-codepage

This is caused by a tricky problem in FILE *_bfd_real_fopen(const char *filename, const char *modes) (binutils-2.39/bfd/bfdio.c). In a word, MSVCRT version of UINT ___lc_codepage_func(void), which is invoked by ld to determine current code page, does not return system code page or active code page. Instead, it returns default code page for Windows display language. (UCRT version is okay.)

Changing ___lc_codepage_func() to CP_ACP seems to fix this bug. But I’m not sure whether there are some conner cases that would be broken.

Here is a simple program to test ___lc_codepage_func().

#include <iostream>
int main() {
  setlocale(LC_CTYPE, "");
  std::cout << ___lc_codepage_func() << std::endl;
}

With release of 13.1.0-rt_v11-rev1, x86-64 POSIX SEH:

  1. Windows display language: English (UK); system and active code page: 936 (Simplified Chinese).
    MSVCRT UCRT Expected
    1252 936 936
  2. Windows display language: English (UK); system code page: 936; active code page: 65001 (with application manifest).
    MSVCRT UCRT Expected
    1252 65001 65001
  3. Windows display language: English (UK); system and active code page: 65001 (check “Beta: Use Unicode UTF-8 for worldwide language support”).
    MSVCRT UCRT Expected
    1252 65001 65001
  4. Windows display language: Simplified Chinese; system and active code page: 65001.
    MSVCRT UCRT Expected
    936 65001 65001

(MSVCRT result for case 1 and 2:)
codepage-msvcrt

(UCRT result for case 1 and 2:)
codepage-ucrt

Patch

diff --unified --recursive --text binutils-2.39.orig/bfd/bfdio.c binutils-2.39/bfd/bfdio.c
--- binutils-2.39.orig/bfd/bfdio.c      2022-07-08 17:46:47.000000000 +0800
+++ binutils-2.39/bfd/bfdio.c   2023-06-24 19:56:02.752090800 +0800
@@ -122,7 +122,7 @@
    const wchar_t  prefix[] = L"\\\\?\\";
    const size_t   partPathLen = strlen (filename) + 1;
 #ifdef __MINGW32__
-   const unsigned int cp = ___lc_codepage_func();
+   const unsigned int cp = CP_ACP;
 #else
    const unsigned int cp = CP_UTF8;
 #endif
@@ -138,8 +138,8 @@

    /* Convert any UNIX style path separators into the DOS i.e. backslash separator.  */
    for (ix = 0; ix < partPathLen; ix++)
-     if (IS_UNIX_DIR_SEPARATOR(filename[ix]))
-       partPath[ix] = '\\';
+     if (IS_UNIX_DIR_SEPARATOR(partPath[ix]))
+       partPath[ix] = L'\\';

    /* Getting the full path from the provided partial path.
       1) Get the length.

By the way, if someone would like to fix it in upstream, a minor problem in this function can also be fixed:

   wchar_t *  fullPath = calloc (fullPathWSize + sizeof(prefix) + 1, sizeof(wchar_t));

A length of fullPathWSize + (sizeof(prefix) / sizeof(wchar_t) - 1) + 1 is sufficient.

@niXman
Copy link
Owner

niXman commented Sep 13, 2023

@CyanoHao could you please provide the PR for the develop branch?

@niXman
Copy link
Owner

niXman commented Oct 29, 2023

@CyanoHao do you want me to release a new build with this patch before closing this issue?

@niXman niXman reopened this Oct 29, 2023
@xuchengpeng
Copy link

@CyanoHao do you want me to release a new build with this patch before closing this issue?

same problem, please release a new build, thanks

@anbangli
Copy link

@CyanoHao
请您协助解决下面的问题 (please help to solve the following bug)

If the installation directory of MinGW-w64 contains Chinese Characters, then compilation will fail.
For example, MinGW-w64 is installed in directory "C:\编译器MinGW64" which contains Chinese characters, one tried to compile program "C:\myprogs\hello.cpp" with the following command:

C:\编译器MinGW64\bin\g++.exe "C:\myprogs\hello.cpp" -o "C:\编译器hello.exe" -Wall -Wextra -pipe -I"C:\编译器MinGW64\include" -I"C:\编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\13.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\13.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -static-libstdc++ -static-libgcc

The compilation will fail, and the output message is:

C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/crtbegin.o: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lstdc++: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc_eh: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc_eh: No space left on device
C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/13.2.0/crtend.o: No space left on device
collect2.exe: error: ld returned 1 exit status

It seems that the Chinese characters "编译器" are ignored in some internal stage.

@anbangli
Copy link

I also tested v12.2 and v11.2.

Compiling command with v12.2:
C:\编译器MinGW64\bin\g++.exe "C:\myprogs\hello.cpp" -o "C:\编译器hello.exe" -Wextra -g3 -pipe -I"C:\编译器MinGW64\include" -I"C:\编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\12.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\12.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -g3

Output message:
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/lib/../lib/crt2.o: No such file or directory
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/crtbegin.o: No such file or directory
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lstdc++: No such file or directory
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No such file or directory
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find -lgcc: No such file or directory
C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/../../../../x86_64-w64-mingw32/bin/ld.exe: cannot find C:/编译器MinGW64/bin/../lib/gcc/x86_64-w64-mingw32/12.2.0/crtend.o: No such file or directory
collect2.exe: error: ld returned 1 exit status

It seems that the Chinese characters "编译器" are not ignored in message, but ignored in some internal stage.

Compiling command with v11.2:
c:\编译器MinGW64>C:\编译器MinGW64\bin\g++.exe "C:\我的程序\测试hello.cpp" -o "C:\我的程序\测试hello.exe" -Wextra -g3 -pipe -I"C:\编译器MinGW64\include" -I"C:\ 编译器MinGW64\x86_64-w64-mingw32\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\11.2.0\include" -I"C:\编译器MinGW64\lib\gcc\x86_64-w64-mingw32\11.2.0\include\c++" -L"C:\编译器MinGW64\lib" -L"C:\编译器MinGW64\x86_64-w64-mingw32\lib" -g3

Regardless of whether my source code has errors or not, v11.2 works OK.

@CyanoHao
Copy link
Contributor Author

@anbangli
The root causes in these 2 situations (either the path of user code, or the path of gcc itself, contains non ASCII characters) are same -- the object paths passed to ld (user object or crt2.o, etc) got broken.

It seems okay now (tested with x86_64-posix-seh-ucrt build).
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants