Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling GetName() on a System.Reflection.Assembly that is loaded from a byte stream will throw exception in Japanese locale #20968

Open
MerlinVR opened this issue Mar 23, 2021 · 20 comments
Assignees

Comments

@MerlinVR
Copy link

Steps to Reproduce

  1. Set system locale to Japanese & restart system to apply locale, the locale settings I'm referring to just to be clear since I don't mean having the Japanese IME setup: (https://www.java.com/en/download/help/locale.html)
  2. Make a directory with a path that includes Japanese characters which will contain the executable at some level. In my tests the exact path to the compiled exe was C:\Users\userドキュメント\Unity Projects\MonoTest\MonoTest\bin\Debug\MonoTest.exe
  3. In a C# console project, Compile and Emit an assembly at runtime, using Roslyn in this case. See attached files for example project. Load the assembly via System.Reflection.Assembly.Load(), and then call GetName() on the loaded assembly.
  4. Compile executable, in this case it was compiled in VS2017
  5. Make sure the Mono bin directory is added to the System PATH
  6. Open some terminal app, in this case I used Powershell.
  7. IMPORTANT: cd the working directory to the same directory as the executable, in my case: cd "C:\Users\userドキュメント\Unity Projects\MonoTest\MonoTest\bin\Debug"
  8. Execute built exe on mono, in my case: mono "C:\Users\userドキュメント\Unity Projects\MonoTest\MonoTest\bin\Debug\MonoTest.exe"
  9. Receive exception inside icall System.Reflection.RuntimeAssembly.get_code_base()

To run example, install dependencies listed in packages.config via nuget and compile Debug build in VS2017, presumably other build environments will work as well. The program is contained in Program.cs. The built exe is also included in the bin directory

MonoTest.zip

Current Behavior

Will throw an exception System.ExecutionEngineException: String conversion error: Illegal byte sequence encounted in the input. when GetName() is called on loaded assembly. Full stack trace included below

Expected Behavior

Prints the assembly name without throwing. The exact expected output of the program to the console is:

Hello!
MyTestAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
Success!

Also tested by just running the exe directly to use the .NET runtime and it works as expected:
image
image

Details

Mono actually works as expected if your working directory does not contain Japanese characters. Just executing the program from the Users root dir prints the expected output without errors
image

It is only when you make the working directory contain Japanese characters that the runtime-created assembly with no connection to the file system will somehow break:
image

This seemed odd since other assemblies that aren't runtime loaded work fine. So I tried saving the assembly to the same directory as the exe and then loading it from the file directly using Assembly.LoadFrom() and supplying the path to the assembly that I saved to disk as a .dll. This also threw the same exception, so it didn't seem like just loading the assembly from the disk works. Then why can most other system assemblies load properly and have GetName() called on them? At least in part, it seems like it's because most of the assemblies are actually stored in some directory without Japanese characters. If I saved the generated assembly to a directory outside the project in some directory that doesn't include Japanese characters, it loaded correctly without issue.

The reason I don't use just mono MonoTest.exe here is because mono throws a fit for no reason, and I cannot find an encoding for the MONO_EXTERNAL_ENCODINGS that works. utf8, utf32, all naming variations of shift-jis throw the same error, and utf16 just hangs the program indefinitely:
image

On which platforms did you notice this

[ ] macOS
[ ] Linux
[x] Windows

Version Used:
6.12.0, this issue was originally noticed in Unity 2018.4.20f1 and then reproduced in the latest stable version of Mono.

Stacktrace

Unhandled Exception:
System.ExecutionEngineException: String conversion error: Illegal byte sequence encounted in the input.
  at (wrapper managed-to-native) System.Reflection.RuntimeAssembly.get_code_base(System.Reflection.Assembly,bool)
  at System.Reflection.RuntimeAssembly.GetCodeBase (System.Reflection.Assembly a, System.Boolean escaped) [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.RuntimeAssembly.get_CodeBase () [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.AssemblyName.Create (System.Reflection.Assembly assembly, System.Boolean fillCodebase) [0x00010] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.RuntimeAssembly.GetName (System.Boolean copiedName) [0x0000e] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.Assembly.GetName () [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at MonoTest.Program.Main (System.String[] args) [0x00177] in <b10c1804a8eb477aa9eef64e391e55b9>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.ExecutionEngineException: String conversion error: Illegal byte sequence encounted in the input.
  at (wrapper managed-to-native) System.Reflection.RuntimeAssembly.get_code_base(System.Reflection.Assembly,bool)
  at System.Reflection.RuntimeAssembly.GetCodeBase (System.Reflection.Assembly a, System.Boolean escaped) [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.RuntimeAssembly.get_CodeBase () [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.AssemblyName.Create (System.Reflection.Assembly assembly, System.Boolean fillCodebase) [0x00010] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.RuntimeAssembly.GetName (System.Boolean copiedName) [0x0000e] in <32116eccb94d4ed685ca661d98e36637>:0
  at System.Reflection.Assembly.GetName () [0x00000] in <32116eccb94d4ed685ca661d98e36637>:0
  at MonoTest.Program.Main (System.String[] args) [0x00177] in <b10c1804a8eb477aa9eef64e391e55b9>:0
@CoffeeFlux
Copy link
Contributor

CoffeeFlux commented Apr 5, 2021

Just to make sure I'm clear: would something like the following snippet fail on JP locale?

            var a = Assembly.LoadFrom("/path/to/some/お/TestAssembly.dll");
            Console.WriteLine(a.GetName());

Does the assembly have to be emitted from managed code, or can I just use a random assembly to try and load?

@CoffeeFlux CoffeeFlux self-assigned this Apr 5, 2021
@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

Yes it should presuming お is enough to cause the issue. I was only able to avoid error by saving the assembly outside of the path with Japanese characters and loading it from there. In my case the path that it worked from was just C:\Users\merli\

@CoffeeFlux
Copy link
Contributor

Sorry, I edited in a second question there: does the assembly have to be emitted from managed code, or can I just use a random assembly to try and load?

@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

I didn't test if it breaks on assemblies not emitted by the managed code, I can check on that.

@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

It seems to only happen with assemblies emitted from managed code, I tried with System.Collections.Immutable and it works. Also tried with a random dll with few dependencies that wasn't already needed by the project (DotZLib) and it did not break either.

@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

Is there anything that might be different between the two? The emitted dll is emitted from the latest version of Roslyn. I also ran into the same issue with an assembly created by Harmony which uses Mono.Cecil to create the assembly https://github.com/pardeike/Harmony/blob/master/Harmony/Internal/HarmonySharedState.cs#L78.

@CoffeeFlux
Copy link
Contributor

Try a non-system assembly? There are a few differences between system and user-created assemblies.

@CoffeeFlux
Copy link
Contributor

Oh you edited in that you tried that, apologies.

Could you try compiling the same assembly (some simple hello world should be fine) with Roslyn and a standalone project, and then try loading both, to verify this holds true? If you still see the failure, upload them both here and I can take a look.

@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

I also just tried with Newtonsoft.Json and it loaded correctly. I'll try the the Roslyn vs standalone project now.

@MerlinVR
Copy link
Author

MerlinVR commented Apr 5, 2021

I can't reproduce it with LoadFrom() any longer on the dll. I don't have the code I was initially using to test the LoadFrom with, I was probably using an old executable that was still executing the .Load() from the emitted bytes, apologies for the misleading test. Running Assembly.Load() on a byte[] loaded from a package dll via File.ReadAllBytes() fails, whereas running Assembly.LoadFrom on the dll path directly works. Tested with Newtonsoft.Json, a VS2017 compiled dll, and a Roslyn 3.9 compiled dll.

@CoffeeFlux
Copy link
Contributor

Ah okay, gotcha. So something like the following should trigger it?

            var path = "/some/path/おはよう/TestAssembly.dll";
            var bytes = File.ReadAllBytes(path);
            var asm = Assembly.Load(bytes);
            Console.WriteLine(asm.GetName());

@CoffeeFlux
Copy link
Contributor

This also might be Windows-only, unless I'm missing something? Testing on my MacOS machine:

ryan@kenshin:~/Downloads/MonoTestおはよう/bin/Debug$ locale
LANG="ja_JP.eucjp"
LC_COLLATE="ja_JP.eucjp"
LC_CTYPE="ja_JP.eucjp"
LC_MESSAGES="ja_JP.eucjp"
LC_MONETARY="ja_JP.eucjp"
LC_NUMERIC="ja_JP.eucjp"
LC_TIME="ja_JP.eucjp"
LC_ALL="ja_JP.eucjp"
ryan@kenshin:~/Downloads/MonoTestおはよう/bin/Debug$ mono MonoTest.exe
Hello!
MyTestAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
Success!

If you confirm that snippet should work, I'll switch the locale on my Windows VM and give that a try.

@CoffeeFlux
Copy link
Contributor

CoffeeFlux commented Apr 6, 2021

On second thought, this might only happen with shift jis? Will try in a bit.

Edit: nope :(

ryan@kenshin:~/Downloads/MonoLocaleTestおはよう/bin/Debug$ locale
LANG="ja_JP.SJIS"
LC_COLLATE="ja_JP.SJIS"
LC_CTYPE="ja_JP.SJIS"
LC_MESSAGES="ja_JP.SJIS"
LC_MONETARY="ja_JP.SJIS"
LC_NUMERIC="ja_JP.SJIS"
LC_TIME="ja_JP.SJIS"
LC_ALL="ja_JP.SJIS"
ryan@kenshin:~/Downloads/MonoLocaleTestおはよう/bin/Debug$ mono MonoTest.exe
Hello!
MyTestAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
Success!

@MerlinVR
Copy link
Author

MerlinVR commented Apr 6, 2021

Sorry for the late response, I'm using a Windows machine and don't have a MacOS machine available to test. It'd make some sense that it may only be reproducible on Windows since afaik macos uses UTF-8 for file system paths usually while Windows uses UTF-16. The locale reported by Windows for me, though I'm not sure how helpful it'd be:
image

@SamMonoRT SamMonoRT assigned fanyang-mono and unassigned CoffeeFlux Jun 23, 2021
@compasslg
Copy link

Hi, is there any solution to this yet?

@SamMonoRT
Copy link
Contributor

This isn't reproducible locally. Please specify mono version to try to repro locally.

@CoffeeFlux
Copy link
Contributor

CoffeeFlux commented Nov 9, 2021 via email

@CoffeeFlux
Copy link
Contributor

Still happens on nightly:

PS C:\Users\Ryan\Downloads\MonoTestあ\bin\Debug> mono C:\Users\Ryan\Downloads\MonoTestあ\bin\Debug\MonoTest.exe
Hello!

Unhandled Exception:
System.ExecutionEngineException: String conversion error: Illegal byte sequence encounted in the input.
  at (wrapper managed-to-native) System.Reflection.RuntimeAssembly.get_code_base(System.Reflection.Assembly,bool)
  at System.Reflection.RuntimeAssembly.GetCodeBase (System.Reflection.Assembly a, System.Boolean escaped) [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.RuntimeAssembly.get_CodeBase () [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.AssemblyName.Create (System.Reflection.Assembly assembly, System.Boolean fillCodebase) [0x00010] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.RuntimeAssembly.GetName (System.Boolean copiedName) [0x0000e] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.Assembly.GetName () [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at MonoTest.Program.Main (System.String[] args) [0x00177] in <b10c1804a8eb477aa9eef64e391e55b9>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.ExecutionEngineException: String conversion error: Illegal byte sequence encounted in the input.
  at (wrapper managed-to-native) System.Reflection.RuntimeAssembly.get_code_base(System.Reflection.Assembly,bool)
  at System.Reflection.RuntimeAssembly.GetCodeBase (System.Reflection.Assembly a, System.Boolean escaped) [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.RuntimeAssembly.get_CodeBase () [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.AssemblyName.Create (System.Reflection.Assembly assembly, System.Boolean fillCodebase) [0x00010] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.RuntimeAssembly.GetName (System.Boolean copiedName) [0x0000e] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at System.Reflection.Assembly.GetName () [0x00000] in <04f81ef9cc5f4e468813800c70fd83b0>:0
  at MonoTest.Program.Main (System.String[] args) [0x00177] in <b10c1804a8eb477aa9eef64e391e55b9>:0

@constfold
Copy link

constfold commented Jan 16, 2022

failed to reproduce on Windows 11 with Chinese locale and mono 6.12.0. But I met the same issue while using Unity. strange

D:\>systeminfo
系统区域设置:     zh-cn;中文(中国)
输入法区域设置:   zh-cn;中文(中国)
D:\>mono -V
Mono JIT compiler version 6.12.0 (Visual Studio built mono)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       normal
        Notification:  Thread + polling
        Architecture:  amd64
        Disabled:      none
        Misc:          softdebug
        Interpreter:   yes
        LLVM:          supported, not enabled.
        Suspend:       preemptive
        GC:            sgen (concurrent by default)

D:\>mono D:\dev\MonoTestあ\bin\Debug\MonoTest.exe
Hello!
MyTestAssembly, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
Success!

@constfold
Copy link

Successfully reproduced. And after a few hours of debugging, I found that the root cause is in g_get_current_dir(this line exactly):

r = getcwd (buffer, s);

Which in Windows, the result of getcwd is NOT guaranteed to be UTF-8(and usually using system's locale). So that, any function who calls g_get_current_dir will get a melformed string and the encoding invariant would be broken.
In pratice, two of victims is mono_assembly_request_open and mono_assembly_request_load_from(both of them depends on g_get_current_dir to canonicalize assembly's filename) and all managed api that depends on them, Assembly.Load(byte[]) in this case.

side note:
I found you can reproduce the same issue easily with

Assembly assembly = Assembly.LoadFrom("./MonoTest.dll");
Console.WriteLine(assembly.EscapedCodeBase); // you will see the melformed encoded string
Console.WriteLine(assembly.GetName()); // Exception on GetName()!

and

> cd MonoTestあ\bin\Debug
> mono MonoTest.exe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants