Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up debugprint of gsharedvt get_call_info from start_call. #10582

Closed
wants to merge 4 commits into from

Conversation

jaykrell
Copy link
Contributor

Add MONO_DEBUG options, besides the ifdefs.
This is because the runtime with it enabled fails compiling BCL, but is still useful on smaller scenarios.

Add function and thread to the printing -- a failing test is very multi-threaded and the output interleaves.

Enable it on checked builds to ensure it doesn't rot again -- but still guarded by debug options mostly.

Add MONO_DEBUG options, besides the ifdefs.
 This is because the runtime with it enabled fails compiling BCL,
 but is still useful on smaller scenarios.
Add function and thread to the printing -- a failing test is very multi-threaded
 and the output interleaves.
Enable it on checked builds to ensure it doesn't rot again -- but still
  guarded by debug options mostly.
@jaykrell
Copy link
Contributor Author

@monojenkins build Windows i386

jaykrell added a commit to jaykrell/mono that referenced this pull request Sep 14, 2018
…rrectly via runtime disassembly.

This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for reasons obvious from the fix.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono#10582.

mini amd64 unbox trampolines are always 9 bytes.
LLVM amd64 unbox trampolines hypothetically vary 6 or 9, but are probably always 6.

There maybe other fixes here.

1. The change to use jmp instead of db 0xe9 might be have been more about
consistency than necessacity -- necessary other places.

2. It might be possible to force the assembler to output 0xe9.
jaykrell added a commit to jaykrell/mono that referenced this pull request Sep 14, 2018
…rrectly via runtime disassembly.

This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

mono#10441
mono#9554
mono#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for reasons obvious from the fix.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono#10582.

mini amd64 unbox trampolines are always 9 bytes.
LLVM amd64 unbox trampolines hypothetically vary 6 or 9, but are probably always 6.

There maybe other fixes here.

1. The change to use jmp instead of db 0xe9 might be have been more about
consistency than necessacity -- necessary other places.

2. It might be possible to force the assembler to output 0xe9.
GNU assembler supports jmp.d32 to force a 32bit displacement opcode 0xE9.
A solution portable to other assemblers is not known.
jaykrell added a commit that referenced this pull request Sep 15, 2018
…amd64/FullAOT/LLVM). (#10615)

* mono_aot_get_unbox_trampoline: amd64 compute unbox trampoline size correctly via runtime disassembly.
This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

#10441
#9554
#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for obvious reasons.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using #10582.

mini amd64 unbox trampolines are always 9 bytes -- 4 byte add, 5 byte jmp
LLVM amd64 unbox trampolines vary 6 or 9 -- 4 byte add, 2 or 5 byte jmp.

There are *many* possible ways to fix this.

1. Runtime disassembly to measure the trampoline size, each trampoline, allowing just two variations. This was written, tested, works. Downside is inhibits breakpoints, at an unlikely time and place, and inhibits execute only memory protection, also unlikely. Looks a little gnarly, but really is not.

2. Get the assembler to output fixed size trampolines.
This is known for GNU assembler -- (disp32) jmp and jmp.d32.
This is not known for Apple assembler, either the older cctools or LLVM, but maybe.
The problem *seemed* to be Linux only, but the underlying causes do not.
If this were looked into more closely and a Linux-specificity actually found, then depending on GNU as is more reasonable.

3. Output object files instead of assembly text. Good and bad idea otherwise. Super non-trivial.

4. Construct the tramp_info at compile time, with compiler/assembler/linker-computed size. This should be possible and otherwise beneficial, but not trivial, and not super beneficial, only mildly.
const static data/code is generally prefer over dynamic.

5. Don't bother with the trampinfo at all. It should not be needed, even for stackwalk, as these are leaf functions (don't change nonvolatiles), so stackwalk is just *rsp++.
Good idea for another day, as it is a simple net reduction in code, pending testing a not easy  to test case -- need to make a function that just jmps to self, point a thread at it, wait till it gets there, suspend it, and stackwalk it. A good exercise, for another time.

6. Pad out trampolines to at least 9 bytes by adding 3x 0xcc.
Sometimes 12 bytes.
And hardcoding size as 9 at runtime -- don't use the actual computed size, as it could be 12, apply that to a 9 byter, and back with the same problem.

That is one incorrect solution in there -- just merely adding 3 bytes of padding seemed at first like a good idea, but would not likely work.

7. Or rather, no need to pad, hardcode to 6 at runtime.

8. Or rather, hardcoding to 5 suffices. Interior bytes of last instruction need not be described, pretend it is of size 1. 2 is no more accurate than 1, when some of them are actually 5.

9. Can hardcode in compiler or runtime.
Both developed, easy.

10. Try current assemblers. Maybe the original reason to switch from 0xe9 to jmp is moot.
But continue to work with old assemblers? (in a test-only scenario?)
Not attempted.

11. Use separate source files to help achieve distance or assembler ignorance of distance.
Not attempted.

12. Move all the trampolines adjacent, with 128 bytes of padding on either end.
Not attempted.

13. Asserting the trampolines are 16-aligned would maybe work. And/or following them with a 16-aligned entity.

Here we option 9, hardcode to 5 in the compiler. It is essentially a one line change.

Removing the trampinfo and testing async stack walk is left as a future change.
This is really the best option but has more significant test burden.

* Increment MONO_AOT_FILE_VERSION to invalidate old bad images.
The badness is borderline -- limited to FullAOT+AMD64+LLVM, which is nonexistant outside test environments. This version should probably be a guid also.
monojenkins pushed a commit to monojenkins/mono that referenced this pull request Sep 15, 2018
…rrectly via runtime disassembly. This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

mono#10441
mono#9554
mono#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for reasons obvious from the fix.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono#10582.

mini amd64 unbox trampolines are always 9 bytes.
LLVM amd64 unbox trampolines hypothetically vary 6 or 9, but are probably always 6.

There maybe other fixes here.

1. The change to use jmp instead of db 0xe9 might be have been more about
consistency than necessacity -- necessary other places.

2. It might be possible to force the assembler to output 0xe9.
GNU assembler supports jmp.d32 to force a 32bit displacement opcode 0xE9.
A solution portable to other assemblers is not known.
jaykrell pushed a commit that referenced this pull request Sep 16, 2018
…By (Linux/amd64/FullAOT/LLVM). (#10633)

mono_aot_get_unbox_trampoline: amd64 hardcode unbox trampoline size to 5 at compile-time.
This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

#10441
#9554
#9046

The actual size varies between 6 and 9 but we can just pretend the last instruction has size 1 (it is 2 or 5; calling it 5 can be fatal, calling it 2 is no better than 1).

Otherwise, a trampoline can overlap the next function, jit info lookup of the next function can fail, then gsharedvt wrapper omited, and crash.

The afflicted function in this case was
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

You can see it with debugprint: #10582.

There are many other possible fixes.
Such as runtime disassembly, hardcode in runtime instead of compiler, produce static trampinfo, omit trampinfo entirely, compute minimum size.

See d589f3b.
@luhenry luhenry closed this Jan 3, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…amd64/FullAOT/LLVM). (mono/mono#10615)

* mono_aot_get_unbox_trampoline: amd64 compute unbox trampoline size correctly via runtime disassembly.
This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

mono/mono#10441
mono/mono#9554
mono/mono#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for obvious reasons.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono/mono#10582.

mini amd64 unbox trampolines are always 9 bytes -- 4 byte add, 5 byte jmp
LLVM amd64 unbox trampolines vary 6 or 9 -- 4 byte add, 2 or 5 byte jmp.

There are *many* possible ways to fix this.

1. Runtime disassembly to measure the trampoline size, each trampoline, allowing just two variations. This was written, tested, works. Downside is inhibits breakpoints, at an unlikely time and place, and inhibits execute only memory protection, also unlikely. Looks a little gnarly, but really is not.

2. Get the assembler to output fixed size trampolines.
This is known for GNU assembler -- (disp32) jmp and jmp.d32.
This is not known for Apple assembler, either the older cctools or LLVM, but maybe.
The problem *seemed* to be Linux only, but the underlying causes do not.
If this were looked into more closely and a Linux-specificity actually found, then depending on GNU as is more reasonable.

3. Output object files instead of assembly text. Good and bad idea otherwise. Super non-trivial.

4. Construct the tramp_info at compile time, with compiler/assembler/linker-computed size. This should be possible and otherwise beneficial, but not trivial, and not super beneficial, only mildly.
const static data/code is generally prefer over dynamic.

5. Don't bother with the trampinfo at all. It should not be needed, even for stackwalk, as these are leaf functions (don't change nonvolatiles), so stackwalk is just *rsp++.
Good idea for another day, as it is a simple net reduction in code, pending testing a not easy  to test case -- need to make a function that just jmps to self, point a thread at it, wait till it gets there, suspend it, and stackwalk it. A good exercise, for another time.

6. Pad out trampolines to at least 9 bytes by adding 3x 0xcc.
Sometimes 12 bytes.
And hardcoding size as 9 at runtime -- don't use the actual computed size, as it could be 12, apply that to a 9 byter, and back with the same problem.

That is one incorrect solution in there -- just merely adding 3 bytes of padding seemed at first like a good idea, but would not likely work.

7. Or rather, no need to pad, hardcode to 6 at runtime.

8. Or rather, hardcoding to 5 suffices. Interior bytes of last instruction need not be described, pretend it is of size 1. 2 is no more accurate than 1, when some of them are actually 5.

9. Can hardcode in compiler or runtime.
Both developed, easy.

10. Try current assemblers. Maybe the original reason to switch from 0xe9 to jmp is moot.
But continue to work with old assemblers? (in a test-only scenario?)
Not attempted.

11. Use separate source files to help achieve distance or assembler ignorance of distance.
Not attempted.

12. Move all the trampolines adjacent, with 128 bytes of padding on either end.
Not attempted.

13. Asserting the trampolines are 16-aligned would maybe work. And/or following them with a 16-aligned entity.

Here we option 9, hardcode to 5 in the compiler. It is essentially a one line change.

Removing the trampinfo and testing async stack walk is left as a future change.
This is really the best option but has more significant test burden.

* Increment MONO_AOT_FILE_VERSION to invalidate old bad images.
The badness is borderline -- limited to FullAOT+AMD64+LLVM, which is nonexistant outside test environments. This version should probably be a guid also.

Commit migrated from mono/mono@d589f3b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants