Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy (Linux/…
…amd64/FullAOT/LLVM). (#10615) * mono_aot_get_unbox_trampoline: amd64 compute unbox trampoline size correctly via runtime disassembly. This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI. #10441 #9554 #9046 Race condition because the code varies in what actually runs. Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called, sometimes not. FullAOT+LLVM for obvious reasons. Very sensitive to exact code presumably because the bug depends on whether the first unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT? For example, this could be the only unbox trampoline or it could be first. For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference. Guessing here: When the first unbox trampoline is 9 bytes in size and System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT. So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged). And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT with a gsharedvt trampoline. In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT goes through both gsharedvt_out_trampoline and gsharedvt_trampoline. The first turns a byref into a byvalue, the second turns it back into a byref. When the second is skipped, we end up treating the value as a pointer and SIGSEGV. This can be seen using #10582. mini amd64 unbox trampolines are always 9 bytes -- 4 byte add, 5 byte jmp LLVM amd64 unbox trampolines vary 6 or 9 -- 4 byte add, 2 or 5 byte jmp. There are *many* possible ways to fix this. 1. Runtime disassembly to measure the trampoline size, each trampoline, allowing just two variations. This was written, tested, works. Downside is inhibits breakpoints, at an unlikely time and place, and inhibits execute only memory protection, also unlikely. Looks a little gnarly, but really is not. 2. Get the assembler to output fixed size trampolines. This is known for GNU assembler -- (disp32) jmp and jmp.d32. This is not known for Apple assembler, either the older cctools or LLVM, but maybe. The problem *seemed* to be Linux only, but the underlying causes do not. If this were looked into more closely and a Linux-specificity actually found, then depending on GNU as is more reasonable. 3. Output object files instead of assembly text. Good and bad idea otherwise. Super non-trivial. 4. Construct the tramp_info at compile time, with compiler/assembler/linker-computed size. This should be possible and otherwise beneficial, but not trivial, and not super beneficial, only mildly. const static data/code is generally prefer over dynamic. 5. Don't bother with the trampinfo at all. It should not be needed, even for stackwalk, as these are leaf functions (don't change nonvolatiles), so stackwalk is just *rsp++. Good idea for another day, as it is a simple net reduction in code, pending testing a not easy to test case -- need to make a function that just jmps to self, point a thread at it, wait till it gets there, suspend it, and stackwalk it. A good exercise, for another time. 6. Pad out trampolines to at least 9 bytes by adding 3x 0xcc. Sometimes 12 bytes. And hardcoding size as 9 at runtime -- don't use the actual computed size, as it could be 12, apply that to a 9 byter, and back with the same problem. That is one incorrect solution in there -- just merely adding 3 bytes of padding seemed at first like a good idea, but would not likely work. 7. Or rather, no need to pad, hardcode to 6 at runtime. 8. Or rather, hardcoding to 5 suffices. Interior bytes of last instruction need not be described, pretend it is of size 1. 2 is no more accurate than 1, when some of them are actually 5. 9. Can hardcode in compiler or runtime. Both developed, easy. 10. Try current assemblers. Maybe the original reason to switch from 0xe9 to jmp is moot. But continue to work with old assemblers? (in a test-only scenario?) Not attempted. 11. Use separate source files to help achieve distance or assembler ignorance of distance. Not attempted. 12. Move all the trampolines adjacent, with 128 bytes of padding on either end. Not attempted. 13. Asserting the trampolines are 16-aligned would maybe work. And/or following them with a 16-aligned entity. Here we option 9, hardcode to 5 in the compiler. It is essentially a one line change. Removing the trampinfo and testing async stack walk is left as a future change. This is really the best option but has more significant test burden. * Increment MONO_AOT_FILE_VERSION to invalidate old bad images. The badness is borderline -- limited to FullAOT+AMD64+LLVM, which is nonexistant outside test environments. This version should probably be a guid also.
- Loading branch information