Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FullAOT LLVM Linux x64 one flaky failure. #10441

Closed
jaykrell opened this issue Sep 2, 2018 · 3 comments
Assignees
Labels
Projects

Comments

@jaykrell
Copy link
Collaborator

@jaykrell jaykrell commented Sep 2, 2018

FullAOT LLVM Linux x64 has one flaky failure.

Here is how far I've gotten it, reduced:

using System;
using System.Linq;
using System.Collections.Generic;

internal static class AsParallelHelper
{
	internal static ParallelQuery<T> AsReallyParallel<T> (this IEnumerable<T> source)
	{
		return source.AsParallel ().WithExecutionMode (ParallelExecutionMode.ForceParallelism);
	}
}

public class ParallelEnumerableTests
{
	public static void Main(string[] args)
	{
		int num = 100;
		Tuple<int, int>[] source = Enumerable.Range (0, num).Select ((i) => Tuple.Create (i / 10, i)).ToArray ();

		var actual = source.AsReallyParallel ().GroupBy ((e) => e.Item1);
		foreach (var group in actual) { }
	}
}
@jaykrell jaykrell added the flaky issue label Sep 2, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
jaykrell added a commit to jaykrell/mono that referenced this issue Sep 3, 2018
@jaykrell

This comment has been minimized.

Copy link
Collaborator Author

@jaykrell jaykrell commented Sep 6, 2018

Here is the relevant stack, which is elusive -- you have to add excess fields to System.Linq.Parallel.Wrapper to see it. I suspect otherwise gsharedvt is corrupting stack but maybe not.

  at System.Buffer.Memcpy (System.Byte* dest, System.Byte* src, System.Int32 len) <0x7f0e247b87c0 + 0x00040> in <b0ac880de1b74e779047cdc849d8159b#9863251a1fd577ad9f1585c3dc141e1a>:0 
  at System.String.memcpy (System.Byte* dest, System.Byte* src, System.Int32 size) <0x7f0e247904c0 + 0x0002c> in <b0ac880de1b74e779047cdc849d8159b#9863251a1fd577ad9f1585c3dc141e1a>:0 
  at System.Linq.Parallel.WrapperEqualityComparer`1[T].GetHashCode (System.Linq.Parallel.Wrapper`1[T] x) [0x00000] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.HashLookup`2[TKey,TValue].GetKeyHashCode (TKey key) [0x00000] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_in()
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.HashLookup`2[TKey,TValue].Find (TKey key, System.Boolean add, System.Boolean set, TValue& value) [0x00000] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_in()
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.HashLookup`2[TKey,TValue].TryGetValue (TKey key, TValue& value) [0x00000] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_in()
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.GroupByIdentityQueryOperatorEnumerator`3[TSource,TGroupKey,TOrderKey].BuildHashLookup () [0x00050] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.GroupByQueryOperatorEnumerator`4[TSource,TGroupKey,TElement,TOrderKey].MoveNext (System.Linq.IGrouping`2[TGroupKey,TElement]& currentElement, TOrderKey& currentKey) [0x00019] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at (wrapper unknown) System.Object.gsharedvt_out()
  at System.Linq.Parallel.PipelineSpoolingTask`2[TInputOutput,TIgnoreKey].SpoolingWork () [0x00042] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at System.Linq.Parallel.SpoolingTaskBase.Work () [0x0005d] in <b4141d4f0046412bbd28c6d2dc4c5d53>:0 
  at System.Linq.Parallel.QueryTask.BaseWork (System.Object unused) <0x7f0e1f823ab0 + 0x0004c> in <b4141d4f0046412bbd28c6d2dc4c5d53#9863251a1fd577ad9f1585c3dc141e1a>:0 
  at System.Linq.Parallel.QueryTask+<>c.<.cctor>b__10_0 (System.Object o) <0x7f0e1f8bbd90 + 0x0003e> in <b4141d4f0046412bbd28c6d2dc4c5d53#9863251a1fd577ad9f1585c3dc141e1a>:0 
  at System.Threading.Tasks.Task.InnerInvoke () <0x7f0e248e6080 + 0x0005b> in <b0ac880de1b74e779047cdc849d8159b#9863251a1fd577ad9f1585c3dc141e1a>:0 
  at System.Threading.Tasks.Task.Execute () <0x7f0e248e5e80 + 0x00035> in <b0ac880de1b74e779047cdc849d8159b#9863251a1fd577ad9f1585c3dc141e1a>:0 <---
@jaykrell

This comment has been minimized.

Copy link
Collaborator Author

@jaykrell jaykrell commented Sep 7, 2018

The callstack used to appear more readily but that regressed. Maybe it is in a .json file?

jaykrell added a commit to jaykrell/mono that referenced this issue Sep 14, 2018
…rrectly via runtime disassembly.

This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

mono#10441
mono#9554
mono#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for reasons obvious from the fix.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono#10582.

mini amd64 unbox trampolines are always 9 bytes.
LLVM amd64 unbox trampolines hypothetically vary 6 or 9, but are probably always 6.

There maybe other fixes here.

1. The change to use jmp instead of db 0xe9 might be have been more about
consistency than necessacity -- necessary other places.

2. It might be possible to force the assembler to output 0xe9.
GNU assembler supports jmp.d32 to force a 32bit displacement opcode 0xE9.
A solution portable to other assemblers is not known.
jaykrell referenced this issue Sep 14, 2018
…ull aot mode. This requires emitting got accesses using normal assembly instead of emit_symbol_diff (), since clang cannot handle <sym> - . expressions if <sym> is an extern symbol.
@jaykrell jaykrell added this to Bugs Pool in Bugs Week via automation Sep 14, 2018
@jaykrell jaykrell self-assigned this Sep 14, 2018
@jaykrell jaykrell moved this from Bugs Pool to In Progress in Bugs Week Sep 14, 2018
jaykrell added a commit that referenced this issue Sep 15, 2018
…amd64/FullAOT/LLVM). (#10615)

* mono_aot_get_unbox_trampoline: amd64 compute unbox trampoline size correctly via runtime disassembly.
This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

#10441
#9554
#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for obvious reasons.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using #10582.

mini amd64 unbox trampolines are always 9 bytes -- 4 byte add, 5 byte jmp
LLVM amd64 unbox trampolines vary 6 or 9 -- 4 byte add, 2 or 5 byte jmp.

There are *many* possible ways to fix this.

1. Runtime disassembly to measure the trampoline size, each trampoline, allowing just two variations. This was written, tested, works. Downside is inhibits breakpoints, at an unlikely time and place, and inhibits execute only memory protection, also unlikely. Looks a little gnarly, but really is not.

2. Get the assembler to output fixed size trampolines.
This is known for GNU assembler -- (disp32) jmp and jmp.d32.
This is not known for Apple assembler, either the older cctools or LLVM, but maybe.
The problem *seemed* to be Linux only, but the underlying causes do not.
If this were looked into more closely and a Linux-specificity actually found, then depending on GNU as is more reasonable.

3. Output object files instead of assembly text. Good and bad idea otherwise. Super non-trivial.

4. Construct the tramp_info at compile time, with compiler/assembler/linker-computed size. This should be possible and otherwise beneficial, but not trivial, and not super beneficial, only mildly.
const static data/code is generally prefer over dynamic.

5. Don't bother with the trampinfo at all. It should not be needed, even for stackwalk, as these are leaf functions (don't change nonvolatiles), so stackwalk is just *rsp++.
Good idea for another day, as it is a simple net reduction in code, pending testing a not easy  to test case -- need to make a function that just jmps to self, point a thread at it, wait till it gets there, suspend it, and stackwalk it. A good exercise, for another time.

6. Pad out trampolines to at least 9 bytes by adding 3x 0xcc.
Sometimes 12 bytes.
And hardcoding size as 9 at runtime -- don't use the actual computed size, as it could be 12, apply that to a 9 byter, and back with the same problem.

That is one incorrect solution in there -- just merely adding 3 bytes of padding seemed at first like a good idea, but would not likely work.

7. Or rather, no need to pad, hardcode to 6 at runtime.

8. Or rather, hardcoding to 5 suffices. Interior bytes of last instruction need not be described, pretend it is of size 1. 2 is no more accurate than 1, when some of them are actually 5.

9. Can hardcode in compiler or runtime.
Both developed, easy.

10. Try current assemblers. Maybe the original reason to switch from 0xe9 to jmp is moot.
But continue to work with old assemblers? (in a test-only scenario?)
Not attempted.

11. Use separate source files to help achieve distance or assembler ignorance of distance.
Not attempted.

12. Move all the trampolines adjacent, with 128 bytes of padding on either end.
Not attempted.

13. Asserting the trampolines are 16-aligned would maybe work. And/or following them with a 16-aligned entity.

Here we option 9, hardcode to 5 in the compiler. It is essentially a one line change.

Removing the trampinfo and testing async stack walk is left as a future change.
This is really the best option but has more significant test burden.

* Increment MONO_AOT_FILE_VERSION to invalidate old bad images.
The badness is borderline -- limited to FullAOT+AMD64+LLVM, which is nonexistant outside test environments. This version should probably be a guid also.
monojenkins added a commit to monojenkins/mono that referenced this issue Sep 15, 2018
…rrectly via runtime disassembly. This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

mono#10441
mono#9554
mono#9046

Race condition because the code varies in what actually runs.
Sometimes System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT is called,
sometimes not.

FullAOT+LLVM for reasons obvious from the fix.

Very sensitive to exact code presumably because the bug depends on whether the first
unbox trampoline has the same size as the unbox trampoline for System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT?
For example, this could be the only unbox trampoline or it could be first.

For example System_Security_Cryptography_ECCurve_get_Oid gets the first unbox trampoline and compiling it with mini or LLVM makes a difference.

Guessing here:
When the first unbox trampoline is 9 bytes in size and
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is 6 bytes in size, then System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT's
unbox trampoline, which comes right before System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
is thought to be 9 bytes also, which overlaps System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

So the binary search finds that, which goes wrong somehow -- mono_aot_find_jit_info returns NULL (for reasons not debugged).

And then we fail to wrap System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
with a gsharedvt trampoline.

In the normal course when things work, calling System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT
goes through both gsharedvt_out_trampoline and gsharedvt_trampoline.

The first turns a byref into a byvalue, the second turns it back into a byref.
When the second is skipped, we end up treating the value as a pointer and SIGSEGV.
This can be seen using mono#10582.

mini amd64 unbox trampolines are always 9 bytes.
LLVM amd64 unbox trampolines hypothetically vary 6 or 9, but are probably always 6.

There maybe other fixes here.

1. The change to use jmp instead of db 0xe9 might be have been more about
consistency than necessacity -- necessary other places.

2. It might be possible to force the assembler to output 0xe9.
GNU assembler supports jmp.d32 to force a 32bit displacement opcode 0xE9.
A solution portable to other assemblers is not known.
@jaykrell jaykrell moved this from In Progress to Done in Bugs Week Sep 15, 2018
jaykrell added a commit that referenced this issue Sep 16, 2018
…By (Linux/amd64/FullAOT/LLVM). (#10633)

mono_aot_get_unbox_trampoline: amd64 hardcode unbox trampoline size to 5 at compile-time.
This fixes: MonoTests.System.Linq.ParallelEnumerableTests.TestGroupBy frequently crashing on FullAOT/LLVM/Linux/amd64 CI.

#10441
#9554
#9046

The actual size varies between 6 and 9 but we can just pretend the last instruction has size 1 (it is 2 or 5; calling it 5 can be fatal, calling it 2 is no better than 1).

Otherwise, a trampoline can overlap the next function, jit info lookup of the next function can fail, then gsharedvt wrapper omited, and crash.

The afflicted function in this case was
System_Linq_Parallel_WrapperEqualityComparer_1_T_GSHAREDVT_GetHashCode_System_Linq_Parallel_Wrapper_1_T_GSHAREDVT.

You can see it with debugprint: #10582.

There are many other possible fixes.
Such as runtime disassembly, hardcode in runtime instead of compiler, produce static trampinfo, omit trampinfo entirely, compute minimum size.

See d589f3b.
@marek-safar marek-safar moved this from Done to Archived in Bugs Week Sep 24, 2018
@jaykrell

This comment has been minimized.

Copy link
Collaborator Author

@jaykrell jaykrell commented Oct 1, 2018

I fixed this recently.

@jaykrell jaykrell closed this Oct 1, 2018
Bugs Week automation moved this from Archived to Done Oct 1, 2018
@marek-safar marek-safar moved this from Done to Archived in Bugs Week Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Bugs Week
Archived
1 participant
You can’t perform that action at this time.