[interp] Move from stack based to fully local var based design #20663

BrzVlad · 2020-12-14T10:15:23Z

Instead of having instructions that push and pop from the stack, every instruction has explicit dreg and sregs.

While the purpose of this PR is mainly to make it easier to implement more advanced optimization in the future, it also has noticeable performance implications. The code is simplified because we no longer need to update and save the SP. However, the code for each instruction is bloated due to the addition of explicit source and destination offsets. This is counteracted by the reduction of the total number of instructions, since ldloc/stloc and moves become redundant and they are mostly optimized away, even in this implementation state. Here are the total number of executed opcodes as part of running the corlib test suite with the interp https://gist.github.com/BrzVlad/d62f504930b75cba4b870e6dbd947e90.

vargaz · 2021-01-05T21:16:52Z

@monojenkins build failed

vargaz · 2021-01-05T21:23:14Z

mono/mini/interp/transform.h

+	/*
+	 * The local associated with the value of this stack entry. Every time we push on
+	 * the stack a new new local is created.
+	 */


vargaz · 2021-01-05T21:23:46Z

mono/mini/interp/interp.c

@@ -3472,10 +3472,10 @@ interp_exec_method (InterpFrame *frame, ThreadContext *context, FrameClauseArgs
 			++ip; 
 			MINT_IN_BREAK;
 		MINT_IN_CASE(MINT_DUP_VT) {
-			int i32 = READ32 (ip + 1);


Won't this break some code which used to work ?

Even before this change, some opcodes support 32bit sized value types while others only 16bit, so it's not like we really supported it. This change just doubles down on having all sizes fit into 16bit. If deemed necessary in the future, we can easily add some slow path opcodes, accepting the full size, similarly with the local offsets, which are also only 16bits

vargaz · 2021-01-05T22:56:42Z

Looks fine in general.

mono/mini/interp/transform.c

Some opcodes support large vt, while others don't, making it useless. Completely remove support for it, so we have faster code for common scenarios. Later at some point, we could add slow opcodes for these corner cases (also for methods with a lot of locals).

In order to facilitate future optimizations and also reduce the total number of instructions (since ldloc/stloc/mov are mostly redundant with this change). Instead of having instructions pop and push to the stack, they now receive explicit offsets for dreg and sregs. The offset is relative to frame->stack. Call args/ret are handled through a single dreg (this is a special CallArgs dreg). As with other locals, this dreg will be resolved to an offset. However, in addition to the optional return value of the call, at this offset, the call args will reside one after the other (This means the called frame will have the stack base at this offset, and the arguments are directly part of its local space). A local that holds an arg value to a call will never be referenced again (as a sreg in another instruction), but its offset will be in the param area of the call, and we will never optimize it out, since the call keeps it alive. Since instructions only operate on stackvals, all IL locals of primitive types now occupy sizeof (stackval), instead of their real size. This enables them to be used directly as the source / destination of an instruction. Currently, there are two types of locals : normal locals and execution stack locals. While normal locals occupy space on the stack on the first come first served basis, execution stack locals have a predefined stack offset (relative to the start of the execution stack that we used to have before, which is computed during the initial code generation, mapping directly to the IL stack). Their real offset is resolved only after all the normal locals have their offset determined. Execution stack locals will become normal locals in the future, once we have a local offset allocator that takes into account the liveness region of locals.

Also move the code over to transform.c so we can add more verbose dumping in the future for method/class/field tokens.

In the previous commit we made return opcode write directly the result at the bottom of the stack. Previously we used to leave the result at the bottom of the execution stack, and manually write it to the execution stack of the parent frame, if we had a parent.

Replace the previous cumbersome cprop with a new simpler and more generic pass. It iterates on the instructions of each basic block, one at a time, saving the definition for every destination local. If a local is used as a sreg and its definition is a MOV, then we can forward through the copy. The deadce pass will remove locals that are not referenced anymore. We add INTERP_LOCAL_FLAG_CALL_ARGS flag to locals that represent call args, since deadce would normally remove them. We generate moves when transitioning from a basic block to another while having a stack state in fixup_newbb_stack_locals. This is serves the same purpose as a phi instruction, in the future we should probably change to SSA and further simplify parts of the implementation. The fact that we don't have our own local offset allocator for execution stack locals, means that we are not free to extend liveness of these type of locals, overcomplicating the cprop pass. For now.

newobj_reg_map holds the src->dst writes that are done inside MINT_NEWOBJ_FAST (through the memmove), even when inlining. Once we remove the redundant memmove we would no longer need this newobj_reg_map and cprop would work normally.

This replaces LDLOCA + LDFLD/STFLD to a simple MOV.

We use the address of this label during EH to resume to it. For whatever reason, with some compilers, if the label is followed by a void return, the address of the label is at the end of the method, so we can't resume to it because it points to foreign code. We return NULL to workaround this.

BrzVlad added the do-not-merge label Dec 14, 2020

monojenkins mentioned this pull request Dec 14, 2020

[interp] Move from stack based to fully local var based design dotnet/runtime#46037

Merged

BrzVlad force-pushed the feature-interp-local-machine branch 3 times, most recently from e88f77c to 94c53a2 Compare December 21, 2020 10:12

BrzVlad force-pushed the feature-interp-local-machine branch 5 times, most recently from a5a4777 to 1079395 Compare December 29, 2020 00:03

BrzVlad force-pushed the feature-interp-local-machine branch from 1079395 to 3b7d7c3 Compare January 5, 2021 10:41

BrzVlad removed the do-not-merge label Jan 5, 2021

marek-safar requested a review from vargaz January 5, 2021 20:57

vargaz reviewed Jan 5, 2021

View reviewed changes

vargaz approved these changes Jan 5, 2021

View reviewed changes

lambdageek reviewed Jan 6, 2021

View reviewed changes

mono/mini/interp/transform.c Show resolved Hide resolved

BrzVlad added 9 commits January 7, 2021 01:21

[interp] Fix dumping of instructions

bdd6abb

Also move the code over to transform.c so we can add more verbose dumping in the future for method/class/field tokens.

[interp] Re-enable constant folding

699b6c5

[interp] Track values through ctors

63e4549

newobj_reg_map holds the src->dst writes that are done inside MINT_NEWOBJ_FAST (through the memmove), even when inlining. Once we remove the redundant memmove we would no longer need this newobj_reg_map and cprop would work normally.

[interp] Re-enable ldloca removal optimization for IntPtr

a3104eb

This replaces LDLOCA + LDFLD/STFLD to a simple MOV.

BrzVlad force-pushed the feature-interp-local-machine branch from 3b7d7c3 to 365d2f9 Compare January 6, 2021 23:23

BrzVlad merged commit ed7d3b5 into mono:master Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[interp] Move from stack based to fully local var based design #20663

[interp] Move from stack based to fully local var based design #20663

BrzVlad commented Dec 14, 2020 •

edited

vargaz commented Jan 5, 2021

vargaz Jan 5, 2021

vargaz Jan 5, 2021

BrzVlad Jan 5, 2021

vargaz commented Jan 5, 2021

[interp] Move from stack based to fully local var based design #20663

[interp] Move from stack based to fully local var based design #20663

Conversation

BrzVlad commented Dec 14, 2020 • edited

vargaz commented Jan 5, 2021

vargaz Jan 5, 2021

Choose a reason for hiding this comment

vargaz Jan 5, 2021

Choose a reason for hiding this comment

BrzVlad Jan 5, 2021

Choose a reason for hiding this comment

vargaz commented Jan 5, 2021

BrzVlad commented Dec 14, 2020 •

edited