Skip to content
This repository has been archived by the owner. It is now read-only.

Add On-Stack Replacement for Sulong. #859

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

rmosaner
Copy link
Contributor

@rmosaner rmosaner commented Jul 9, 2018

This PR adds On-Stack Replacement (OSR) for Sulong.

Based on the block based structure of LLVM IR bitcodes, loops are detected and for each loop a new node is created, which conforms to Truffle interface for loop nodes and replaces the basic block which was identified as the header of the loop.

Thus, Graal can replace loops during (interpreted) execution with optimized (compiled) versions, yielding a faster warmup. However, due to additional nodes, peak performance can be negatively affected.

Parameters (with default values) for playing around with OSR:

  • -Dgraal.TruffleOSR=true ... toggle OSR
  • -Dgraal.TruffleOSRCompilationThreshold=100000 ... threshold for loop iterations after which a loop is OSR compiled


import com.oracle.truffle.api.nodes.ControlFlowException;

public class LLVMBreakException extends ControlFlowException{
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this class final.


@Child private LoopNode loop;

@CompilationFinal private final Integer[] successors;
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there's a reason why this isn't an int[]. Please document it in a comment.


@Child private LoopNode loop;

@CompilationFinal private final Integer[] successors;
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the entries of this array constant as well? If so, you should change this to @CompilationFinal(dimensions = 1).

}

CompilerDirectives.transferToInterpreter();
throw new IllegalStateException("Must not reach here!");
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? You could make this error message more descriptive by e.g. stating that you did not find a loop successor.

}
} catch (FrameSlotTypeException e) {
CompilerDirectives.transferToInterpreter();
throw new RuntimeException("Error while reading from loop successor frame slot - type mismatch!");
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can include e as cause for the RunTimeException in the call to its constructor.

}
} catch (FrameSlotTypeException e) {
CompilerDirectives.transferToInterpreter();
throw new RuntimeException("Error while reading from loop successor frame slot - type mismatch!");
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already mentioned this above, but a more descriptive error message would come in handy for debugging. In general, since this class is in large parts a duplicate of LLVMDispatchBasicBlockNode it would be preferable to have the common parts in a common baseclass, but it may be hard to accomplish this without any performance regressions. Perhaps you could look into this as future work.


@Override
public boolean executeRepeating(VirtualFrame frame) {
Object ret = null;
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace this method body with return bodyNode.executeI1(frame). Instead of a RuntimeException that would also throw the IMHO more fitting UnexpectedResultException.

@@ -109,21 +117,83 @@ public RootCallTarget convert() {
method.getParameters().size(), symbols,
method, liveness, notNullable, dbgInfoHandler);
method.accept(visitor);

LLVMControlFlowGraph cfg = new LLVMControlFlowGraph(method.getBlocks().toArray(new InstructionBlock[method.getBlocks().size()]));
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not necessary to pass a preallocated array to toArray. Its safer to instead use a constant empty array like FunctionDefinition.EMPTY.

FrameSlot[][] nullableAfterBlock, LLVMSourceLocation location, LLVMStatementNode[] copyArgumentsToFrameArray, UniquesRegion uniquesRegion) {

if (cfg.getCFGLoops().size() > 0) {
loopSuccessorSlot = frame.addFrameSlot("__LLVMLoopSuccessor", FrameSlotKind.Int);
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sulong-internal frameslots usually use an identifier in the form "". See e.g. LLVMStack.FRAME_ID or LLVMUserException.FRAME_SLOT_ID. This is by no means an explicitly enforced convention, but I think doing this avoids name conflicts and makes debugging easier.

loopSuccessorSlot = frame.addFrameSlot("__LLVMLoopSuccessor", FrameSlotKind.Int);
}

LLVMStatementNode[] nodeArray = nodes.toArray(new LLVMStatementNode[0]);
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use LLVMStatementNode.NO_STATEMENTS instead of explicitly allocating an array here.


LLVMStatementNode[] nodeArray = nodes.toArray(new LLVMStatementNode[0]);

for (CFGLoop l : cfg.getCFGLoops()) {
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid single-letter variable names if the values are used more often or not obviously in your loop body.

LLVMStatementNode loop = runtime.getNodeFactory().createLoop(loopBody, l.getSuccessorIDs());

// replace header block with loop node
nodes.remove(headerId);
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make removing and adding an atomic operation by calling nodes.set(headerId, loop) instead.

// add body nodes
int i = 1;
for (CFGBlock b : l.getBody()) {
bodyNodes.add(nodeArray[b.id]);
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you keep nodeArray and nodes in sync in any case, I don't see a need for nodeArray.

@@ -93,11 +98,36 @@ public Object executeGeneric(VirtualFrame frame) {
int backEdgeCounter = 0;
outer: while (basicBlockIndex != LLVMBasicBlockNode.RETURN_FROM_FUNCTION) {
CompilerAsserts.partialEvaluationConstant(basicBlockIndex);
LLVMBasicBlockNode bb = bodyNodes[basicBlockIndex];

if(bodyNodes[basicBlockIndex] instanceof LLVMLoopNode) {
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, a loop node should be a terminating instruction - I think that this would fit more nicely into the current structure as all terminating instructions modify the basic block index.

loop.execute(frame);

Integer[] successors = loop.getSuccessors();
for(int i = 0; i < successors.length; i++) {
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the iteration should stop at successors.length - 1 and there should be some explicit code to handle the the last successor (see the pattern used for the LLVMSwitchNode below). Otherwise, the graph will look a bit worse.

private final FrameSlot exceptionValueSlot;
private final int headerId;
@Children private final LLVMStatementNode[] bodyNodes;
@CompilationFinal private final Integer[] indexMapping;
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why you are using Integer instead of int? (same applies to loopSuccessors).
With Integer[], you would need @CompilationFinal(dimensions = 1) - otherwise, the values are not compilation final (also applies to loopSuccessors).

@ExplodeLoop
private boolean isInLoop(int bci) {
for(int i : loopSuccessors) {
if(i == bci)
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please always use braces


@Child private LoopNode loop;

@CompilationFinal private final Integer[] successors;
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comment regarding int vs. Integer and @CompilationFinal(dimensions = 1)

@Override
@ExplodeLoop(kind = LoopExplosionKind.MERGE_EXPLODE)
public Object executeGeneric(VirtualFrame frame) {
boolean ret = false; // helper variable to not return at first encounter of headerId
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like firstIteration would be more readable and would make the comment unnecessary

CompilerAsserts.compilationConstant(bodyNodes.length);
int basicBlockIndex = headerId;

outer: while (isInLoop(basicBlockIndex) && (basicBlockIndex != headerId || !ret)) { // do-while loop fails at PE
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, comments are more readable if they are on a separate line


ret = true;

if(bodyNodes[indexMapping[basicBlockIndex]] instanceof LLVMLoopNode) {
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment in LLVMDispatchBasicBlockNode

}

@ExplodeLoop
private static void executePhis(VirtualFrame frame, LLVMControlFlowNode controlFlowNode, int successorIndex) {
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

share the helper functions that are down here with the LLVMDispatchBasicBlockNode (this will not impact PE)

if (active.contains(inner)) {
// catches case that there is a stack overflow because two loop nodes are being called iteratively
// from one another, without one being left beforehand
throw new RuntimeException("Irreducible nestedness!");
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw a custom exception and catch it in the build method (applies to all bailouts in here)

private static final int LOOP_HEADER_INITIAL_CAPACITY = 4;

private CFGBlock[] blocks;
private CFGLoop[] cfgLoops = new CFGLoop[LOOP_HEADER_INITIAL_CAPACITY];
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need an array that you resize manually? I think you should use an ArrayList instead, if not to make your code simpler than at least to make it safer.

indexMapping[b.id] = i++;
}

LLVMExpressionNode loopBody = runtime.getNodeFactory().createLoopDispatchNode(frame.findFrameSlot(LLVMUserException.FRAME_SLOT_ID), Collections.unmodifiableList(bodyNodes),
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getSuccessorIDs allocates a new array with the same content on each call. Since, I assume, the contents of the returned arrays are expected to be identical, you should only call the method once and pass the same array to createLoop and createLoopDispatchNode.

calculateSuccessors();
}

Integer[] sucIDs = new Integer[successors.size()];
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order of elements in sucIDs guaranteed by your loop detection algorithm? These seem like they should be sorted in some way.

List<CFGLoop> loops = new ArrayList<>();

for (CFGLoop l : cfgLoops) {
if (l != null)
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style gate will probably catch this. Please always use braces to delimit the bodies of conditional branches even if they consist only of a single line.

if (l != null)
loops.add(l);
}
loops.removeIf(l -> l == null);
Copy link
Contributor

@jkreindl jkreindl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is unnecessary since you already make sure not to add any null elements to the list.


import com.oracle.truffle.api.nodes.ControlFlowException;

public class LLVMBreakException extends ControlFlowException{
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this class unused?

Integer[] successors = loop.getSuccessors();
for(int i = 0; i < successors.length; i++) {
try {
if(frame.getInt(loopSuccessorSlot) == successors[i]) {
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for accessing the frame, you can use FrameUtil.getIntSafe(...)

@@ -0,0 +1,265 @@
/*
* Copyright (c) 2017, 2018, Oracle and/or its affiliates.
Copy link
Member

@christianhaeubl christianhaeubl Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that you have the correct copyright year in all files that you added

@rmosaner rmosaner force-pushed the OSR branch 2 times, most recently from e3a1818 to 448838a Compare Jul 11, 2018
* Distinction between low level basic blocks and high level control flow
* structures, like LLVMLoopNode
*/
if(bodyNodes[basicBlockIndex] instanceof LLVMBasicBlockNode) {
Copy link
Contributor

@jkreindl jkreindl Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check will probably have a noticeable impact on performance. It should not be necessary either.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants