Skip to content

Commit

Permalink
JIT: enable inline pinvoke in more cases
Browse files Browse the repository at this point in the history
An inline pinvoke is a pinvoke where the managed/native transition
overhead is reduced by inlining parts of the transition bookkeeping
around the call site. A normal pinvoke does this bookkeeping in
a stub method that interposes between the managed caller and the
native callee.

Previously the jit would not allow pinvoke calls that came from inlines
to be optimized via inline pinvoke. This sometimes caused performance
surprises for users who wrap DLL imports with managed methods. See for
instance dotnet/coreclr#2373.

This change lifts this limitation. Pinvokes from inlined method bodies
are now given the same treatment as pinvokes in the root method. The
legality check for inline pinvokes has been streamlined slightly to
remove a redundant check. Inline pinvokes introduced by inlining are
handled by accumulating the unmanaged method count with the value from
inlinees, and deferring insertion of the special basic blocks until after
inlining, so that if the only inline pinvokes come from inline instances
they are still properly processed.

Inline pinvokes are still disallowed in try and handler regions
(catches, filters, and finallies).

X87 liveness tracking was updated to handle the implicit inline frame
var references. This was a pre-existing issue that now can show up more
frequently. Added a test case that fails with the stock legacy jit
(and also with the new enhancements to pinvoke). Now both the original
failing case and this case pass.

Inline pinvokes are also now suppressed in rarely executed blocks,
for instance blocks leading up to throws or similar.

The inliner is now also changed to preferentially report inline
reasons as forced instead of always when both are applicable.

This change adds a new test case that shows the variety of
situations that can occur with pinvoke, inlining, and EH.


Commit migrated from dotnet/coreclr@043fe32
  • Loading branch information
AndyAyersMS committed Dec 5, 2016
1 parent 2215cce commit 365d732
Show file tree
Hide file tree
Showing 11 changed files with 526 additions and 68 deletions.
7 changes: 4 additions & 3 deletions src/coreclr/src/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -2770,9 +2770,10 @@ class Compiler

void impImportNewObjArray(CORINFO_RESOLVED_TOKEN* pResolvedToken, CORINFO_CALL_INFO* pCallInfo);

bool impCanPInvokeInline(var_types callRetTyp);
bool impCanPInvokeInlineCallSite(var_types callRetTyp);
void impCheckForPInvokeCall(GenTreePtr call, CORINFO_METHOD_HANDLE methHnd, CORINFO_SIG_INFO* sig, unsigned mflags);
bool impCanPInvokeInline(var_types callRetTyp, BasicBlock* block);
bool impCanPInvokeInlineCallSite(var_types callRetTyp, BasicBlock* block);
void impCheckForPInvokeCall(
GenTreePtr call, CORINFO_METHOD_HANDLE methHnd, CORINFO_SIG_INFO* sig, unsigned mflags, BasicBlock* block);
GenTreePtr impImportIndirectCall(CORINFO_SIG_INFO* sig, IL_OFFSETX ilOffset = BAD_IL_OFFSET);
void impPopArgsForUnmanagedCall(GenTreePtr call, CORINFO_SIG_INFO* sig);

Expand Down
3 changes: 3 additions & 0 deletions src/coreclr/src/jit/flowgraph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21981,6 +21981,9 @@ void Compiler::fgInsertInlineeBlocks(InlineInfo* pInlineInfo)
compNeedsGSSecurityCookie |= InlineeCompiler->compNeedsGSSecurityCookie;
compGSReorderStackLayout |= InlineeCompiler->compGSReorderStackLayout;

// Update unmanaged call count
info.compCallUnmanaged += InlineeCompiler->info.compCallUnmanaged;

// Update optMethodFlags

#ifdef DEBUG
Expand Down
166 changes: 115 additions & 51 deletions src/coreclr/src/jit/importer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5367,59 +5367,111 @@ GenTreePtr Compiler::impTransformThis(GenTreePtr thisPtr,
}
}

bool Compiler::impCanPInvokeInline(var_types callRetTyp)
//------------------------------------------------------------------------
// impCanPInvokeInline: examine information from a call to see if the call
// qualifies as an inline pinvoke.
//
// Arguments:
// callRetTyp - return type of the call
// block - block contaning the call, or for inlinees, block
// contaiing the call being inlined
//
// Return Value:
// true if this call qualifies as an inline pinvoke, false otherwise
//
// Notes:
// Checks basic legality and then a number of ambient conditions
// where we could pinvoke but choose not to

bool Compiler::impCanPInvokeInline(var_types callRetTyp, BasicBlock* block)
{
return impCanPInvokeInlineCallSite(callRetTyp) && getInlinePInvokeEnabled() && (!opts.compDbgCode) &&
return impCanPInvokeInlineCallSite(callRetTyp, block) && getInlinePInvokeEnabled() && (!opts.compDbgCode) &&
(compCodeOpt() != SMALL_CODE) && (!opts.compNoPInvokeInlineCB) // profiler is preventing inline pinvoke
;
}

// Returns false only if the callsite really cannot be inlined. Ignores global variables
// like debugger, profiler etc.
bool Compiler::impCanPInvokeInlineCallSite(var_types callRetTyp)
//------------------------------------------------------------------------
// impCanPInvokeInlineSallSite: basic legality checks using information
// from a call to see if the call qualifies as an inline pinvoke.
//
// Arguments:
// callRetTyp - return type of the call
// block - block contaning the call, or for inlinees, block
// contaiing the call being inlined
//
// Return Value:
// true if this call can legally qualify as an inline pinvoke, false otherwise
//
// Notes:
// Inline PInvoke is not legal in these cases:
// * Within handler regions (finally/catch/filter)
// * Within trees with localloc
// * If the call returns a struct
//
// We have to disable pinvoke inlining inside of filters
// because in case the main execution (i.e. in the try block) is inside
// unmanaged code, we cannot reuse the inlined stub (we still need the
// original state until we are in the catch handler)
//
// We disable pinvoke inlining inside handlers since the GSCookie is
// in the inlined Frame (see CORINFO_EE_INFO::InlinedCallFrameInfo::offsetOfGSCookie),
// but this would not protect framelets/return-address of handlers.
//
// On x64, we disable pinvoke inlining inside of try regions.
// Here is the comment from JIT64 explaining why:
//
// [VSWhidbey: 611015] - because the jitted code links in the
// Frame (instead of the stub) we rely on the Frame not being
// 'active' until inside the stub. This normally happens by the
// stub setting the return address pointer in the Frame object
// inside the stub. On a normal return, the return address
// pointer is zeroed out so the Frame can be safely re-used, but
// if an exception occurs, nobody zeros out the return address
// pointer. Thus if we re-used the Frame object, it would go
// 'active' as soon as we link it into the Frame chain.
//
// Technically we only need to disable PInvoke inlining if we're
// in a handler or if we're in a try body with a catch or
// filter/except where other non-handler code in this method
// might run and try to re-use the dirty Frame object.
//
// A desktop test case where this seems to matter is
// jit\jit64\ebvts\mcpp\sources2\ijw\__clrcall\vector_ctor_dtor.02\deldtor_clr.exe

bool Compiler::impCanPInvokeInlineCallSite(var_types callRetTyp, BasicBlock* block)
{
return
// We have to disable pinvoke inlining inside of filters
// because in case the main execution (i.e. in the try block) is inside
// unmanaged code, we cannot reuse the inlined stub (we still need the
// original state until we are in the catch handler)
(!bbInFilterILRange(compCurBB)) &&
// We disable pinvoke inlining inside handlers since the GSCookie is
// in the inlined Frame (see CORINFO_EE_INFO::InlinedCallFrameInfo::offsetOfGSCookie),
// but this would not protect framelets/return-address of handlers.
!compCurBB->hasHndIndex() &&

#ifdef _TARGET_AMD64_
// Turns out JIT64 doesn't perform PInvoke inlining inside try regions, here's an excerpt of
// the comment from JIT64 explaining why:
//
//// [VSWhidbey: 611015] - because the jitted code links in the Frame (instead
//// of the stub) we rely on the Frame not being 'active' until inside the
//// stub. This normally happens by the stub setting the return address
//// pointer in the Frame object inside the stub. On a normal return, the
//// return address pointer is zeroed out so the Frame can be safely re-used,
//// but if an exception occurs, nobody zeros out the return address pointer.
//// Thus if we re-used the Frame object, it would go 'active' as soon as we
//// link it into the Frame chain.
////
//// Technically we only need to disable PInvoke inlining if we're in a
//// handler or if we're
//// in a try body with a catch or filter/except where other non-handler code
//// in this method might run and try to re-use the dirty Frame object.
//
// Now, because of this, the VM actually assumes that in 64 bit we never PInvoke
// inline calls on any EH construct, you can verify that on VM\ExceptionHandling.cpp:203
// The method responsible for resuming execution is UpdateObjectRefInResumeContextCallback
// you can see how it aligns with JIT64 policy of not inlining PInvoke calls almost right
// at the beginning of the body of the method.
!compCurBB->hasTryIndex() &&
#endif
(!impLocAllocOnStack()) && (callRetTyp != TYP_STRUCT);
const bool inX64Try = block->hasTryIndex();
#else
const bool inX64Try = false;
#endif // _TARGET_AMD64_

return !inX64Try && !block->hasHndIndex() && !impLocAllocOnStack() && (callRetTyp != TYP_STRUCT);
}

void Compiler::impCheckForPInvokeCall(GenTreePtr call,
CORINFO_METHOD_HANDLE methHnd,
CORINFO_SIG_INFO* sig,
unsigned mflags)
//------------------------------------------------------------------------
// impCheckForPInvokeCall examine call to see if it is a pinvoke and if so
// if it can be expressed as an inline pinvoke.
//
// Arguments:
// call - tree for the call
// methHnd - handle for the method being called (may be null)
// sig - signature of the method being called
// mflags - method flags for the method being called
// block - block contaning the call, or for inlinees, block
// contaiing the call being inlined
//
// Notes:
// Sets GTF_CALL_M_PINVOKE on the call for pinvokes.
//
// Also sets GTF_CALL_UNMANAGED on call for inline pinvokes if the
// call passes a combination of legality and profitabilty checks.
//
// If GTF_CALL_UNMANAGED is set, increments info.compCallUnmanaged

void Compiler::impCheckForPInvokeCall(
GenTreePtr call, CORINFO_METHOD_HANDLE methHnd, CORINFO_SIG_INFO* sig, unsigned mflags, BasicBlock* block)
{
var_types callRetTyp = JITtype2varType(sig->retType);
CorInfoUnmanagedCallConv unmanagedCallConv;
Expand Down Expand Up @@ -5466,13 +5518,13 @@ void Compiler::impCheckForPInvokeCall(GenTreePtr call,
{
#ifdef _TARGET_X86_
// CALLI in IL stubs must be inlined
assert(impCanPInvokeInlineCallSite(callRetTyp));
assert(impCanPInvokeInlineCallSite(callRetTyp, block));
assert(!info.compCompHnd->pInvokeMarshalingRequired(methHnd, sig));
#endif // _TARGET_X86_
}
else
{
if (!impCanPInvokeInline(callRetTyp))
if (!impCanPInvokeInline(callRetTyp, block))
{
return;
}
Expand All @@ -5481,15 +5533,21 @@ void Compiler::impCheckForPInvokeCall(GenTreePtr call,
{
return;
}

// Size-speed tradeoff: don't use inline pinvoke at rarely
// executed call sites. The non-inline version is more
// compact.
if (block->isRunRarely())
{
return;
}
}

JITLOG((LL_INFO1000000, "\nInline a CALLI PINVOKE call from method %s", info.compFullName));

call->gtFlags |= GTF_CALL_UNMANAGED;
info.compCallUnmanaged++;

assert(!compIsForInlining());

// AMD64 convention is same for native and managed
if (unmanagedCallConv == CORINFO_UNMANAGED_CALLCONV_C)
{
Expand Down Expand Up @@ -6146,7 +6204,7 @@ bool Compiler::impIsTailCallILPattern(bool tailPrefixed,
((nextOpcode == CEE_NOP) || ((nextOpcode == CEE_POP) && (++cntPop == 1)))); // Next opcode = nop or exactly
// one pop seen so far.
#else
nextOpcode = (OPCODE)getU1LittleEndian(codeAddrOfNextOpcode);
nextOpcode = (OPCODE)getU1LittleEndian(codeAddrOfNextOpcode);
#endif

if (isCallPopAndRet)
Expand Down Expand Up @@ -6920,9 +6978,15 @@ var_types Compiler::impImportCall(OPCODE opcode,

//--------------------------- Inline NDirect ------------------------------

if (!compIsForInlining())
// For inline cases we technically should look at both the current
// block and the call site block (or just the latter if we've
// fused the EH trees). However the block-related checks pertain to
// EH and we currently won't inline a method with EH. So for
// inlinees, just checking the call site block is sufficient.
{
impCheckForPInvokeCall(call, methHnd, sig, mflags);
// New lexical block here to avoid compilation errors because of GOTOs.
BasicBlock* block = compIsForInlining() ? impInlineInfo->iciBlock : compCurBB;
impCheckForPInvokeCall(call, methHnd, sig, mflags, block);
}

if (call->gtFlags & GTF_CALL_UNMANAGED)
Expand Down
12 changes: 6 additions & 6 deletions src/coreclr/src/jit/inlinepolicy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -449,16 +449,16 @@ void LegacyPolicy::NoteInt(InlineObservation obs, int value)

// Now that we know size and forceinline state,
// update candidacy.
if (m_CodeSize <= InlineStrategy::ALWAYS_INLINE_SIZE)
{
// Candidate based on small size
SetCandidate(InlineObservation::CALLEE_BELOW_ALWAYS_INLINE_SIZE);
}
else if (m_IsForceInline)
if (m_IsForceInline)
{
// Candidate based on force inline
SetCandidate(InlineObservation::CALLEE_IS_FORCE_INLINE);
}
else if (m_CodeSize <= InlineStrategy::ALWAYS_INLINE_SIZE)
{
// Candidate based on small size
SetCandidate(InlineObservation::CALLEE_BELOW_ALWAYS_INLINE_SIZE);
}
else if (m_CodeSize <= m_RootCompiler->m_inlineStrategy->GetMaxInlineILSize())
{
// Candidate, pending profitability evaluation
Expand Down
16 changes: 8 additions & 8 deletions src/coreclr/src/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16832,14 +16832,6 @@ void Compiler::fgMorph()

fgRemoveEmptyBlocks();

/* Add any internal blocks/trees we may need */

fgAddInternal();

#if OPT_BOOL_OPS
fgMultipleNots = false;
#endif

#ifdef DEBUG
/* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
fgDebugCheckBBlist(false, false);
Expand All @@ -16858,6 +16850,14 @@ void Compiler::fgMorph()

EndPhase(PHASE_MORPH_INLINE);

/* Add any internal blocks/trees we may need */

fgAddInternal();

#if OPT_BOOL_OPS
fgMultipleNots = false;
#endif

#ifdef DEBUG
/* Inliner could add basic blocks. Check that the flowgraph data is up-to-date */
fgDebugCheckBBlist(false, false);
Expand Down
18 changes: 18 additions & 0 deletions src/coreclr/src/jit/stackfp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4140,8 +4140,26 @@ void Compiler::raEnregisterVarsPostPassStackFP()
{
raSetRegLclBirthDeath(tree, lastlife, false);
}

// Model implicit use (& hence last use) of frame list root at pinvokes.
if (tree->gtOper == GT_CALL)
{
GenTreeCall* call = tree->AsCall();
if (call->IsUnmanaged() && !opts.ShouldUsePInvokeHelpers())
{
LclVarDsc* frameVarDsc = &lvaTable[info.compLvFrameListRoot];

if (frameVarDsc->lvTracked && ((call->gtCallMoreFlags & GTF_CALL_M_FRAME_VAR_DEATH) != 0))
{
// Frame var dies here
unsigned varIndex = frameVarDsc->lvVarIndex;
VarSetOps::RemoveElemD(this, lastlife, varIndex);
}
}
}
}
}

assert(VarSetOps::Equal(this, lastlife, block->bbLiveOut));
}
compCurBB = NULL;
Expand Down
Loading

0 comments on commit 365d732

Please sign in to comment.