Skip to content
This repository has been archived by the owner. It is now read-only.

IR with double constant doesn't work with OrcJIT build by MSVC #44337

Closed
llvmbot opened this issue Dec 18, 2019 · 23 comments
Closed

IR with double constant doesn't work with OrcJIT build by MSVC #44337

llvmbot opened this issue Dec 18, 2019 · 23 comments

Comments

@llvmbot
Copy link
Collaborator

@llvmbot llvmbot commented Dec 18, 2019

Bugzilla Link 44337
Resolution FIXED
Resolved on Jan 22, 2020 15:01
Version 9.0
OS Windows XP
Depends On #44336
Attachments The IR module, Binary IR module, The x86-64 assembler version of module
Reporter LLVM Bugzilla Contributor
CC @AlexDenisov,@dwblaikie,@lhames

Extended Description

Precondition:
LLVM 9.0.0 built by MSVC 2019.

Steps to reproduce:

  1. Load 'module.bc' to memory from the file.
  2. Add module to LLLazyJIT.
  3. Request address of function 'calculate'.
  4. Call function 'calculate'.

Expected result:
The function should be executed fine.

Actual result:
The OrcJIT failed on an assertion Core.cpp:867:
Assertion failed: I != Symbols.end() && "Symbol not found", file C:\Workspace\llvm-9.0.0.src\lib\ExecutionEngine\Orc\Core.cpp, line 867

Note:
I figured out that the missed symbol is __real@40091eb851eb851f. It appears after IR code is compiled by llc to assembler.

This issue is not observed on Linux and MinGW builds on Windows.

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 18, 2019

; ModuleID = 'module.bc'
source_filename = "module.ll"

@​PI = local_unnamed_addr constant double 3.140000e+00
@​cachedValue = private unnamed_addr global double 0.000000e+00

; Function Attrs: norecurse nounwind writeonly
define double @"calculate"(i32) local_unnamed_addr #​0 {
%2 = sitofp i32 %0 to double
%3 = fmul double %2, 3.140000e+00
%4 = fmul double %3, %2
store double %4, double* @​cachedValue, align 8
ret double %4
}

; Function Attrs: norecurse nounwind readonly
define double @"getCachedValue"() local_unnamed_addr #​1 {
%1 = load double, double* @​cachedValue, align 8
ret double %1
}

attributes #​0 = { norecurse nounwind writeonly }
attributes #​1 = { norecurse nounwind readonly }

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 18, 2019

.text
.def	 @​feat.00;
.scl	3;
.type	0;
.endef
.globl	@feat.00

.set @​feat.00, 0
.file "module.ll"
.def calculate;
.scl 2;
.type 32;
.endef
.globl __real@40091eb851eb851f # -- Begin function calculate
.section .rdata,"dr",discard,__real@40091eb851eb851f
.p2align 3
__real@40091eb851eb851f:
.quad 4614253070214989087 # double 3.1400000000000001
.text
.globl calculate
.p2align 4, 0x90
calculate: # @​calculate

%bb.0:

cvtsi2sdl	%ecx, %xmm1
movsd	__real@40091eb851eb851f(%rip), %xmm0 # xmm0 = mem[0],zero
mulsd	%xmm1, %xmm0
mulsd	%xmm1, %xmm0
movsd	%xmm0, .LcachedValue(%rip)
retq
                                    # -- End function
.def	 getCachedValue;
.scl	2;
.type	32;
.endef
.globl	getCachedValue          # -- Begin function getCachedValue
.p2align	4, 0x90

getCachedValue: # @​getCachedValue

%bb.0:

movsd	.LcachedValue(%rip), %xmm0 # xmm0 = mem[0],zero
retq
                                    # -- End function
.section	.rdata,"dr"
.globl	PI                      # @​PI
.p2align	3

PI:
.quad 4614253070214989087 # double 3.1400000000000001

.lcomm	.LcachedValue,8,8       # @​cachedValue

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 20, 2019

Loading

@lhames
Copy link

@lhames lhames commented Dec 20, 2019

Hi Alexander,

Thanks for looking in to this.

The proposed patch is not the right fix: It weakens responsibility checking and dynamically injects new definitions. I'm trying to wrap my head around what has gone wrong so that I can figure out the correct fix.

From the original assertion it looks like the assertion that fired was in a call to JITDylib::resolve that included a symbol that wasn't defined in the JITDylib, but I'm not sure how that could have happened: We should have hit the corresponding assertion in MaterializationResponsibility::notifyResolved before we got to JITDylib::resolve. Would you be able to run this under a debugger and obtain a backtrace for the original assertion?

If this analysis is correct: That the symbol has somehow made it into the responsibility set for the MaterializationResponsibility, but is missing from the JITDylib, then we need to track down where this correspondence broke. They should move in lock-step.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 22, 2019

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 22, 2019

.text
.file	"module.ll"
.section	.rodata.cst8,"aM",@progbits,8
.p2align	3               # -- Begin function calculate

.LCPI0_0:
.quad 4614253070214989087 # double 3.1400000000000001
.text
.globl calculate
.p2align 4, 0x90
.type calculate,@function
calculate: # @​calculate

%bb.0:

cvtsi2sd	%edi, %xmm1
movsd	.LCPI0_0(%rip), %xmm0   # xmm0 = mem[0],zero
mulsd	%xmm1, %xmm0
mulsd	%xmm1, %xmm0
movsd	%xmm0, .LcachedValue(%rip)
retq

.Lfunc_end0:
.size calculate, .Lfunc_end0-calculate
# -- End function
.globl getCachedValue # -- Begin function getCachedValue
.p2align 4, 0x90
.type getCachedValue,@function
getCachedValue: # @​getCachedValue

%bb.0:

movsd	.LcachedValue(%rip), %xmm0 # xmm0 = mem[0],zero
retq

.Lfunc_end1:
.size getCachedValue, .Lfunc_end1-getCachedValue
# -- End function
.type PI,@object # @​PI
.section .rodata,"a",@progbits
.globl PI
.p2align 3
PI:
.quad 4614253070214989087 # double 3.1400000000000001
.size PI, 8

.type	.LcachedValue,@object   # @​cachedValue
.local	.LcachedValue
.comm	.LcachedValue,8,8
.section	".note.GNU-stack","",@progbits

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 22, 2019

Hi Lang,

Thank you for the fast response.

I had some debug session before I prepared my "hot fix" :)
So per my light analysis, the root cause of the issue is a difference between low-level code generation on MSVC (COFF) and Linux/MinGW.
I have attached the assembler version of the module which compiled by LLC on Linux.
You can see a difference: MSVC version has additional constant __real@40091eb851eb851f(%rip) in function 'calculate'.
This constant wasn't declared in the IR module, but was added during compilation to platform-specific code. Maybe it connected with some rules of compilation for the MSVC target.

So what is happened in code in my understanding:

  1. We requested to materialize function 'calculate' and inserted the function name 'calculate' in the 'Symbols' map.
  2. We have got 'notifyResolved' notification from RTDyldObjectLinkingLayer.cpp with list of resolved symbols.
  3. We expected to see in resolved symbols list only symbols which we requested (which contains in 'Symbols') and assertion failed.
    But because a new constant '__real@40091eb851eb851f' was generated during compilation it also presents in the resolved list which comes from the linking layer (RTDyldObjectLinkingLayer.cpp).

So in my opinion, 'calculation' depends on constant '__real@40091eb851eb851f' and we have to extend our 'Symbol' map.
And you right, my patch injects all additional symbols which come from the linking layer. Because all symbols already have real addresses it is safe.


Best wishes,
Alexander

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 25, 2019

Another way just discard all unexpected symbols, but I don't enough familiar with OrcJIT architecture to say what way is proper.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 27, 2019

The patch which fixes the issue [updated].
Ignore already resolved symbols.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Dec 27, 2019

Hi Lang,

Is there any chance that this patch (or "true" fix) will be in LLVM 10?

Loading

@lhames
Copy link

@lhames lhames commented Jan 2, 2020

Hi Alexander,

I'm just back from vacation. I'll try to get you an update on this today.

-- Lang.

Loading

@lhames
Copy link

@lhames lhames commented Jan 3, 2020

Hi Alexander,

I had some debug session before I prepared my "hot fix" :)
So per my light analysis, the root cause of the issue is a difference between low-level code generation on MSVC (COFF) and Linux/MinGW.
I have attached the assembler version of the module which compiled by LLC on Linux. You can see a difference: MSVC version has additional constant
__real@40091eb851eb851f(%rip) in function 'calculate'.
This constant wasn't declared in the IR module, but was added during compilation to platform-specific code. Maybe it connected with some rules of compilation for the MSVC target.

That all sounds right to me. I guess the code generator transformed this into a named constant (probably intended to be non-exported from the final symbol table).

The thing that has me confused is the specific assertion that you saw: It is in JITDylib::resolve, but this is only called from MaterializationResponsibility::notifyResolved which contains what should have been an equivalent assertion: That the symbol appears in the SymbolFlags map of the MaterializationResponsibility. So what I want to know is how we managed to trigger the second assertion without triggering the first? Are you able to debug your LLVM build and tell me whether the loop in MaterializationResponsibility::notifyResolved is running?

That mystery aside, I think the right answer here is to have RTDyldObjectLinkingLayer auto-claim responsibility for new symbols in the object. I've attached a patch to do that (auto-claim-obj-syms.patch). Would you be able to test it out for me? I do not have access to a Windows machine.

-- Lang.

Loading

@lhames
Copy link

@lhames lhames commented Jan 3, 2020

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 9, 2020

We had a winter holidays.
Sure. I'll try your patch.
Could you please also take a look #44336 (issue with an assertion of Export flag)?
Because this bug depends on 44336.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 9, 2020

What version of LLVM is your patch for?
I use the 9.0.0 release.
Could you please prepare a patch for 9.0.0?

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 9, 2020

The thing that has me confused is the specific assertion that you saw: It is
in JITDylib::resolve, but this is only called from
MaterializationResponsibility::notifyResolved which contains what should
have been an equivalent assertion: That the symbol appears in the
SymbolFlags map of the MaterializationResponsibility. So what I want to know
is how we managed to trigger the second assertion without triggering the
first? Are you able to debug your LLVM build and tell me whether the loop in
MaterializationResponsibility::notifyResolved is running?

Sorry for the confusion. This part of the patch fixes the issue #44336 (The root cause of the issue that the resolved symbol on COFF doesn't have flag JITSymbolFlags::Exported).
Because without such fix I can't debug issue with extra symbols.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 9, 2020

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 9, 2020

Lang,

I have adapted your patch for LLVM 9.0.0 (please take a look is it right) and have got the error at runtime:
JIT session error: Duplicate definition of symbol '__real@40091eb851eb851f'

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 10, 2020

I think root cause of this error (Duplicate definition of symbol '__real@40091eb851eb851f') is here (return make_error(*KV.first);):

Error JITDylib::defineMaterializing(const SymbolFlagsMap &SymbolFlags) {
return ES.runSessionLocked(& -> Error {
std::vectorSymbolTable::iterator AddedSyms;

for (auto &KV : SymbolFlags) {
  SymbolTable::iterator EntryItr;
  bool Added;

  std::tie(EntryItr, Added) =
      Symbols.insert(std::make_pair(KV.first, SymbolTableEntry(KV.second)));

  if (Added) {
    AddedSyms.push_back(EntryItr);
    EntryItr->second.setState(SymbolState::Materializing);
  } else {
    // Remove any symbols already added.
    for (auto &SI : AddedSyms)
      Symbols.erase(SI);

    // FIXME: Return all duplicates.
    return make_error<DuplicateDefinition>(*KV.first);
  }
}

return Error::success();

});
}

Called from:
Error RTDyldObjectLinkingLayer::onObjLoad(
VModuleKey K, MaterializationResponsibility &R, object::ObjectFile &Obj,
std::unique_ptrRuntimeDyld::LoadedObjectInfo LoadedObjInfo,
std::map<StringRef, JITEvaluatedSymbol> Resolved,
std::set &InternalSymbols) {
SymbolFlagsMap ExtraSymbolsToClaim;
SymbolMap Symbols;
for (auto &KV : Resolved) {
// Scan the symbols and add them to the Symbols map for resolution.

// We never claim internal symbols.
if (InternalSymbols.count(KV.first))
  continue;

auto InternedName = getExecutionSession().intern(KV.first);
auto Flags = KV.second.getFlags();

// Override object flags and claim responsibility for symbols if
// requested.
if (OverrideObjectFlags || AutoClaimObjectSymbols) {
  auto I = R.getSymbols().find(InternedName);

  if (OverrideObjectFlags && I != R.getSymbols().end())
    Flags = I->second;
  else if (AutoClaimObjectSymbols && I == R.getSymbols().end())
    ExtraSymbolsToClaim[InternedName] = Flags;
}

Symbols[InternedName] = JITEvaluatedSymbol(KV.second.getAddress(), Flags);

}

if (!ExtraSymbolsToClaim.empty())
if (auto Err = R.defineMaterializing(ExtraSymbolsToClaim))
return Err;

R.notifyResolved(Symbols);

if (NotifyLoaded)
NotifyLoaded(K, Obj, *LoadedObjInfo);

return Error::success();
}

The second issue is here:
if (OverrideObjectFlags && I != R.getSymbols().end())
Flags = I->second;
else if (AutoClaimObjectSymbols && I == R.getSymbols().end())
ExtraSymbolsToClaim[InternedName] = Flags;
}

Symbols[InternedName] = JITEvaluatedSymbol(KV.second.getAddress(), Flags);

Despite the AutoClaimObjectSymbols flag is true extra symbol will be added to Symbols[InternedName].

Loading

@lhames
Copy link

@lhames lhames commented Jan 21, 2020

Hi Alexander,

Someone has reported a similar bug in http://bugs.#40074 . I have committed a (likely) fix for their issue in 84217ad66115cc31b184374a03c8333e4578996f. Would you be able to test and see if that commit fixes your issue too?

-- Lang.

Loading

@llvmbot
Copy link
Collaborator Author

@llvmbot llvmbot commented Jan 22, 2020

Hi Alexander,

Someone has reported a similar bug in http://bugs.#40074 . I have
committed a (likely) fix for their issue in
84217ad66115cc31b184374a03c8333e4578996f. Would you be able to test and see
if that commit fixes your issue too?

-- Lang.

Hi Lang,

I have ported your patch from #​40074 to LLVM 9.0.0 and tested it in the mine environment.
Your patch is working fine for me.
Also, it fixes bug #​44336.

But are you sure that it is a good idea to pass a parameter by value?
Error defineMaterializing(SymbolFlagsMap SymbolFlags);
If you don't want to modify the map, a constant reference is a better solution, if you want to modify, a non-constant reference or pointer is more applicable here, in my opinion.

One more question for you: is removing a module from OrcJIT will be available in LLVM 10?


Best wishes,
Alexander

Loading

@lhames
Copy link

@lhames lhames commented Jan 22, 2020

I have ported your patch from #​40074 to LLVM 9.0.0 and tested it in the mine environment.
Your patch is working fine for me.
Also, it fixes bug #​44336.

That's great news. :)
Ok -- closing this as fixed by 84217ad6611.

But are you sure that it is a good idea to pass a parameter by value?
Error defineMaterializing(SymbolFlagsMap SymbolFlags);
If you don't want to modify the map, a constant reference is a better solution, if you want to modify, a non-constant reference or pointer is more applicable here, in my opinion.

The JITDylib::defineMaterializing method makes modifications to the map based on what definitions are accepted and rejected, and returns these to MaterializationResponsibilty::defineMaterializing so that it can update the responsibility set. Since these modifications are not interesting to the client, it makes sense to pass by value: It is cheap, and if the client does not need their value any more then no copies are required.

One more question for you: is removing a module from OrcJIT will be available in LLVM 10?

Unfortunately support for removing modules will not make it in to LLVM 10. The best solution for now is to maintain a separate ExecutionSession for code that you want to throw away. I am working on removable module support, and hope to get it in to LLVM 11.

Loading

@lhames
Copy link

@lhames lhames commented Nov 27, 2021

mentioned in issue #44700

Loading

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants