Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT stub offsets silently truncated to 32 bits in call instruction #5573

Closed
llvmbot opened this issue Oct 15, 2009 · 26 comments
Closed

JIT stub offsets silently truncated to 32 bits in call instruction #5573

llvmbot opened this issue Oct 15, 2009 · 26 comments
Labels
bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 15, 2009

Bugzilla Link 5201
Resolution FIXED
Resolved on Nov 17, 2015 11:45
Version trunk
OS Linux
Attachments Add asserts, against trunk r84189, Add a test for the bug, against trunk r84531
Reporter LLVM Bugzilla Contributor
CC @comex,@echristo,@edwintorok,@rnk

Extended Description

On x86-64 systems, we're observing segfaults that we believe to have tracked down to a codegen problem in the JIT:

Disassembling the buggy function in gdb gives us this:
... asm ...
0x00002aaab33482b1 <_23_u_23___init__45+673>: incq (%r14)
0x00002aaab33482b4 <_23_u_23___init__45+676>: mov 0x18(%rsp),%rbx
0x00002aaab33482b9 <_23_u_23___init__45+681>: mov %r14,(%rbx)
0x00002aaab33482bc <_23_u_23___init__45+684>: mov $0x2dd8b48,%rbx
0x00002aaab33482c6 <_23_u_23___init__45+694>: mov 0x18(%rbx),%rsi
0x00002aaab33482ca <_23_u_23___init__45+698>: mov %r14,%rdi
0x00002aaab33482cd <_23_u_23___init__45+701>: callq 0x2aaaa5c810b0
0x00002aaab33482d2 <_23_u_23___init__45+706>: mov (%r14),%rbx
0x00002aaab33482d5 <_23_u_23___init__45+709>: dec %rbx
... asm ...

(gdb) x 0x2aaaa5c810b0
0x2aaaa5c810b0: Cannot access memory at address 0x2aaaa5c810b0

That callq 0x2aaaa5c810b0 instruction is the problem: it's calling to invalid memory. Looking at the other callq instructions in this function, gdb can't
access any of them. callq addresses in other functions can be accessed by gdb.

What's happening is that now that the JIT memory manager can allocate multiple code slabs, there's no protection against having stub slabs and code slabs more than 32 bits distant from each other. If the memory manager allocates a new code slab far away from the stub, the offset will be truncated to 32 bits, which means that Bad Things Happen.

I'm working on a reduced test case, but in the meantime, I think the attached patch should be applied to trunk so that the offset is no longer silently truncated.

Applying this patch and passing -debug-only=jit will show this output when running part of Unladen Swallow's test suite:

[...tests...]
test_importhooks
test_enumerate
test_getopt
test_codecencodings_cn
JIT: Allocating another slab of memory for function.
python: /usr/local/google/collinwinter/us/trunk/Util/llvm/lib/ExecutionEngine/JIT/JITEmitter.cpp:652:
void*::JITEmitter::getPointerToGlobal(llvm::GlobalValue*, void*, bool): Assertion `(Offset & 0xFFFFFFFF) == Offset && "Offset
too big for 32 bits"' failed.
Stack dump:
0. Running pass 'X86 Machine Code Emitter' on function '@"#u#readline241"

Without this, you'd see a segfault instead.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Oct 20, 2009

Adding MAP_32BIT to llvm::sys::Memory::AllocateRWX()'s mmap flags makes the test case pass, but does not fix the problems we're seeing with running Unladen Swallow's full test suite on x86-64. So much for a quick hack.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2009

This failure also appears on PowerPC:
http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=399

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 2, 2009

I suspect this is quite an old issue that has only come to light because of recent changes to the JIT memory manager. Prior to r76902, all code was generated into a single buffer which, while large, wasn't large enough for jumps to stubs not to fit in the relevant branch instruction. From r76902, however, code is allocated in small (64k?) slabs, which can be anywhere in memory. Once you fill a slab it's only a matter of time before you hit this error.

@rnk
Copy link
Collaborator

rnk commented Nov 2, 2009

Yeah, I wrote the change, and if you read the code, you can see that it tries to allocate all the code and stubs contiguously. If it can't get the address it wants from mmap, it tries again without specifying an address, and then it's possible to end up anywhere. That approach is horribly buggy. What we really need is some kind of heap for code, but there's no good way to implement that in a cross-platform manner. One terrible thing we could do is to linearly probe the address space. How do other JITs handle this?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 3, 2009

On PowerPC a short jump can only go 5Mb in each direction, so trying to keep the slabs "close enough" might be a losing battle.

I don't know a lot about the internals of LLVM, so this may be rubbish, but my thoughts at the moment would be to move from one stub per destination to N stubs per destination. When performing the relocation on the jump, you walk the list of stubs for that destination, picking the first one that's close enough, and generating a new stub in the current slab if none are suitable.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 3, 2009

I'd actually go even farther. I think allocating far-call-stubs inside the calling function's block (i.e. don't bother re-using stubs across functions) will save some complexity, allow us to free stubs relatively easily, and not waste much space.

I think we can change where the stubs are allocated without rearchitecting anything, since JITEmitter::finishFunction() already checks if it needs to retry after relocating, but I could have missed something.

We might run into problems sorting out the current assumption that far-call-stubs, lazy-compilation-stubs, and dlsym-stubs are all the same thing, but fixing those will help the overall design anyway.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 6, 2009

I'm going to look into un-sharing far-call-stubs.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 6, 2009

Awesome :)

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 10, 2009

Using far-call stubs allocated with the calling function has the problem that the stub gets returned from address-of operations, and may outlive the call to the function entirely. The function is destroyed then, the call stub gets invalidated, and calling it may cause a crash. :( The real fix here is to distinguish between address-of operations and call operations, and only use far-call stubs for calls, but that appears to be a hairy yak.

As a fix for this problem, nlewycky proposed making all calls into far calls, even if they turn out to be nearby, eliminating the need for far calls entirely. This will waste some code space, but make the code correct. Are there any problems we haven't seen with solving the problem this way?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 11, 2009

@llvmbot
Copy link
Collaborator Author

llvmbot commented Nov 20, 2009

r88984 fixes this problem for x86-64 by generating far calls for all calls in the large code model. Unfortunately, PPC and ARM don't have a large code model, so making the same change there affects their static compilation. There's a couple things we could do on PPC and ARM:

  1. Go back to 16MB code allocation by default, which would mask this problem in most cases, but wouldn't fix it.
  2. Invent separate code models for PPC and ARM. I'm skeptical of this because there are no external ABI docs to justify or drive any choices we make.
  3. Figure out more generally how to have multiple far-call-stubs for a single target. This probably requires distinguishing address-of and call uses of Functions in JITEmitter::getPointerToGlobal(), which will take more widespread changes.
  4. Thread the actual target address into the code generator. This would be a pretty general win, but would take even more changes.
  5. ????
  6. Profit!

Since I'm not working on PPC or ARM systems, I favor option 1 to undo the regression from Reid's change. When someone else gets enthusiastic about those platforms, they can fix them more generally.

@edwintorok
Copy link
Contributor

edwintorok commented Apr 4, 2010

*** Bug llvm/llvm-bugzilla-archive#6758 has been marked as a duplicate of this bug. ***

@edwintorok
Copy link
Contributor

edwintorok commented Apr 4, 2010

r88984 fixes this problem for x86-64 by generating far calls for all calls in
the large code model. Unfortunately, PPC and ARM don't have a large code model,
so making the same change there affects their static compilation. There's a
couple things we could do on PPC and ARM:

  1. Go back to 16MB code allocation by default, which would mask this problem
    in most cases, but wouldn't fix it.

It should also allocated stubs together with code, otherwise they will be too far apart for relocations (see llvm/llvm-bugzilla-archive#6758 ).

Problem is that stub emission can't be guaranteed to be "close enough" to the code, at least not for PPC where the possible relocation range is very small.

  1. Thread the actual target address into the code generator. This would be a
    pretty general win, but would take even more changes.

Sounds good, we could emit direct calls when constants are small enough, and indirect when not.
It would need some interface between the code generator and the JIT to query the address.

  1. ????
  1. Always emit indirect calls for ALL calls in the JIT.

The disadvantage is that even for local functions we need to use this, since we don't know if they will be close enough to call or not (and the "far call stubs" are broken, they are often too far to call too on PowerPC. Even when the function would be close-enough to call).

Attached is a patch that implements this.

  1. Profit!

Since I'm not working on PPC or ARM systems, I favor option 1 to undo the
regression from Reid's change. When someone else gets enthusiastic about those
platforms, they can fix them more generally.

@edwintorok
Copy link
Contributor

edwintorok commented Apr 4, 2010

PowerPC JIT workaround
Workaround: always emit indirect calls in JIT mode for PowerPC.

@echristo
Copy link
Contributor

echristo commented Apr 6, 2010

Seems somewhat reasonable, I'm probably missing something but I'm not seeing SetJITMode ever called?

@edwintorok
Copy link
Contributor

edwintorok commented Apr 6, 2010

Seems somewhat reasonable, I'm probably missing something but I'm not seeing
SetJITMode ever called?

It is called in PPCTargetMachine.cpp:
bool PPCTargetMachine::addCodeEmitter(PassManagerBase &PM,
CodeGenOpt::Level OptLevel,
JITCodeEmitter &JCE) {
// The JIT should use the static relocation model in ppc32 mode, PIC in ppc64.
// FIXME: This should be moved to TargetJITInfo!!
if (Subtarget.isPPC64()) {
// We use PIC codegen in ppc64 mode, because otherwise we'd have to use many
// instructions to materialize arbitrary global variable + function +
// constant pool addresses.
setRelocationModel(Reloc::PIC_);
// Temporary workaround for the inability of PPC64 JIT to handle jump
// tables.
DisableJumpTables = true;
} else {
setRelocationModel(Reloc::Static);
}

// Inform the subtarget that we are in JIT mode. FIXME: does this break macho
// writing?
Subtarget.SetJITMode();

@echristo
Copy link
Contributor

echristo commented Apr 6, 2010

Aha. For some reason I thought I'd see it set in the diff.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Apr 23, 2010

Created an attachment (id=4631) [details]
PowerPC JIT workaround

Workaround: always emit indirect calls in JIT mode for PowerPC.

With the patch applied on LLVM 2.7 makes the PowerPC JIT fail to resolve
llvm.memset.* intrinsics: bug 6902 (contains testcase and additional info)

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 19, 2010

On PowerPC a short jump can only go 5Mb in each direction, so trying to keep
the slabs "close enough" might be a losing battle.

I have hit this bug using the ARM JIT.
On ARM a short jump can only go 32Mb in each direction, while it happens less frequently on ARM than on PowerPC it still can happen.
This bug looks plausible to fix for ARM by generating a long branch to any address within the 4Gb addresspace by emitting the following when outside the +-32Mb scope:
mov lr, pc
mov pc, #stubaddress

@edwintorok
Copy link
Contributor

edwintorok commented Aug 4, 2010

new patch for PowerPC JIT workaround
New patch, that passes llvm/llvm-bugzilla-archive#6902 too.

@llvmbot
Copy link
Collaborator Author

llvmbot commented Aug 9, 2010

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

ARM got a new -arm-long-calls option to force calls to be indirect
that got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=101303
A possible fix would be to enable this new option by default on ARM.

@edwintorok
Copy link
Contributor

edwintorok commented Aug 9, 2010

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

Can you test if it works with OpenJDK too?

@llvmbot
Copy link
Collaborator Author

llvmbot commented Sep 10, 2010

pr5201-ppcfix-llvm2.8-rc1-openjdk-fail.log

Török's PPC workaround got commited to:
http://llvm.org/viewvc/llvm-project?view=rev&revision=110246

Can you test if it works with OpenJDK too?

Unfortunately:
PPC JIT still asserts with reallocation out of range errors when I test using LLVM 2.8 rc1.

@echristo
Copy link
Contributor

echristo commented Oct 16, 2015

Is this still affecting anyone?

@rnk
Copy link
Collaborator

rnk commented Nov 17, 2015

The old JIT has been deleted, and MCJIT doesn't suffer from this problem. It sometimes has a different class of issues where it fatal errors during relocation resolution, but we already have issues open for that.

@edwintorok
Copy link
Contributor

edwintorok commented Nov 27, 2021

mentioned in issue llvm/llvm-bugzilla-archive#6758

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla
Projects
None yet
Development

No branches or pull requests

4 participants