Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ppc64le] RuntimeDyldELF/TOC problem (segfaults) #6606

Open
ghost opened this issue Dec 30, 2020 · 5 comments
Open

[ppc64le] RuntimeDyldELF/TOC problem (segfaults) #6606

ghost opened this issue Dec 30, 2020 · 5 comments
Labels
ISA: POWER Issue related to POWER ISA llvm LLVM related issues

Comments

@ghost
Copy link

ghost commented Dec 30, 2020

(spawned from #4026)

Summary

A missing nop after a branch-instruction results in RuntimeDyldELF overwriting an instruction with the TOC restoration code. This happnes only in reloc-mode=static.

The workaround in numba was to change (for ppc) reloc_mode back to pic (31ddd81)

See bug-identification by obilaniu within #4026 (comment)

Reproduction

# DRAFT
# on ppc64le, tested on ubuntu 16.04, Power8, Power9, qemu ppc64le

### Install Miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-ppc64le.sh
bash Miniconda3-latest-Linux-ppc64le.sh 

### Setup Conda Environment

conda create -n ppc-segfault python=3.6 numpy=1.15 llvmlite=0.35.* scipy jinja2 cffi conda-build -y
conda activate ppc-segfault
conda install gcc_linux-ppc64le gxx_linux-ppc64le conda-build make  -y

### use your preferred location

cd ~ && mkdir numba-6606 && cd numba-6606

### get a 0.52.0 version of numba patched to use `static` for `ppc`

git clone -b 0.52.0-static https://github.com/abebeos/numba.git numba-static
cd numba-static
python setup.py build_ext --inplace

### this segfaults

./runtests.py -v numba.tests.npyufunc.test_parallel_ufunc_issues.TestParGUfuncIssues

Suggested Fix

The fix was identified during the work-in to the compiler-side of the problem. Whilst looking for similar work on the relevant (pointed out by nemanjai within #4026 (comment)) PPCISelLowering.cpp / callsShareTOCBase(), the comments on the following commit indicated its relevance for the problem:

https://reviews.llvm.org/rG2812c1515627904e31605bbd4f25a887a1f8eb12

The backported (llvm-10.x) patch (incl. adopted tests):

Binaries

Patched llvm-10 / llvmlite-0.35.0 available (temporarily, for testing):

conda install llvm -c abebeos
conda install llvmlite -c abebeos

Validation

(numba repo does not automatically test on PowerPC, see numba/llvmlite#684 (comment))

I've validated the fix manually on Power8 hardware, see numba/llvmlite#684 (comment).

The validation should be trivial for any dev with a ready dev-system.

Validation with pre-compiled binaries (After doing the Reproduction steps above):

# Ensure you've executed the reproduction steps

### this llvm contains the fix

conda install llvmlite -c abebeos

### rebuild numba (just to be sure)

cd ~/numba-6606/numba-static
python setup.py build_ext --inplace 

### should work now
./runtests.py -v numba.tests.npyufunc.test_parallel_ufunc_issues.TestParGUfuncIssues

# full tests
./runtests.py -m16 #your machine thread count
@ghost
Copy link
Author

ghost commented Jan 3, 2021

We at IBM are concerned that there is a latent bug in RuntimeDyldELF for PPC64LE, but apparently there is no reproducer.

source: #4026 (comment)

I can say with near 100% confidence that (beside this bug), no other latent bugs should be in the code (re this issue). The compiler-side code I reviewed (PPCISelLowering.cpp) is very clean, with the right amount of in-source documentation.

The work/reviews on e.g. https://reviews.llvm.org/D81126?id=274770 can be easily followed.

With a little bit effort for clarifying RuntimeDyldELF (only the code that directly affects ppc), all should be good. The clarity of RuntimeDyldELF should become similar to that of PPCISelLowering.cpp, even if the future direction will be JITLink

@esc esc added the needtriage label Jan 4, 2021
@stuartarchibald stuartarchibald added the ISA: POWER Issue related to POWER ISA label Jan 5, 2021
@stuartarchibald
Copy link
Contributor

@abebeos Thanks for the patch(es) and for fixing this! As promised here I raised this issue at yesterday's development meeting. Outcome was that we're going to hopefully roll these fixes into the 0.54 release of Numba along with attempting an upgrade to LLVM 11.

@stuartarchibald stuartarchibald added llvm LLVM related issues and removed needtriage labels Jan 6, 2021
@esc
Copy link
Member

esc commented Jan 6, 2021

@abebeos thank you again for submitting this! We appreciate your efforts to improve Numba. Regarding the timeline: the 0.54 release cycle will most likely begin in a few weeks from now (end of Jan/ beg. of Feb). At that point I can commit to reviewing and validating your patch. I should also note, ahead of time, that we may need this patch to be ported to LLVM 11.0 since we will be attempting an upgrade from LLVM 10 to LLVM 11 as part of the 0.54 release.

@esc
Copy link
Member

esc commented Jan 8, 2021

@abebeos thank you, I will take it into consideration.

@esc
Copy link
Member

esc commented Jan 11, 2021

Dear @abebeos,

Judging from your response of "Ouch!" I assume that you are disappointed about the proposed timeline for validating your proposed patches? If that is the case, I am sorry that the project schedule does to align with your expectations.

Also, thank you for pre-compiling those packages but unfortunately they will be of little use for me. We will want to build and test the Numba stack (llvmdev -> llvmlite -> numba) across the supported versions of Python and Numpy on the corresponding supported architectures as part of our efforts in quality assurance.

Thank you in advance for your patience and understanding in these matters.

-esc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISA: POWER Issue related to POWER ISA llvm LLVM related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants