Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Compile on PyPy on Linux #2319

Closed
wants to merge 7 commits into from
Closed

Conversation

gmarkall
Copy link
Member

This is something of a placeholder at the moment - the code is pretty much as I plan for it to be (and is based on #2253), but I'm still thinking about a good way to produce llvmlite packages for pypy that can be used to test Numba with. Some thoughts/comments:

  • All this enables is compiling the C extension part of Numba on PyPy - it won't even import at the moment. Import working correctly, and more straightforward use-cases working, will be part of the following PR - this is just to try and keep the work into as small and manageable chunks as possible.
  • IIRC llvmlite packages were produced on a Jenkins server internal to Continuum, and then uploaded to anaconda.org - is that still the case? If so, will it be OK to add an extra configuration to build llvmlite packages in the long term?
  • The llvmlite package that's used here at the moment is created from this recipe: https://github.com/gmarkall/llvmlite/tree/pypycondarecipehack/conda-recipes/llvmlite - this recipe rolls in pip and enum34, which is not ideal. I think that I should build the pypy package again but this time make sure it includes pip and setuptools, and make a pypy-enum34 conda package. I didn't do either of these things yet because it will be a little time-consuming, and wanted to get a bit of feedback first before going ahead with trying to do that.
  • libllvmlite.so for PyPy is 491MB, which seems a little excessive. Maybe I need to see why that's so big quite soon too.

All feedback / thoughts appreciated! In the meantime, I'll be trying to improve the packaging setup and fixing anything that this appears to have broken.

_PyObject_GC_UNTRACK is not supported by PyPy's cpyext.
PyObject_GC_UnTrack performs the same function, but with a check
that allows it to be called twice on an object. This commit simply
changes _PyObject_GC_UNTRACK to PyObject_GC_UnTrack, as it is not
expected to have an impact on performance.
This is compatible with both CPython and PyPy.
_Py_c_pow is not implemented on PyPy, but the implementation in
CPython is quite self-contained so it can be included in
_helperlib for use when compiling on PyPy.
This does not look ideal, but as per the comment it needs further
investigation as to whether the alternative doesn't work correctly
due to a but in PyPy.
These macros should not be missing in PyPy, but for now they are
required to build Numba.
@gmarkall
Copy link
Member Author

I completely forgot to do something about making sure Numpy was installed on PyPy, and seem to have upset the build for CPython too. Will come back to this after the weekend.

@seibert
Copy link
Contributor

seibert commented Mar 27, 2017

You are right that llvmlite is still built on an internal Continuum computer, but now we're using buildbot instead of Jenkins. Same basic idea though.

One issue that comes to mind is how we should distinguish a PyPy build of a package from a CPython build of a package in conda. There might be some new conda/conda-build features that help here. I need to talk to @msarahan to see.

@gmarkall
Copy link
Member Author

One issue that comes to mind is how we should distinguish a PyPy build of a package from a CPython build of a package in conda. There might be some new conda/conda-build features that help here. I need to talk to @msarahan to see.

I had wondered if that would be possible, but I was a bit wary of getting too stuck into generic PyPy/conda stuff and spending a lot of time on it - it would be good to know if there's some things in there that make it relatively easy to do though :-)

@msarahan
Copy link

msarahan commented Mar 28, 2017 via email

@gmarkall
Copy link
Member Author

Should be easier with conda build 3's more dynamic recipes. I have not really thought through this exact use case though. Happy to talk when I return from vacation on Monday.

It's not so much the package building that I was worried would be an issue, but more a problem of dependency resolution - for example, if I wanted to package pypy and have a numpy package that depends on it, I'm not sure how that would look. For numpy 1.12.1 there are the following variations:

                             1.12.1                   py27_0  defaults        
                             1.12.1             py27_nomkl_0  defaults        [nomkl]
                             1.12.1                   py35_0  defaults        
                             1.12.1             py35_nomkl_0  defaults        [nomkl]
                          *  1.12.1                   py36_0  defaults        
                             1.12.1             py36_nomkl_0  defaults        [nomkl]

None of these will be compatible with a pypy package, and I think quite a few questions arise, e.g.:

  • If I were to build a Numpy for PyPy, what would the build string be to ensure that it would depend on a PyPy package that implements Python version 2.7?
  • Should the PyPy package be named Python (given that CPython is called python in conda too)?
    • If it should be called python, what should its version be?
    • If not, can build strings be used to denote dependence on particular versions of non-python packages?

I imagine that the answers to these are not trivial and might require some work to support in conda - I was hoping to sidestep these questions by simply building a small "ecosystem" of PyPy and related packages that don't use the exact names "python" or "numpy" as the package names, but it's not a scalable way of working - however, if there's a better, "official" way to do this then that would be preferable for the long term.

@seibert
Copy link
Contributor

seibert commented Mar 28, 2017

Yeah, I think this is something we'll need to figure out for conda more broadly as PyPy cpyext improves and PyPy can run more of the packages relevant to conda users (scipy, pandas, etc).

@seibert
Copy link
Contributor

seibert commented Apr 12, 2017

I think for now we'll have to go with the "small ecosystem" approach. I spoke with some of the conda developers and they have some ideas for how to deal with this, but nothing that will be available soon.


set +v
source activate $CONDA_ENV
set -v

# Install latest llvmlite build
$CONDA_INSTALL -c numba llvmlite
if [ "$PYTHON" == "pypy" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the reverse of the condition you want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is! Will fix.

@gmarkall
Copy link
Member Author

I think for now we'll have to go with the "small ecosystem" approach. I spoke with some of the conda developers and they have some ideas for how to deal with this, but nothing that will be available soon.

Just to clarify - this means that I'll build and publish a set of PyPy and related packages that are completely independent of other conda packages for the purpose of testing Numba and llvmlite? (I am happy to do this)

@seibert
Copy link
Contributor

seibert commented Apr 13, 2017

Yes. If you put together the conda recipes, we're also happy to host them under our Anaconda Cloud account.

@astrojuanlu
Copy link
Contributor

@ytrezq
Copy link

ytrezq commented Oct 30, 2019

Any update?

PyPy added almost full Cython support since the last comment was made. llvmlite and builds and runs fine without any patch.

@h-g-s
Copy link

h-g-s commented Dec 14, 2019

with pypy 7.2 it still failing (Manjaro) :/
....
copying numba/tests/pycc_distutils_usecase/setup_distutils.py -> build/lib.linux-x86_64-3.6/numba/tests/pycc_distutils_usecase
running build_ext
building 'numba._dynfunc' extension
Warning: Can't read registry to find the necessary compiler setting
Make sure that Python modules winreg, win32api or win32con are installed.
C compiler: gcc -pthread -DNDEBUG -O2 -fPIC

creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/numba
compile options: '-I/opt/pypy3/include -c'
extra options: '-g'
gcc: numba/_dynfuncmod.c
In file included from numba/_dynfuncmod.c:1:
numba/_dynfunc.c: In function ‘generator_dealloc’:
numba/_dynfunc.c:361:10: error: ‘_Py_Finalizing’ undeclared (first use in this function)
  361 |     if (!_Py_Finalizing)
      |          ^~~~~~~~~~~~~~
numba/_dynfunc.c:361:10: note: each undeclared identifier is reported only once for each function it appears in
error: Command "gcc -pthread -DNDEBUG -O2 -fPIC -I/opt/pypy3/include -c numba/_dynfuncmod.c -o build/temp.linux-x86_64-3.6/numba/_dynfuncmod.o -g" failed with exit status 1
----------------------------------------

@stuartarchibald stuartarchibald added this to the PR Backlog milestone May 14, 2020
@sandys
Copy link

sandys commented May 21, 2020

hi guys,
I'm attaching a Dockerfile that you can run in a single line and illustrates the issue. This code builds all the way till llvmlite, but fails on numba.

Scipy, Pandas, Numpy, Cython, everything work with Pypy


FROM buildpack-deps:sid

# ensure local pypy is preferred over distribution pypy
ENV PATH /usr/local/bin:$PATH

# http://bugs.python.org/issue19846
# > At the moment, setting "LANG=C" on a Linux system *fundamentally breaks Python 3*, and that's not OK.
ENV LANG C.UTF-8

# runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    tcl \
    tk \
    && rm -rf /var/lib/apt/lists/*

ENV PYPY_VERSION 7.3.1

RUN set -ex; \
    \
    # this "case" statement is generated via "update.sh"
    dpkgArch="$(dpkg --print-architecture)"; \
    case "${dpkgArch##*-}" in \
    # amd64
    amd64) pypyArch='linux64'; sha256='f67cf1664a336a3e939b58b3cabfe47d893356bdc01f2e17bc912aaa6605db12' ;; \
    # arm64v8
    arm64) pypyArch='aarch64'; sha256='b900241bca7152254c107a632767f49edede99ca6360b9a064141267b47ef598' ;; \
    # i386
    i386) pypyArch='linux32'; sha256='7045b295d38ba0b5ee65bd3f078ca249fcf1de73fedeaab2d6ad78de2eab0f0e' ;; \
    # ppc64le
    ppc64el) pypyArch='ppc64le'; sha256='d6f3b701313df69483b43ebdd21b9652ae5e808b2eea5fbffe3b74b82d2e7433' ;; \
    # s390x
    s390x) pypyArch='s390x'; sha256='0fe2f7bbf42ea88b40954d7de773a43179a44f40656f2f58201524be70699544' ;; \
    *) echo >&2 "error: current architecture ($dpkgArch) does not have a corresponding PyPy $PYPY_VERSION binary release"; exit 1 ;; \
    esac; \
    \
    savedAptMark="$(apt-mark showmanual)"; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
    # sometimes "pypy3" itself is linked against libexpat1 / libncurses5, sometimes they're ".so" files in "/usr/local/lib_pypy"
    libexpat1 \
    libncurses5 \
    # (so we'll add them temporarily, then use "ldd" later to determine which to keep based on usage per architecture)
    ; \
    \
    wget -O pypy.tar.bz2 "https://bitbucket.org/pypy/pypy/downloads/pypy3.6-v${PYPY_VERSION}-${pypyArch}.tar.bz2" --progress=dot:giga; \
    echo "$sha256 *pypy.tar.bz2" | sha256sum -c; \
    tar -xjC /usr/local --strip-components=1 -f pypy.tar.bz2; \
    find /usr/local/lib-python -depth -type d -a \( -name test -o -name tests \) -exec rm -rf '{}' +; \
    rm pypy.tar.bz2; \
    \
    # smoke test
    pypy3 --version; \
    \
    if [ -f /usr/local/lib_pypy/_ssl_build.py ]; then \
    # on pypy3, rebuild ffi bits for compatibility with Debian Stretch+ (https://github.com/docker-library/pypy/issues/24#issuecomment-409408657)
    cd /usr/local/lib_pypy; \
    pypy3 _ssl_build.py; \
    # TODO rebuild other cffi modules here too? (other _*_build.py files)
    fi; \
    \
    apt-mark auto '.*' > /dev/null; \
    [ -z "$savedAptMark" ] || apt-mark manual $savedAptMark > /dev/null; \
    find /usr/local -type f -executable -exec ldd '{}' ';' \
    | awk '/=>/ { print $(NF-1) }' \
    | sort -u \
    | xargs -r dpkg-query --search \
    | cut -d: -f1 \
    | sort -u \
    | xargs -r apt-mark manual \
    ; \
    apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; \
    rm -rf /var/lib/apt/lists/*; \
    # smoke test again, to be sure
    pypy3 --version; \
    \
    find /usr/local -depth \
    \( \
    \( -type d -a \( -name test -o -name tests \) \) \
    -o \
    \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
    \) -exec rm -rf '{}' +

# if this is called "PIP_VERSION", pip explodes with "ValueError: invalid truth value '<VERSION>'"
ENV PYTHON_PIP_VERSION 20.0.2
# https://github.com/pypa/get-pip
ENV PYTHON_GET_PIP_URL https://github.com/pypa/get-pip/raw/d59197a3c169cef378a22428a3fa99d33e080a5d/get-pip.py
ENV PYTHON_GET_PIP_SHA256 421ac1d44c0cf9730a088e337867d974b91bdce4ea2636099275071878cc189e

RUN set -ex; \
    \
    wget -O get-pip.py "$PYTHON_GET_PIP_URL"; \
    echo "$PYTHON_GET_PIP_SHA256 *get-pip.py" | sha256sum --check --strict -; \
    \
    pypy3 get-pip.py \
    --disable-pip-version-check \
    --no-cache-dir \
    "pip==$PYTHON_PIP_VERSION" \
    ; \
    # smoke test
    pip --version; \
    \
    find /usr/local -depth \
    \( \
    \( -type d -a \( -name test -o -name tests \) \) \
    -o \
    \( -type f -a \( -name '*.pyc' -o -name '*.pyo' \) \) \
    \) -exec rm -rf '{}' +; \
    rm -f get-pip.py

# CMD ["pypy3"]


WORKDIR /usr/src/app


# Tell Python not to recreate the bytecode files. Since this is a docker image,
# these will be recreated every time, writing them just uses unnecessary disk
# space.
ENV PYTHONDONTWRITEBYTECODE=true

RUN apt-get  update &&  apt-get install -y build-essential 

#4 opencv3
ENV OPENCV_VERSION 3.4.2


RUN apt-get install --no-install-recommends -y openssh-client git cmake make

################## shared packages for chromium headless ######################
RUN apt-get install -y libx11-xcb1 libxcb1 libx11-6 libxcomposite1 libxcursor1 \
    libxdamage1 libxext6 libxi6 libxtst6 libgtk-3-0 libnss3 libxss1 libasound2

# RUN  apt-get --no-install-recommends -y install libatlas-base-dev
RUN apt-get install --no-install-recommends -y libopenblas-base  libopenblas-dev 

# RUN pypy -m ensurepip

RUN pip download  scipy
RUN apt-get install -y gfortran
RUN pip install --no-cache-dir --only-binary cython,numpy numpy cython 
RUN pypy3 -m pip install --upgrade pip
# RUN pip install --no-cache-dir pybind11 
RUN apt-get install -y libopenblas-dev liblapack-dev 
RUN  pip install git+https://github.com/scipy/scipy@master#egg=scipy

RUN pip install --no-cache-dir  pandas dask 

# RUN pip install --no-cache-dir  pandas dask 

RUN apt-get install -y llvm-8-dev  llvm-8 pypy-dev
RUN update-alternatives --install /usr/local/bin/llvm-config llvm-config /usr/bin/llvm-config-8 40 && \
    # update-alternatives --install /usr/local/bin/clang clang /usr/bin/clang-8 40 && \
    update-alternatives --install /usr/local/bin/opt opt /usr/bin/opt-8 40 && \
    update-alternatives --install /usr/local/bin/llvm-link llvm-link /usr/bin/llvm-link-8 40
RUN pip install --no-cache-dir  numba

@gmarkall gmarkall mentioned this pull request Jun 4, 2020
@mattip
Copy link
Contributor

mattip commented Mar 28, 2021

_Py_Finalize is supported since release 7.3.0 in Dec 2019, and conda now has support for PyPy. Could you try again with a python7.3 version of PyPy?

@mattip
Copy link
Contributor

mattip commented Mar 28, 2021

It seems the CI has moved to azure, and the conda environment is created with this line

conda create -n %CONDA_ENV% -q -y python=%PYTHON% numpy=%NUMPY% cffi pip scipy jinja2 ipython gitpython

which can be modified to use PyPy via

conda config --set channel_priority strict
conda create -c conda-forge -n %CONDA_ENV% -q -y pypy python=3.7 numpy pip cffi scipy jinja2 ipython gitpython

The main change is to add a pypy specifier to select the PyPy builds. Other than that:

  • require conda-forge channel
  • specify only python3.7
  • do not specify a numpy version since only a few are available for pypy

@mattip
Copy link
Contributor

mattip commented Mar 28, 2021

Nope, this won't work. Compilation of numba/_dispatcher.cpp fails because PyPy's PyThreadState does not have c_profilefunc, in fact it does not have the typedef int (*Py_tracefunc)(PyObject *, struct _frame *, int, PyObject *); declaration in pystate.h.

@gmarkall
Copy link
Member Author

gmarkall commented Jul 1, 2022

I think I might just close this. I think it's maybe useful as an inspiration for making current Numba work on PyPy (along with the guidance in https://www.embecosm.com/2017/01/19/running-numba-on-pypy/), but probably not work attempting to continue on with.

@gmarkall gmarkall closed this Jul 1, 2022
@stuartarchibald stuartarchibald added the abandoned PR is abandoned (no reason required) label Jul 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
abandoned PR is abandoned (no reason required) Blocked awaiting long term feature For PRs/Issues that require the implementation of a long term plan feature
Projects
Active
Blocked
Development

Successfully merging this pull request may close these issues.

None yet

9 participants