Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing package: pyarrow #195

Open
josuuribe opened this issue Mar 7, 2021 · 4 comments
Open

Missing package: pyarrow #195

josuuribe opened this issue Mar 7, 2021 · 4 comments

Comments

@josuuribe
Copy link

Package name: pyarrow
Issue type: Build failed
Link to PyPI page: https://pypi.org/project/pyarrow
Link to piwheels page: https://www.piwheels.org/project/pyarrow/
Version: All
Python version: 3.5+
I am the maintainer: No
More information:

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. This library is used by vaex-core that also fails

Detailed instructions about the installation can be found here:
https://arrow.apache.org/install/

Additional help
https://gist.github.com/heavyinfo/04e1326bb9bed9cecb19c2d603c8d521

I suppose the main reason is the need for Apache arrow libraries

@bennuttall
Copy link
Member

This has been raised before. We closed it as it didn't seem feasible to add to our automated build.

Can you follow the instructions and build it successfully on a Pi?

@josuuribe
Copy link
Author

Not yet, I expected it would be more easy in a specialized builder machine like yours, but I have read several people has got it. The problem is this library is used by several other ones, especially those related to deal with big data.
I have the idea to create a specialized Docker container if i get how to build it, as other open source projects like PyTorch o Tensorflow does.

@josuuribe
Copy link
Author

josuuribe commented Apr 16, 2021

FROM debian:latest

ARG DEBIAN_FRONTEND=noninteractive
ARG REPO_HOME=/repos
ARG ARROW_HOME=$REPO_HOME/dist
ARG LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
ARG PYARROW_WITH_PARQUET=1
ARG PARQUET_TEST_DATA=$REPO_HOME/arrow/cpp/submodules/parquet-testing/data
ARG ARROW_TEST_DATA=$REPO_HOME/arrow/testing/data
ARG ARROW_BUILD_TYPE=release
ARG ARROW_TAG=apache-arrow-3.0.0

RUN apt-get update -y && apt-get install -y libjemalloc-dev libboost-dev
libboost-filesystem-dev
libboost-system-dev
libboost-regex-dev
make
build-essential
g++
libgflags-dev
rapidjson-dev
libre2-dev
python3-dev
libatlas-base-dev
python3-dev
autoconf
flex
bison
libgrpc-dev
git &&
rm -rf /var/lib/apt/lists/* &&
rm -rf /tmp/*

ADD https://bootstrap.pypa.io/get-pip.py get-pip.py
RUN python3 get-pip.py
RUN python3 -m pip config --global set global.extra-index-url https://www.piwheels.org/simple
RUN python3 -m pip install --upgrade
cmake
wheel
numpy

WORKDIR $REPO_HOME
RUN git clone https://github.com/apache/arrow.git
WORKDIR $REPO_HOME/arrow
RUN git checkout tags/$ARROW_TAG -b build
RUN git submodule init
RUN git submodule update

WORKDIR $REPO_HOME
RUN python3 -m pip install -r arrow/python/requirements-build.txt -r arrow/python/requirements-test.txt

WORKDIR $REPO_HOME/arrow/cpp/build
RUN cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
-DPYTHON3_EXECUTABLE=$(which python3) \
-DPYTHON_INCLUDE_DIR=$(python3 -c "from distutils.sysconfig import get_python_inc;print(get_python_inc())") \
-DCMAKE_INSTALL_LIBDIR=lib \
-DPYTHON_INCLUDE_DIR2=$(python3 -c "from os.path import dirname; from distutils.sysconfig import get_config_h_filename; print(dirname(get_config_h_filename()))") \
-DARROW_WITH_BZ2=ON \
-DPYTHON_LIBRARY=$(python3 -c "from distutils.sysconfig import get_config_var;from os.path import dirname,join ; print(join(dirname(get_config_var('LIBPC')),get_config_var('LDLIBRARY')))") \
-DARROW_WITH_ZLIB=ON \
-DPYTHON3_NUMPY_INCLUDE_DIRS=$(python3 -c "import numpy; print(numpy.get_include())") \
-DARROW_WITH_ZSTD=ON
-DPYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
-DARROW_WITH_LZ4=ON
-DARROW_WITH_SNAPPY=ON
-DARROW_WITH_BROTLI=ON
-DARROW_PARQUET=ON
-DARROW_PYTHON=ON
-DARROW_BUILD_TESTS=ON
..
RUN make -j$(nproc)
RUN make install

WORKDIR $REPO_HOME/arrow/python
RUN python3 setup.py build_ext --inplace
RUN python3 -m pytest pyarrow 2>&1 || echo "Some unit tests have failed"
RUN python3 setup.py build_ext --build-type=$ARROW_BUILD_TYPE --bundle-arrow-cpp bdist_wheel

WORKDIR /drop
RUN cp $REPO_HOME/arrow/python/dist/*.whl .

CMD ["/bin/bash"]

@josuuribe
Copy link
Author

josuuribe commented Apr 16, 2021

Execute with:
docker run -dit image_id

Copy wheel from docker image
docker cp container_id:/drop .

Now, you can stop container
docker container stop container_id

It works for Apache 4.0.0 (master) and also for latest stable version (3.0.0) anyway you can switch versions using ARROW_TAG while build (set as value the same label as exists in Arrow GitHub repository)

Original here: https://github.com/josuuribe/RaraAvis/blob/blog/Docker/build/Dockerfile.arrow

I hope this helps!!

Thanks for your effort with pywheels!

@bennuttall bennuttall self-assigned this Apr 24, 2021
@bennuttall bennuttall removed their assignment Apr 24, 2021
@bennuttall bennuttall changed the title Failed building: pyarrow Missing package: pyarrow Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants