Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pyarrow optional #186

Closed
niboshi opened this issue Jun 2, 2021 · 5 comments
Closed

Make pyarrow optional #186

niboshi opened this issue Jun 2, 2021 · 5 comments
Labels
wontfix This will not be worked on

Comments

@niboshi
Copy link
Member

niboshi commented Jun 2, 2021

I'm trying to install pfio in an environment where I can't install arrow.
I have no plan to use HDFS either.
I think it's very helpful if pfio can be installed without HDFS support.

@kuenishi
Copy link
Member

kuenishi commented Jun 3, 2021

Could you tell me any actual use case behind your request where you can't install? PFIO is originally a file system wrapper that does not support POSIX. And it supports HDFS as a primary file system to wrap. We assumed that PFIO users always needed pyarrow. I also want which feature of PFIO you're using, without need for HDFS. Also, pyarrow itself has an x86 wheel release - it can be installed without Hadoop or even Java installation. So does boto3. In case when you don't need them, they don't do any harm unless you use them.

Also, #187 would be a huge change that changes the usage of PFIO. I don't have good idea to implement pyarrow and boto3 as extras_require .

@niboshi
Copy link
Member Author

niboshi commented Jun 8, 2021

Thank you for a comment. This is the result of pip install pfio in my environment (on arm64).

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting pfio
  Downloading pfio-1.4.3-py3-none-any.whl (46 kB)
Collecting boto3
  Downloading boto3-1.17.89-py2.py3-none-any.whl (131 kB)
Collecting pyarrow==3.0.0
  Downloading pyarrow-3.0.0.tar.gz (682 kB)
  Installing build dependencies: started
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: still running...
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Requirement already satisfied: numpy>=1.16.6 in ./.pyenv/versions/3.6.13/lib/python3.6/site-packages (from pyarrow==3.0.0->pfio) (1.19.5)
Collecting botocore<1.21.0,>=1.20.89
  Downloading botocore-1.20.89-py2.py3-none-any.whl (7.6 MB)
Collecting s3transfer<0.5.0,>=0.4.0
  Downloading s3transfer-0.4.2-py2.py3-none-any.whl (79 kB)
Collecting jmespath<1.0.0,>=0.7.1
  Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in ./.pyenv/versions/3.6.13/lib/python3.6/site-packages (from botocore<1.21.0,>=1.20.89->boto3->pfio) (2.8.1)
Collecting urllib3<1.27,>=1.25.4
  Downloading urllib3-1.26.5-py2.py3-none-any.whl (138 kB)
Requirement already satisfied: six>=1.5 in ./.pyenv/versions/3.6.13/lib/python3.6/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.21.0,>=1.20.89->boto3->pfio) (1.16.0)
Building wheels for collected packages: pyarrow
  Building wheel for pyarrow (PEP 517): started
  Building wheel for pyarrow (PEP 517): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /home/niboshi/.pyenv/versions/3.6.13/bin/python3.6 /home/niboshi/.pyenv/versions/3.6.13/lib/python3.6/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpn6mv248f
       cwd: /tmp/pip-install-t4ofepcn/pyarrow_983f6fc761534b2db63cafc8296003f5
  Complete output (233 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-aarch64-3.6
  creating build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/orc.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/compat.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/filesystem.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/fs.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/jvm.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/__init__.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/compute.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/types.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/feather.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/flight.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/hdfs.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/pandas_compat.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/util.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/benchmark.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/cffi.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/dataset.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/ipc.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/_generated_version.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/cuda.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/json.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/plasma.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/serialization.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/csv.py -> build/lib.linux-aarch64-3.6/pyarrow
  copying pyarrow/parquet.py -> build/lib.linux-aarch64-3.6/pyarrow
  creating build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_cuda.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_hdfs.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/strategies.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_builder.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_plasma_tf_op.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/conftest.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_orc.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_sparse_tensor.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_serialization_deprecated.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_serialization.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_types.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_dataset.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/__init__.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_cffi.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_cython.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_strategies.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/pandas_examples.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_adhoc_memory_leak.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_deprecations.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_tensor.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_feather.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_jvm.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/util.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_flight.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_plasma.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_compute.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_io.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_scalars.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_cuda_numba_interop.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_table.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_misc.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_pandas.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_fs.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_gandiva.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_schema.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_convert_builtin.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_array.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_extension_type.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_filesystem.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_ipc.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_json.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/arrow_7980.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/pandas_threaded_import.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_memory.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/deserialize_buffer.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  copying pyarrow/tests/test_csv.py -> build/lib.linux-aarch64-3.6/pyarrow/tests
  running egg_info
  writing pyarrow.egg-info/PKG-INFO
  writing dependency_links to pyarrow.egg-info/dependency_links.txt
  writing entry points to pyarrow.egg-info/entry_points.txt  

The error is because pyarrow 3.0.0 does not provide a wheel for aarch64 (pyarrow 4.0.0 does, though, but not any).

pfio's README says it's a file system abstraction. We use it in our library for that purpose. The library intends to support both HDFS and POSIX file system. Users (like me in this case) should be able to run it both with and without HDFS.

@kuenishi
Copy link
Member

kuenishi commented Jun 8, 2021

Got it. Thank you for detail. How about updating pyarrow to 4.0? (Now it has 4.0.1). That should work for you if you're using Linux, because pyarrow itself does no harm (once it's installed :P ), even without HDFS. That fix would also be easy for me, too.

For macos, maybe we can implement a special workaround like skipping pyarrow with warning.

I understand your concern that a dependency that's not needed is installed by opt-out, but the complexity introduced by naive resolution, like install options such as pip install pfio[hdfs,s3] would break a lot of applications' dependency. We should design and discuss more carefully.

@kuenishi
Copy link
Member

kuenishi commented Jun 8, 2021

I've released 1.5, which depends on pyarrow 4.0.1. I hope it works for you, but otherwise would welcome further discussion.

@kuenishi kuenishi added the wontfix This will not be worked on label Dec 27, 2021
@kuenishi
Copy link
Member

Thank you for the suggestion, but closing. Feel free to reopen, to resume the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants