Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot import datasets - ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility #5923

Closed
ehuangc opened this issue Jun 2, 2023 · 24 comments

Comments

@ehuangc
Copy link

ehuangc commented Jun 2, 2023

Describe the bug

When trying to import datasets, I get a pyarrow ValueError:

Traceback (most recent call last):
File "/Users/edward/test/test.py", line 1, in
import datasets
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/init.py", line 43, in
from .arrow_dataset import Dataset
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_dataset.py", line 65, in
from .arrow_reader import ArrowReader
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_reader.py", line 28, in
import pyarrow.parquet as pq
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/init.py", line 20, in
from .core import *
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/core.py", line 45, in
from pyarrow.fs import (LocalFileSystem, FileSystem, FileType,
File "/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/fs.py", line 49, in
from pyarrow._gcsfs import GcsFileSystem # noqa
File "pyarrow/_gcsfs.pyx", line 1, in init pyarrow._gcsfs
ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject

Steps to reproduce the bug

import datasets

Expected behavior

Successful import

Environment info

Conda environment, MacOS
python 3.9.12
datasets 2.12.0

@mariosasko
Copy link
Collaborator

Based on rapidsai/cudf#10187, this probably means your pyarrow installation is not compatible with datasets.

Can you please execute the following commands in the terminal and paste the output here?

conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"

@ehuangc
Copy link
Author

ehuangc commented Jun 2, 2023

Based on rapidsai/cudf#10187, this probably means your pyarrow installation is not compatible with datasets.

Can you please execute the following commands in the terminal and paste the output here?

conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"

Here is the output to the first command:

arrow-cpp                 11.0.0           py39h7f74497_0  
pyarrow                   12.0.0                   pypi_0    pypi

and the second:

/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/__init__.py

Thanks!

@Joheun-Kang
Copy link

after installing pytesseract 0.3.10, I got the above error. FYI

@Joheun-Kang
Copy link

RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject

@ssydyc
Copy link

ssydyc commented Jun 4, 2023

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies?

@albertvillanova
Copy link
Member

Please note that our CI properly passes all tests with pyarrow-12.0.0, for Python 3.7 and Python 3.10, for Ubuntu and Windows: see for example https://github.com/huggingface/datasets/actions/runs/5157324334/jobs/9289582291

@Joheun-Kang
Copy link

Joheun-Kang commented Jun 5, 2023

For conda with python3.8.16 this solved my problem! thanks!

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies? I can work on that if no one else is working on it.

@Joheun-Kang
Copy link

Thanks for replying. I am not sure about those environments but it seems like pyarrow-12.0.0 does not work for conda with python 3.8.16.

Please note that our CI properly passes all tests with pyarrow-12.0.0, for Python 3.7 and Python 3.10, for Ubuntu and Windows: see for example https://github.com/huggingface/datasets/actions/runs/5157324334/jobs/9289582291

@lorelupo
Copy link

Got the same error with:

arrow-cpp                 11.0.0          py310h7516544_0  
pyarrow                   12.0.0                   pypi_0    pypi

python                    3.10.11              h7a1cb2a_2  

datasets                  2.13.0             pyhd8ed1ab_0    conda-forge

@lorelupo
Copy link

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies?

This solved the issue for me as well.

@imarquart
Copy link

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies?

Solved it for me also

@YY0649
Copy link

YY0649 commented Jul 11, 2023

基于 rapidsai/cudf#10187,这可能意味着您的安装与 不兼容。pyarrow``datasets

您能否在终端中执行以下命令并将输出粘贴到此处?

conda list | grep arrow
python -c "import pyarrow; print(pyarrow.__file__)"

arrow-cpp 11.0.0 py310h7516544_0
pyarrow 12.0.1 pypi_0 pypi

/root/miniconda3/lib/python3.10/site-packages/pyarrow/init.py

@kimjansheden
Copy link

Got the same problem with

arrow-cpp 11.0.0 py310h1fc3239_0
pyarrow 12.0.1 pypi_0 pypi

miniforge3/envs/mlp/lib/python3.10/site-packages/pyarrow/init.py

Reverting back to pyarrow 11 solved the problem.

@B8ni
Copy link

B8ni commented Aug 7, 2023

Solved with pip install pyarrow==11.0.0

@wolf-li
Copy link

wolf-li commented Aug 31, 2023

I got different. Solved with
pip install pyarrow==12.0.1
pip install cchardet

env:
Python 3.9.16
transformers 4.32.1

@5uryansh
Copy link

5uryansh commented Sep 3, 2023

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies?

This works for me as well

@williamLyh
Copy link

I got different. Solved with pip install pyarrow==12.0.1 pip install cchardet

env: Python 3.9.16 transformers 4.32.1

I guess it also depends on the Python version. I got Python 3.11.5 and pyarrow==12.0.0.
It works!

@thierrydecae
Copy link

Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I'm using Colab) but this worked:
!pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11

ayutaz added a commit to ayutaz/japanese-mistral-300m-recipe that referenced this issue Jan 21, 2024
ValueError                                Traceback (most recent call last)
<timed exec> in <module>

/usr/local/lib/python3.10/dist-packages/datasets/__init__.py in <module>
     20 __version__ = "2.14.5"
     21 
---> 22 from .arrow_dataset import Dataset
     23 from .arrow_reader import ReadInstruction
     24 from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder

4 frames
/usr/local/lib/python3.10/dist-packages/pyarrow/_parquet.pyx in init pyarrow._parquet()

ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject

上記の問題を対応するために、ライブラリのアップデート
以下を参考
huggingface/datasets#5923
@JerryRen471
Copy link

Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I'm using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11

thanks! I met the same problem and your suggestion solved it.

@merveenoyan
Copy link
Contributor

merveenoyan commented Feb 10, 2024

(I was doing quiet install so I didn't notice it initially)
I've been loading the same dataset for months on Colab, just now I got this error as well. I think Colab has changed their image recently (I had some errors regarding CUDA previously as well). beware of this and restart runtime if you're doing quite pip installs.
moreover installing stable version of datasets on pypi gives this:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ibis-framework 7.1.0 requires pyarrow<15,>=2, but you have pyarrow 15.0.0 which is incompatible.
Successfully installed datasets-2.17.0 dill-0.3.8 multiprocess-0.70.16 pyarrow-15.0.0
WARNING: The following packages were previously imported in this runtime:
  [pyarrow]
You must restart the runtime in order to use newly installed versions.

@rasith1998
Copy link

for colab - pip install pyarrow==11.0.0

@PennlaineChu
Copy link

The above methods didn't help me. So I installed an older version: !pip install datasets==2.16.1
and import datasets worked!!

@mariosasko
Copy link
Collaborator

@rasith1998 @PennlaineChu You can avoid this issue by restarting the session after the datasets installation (see #6661 for more info)

Also, we've contacted Google Colab folks to update the default PyArrow installation, so the issue should soon be "officially" resolved on their side.

This was referenced Feb 16, 2024
@mariosasko
Copy link
Collaborator

Also, we've contacted Google Colab folks to update the default PyArrow installation, so the issue should soon be "officially" resolved on their side.

This has been done! Google Colab now pre-installs PyArrow 14.0.2, which makes this issue unlikely to happen, so I'm closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests