Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas/io/feather_format.py should call use_threads instead of nthreads to prevent breakage in pyarrow 0.11.0 #23053

Closed
bvanderhaar opened this issue Oct 9, 2018 · 11 comments · Fixed by #23112
Labels
Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@bvanderhaar
Copy link

bvanderhaar commented Oct 9, 2018

Code Sample

d = {'one' : [1., 2., 3., 4.],
        'two' : [4., 3., 2., 1.]}
df = pandas.DataFrame(d)
df.to_feather('example.feather')

# with pyarrow 0.10.0 this succeeds with a deprecation warning
# with pyarrow 0.11.0 this errors with a TypeError: unexpected argument 'nthreads'
df = pandas.read_feather('example.feather')

# attempt to manually set nthreads results in TypeError: unexpectect argument 'nthreads'
df = pandas.read_feather('example.feather', nthreads=4)

# attempt to pass 'use_threads' results in TypeError: unexpected argument 'nthreads'
df = pandas.read_feather('example.feather', use_threads=True)

Problem description

Pandas introduced nthreads for reading feather files in issue 16359

With PyArrow 0.10.0 a deprecation warning is shown from this source: "nthreads argument is deprecated, pass use_threads instead"

When PyArrow version 0.11.0, Python errors with: TypeError: read_feather() got an unexpected keyword argument 'nthreads'.

I've searched with 'pyarrow' and 'nthreads' keywords and didn't see this issue posted.

Specifically feather-format.py line 112 should be changed to
return feather.read_dataframe(path, use_threads=True) or changing the method signature to all overriding use_threads:
return feather.read_dataframe(path, use_threads=use_threads)
I will submit a PR if the only barrier to fix is code effort.

Expected Output

I expect no error output upon running pandas.read_feather() with PyArrow 0.11.0

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 40.3.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.10.0
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label Compat pandas objects compatability with Numpy or Python functions labels Oct 9, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Oct 9, 2018
@TomAugspurger
Copy link
Contributor

AFAICT, pyarrow doesn't have nightly builds we can test against, so I'm not sure what the best way to test this is. Will probably just have to be manual until 0.11 is released.

@bvanderhaar
Copy link
Author

bvanderhaar commented Oct 9, 2018

Hmmm, well it's in PyPi but the mailing list indicates its an RC.

I've been waiting on a fix for another issue to be fixed with PyArrow on Python 3.7, so I may be one of the first to the punch on trying it out.

@xhochy
Copy link
Contributor

xhochy commented Oct 9, 2018

pyarrow has nightly conda packages on the twosigma channel.

pyarrow==0.11 is released so you can use it to test against it. We're missing Python 3.7 wheels at the moment but at least for Linux, these will appear in the next days.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 9, 2018 via email

@johnolos
Copy link

We are seeing the same issue as tests were broken since pyarrow 0.11.0. Using use_threads=True seem to fix this issue for us.

@TomAugspurger
Copy link
Contributor

@johnolos thanks. AFAIK, no one has submitted a PR updating pandas. If you could do so we'll include it in the next pandas release.

@ingwinlu
Copy link
Contributor

i provided a pr addressing the issue, however it is not clear to me if I should change the ci deps as well to require pyarrow 0.11.0.

@zhouyan
Copy link

zhouyan commented Oct 15, 2018

Now this is an hard error when used against pyarrow 1.11.0

@bartolsthoorn
Copy link

bartolsthoorn commented Oct 16, 2018

Work-around might be useful to some people:

import feather
frame = feather.read_dataframe('filename.feather')

jorisvandenbossche pushed a commit that referenced this issue Nov 1, 2018
The nthreads argument is no longer supported since pyarrow 0.11.0 and
was replaced with use_threads.
Hence we deprecate the argument now as well so we can remove it in the
future.

This commit also:
- removes feather-format as a dependency and replaces it with usage of
  pyarrow directly.
- sets CI dependencies to respect the changes above.

We test backwards compatibility with pyarrow 0.9.0 as conda does not
provide a pyarrow 0.10.0 and the conda-forge version has comatibility
issues with the rest of the installed packages.

Resolves #23053.
Resolves #21639.
tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018
The nthreads argument is no longer supported since pyarrow 0.11.0 and
was replaced with use_threads.
Hence we deprecate the argument now as well so we can remove it in the
future.

This commit also:
- removes feather-format as a dependency and replaces it with usage of
  pyarrow directly.
- sets CI dependencies to respect the changes above.

We test backwards compatibility with pyarrow 0.9.0 as conda does not
provide a pyarrow 0.10.0 and the conda-forge version has comatibility
issues with the rest of the installed packages.

Resolves pandas-dev#23053.
Resolves pandas-dev#21639.
@andersrmr
Copy link

#23112 So the fix will show up in the next release of pandas/pyarrow?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 4, 2018 via email

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
The nthreads argument is no longer supported since pyarrow 0.11.0 and
was replaced with use_threads.
Hence we deprecate the argument now as well so we can remove it in the
future.

This commit also:
- removes feather-format as a dependency and replaces it with usage of
  pyarrow directly.
- sets CI dependencies to respect the changes above.

We test backwards compatibility with pyarrow 0.9.0 as conda does not
provide a pyarrow 0.10.0 and the conda-forge version has comatibility
issues with the rest of the installed packages.

Resolves pandas-dev#23053.
Resolves pandas-dev#21639.
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
The nthreads argument is no longer supported since pyarrow 0.11.0 and
was replaced with use_threads.
Hence we deprecate the argument now as well so we can remove it in the
future.

This commit also:
- removes feather-format as a dependency and replaces it with usage of
  pyarrow directly.
- sets CI dependencies to respect the changes above.

We test backwards compatibility with pyarrow 0.9.0 as conda does not
provide a pyarrow 0.10.0 and the conda-forge version has comatibility
issues with the rest of the installed packages.

Resolves pandas-dev#23053.
Resolves pandas-dev#21639.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants