Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: regression: pandas.read_stata(filename, iterator=True) raises ValueError #37280

Closed
2 of 3 tasks
remram44 opened this issue Oct 20, 2020 · 5 comments · Fixed by #37302
Closed
2 of 3 tasks

BUG: regression: pandas.read_stata(filename, iterator=True) raises ValueError #37280

remram44 opened this issue Oct 20, 2020 · 5 comments · Fixed by #37302
Labels
Bug IO Stata read_stata, to_stata Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@remram44
Copy link

remram44 commented Oct 20, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

>>> import pandas
>>> # for example https://gitlab.com/ViDA-NYU/datamart/datamart/-/blob/master/tests/data/stata118.dta
>>> iterator = pandas.read_stata(stata_file_name, iterator=True)
>>> list(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "site-packages/pandas/io/stata.py", line 1523, in __next__
    raise ValueError(
ValueError: chunksize must be set to a positive integer to use as an iterator.

Problem description

read_stata(filename, iterator=True) no longer works in pandas 1.1.3. It worked in pandas 1.0.5.

Expected Output

DataFrame is loaded correctly

Output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit           : db08276bc116c438d3fdee492026f8223584c477
python           : 3.8.5.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.0-51-generic
Version          : #56-Ubuntu SMP Mon Oct 5 14:28:49 UTC 2020
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.1.3
numpy            : 1.19.2
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 44.0.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : 3.2.1
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.14.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.4
fastparquet      : None
gcsfs            : 0.7.1
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pyxlsb           : None
s3fs             : 0.4.2
scipy            : 1.5.3
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 1.2.0
xlwt             : None
numba            : None
@remram44 remram44 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 20, 2020
@remram44 remram44 changed the title BUG: BUG: regression: pandas.read_stata(filename, iterator=True) raises ValueError Oct 20, 2020
@remram44
Copy link
Author

(apologies for the forgotten title, that's a first...)

I have checked that it happens on 1.1.0 and master as well. So the problem must have been introduced between 1.0.5 and 1.1.0.

#31072 looked like a prime suspect but b54aaf7 works for me.

@remram44
Copy link
Author

Bisecting shows that 035e1fe is the culprit (#34128).

@bashtage
Copy link
Contributor

Default chunksize needs to be set to 1.

if self._chunksize is None:

@bashtage bashtage added IO Stata read_stata, to_stata Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 20, 2020
@bashtage
Copy link
Contributor

Workaround is to pass a chunksize.

@remram44
Copy link
Author

Thanks for the workaround, I might do this in the interim.

remram44 added a commit to VIDA-NYU/auctus that referenced this issue Oct 20, 2020
Set chunksize to work around pandas-dev/pandas#37280.
bashtage added a commit to bashtage/pandas that referenced this issue Oct 21, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy

closes pandas-dev#37280
@jreback jreback added this to the 1.1.4 milestone Oct 21, 2020
bashtage added a commit to bashtage/pandas that referenced this issue Oct 21, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
bashtage added a commit to bashtage/pandas that referenced this issue Oct 21, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
bashtage added a commit to bashtage/pandas that referenced this issue Oct 21, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
bashtage added a commit to bashtage/pandas that referenced this issue Oct 23, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
bashtage added a commit to bashtage/pandas that referenced this issue Oct 23, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
jreback pushed a commit that referenced this issue Oct 23, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes #37280
JulianWgs pushed a commit to JulianWgs/pandas that referenced this issue Oct 26, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
Remvoe error message inorrectl added
Fixed new issues identified by mypy
Add test to ensure conversion of large ints is correct

closes pandas-dev#37280
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Stata read_stata, to_stata Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants