Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.ExcelFile closes stream on destruction in pandas 1.0.0 #31467

Closed
johny-b opened this issue Jan 30, 2020 · 7 comments · Fixed by #32544
Closed

pd.ExcelFile closes stream on destruction in pandas 1.0.0 #31467

johny-b opened this issue Jan 30, 2020 · 7 comments · Fixed by #32544
Assignees
Labels
IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@johny-b
Copy link

johny-b commented Jan 30, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
print(pd.__version__)
a = open('some_file.xlsx', 'rb')
x = pd.ExcelFile(a)
del x
print(a.read())

Problem description

Above script behaves in different way in pandas 0.25.3 and 1.0.0:

$ python3 t1.py
0.25.3
b''
$ sudo pip3 install pandas --upgrade --quiet
$ python3 t1.py
1.0.0
Traceback (most recent call last):
  File "t1.py", line 6, in <module>
    print(a.read())
ValueError: read of closed file

It seems that stream is closed when ExcelFile is destroyed - and I don't see why.

Expected Output

I'd expect either notice in release notes, or the same output in 0.25.3 and 1.0.0.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.0.0-1028-gcp
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 9.0.1
setuptools : 39.0.1
Cython : None
pytest : 4.3.0
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.1
html5lib : 0.999999999
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : None
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 2.5.14
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 4.3.0
pyxlsb : None
s3fs : None
scipy : 1.2.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : None

@MarcoGorelli MarcoGorelli added the Regression Functionality that used to work in a prior pandas version label Jan 30, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.1 milestone Jan 31, 2020
@jorisvandenbossche jorisvandenbossche added the IO Excel read_excel, to_excel label Jan 31, 2020
@jorisvandenbossche
Copy link
Member

Possibly related to #30096 since that added a __del__ (cc @jbrockmendel), although I would assume that the ExcelFile.close() should not necessarily close the file handle (in which case it is not related to that PR).

@johny-b if you do x.close() instead of del x, is the file stream also closed?

@johny-b
Copy link
Author

johny-b commented Jan 31, 2020

x.close() closes stream on both pandas versions.

@WillAyd
Copy link
Member

WillAyd commented Jan 31, 2020

I don’t think this is a bug. del simply decrements a reference count but makes no guarantees around when the GC will actually destroy the object.

@jreback
Copy link
Contributor

jreback commented Feb 1, 2020

in read_csv where we track if an open stream is given (as opposed to a file), then can close only things that we opened. this might take a bit of work to fix though.

if someone wants to do this for 1.0.1 great, but won't consider this a blocker.

@jorisvandenbossche
Copy link
Member

Yes, the "bug" (or missing feature, how you want to call it) is that that close() should only close the file handle if it was opened by pandas itself.
The fact that now __del__ closes the file handle is only because __del__ now calls close(), which was a corect fix, but it surfaced the close issue.

@roberthdevries
Copy link
Contributor

take

@roberthdevries
Copy link
Contributor

roberthdevries commented Mar 8, 2020

It seems that the changes for #3441 (PR #4933) introduced this statement (in 2013). The changes have been carried forward to this day, but seem to be obsolete or already covered by other code that closes file handles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants