Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclosed file on EmptyDataError #31488

Closed
fominok opened this issue Jan 31, 2020 · 1 comment · Fixed by #32598
Closed

Unclosed file on EmptyDataError #31488

fominok opened this issue Jan 31, 2020 · 1 comment · Fixed by #32598
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@fominok
Copy link

fominok commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

import pandas
import psutil

proc = psutil.Process()
try:
    data = pandas.read_csv('kek.csv', engine='python')  # kek.csv is an empty file
except pandas.errors.EmptyDataError:
    pass

print(proc.open_files())

Problem description

Hi, there might be an unwanted behavior on attempt to read an empty .csv file, providing a
filename instead of handler, so it cannot be closed anymore by user on exception.

Expected Output

[]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.14.13-1.el7.elrepo.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.2
setuptools : 45.1.0
Cython : 0.29.14
pytest : 5.0.1
hypothesis : 4.32.2
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : 0.9.2
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.1
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : 0.15.0
pytables : None
pytest : 5.0.1
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
tabulate : 0.8.3
xarray : 0.12.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
numba : 0.45.1

@jorisvandenbossche jorisvandenbossche added Bug IO CSV read_csv, to_csv labels Jan 31, 2020
@roberthdevries
Copy link
Contributor

It appears that in the __init__ method of PythonParser, the call to _infer_columns can raise exception EmptyDataError.
This is not caught there and the file resource allocated in get_handle is never released here.
Simple fix is to put a try/except block around the _infer_columns call and to call close() to release all allocated file handles. But that leaves the rest of the __init__ code unguarded for other exceptions. So should we put a very long try/except clause around the remaining code in __init__ just to be sure?
So just put a try/except around _infer_columns or from the self.handles.extend() call after get_handle to the end of __init__?

@jreback jreback added this to the 1.1 milestone Mar 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants