Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SpooledTemporaryFile no longer working with read_csv #43439

Closed
2 of 3 tasks
oskli opened this issue Sep 7, 2021 · 4 comments · Fixed by #43447
Closed
2 of 3 tasks

BUG: SpooledTemporaryFile no longer working with read_csv #43439

oskli opened this issue Sep 7, 2021 · 4 comments · Fixed by #43447
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@oskli
Copy link

oskli commented Sep 7, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import os
import pandas
import tempfile
import traceback


with tempfile.SpooledTemporaryFile() as f:
    f.write(b'abcd')
    f.seek(0)

    try:
        result = pandas.read_csv(f)
        print(result)

    except Exception as e:
        print(e)
        print(traceback.format_exc())

Issue Description

This worked in pandas 1.1.5 but is no longer working in version >=1.2.0.

Expected Behavior

According to the read_csv description I would expect this to work.

filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv.
If you want to pass in a path object, pandas accepts any os.PathLike.
By file-like object, we refer to objects with a read() method, such as a file handle (e.g. via builtin open function) or StringIO.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 5f648bf python : 3.7.9.final.0 python-bits : 64 OS : Darwin OS-release : 20.5.0 Version : Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 machine : x86_64 processor : i386 byteorder : little LC_ALL : en_US.UTF-8 LANG : None LOCALE : en_US.UTF-8

pandas : 1.3.2
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 20.1.1
setuptools : 47.1.0
IPython : 7.27.0

@oskli oskli added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 7, 2021
@twoertwein
Copy link
Member

It fails at

handle = TextIOWrapper(

with
AttributeError: 'SpooledTemporaryFile' object has no attribute 'readable'

@twoertwein twoertwein added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 7, 2021
@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Sep 7, 2021
@twoertwein
Copy link
Member

twoertwein commented Sep 7, 2021

In 1.1
The binary SpooledTemporaryFile objects works with engine='c' (the default engine) in 1.1 but it doesn't work with engine='python'. tempfile.SpooledTemporaryFile(mode="w+t") works also only with the c-engine.

In 1.3
The binary SpooledTemporaryFile objects does not work with any engine (it should work with the pyarrow engine in 1.4?).
tempfile.SpooledTemporaryFile(mode="w+t") works with the c-engine but still fails with the python engine.

SpooledTemporaryFile does not implement __next__ which makes it incompatible with the python engine.

If tempfile.SpooledTemporaryFile(mode="w+t") with engine="c" doesn't work for you, the only workaround on Pandas's side I can think of is add a blacklist/user-facing argument to skip the text wrapping.

edit: it might be easier to skip the text-wrapping if the object doesn't have the "readable" attribute (hoping that doesn't break other objects) or simply put the TextIOWrapper call into a try except block.

@oskli
Copy link
Author

oskli commented Sep 8, 2021

In 1.1
The binary SpooledTemporaryFile objects works with engine='c' (the default engine) in 1.1 but it doesn't work with engine='python'. tempfile.SpooledTemporaryFile(mode="w+t") works also only with the c-engine.

In 1.3
The binary SpooledTemporaryFile objects does not work with any engine (it should work with the pyarrow engine in 1.4?).
tempfile.SpooledTemporaryFile(mode="w+t") works with the c-engine but still fails with the python engine.

SpooledTemporaryFile does not implement __next__ which makes it incompatible with the python engine.

If tempfile.SpooledTemporaryFile(mode="w+t") with engine="c" doesn't work for you, the only workaround on Pandas's side I can think of is add a blacklist/user-facing argument to skip the text wrapping.

edit: it might be easier to skip the text-wrapping if the object doesn't have the "readable" attribute (hoping that doesn't break other objects) or simply put the TextIOWrapper call into a try except block.

Sorry, I should have added some more details. Thanks for clarifying this.

To me it seems like the _open_handles in BaseParser,

self.handles = get_handle(
, could accept the is_text argument.
def get_handle(

According to the docs

is_text : bool, default True
    Whether the type of the content passed to the file/buffer is string or
    bytes. This is not the same as `"b" not in mode`. If a string content is
    passed to a binary file/buffer, a wrapper is inserted.

@twoertwein
Copy link
Member

To fix the 1.3 release, we probably can't introduce new keyword arguments - using a try-exceptor hasattr should make it work on 1.3. For 1.4, we probably have more options - my favorite is to mirror the mode keyword from to_csv. In your case, you would then pass mode='tr' to force pandas to treat the handle as a readable text handle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants