Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan zipped files #9601

Open
gab23r opened this issue Jun 28, 2023 · 2 comments
Open

Scan zipped files #9601

gab23r opened this issue Jun 28, 2023 · 2 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@gab23r
Copy link
Contributor

gab23r commented Jun 28, 2023

Problem description

I wish I could use polars to scan zipped csv (and more ?) files.

This exemple works with read_csv but fails with scan_csv

import os
import shutil


df = pd.DataFrame({'col': [126.3263, 45.23874]})

# create zip
os.mkdir('tmp')
df.to_csv('./tmp/tmp.csv')
shutil.make_archive('myzip', 'zip', 'tmp')

# try to read zipped_file
with zipfile.ZipFile('myzip.zip') as zipFile:
    df = pl.scan_csv(zipFile.read('tmp.csv'))

@gab23r gab23r added the enhancement New feature or an improvement of an existing feature label Jun 28, 2023
@sm-Fifteen
Copy link

Scan needs to recieve a path, whereas zipfile requires supplying Polars with a file handle to the internal file location, because your zip could contain more than one file. Even on files you can get an unambiguous path towards, though, like czv.gz and csv.xz, scan_csv will actually refuse to read those and ask you to use read_csv instead (see #7287).

@neverlink
Copy link

read_csv can read singlular compressed files just fine. But when globbing, scan_csv gets called, causing it to give up.
Not sure why this doesn't work in the current implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants