Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "filename" column option when reading multiple CSVs with globbing #9096

Closed
Tiv0w opened this issue May 29, 2023 · 2 comments
Closed

Add a "filename" column option when reading multiple CSVs with globbing #9096

Tiv0w opened this issue May 29, 2023 · 2 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Tiv0w
Copy link

Tiv0w commented May 29, 2023

Problem description

I'd like to use Polars (in Python) to read multiple CSVs at once, using a glob pl.read_csv('*.csv'), but all those files have exactly the same structure, and are only identifiable via their respective filename.

For now, my code is:

    files = glob.glob('./*.csv')
    data = pl.concat(
        [
            pl.read_csv(f).with_columns(pl.lit(os.path.basename(f)).alias("Symbol"))
            for f in files
        ]
    )

It would be great if an add_filename option could be added to the read_csv/scan_csv functions.
If set, this would include an extra filename column in the resulting DataFrame/LazyFrame, so further processing can be done with this information.

@Tiv0w Tiv0w added the enhancement New feature or an improvement of an existing feature label May 29, 2023
@ritchie46
Copy link
Member

Your code is already a fine solution isn't it? No need to put it in polars itself.

@Tiv0w
Copy link
Author

Tiv0w commented Jul 20, 2023

Yeah, realistically, it's fine. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants