Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create streaming lazy dataframe from a generator #7514

Closed
AroneyS opened this issue Mar 12, 2023 · 0 comments
Closed

Create streaming lazy dataframe from a generator #7514

AroneyS opened this issue Mar 12, 2023 · 0 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@AroneyS
Copy link

AroneyS commented Mar 12, 2023

Problem description

I have a very long generator function that I want to process as a column using Polars. Due to its size, I want to run it in lazy streaming mode using the generator as a source, but I have been unable to work out how to do it (if it is possible).

Creating a normal dataframe and then converting to lazy obviously doesn't work since the generator is exhausted before the lazy plan is run with collect(). This also happens with the LazyFrame initialiser, which is just a shortcut to above.

Are there any other options that don't involve writing then scanning a csv?

Example code:

import polars as pl

def Generator():
    yield 1
    yield 2
    yield 3

generator = Generator()
df = pl.DataFrame({"a": generator}).lazy()

print(df)
# naive plan...

print([i for i in generator])
# []

generator2 = Generator()
df = pl.LazyFrame({"a": generator2})

print(df)
# naive plan...

print([i for i in generator2])
# []

Also posted https://stackoverflow.com/questions/75680581/how-to-stream-from-a-generator-to-a-polars-dataframe-and-subsequent-lazy-plan

@AroneyS AroneyS added the enhancement New feature or an improvement of an existing feature label Mar 12, 2023
@AroneyS AroneyS closed this as completed Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant