Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make python map_batches safer #13181

Merged
merged 1 commit into from
Dec 22, 2023
Merged

feat: make python map_batches safer #13181

merged 1 commit into from
Dec 22, 2023

Conversation

ritchie46
Copy link
Member

This will default to a safer version of map_batches. We will now assume you cannot run on batches. E.g. don't merely do an elementwise operation like + or pow, but do something that requires all data. E.g. sort, sum, rolling_min, etc.

map_batches will now not run by default on the streaming engine unless you set is_elementwise=True. In that case it can produce wrong results in a group-by, but hey, you said it was elementwise. 🤷

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Dec 21, 2023
@ritchie46
Copy link
Member Author

@MarcoGorelli deanm0000 FYI

@ritchie46 ritchie46 changed the title feat: make python map_bathches safer feat: make python map_batches safer Dec 21, 2023
@avimallu
Copy link
Contributor

avimallu commented Dec 22, 2023

@ritchie46, the explanation you gave in the issue isn't quite clear to me. I also think the documentation explanation should better explain what is_elementwise does, and explain somehow what a batch is in map_batches for this argument.

For example, the obvious next question that the user would think is "Why does map_batches not work in streaming mode by default? What the tradeoffs in me setting this argument to True?"

@ritchie46
Copy link
Member Author

Because some functions are not correct on a subset of the data. E.g. a sort needs all data to be correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants