Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: broadcasting of unit LHS in string operations #12737

Merged
merged 6 commits into from
Dec 28, 2023

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Nov 28, 2023

Would fix #12632.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Nov 28, 2023
@ritchie46
Copy link
Member

I don't think we should automatically broadcast. That has implicit performance implications. I asserted the invariant in #12738 and ensure that the caller upholds it.

@nameexhaustion
Copy link
Collaborator Author

nameexhaustion commented Nov 28, 2023

I don't think it's broadcasting Performance should be ok - I'm using std::iter::repeat and some macro magic to avoid materializing the entire column values :) (and also avoiding Box<dyn> :D).

@ritchie46
Copy link
Member

Ah, right. In that case I think it is a nice feature, but I think it should be in a special broadcasting variant of binary_element_apply.

Otherwise we will monomorphize both code variants where in many cases we only need one of the two.

@nameexhaustion nameexhaustion changed the title fix: fix broadcasting in binary elementwise apply fix: fix broadcasting in string operations Nov 28, 2023
@orlp
Copy link
Collaborator

orlp commented Nov 29, 2023

@nameexhaustion Why all the macros?

@nameexhaustion
Copy link
Collaborator Author

nameexhaustion commented Nov 29, 2023

Mainly to be able to apply a function to N inputs that are conditionally broadcasted using iter::repeat without heap allocating/dynamic dispatch (I was assuming we wanted to avoid doing that here).

Although given that we currently only need to handle 2 inputs, I suppose it could also be a function instead.

@orlp
Copy link
Collaborator

orlp commented Nov 29, 2023

Rather than using an indirect implementation using std::iter::repeat can't you just directly implement the broadcast?

@orlp orlp changed the title fix: fix broadcasting in string operations fix: broadcasting in string operations Nov 29, 2023
@nameexhaustion
Copy link
Collaborator Author

nameexhaustion commented Nov 29, 2023

Hmm - I could also just do ca.new_from_index(0) - it would make things much simpler, but it would materialize an entire array with the same values.

Otherwise I don't know of any better way to do the broadcasting o.o

@orlp
Copy link
Collaborator

orlp commented Nov 29, 2023

@nameexhaustion As an example in pseudocode, for binary_elementwise_broadcast(lhs, rhs, op), add an unary_elementwise(ca, op) function with similar semantics. Then if lhs.len() == 1 you call unary_elementwise(rhs, |r| op(lhs[0], r)) and if rhs.len() == 1 you call unary_elementwise(lhs, |l| op(l, rhs[0])).

Note to self: at some point in the near future I want to do a pass over the function names in arity.rs to give them a clearer naming scheme because right now it's a bit all over the place.

@nameexhaustion nameexhaustion marked this pull request as draft November 29, 2023 09:01
@nameexhaustion nameexhaustion marked this pull request as ready for review November 29, 2023 12:06
@nameexhaustion nameexhaustion marked this pull request as draft December 1, 2023 05:32
@nameexhaustion nameexhaustion marked this pull request as ready for review December 1, 2023 07:26
@nameexhaustion nameexhaustion changed the title fix: broadcasting in string operations fix: broadcasting of unit LHS in string operations Dec 4, 2023
Copy link
Collaborator

@orlp orlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being so picky 😅 But these functions could get used a lot so we should get them right.

V: PolarsDataType,
F: FnMut(Option<T::Physical<'a>>) -> Result<Option<K>, E>,
V::Array: ArrayFromIter<Option<K>>,
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm - this would mean requiring unary functions that take Option - i.e. F: FnMut(Option<T::Physical<'a>>) -> Result<Option<K>, E> to always behave as F(None) == None - as otherwise the fast-path would be incorrect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we could also cheaply elide this requirement by checking the output of F(input[0]) if input.len() == input.null_count()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. strictly speaking this would also require the function to be Fn instead of FnMut

V::Array: for<'a> ArrayFromIter<
<F as BinaryFnMut<Option<T::Physical<'a>>, Option<U::Physical<'a>>>>::Ret,
>,
{
Copy link
Collaborator

@orlp orlp Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can only do a fast-path here if both are full-null... Makes me wonder if this is a good idea after all :( We could still handle that wherever we call them though. Nevertheless, we should do a fast-path here for that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm - this would mean we require F: for<'a> BinaryFnMut<Option<T::Physical<'a>>, Option<U::Physical<'a>>> to always behave as F(None, None) == None

crates/polars-core/src/chunked_array/ops/arity.rs Outdated Show resolved Hide resolved
crates/polars-core/src/chunked_array/ops/arity.rs Outdated Show resolved Hide resolved
crates/polars-core/src/datatypes/static_array.rs Outdated Show resolved Hide resolved
@orlp
Copy link
Collaborator

orlp commented Dec 28, 2023

@nameexhaustion Could you resolve the merge conflicts? Then this is ready to go I think.

@nameexhaustion
Copy link
Collaborator Author

@orlp I've rebased

@orlp orlp merged commit e39a4d9 into pola-rs:main Dec 28, 2023
25 checks passed
@nameexhaustion nameexhaustion deleted the macro branch February 22, 2024 05:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.filter( pl.lit("some_str").str.contains( pl.col(...) ) does not filter properly
3 participants