Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short circuit && on subset? #3427

Closed
kescobo opened this issue Mar 1, 2024 · 1 comment
Closed

Short circuit && on subset? #3427

kescobo opened this issue Mar 1, 2024 · 1 comment

Comments

@kescobo
Copy link
Contributor

kescobo commented Mar 1, 2024

Prompted by the closing of JuliaData/DataFramesMeta.jl#310 - I thought I'd mention this here.

At the moment, each conditional of subset is treated as and, but they don't short-circuit (that is, if the first term is false, subsequent terms are tested anyway). You can of course use a short-circuit conditional manually, but it might be nice if this could be done by default for ByRow operations.

I suspect that there may be too much of a change to the way the code works to make this change worth it, especially since the short-circuiting behavior can be achieved through other means, but it could provide a modest speed-up.

julia> df = DataFrame(x = repeat(["ab", "ac", "bc"], 10_000));

julia> @btime subset($df,
                  "x"=> startswith('a') |> ByRow,
                  "x"=> endswith('b') |> ByRow,
                  "x"=> contains('c') |> ByRow);
  501.902 μs (273 allocations: 106.80 KiB)

julia> @btime subset($df,
                  "x"=> ByRow(x->
                       startswith(x, 'a') &&
                       endswith(x, 'b') &&
                       contains(x, 'c')))
  247.928 μs (125 allocations: 42.91 KiB)
@bkamins
Copy link
Member

bkamins commented Mar 2, 2024

The issue is that, in general the conditions passed to subset could be evaluated in parallel (they are not now in all cases but are in some). This renders short-circuting infeasible. As you commented - if one wants short-circut behavior it has to be hard-coded in the function.

Thank you for reporting.

@bkamins bkamins closed this as completed Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants