-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use compute.FilterRecord instead of bitmask and manually filtering #672
Comments
Sorry, apparently fat fingered the close button. I played around with this today func filter(ctx context.Context, pool memory.Allocator, filterExpr BooleanExpression, ar arrow.Record) (arrow.Record, bool, error) {
bitmap, err := filterExpr.Eval(ar)
if err != nil {
return nil, true, err
}
if bitmap.IsEmpty() {
return nil, true, nil
}
// Construct filter array
// NOTE: this is intermediary right now. The Eval function should return a boolean array instead so we can directly pass it in.
bldr := array.NewBooleanBuilder(pool)
defer bldr.Release()
for i := 0; i < int(ar.NumRows()); i++ {
bldr.Append(bitmap.Contains(uint32(i)))
}
filterArr := bldr.NewArray()
defer filterArr.Release()
result, err := compute.FilterRecordBatch(ctx, ar, filterArr, compute.DefaultFilterOptions())
if err != nil {
return nil, true, err
}
return result, false, nil
} And ended up with a panic --- FAIL: Test_DB_All (0.00s)
db_test.go:2085:
Error Trace: /Users/thor/go/src/github.com/polarsignals/frostdb/db_test.go:2085
Error: Received unexpected error:
not implemented: function 'array_take' has no kernel matching input types (dictionary<values=utf8, indices=uint32, ordered=false>, uint16)
Test: Test_DB_All Looks like it's not implemented for all types. |
That is unfortunate. Should we open an issue on apache arrow for this? |
Yea and link it here so we know if/when we can actually implement this. |
No need for this. We can in fact implement it now. Basically, this is how filter on records works
So, actually @thorfour solution is correct, we just need to be smart with So, @thorfour can you please open the PR with your changes ? I will help and make sure we massage it until it works for any case we currently have, we will probably need to run benchmarks as well to make sure we don't introduce regressions. |
I took another look, we already have Something like
So steps will be
|
Quick check says it is not a simple change. I'm taking this task. Will submit supplementary patches to make it possible. |
This is supplementary patch needed for polarsignals#672. I am submitting it separately to simplify reviewing.
The result arrow.Record has the same number of rows as the the input indices array. This is part of polarsignals#672
`r` and `indices` are externally managed resources. Owning them increases ref count ,since we never releases them they will leak when `r` has no dictionary field. There is no need to own these resources inside `Take`. This is part of polarsignals#672
😓 finally I have this working. I will wait for supplementary patches to land then I will drop the PR. |
This is supplementary patch needed for #672. I am submitting it separately to simplify reviewing.
The result arrow.Record has the same number of rows as the the input indices array. This is part of polarsignals#672
The result arrow.Record has the same number of rows as the the input indices array. This is part of #672
`r` and `indices` are externally managed resources. Owning them increases ref count ,since we never releases them they will leak when `r` has no dictionary field. There is no need to own these resources inside `Take`. This is part of polarsignals#672
`r` and `indices` are externally managed resources. Owning them increases ref count ,since we never releases them they will leak when `r` has no dictionary field. There is no need to own these resources inside `Take`. This is part of #672
Done #697 . I wish someone with access to production like workload would do some benchmarks and give us numbers. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
A lot of the filters use a bitmask and then manually filtering the record batch. I think this could be simplified by using the built in
compute.FilterRecord
to do all the work. I'm not 100% if it would improve performance or not, but I think it should give a memory improvement.The text was updated successfully, but these errors were encountered: