-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null fixes on Arrow bridge #7411
Conversation
Summary: Ensure null_count is always set, add support for null constants, and enhance tests to check the bitmap values match null_count. Differential Revision: D50997553
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D50997553 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some questions though..
|
||
// If we're only exporting a subset, create a new validity buffer. | ||
if (rows.changed()) { | ||
nulls = AlignedBuffer::allocate<bool>(out.length, pool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should the size of this be out.length or vec.size() or rows.end() ? Are we guranteed to have out.length >= vec.size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgpai out.length is set as rows.count() and guaranteed to be smaller than vec.size() (since rows.count() is a subset of it)
|
||
// Set null counts. | ||
if (!rows.changed() && (vec.getNullCount() != std::nullopt)) { | ||
out.null_count = *vec.getNullCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is wrong, but my impression was always that null count isnt always up to date and is best effort - maybe we should just use countNulls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getNullCount() is an std::optional. In many cases it's not set, but when it is set we ensure it is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I was uncertain if we ever unset it after setting it if some operation takes place on the vector. I will take your word on this though.
This pull request has been merged in 8542bf8. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
Ensure null_count is always set, add support for null constants, and
enhance tests to check the bitmap values match null_count.
Differential Revision: D50997553