-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Adds fill, validityAsBooleanVector, replaceNulls, and isNull/isNotNull [skip ci] #2129
Conversation
Out of curiosity, how is |
@jrhemstad we filled a boolean vector with |
@abellina when the PR is ready for review please tag it, but also put the [REVIEW] in the title. Java is also not tied into the CI system yet, so until that happens you can put [skip ci] at the end to avoid running the ci tests. |
Ah, okay, that makes sense. It'd certainly be more efficient to implement this in a custom kernel. I think your solution makes sense in the medium term. Would you mind creating an issue to add this as a native libcudf feature? |
@jrhemstad will do. It would be best if isNull and isNotNull also moved to libcudf as unary ops. I'll create issues for these as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little nervous about fill being in place, simply because none of our other code expects to have the contents of the vector change underneath it and we have to do things to update the cached java-side code etc. The way we are using it now is fine, but we may want to discuss a bit more about if we do expose it publicly how are we going to handle it so we don't have issues later on, and if we don't expose it then we should fold fill and fromScalar into a single function and remove a lot of the overlap between them.
I had requested @abellina to file an issue first so we could discuss the best approach for this in libcudf before implementing it... |
All of these things seem useful (except |
@harrism, for
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is conditional on approval from @harrism too.
I really would like to see isNull and isNotNull go in, because it is blocking us doing other things with Spark, but at the same time, I want to be sure that we do it right eventually so waiting until we know how to do it right is fine.
I'm fine with you guys proceeding as long as you replace your code with calls to the libcudf version once it exists. |
I added a commit to remove the overwrite I did for null_count (after fill), as we now have #2142 merged. @tgravescs could you give it a look? A follow up pr should add the java |
Actually, please hold on merging this. I think I found something else. |
@tgravescs ok it's good now. I had to set the bitmask to all 1's before sending it to |
@tgravescs comments addressed. |
@kkraus14 I think we are good to go. Also, should we just not do [skip ci] s.t. pull requests don't get stuck here? |
This PR adds support in the java api for:
cudf::fill
for the full vector caseColumnVector.replaceNulls
: to mirrorcudf::replace_nulls
ColumnVector.validityAsBooleanVector
: Obtaining the validity bitmask as a booleanColumnVector
. Note: this function will go away if libcudf adds support forisNull
as a column operation.ColumnVector.isNull/isNotNull
: required when filtering nullable columnsColumnVector.fromScalar
: given aScalar
and a size, returns aColumnVector
of that size filled with the scalar's value