-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rmm::device_uvector in place of rmm::device_vector in cuIO #8151
Use rmm::device_uvector in place of rmm::device_vector in cuIO #8151
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.20 #8151 +/- ##
===============================================
- Coverage 82.88% 82.88% -0.01%
===============================================
Files 103 104 +1
Lines 17668 17899 +231
===============================================
+ Hits 14645 14836 +191
- Misses 3023 3063 +40
Continue to review full report at Codecov.
|
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Biggest change is to reverse the order of all MR, stream parameters so stream comes first, as in the rest of libcudf.
* | ||
* @return Boolean value; true if string is found, false otherwise | ||
*/ | ||
__host__ __device__ inline bool serialized_trie_contains(device_span<serial_trie_node const> trie, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a device_span
can be passed to __device__
code. Does this function really need to be __host__ __device__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be passed to kernels, we use this a lot in cuIO.
This function is used in CSV and JSON kernels to filter special values, so it needs to be __device__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I think device_span works in device code because or relaxed constexpr
. So we should watch out for the problems discussed in #7795
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small things.
cpp/src/io/json/json_gpu.cu
Outdated
auto d_column_infos = | ||
cudf::detail::make_zeroed_device_uvector_async<cudf::io::column_type_histogram>(num_columns, | ||
stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a very very mild optimization, but you could leave this uninitialized and then set everything to 0 in an else block to the if (do_set_null_count) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. I'm still wrapping my head around this part of the code-base, so I only have nitpicks at the moment.
@gpucibot merge |
Issue #7287
Replaces
device_vector
withdevice_uvector
. Additional changes were needed to provide the stream parameter at construction time. Reduced the mutable state of the JSON reader.Other changes: move trie implementation to correct location and fixed naming and namespace.
Because of changes to the trie, CSV and JSON are potentially impacted.
Measured impact: