Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rmm::device_uvector in place of rmm::device_vector in cuIO #8151

Merged
merged 18 commits into from
May 7, 2021

Conversation

vuule
Copy link
Contributor

@vuule vuule commented May 4, 2021

Issue #7287

Replaces device_vector with device_uvector. Additional changes were needed to provide the stream parameter at construction time. Reduced the mutable state of the JSON reader.

Other changes: move trie implementation to correct location and fixed naming and namespace.

Because of changes to the trie, CSV and JSON are potentially impacted.
Measured impact:

  • JSON - no impact
  • CSV - 5-10% faster
  • Parquet - no impact

@vuule vuule added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels May 4, 2021
@vuule vuule self-assigned this May 4, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 4, 2021
@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #8151 (acf283b) into branch-0.20 (51336df) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.20    #8151      +/-   ##
===============================================
- Coverage        82.88%   82.88%   -0.01%     
===============================================
  Files              103      104       +1     
  Lines            17668    17899     +231     
===============================================
+ Hits             14645    14836     +191     
- Misses            3023     3063      +40     
Impacted Files Coverage Δ
python/cudf/cudf/core/tools/datetimes.py 80.42% <0.00%> (-4.11%) ⬇️
python/cudf/cudf/core/column/decimal.py 90.64% <0.00%> (-2.29%) ⬇️
python/cudf/cudf/core/column/datetime.py 88.03% <0.00%> (-1.88%) ⬇️
python/cudf/cudf/core/column/struct.py 94.73% <0.00%> (-1.56%) ⬇️
python/cudf/cudf/utils/dtypes.py 82.20% <0.00%> (-1.24%) ⬇️
python/dask_cudf/dask_cudf/groupby.py 91.28% <0.00%> (-0.88%) ⬇️
python/cudf/cudf/core/series.py 91.17% <0.00%> (-0.56%) ⬇️
python/cudf/cudf/core/index.py 92.52% <0.00%> (-0.55%) ⬇️
python/cudf/cudf/core/column/column.py 88.20% <0.00%> (-0.44%) ⬇️
python/cudf/cudf/core/column/lists.py 86.98% <0.00%> (-0.43%) ⬇️
... and 28 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2207577...acf283b. Read the comment docs.

@github-actions github-actions bot added the CMake CMake build issue label May 4, 2021
@vuule vuule marked this pull request as ready for review May 4, 2021 21:16
@vuule vuule requested review from a team as code owners May 4, 2021 21:16
@vuule vuule requested review from mythrocks and nvdbaranec May 4, 2021 21:16
@vuule
Copy link
Contributor Author

vuule commented May 4, 2021

rerun tests

Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Biggest change is to reverse the order of all MR, stream parameters so stream comes first, as in the rest of libcudf.

cpp/include/cudf/io/detail/avro.hpp Outdated Show resolved Hide resolved
cpp/include/cudf/io/detail/avro.hpp Outdated Show resolved Hide resolved
cpp/src/io/json/reader_impl.cu Show resolved Hide resolved
*
* @return Boolean value; true if string is found, false otherwise
*/
__host__ __device__ inline bool serialized_trie_contains(device_span<serial_trie_node const> trie,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a device_span can be passed to __device__ code. Does this function really need to be __host__ __device__?

Copy link
Contributor Author

@vuule vuule May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be passed to kernels, we use this a lot in cuIO.
This function is used in CSV and JSON kernels to filter special values, so it needs to be __device__.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I think device_span works in device code because or relaxed constexpr. So we should watch out for the problems discussed in #7795

Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things.

Comment on lines 795 to 797
auto d_column_infos =
cudf::detail::make_zeroed_device_uvector_async<cudf::io::column_type_histogram>(num_columns,
stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a very very mild optimization, but you could leave this uninitialized and then set everything to 0 in an else block to the if (do_set_null_count) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

cpp/src/io/utilities/trie.cuh Show resolved Hide resolved
@vuule vuule requested review from harrism and nvdbaranec May 5, 2021 19:49
@vuule vuule requested a review from mythrocks May 6, 2021 20:46
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I'm still wrapping my head around this part of the code-base, so I only have nitpicks at the moment.

@vuule
Copy link
Contributor Author

vuule commented May 7, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b46913b into rapidsai:branch-0.20 May 7, 2021
@vuule vuule deleted the avoid-device-vector-cuio branch May 7, 2021 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants