Normalizing offsets iterator #14234

davidwendt · 2023-09-28T21:08:11Z

Description

Creates a normalizing offsets iterator that returns an int64 value given either a int32 or int64 column data.
Depends on #14206

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

DaceT · 2023-10-11T16:35:01Z

benchmark bot, please test this PR

Enables indexalator to be instantiated from device code. Also add gtests for the output indexalator. This change helps enable for the offset-normalizing-iterator #14234 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #14206

…fsets-iterator

davidwendt · 2023-11-02T17:26:37Z

This definitely increases compile time but I was able to minimize the impact somewhat. Here are the top 10 offenders:

File	Compile time	Difference
src/reductions/scan/scan_inclusive.cu.o	30:29 min	6:14 min
src/join/mixed_join_size_kernel_nulls.cu.o	28:15 min	15.970 s
src/join/mixed_join_kernel_nulls.cu.o	25:10 min	9.061 s
src/groupby/sort/group_argmin.cu.o	21:05 min	112.269 s
src/groupby/sort/group_argmax.cu.o	21:00 min	106.078 s
tests/table/row_operator_tests_utilities.cu.o	20:53 min	91.883 s
src/join/mixed_join_size_kernel.cu.o	20:31 min	40.165 s
src/sort/sort_column.cu.o	20:25 min	-10.141 s
src/sort/stable_sort_column.cu.o	20:19 min	-17.539 s
src/groupby/hash/groupby.cu.o	20:13 min	65.311 s

Negative numbers mean those 2 files compiled faster. The scan_inclusive jumped from 3rd place to 1st place.
I will look at improving some of these (by splitting out specializations) in a separate PR.

cpp/include/cudf/detail/indexalator.cuh

divyegala · 2023-11-06T17:16:18Z

cpp/include/cudf/detail/indexalator.cuh

+   */
+  struct normalize_type {
+    template <typename T, std::enable_if_t<cudf::is_index_type<T>()>* = nullptr>
+    __device__ cudf::size_type operator()(void const* tp)


Why is this not T const* tp?

Because that is not the type that is being passed to the function.

So to understand the impact of type_dispatcher on the reworked design, it seems to me like we are still using it but there's no cascading calls to type_dispatcher and it's only called exactly once. Is that correct?

Yes. We only call the type-dispatcher in the factory now.

Yes, and once when setting up the non-templated class input_indexelator::normalize_input. If you use a normal if-else dispatch there instead of type_dispatcher, are you able to see any benefits? Especially in src/reductions/scan/scan_inclusive.cu.o where there's a 6 minute compile-time increase

Thanks! Looks good.

But you are correct, the base class's type-dispatcher is still called inside every element() call.
I think that is worth considering here.

We'll have to study the assembly here. Is the type_dispatcher expanded only once when the class is compiled (so when the header is included) or is it expanded every time element() is called?

I created a separate ctor that just passes the width instead of type-dispatching to resolve it.
This did improved the compile time: https://downloads.rapids.ai/ci/cudf/pull-request/14234/ab0edb7/cuda12_x86_64.ninja_log.html

So we halved the compile time increment in scan_inclusive? That is good!

cpp/include/cudf/detail/indexalator.cuh

cpp/include/cudf/detail/normalizing_iterator.cuh

cpp/include/cudf/detail/indexalator.cuh

cpp/include/cudf/detail/offsets_iterator.cuh

cpp/include/cudf/detail/offsets_iterator_factory.cuh

cpp/include/cudf/detail/indexalator.cuh

cpp/include/cudf/detail/offsets_iterator.cuh

cpp/include/cudf/detail/offsets_iterator_factory.cuh

PointKernel

Some doc nits. LGTM

cpp/include/cudf/detail/indexalator.cuh

Splits out the `strings` and `struct` specializations in `scan_inclusive.cu` into separate source files to improve compile time. Each specialization is unique code with limited aggregation types. No functional changes. Just code moved around. Found while working on #14234 Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Nghia Truong (https://github.com/ttnghia) URL: #14358

davidwendt · 2023-11-13T14:05:14Z

/merge

davidwendt added 5 commits September 26, 2023 23:27

Enable indexalator for device code

32e1029

Merge branch 'branch-23.12' into indexalator-device-enable

1548588

return ref experiment

f6419b4

Merge branch 'branch-23.12' into indexalator-device-enable

b5b4449

Normalizing offsets iterator

0e369dd

davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 28, 2023

davidwendt self-assigned this Sep 28, 2023

davidwendt added 8 commits October 4, 2023 20:18

Merge branch 'branch-23.12' into indexalator-device-enable

e451116

23.12 baseline compile-time commit

7dcb134

Merge branch 'branch-23.12' into indexalator-device-enable

a248d75

undo temp change

88f6dff

use cudf::is_index_type

081cb84

use cudf::is_index_type part 2

a28a9ff

Merge branch 'branch-23.12' into indexalator-device-enable

df063b6

Merge branch 'indexalator-device-enable' into offsets-iterator

f5c898c

davidwendt mentioned this pull request Oct 6, 2023

Enable indexalator for device code #14206

Merged

3 tasks

davidwendt added 4 commits October 6, 2023 14:12

add offsetalator factory

ccc5bf5

Merge branch 'branch-23.12' into indexalator-device-enable

73c04d8

use size_t for index_sizeof_fn

eb586f4

Merge branch 'indexalator-device-enable' into offsets-iterator

4ba5a70

davidwendt added 3 commits October 12, 2023 09:15

Merge branch 'branch-23.12' into indexalator-device-enable

e76ec97

Merge branch 'branch-23.12' into indexalator-device-enable

1ae36f3

Merge branch 'indexalator-device-enable' into offsets-iterator

6fdeadf

davidwendt added 2 commits October 17, 2023 11:33

Merge branch 'branch-23.12' into offsets-iterator

84e7510

Merge branch 'branch-23.12' into offsets-iterator

2bfb3e1

davidwendt added 6 commits November 1, 2023 16:26

Merge branch 'offsets-iterator' of github.com:davidwendt/cudf into of…

a132e33

…fsets-iterator

add alignas

8ee15a3

Merge branch 'offsets-iterator' of github.com:davidwendt/cudf into of…

43c4984

…fsets-iterator

add more alignases

a97d061

re-enable device test

1ed5e29

remove unneeded test

5839fcd

change std::enable_if_t to CUDF_ENABLE_IF

e0d4e5f

davidwendt marked this pull request as ready for review November 2, 2023 23:35

add anonymous namespace around internal functor

6ec6258

davidwendt mentioned this pull request Nov 3, 2023

Split up scan_inclusive.cu to improve its compile time #14358

Merged

3 tasks

davidwendt requested a review from divyegala November 6, 2023 13:35

Merge branch 'branch-23.12' into offsets-iterator

8514a71

divyegala reviewed Nov 6, 2023

View reviewed changes

davidwendt added 3 commits November 6, 2023 14:56

remove type-dispatcher call from ctor

3a21fe1

Merge branch 'branch-23.12' into offsets-iterator

ab0edb7

Merge branch 'branch-23.12' into offsets-iterator

7b10cb7

divyegala reviewed Nov 7, 2023

View reviewed changes

cpp/include/cudf/detail/offsets_iterator_factory.cuh Outdated Show resolved Hide resolved

cpp/include/cudf/detail/offsets_iterator_factory.cuh Outdated Show resolved Hide resolved

PointKernel reviewed Nov 7, 2023

View reviewed changes

davidwendt added 2 commits November 7, 2023 19:13

removed unneeded dispatch in factory

f4634f9

Merge branch 'branch-23.12' into offsets-iterator

713d601

divyegala approved these changes Nov 8, 2023

View reviewed changes

Merge branch 'branch-23.12' into offsets-iterator

0d8caf1

PointKernel approved these changes Nov 8, 2023

View reviewed changes

cpp/include/cudf/detail/indexalator.cuh Outdated Show resolved Hide resolved

cpp/include/cudf/detail/indexalator.cuh Outdated Show resolved Hide resolved

davidwendt added 2 commits November 8, 2023 15:45

fix doxygen comments

56a73c5

Merge branch 'branch-23.12' into offsets-iterator

23a8bf6

rapids-bot bot merged commit 04d13d8 into rapidsai:branch-23.12 Nov 13, 2023
61 checks passed

davidwendt deleted the offsets-iterator branch November 13, 2023 14:05

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalizing offsets iterator #14234

Normalizing offsets iterator #14234

davidwendt commented Sep 28, 2023

DaceT commented Oct 11, 2023

davidwendt commented Nov 2, 2023

divyegala Nov 6, 2023

davidwendt Nov 6, 2023

divyegala Nov 6, 2023

davidwendt Nov 6, 2023

divyegala Nov 6, 2023

divyegala Nov 6, 2023

davidwendt Nov 6, 2023

divyegala Nov 6, 2023

davidwendt Nov 7, 2023

divyegala Nov 7, 2023

PointKernel left a comment

davidwendt commented Nov 13, 2023

Normalizing offsets iterator #14234

Normalizing offsets iterator #14234

Conversation

davidwendt commented Sep 28, 2023

Description

Checklist

DaceT commented Oct 11, 2023

davidwendt commented Nov 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

davidwendt commented Nov 13, 2023