Add keep option to distinct nvbench #16497

bdice · 2024-08-05T22:40:01Z

Description

This PR adopts some work from @srinivasyadav18 with additional modifications. This is meant to complement #16484.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

srinivasyadav18 · 2024-08-05T23:10:26Z

cpp/benchmarks/stream_compaction/distinct.cpp

-  .add_int64_axis("NumRows", {10'000, 100'000, 1'000'000, 10'000'000});
+  .add_string_axis("keep", {"any", "first", "last", "none"})
+  .add_int64_axis("cardinality",
+                  {100, 1'000, 10'000, 100'000, 1'000'000, 10'000'000, 100'000'000, 1'000'000'000})


We can just decrease the default values in the axis.

PointKernel

One non-blocking suggestion

LGTM.

cpp/benchmarks/stream_compaction/distinct.cpp

GregoryKimball · 2024-08-07T16:21:56Z

cpp/benchmarks/stream_compaction/distinct.cpp

+  if (cardinality > num_rows) {
+    state.skip("cardinality > num_rows");
+    return;
+  }


I would prefer to omit this skipping condition. I recognize that we can't have 1M distinct elements in 1K rows, but this condition adds a lot of friction when sweeping NumRows for the high cardinality case. It forces me to run a full factorial of matching NumRows and Cardinality values and filter the outputs for the highest Cardinality unskipped for each NumRows.

I'll rewrite this logic. Thanks for the feedback!

Hmm. @GregoryKimball I reviewed the NVBench docs and I don't see a way to filter out certain jobs except by skipping them. https://github.com/NVIDIA/nvbench/blob/main/docs/benchmarks.md#beware-combinatorial-explosion-is-lurking

We might be able to use a string axis like {"100,100", "100,1000", ..., "1000000000,1000000000"} and parse it, but that's hard to maintain.

I don't see a way to filter out certain jobs except by skipping them.

NVIDIA/nvbench#80 can solve this issue but the PR has been stalled for a while.

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

GregoryKimball · 2024-08-07T20:34:23Z

Thanks guys, here is a performance snapshot of these benchmarks on A100

I would like to merge this update ASAP and then have @srinivasyadav18 pull the changes into #16484

I'm noticing that these throughput numbers on A100 are about 10x lower than what @srinivasyadav18 posted for H100. This is another reason I'm interested in running wider benchmarks on #16484

cpp/benchmarks/stream_compaction/stable_distinct.cpp

bdice

@srinivasyadav18 Can you approve?

…inct-benchmark-keep

srinivasyadav18

CI fail is still failing due styling issues.
Otherwise LGTM! Thanks!

bdice · 2024-08-07T21:57:01Z

Just fixed the CI style check. I'll merge this when CI passes.

bdice · 2024-08-07T21:57:10Z

/merge

add keep option to distinct nvbench

40e88c7

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 5, 2024

bdice added 2 commits August 5, 2024 15:53

Update benchmarks, apply clang-format.

8d0c8eb

Update stable_distinct.

2885000

bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 5, 2024

bdice self-assigned this Aug 5, 2024

bdice marked this pull request as ready for review August 5, 2024 23:08

bdice requested a review from a team as a code owner August 5, 2024 23:08

bdice requested review from mythrocks and vuule August 5, 2024 23:08

srinivasyadav18 reviewed Aug 5, 2024

View reviewed changes

PointKernel approved these changes Aug 5, 2024

View reviewed changes

cpp/benchmarks/stream_compaction/distinct.cpp Outdated Show resolved Hide resolved

Move get_keep to common file.

4f80d1d

github-actions bot added the CMake CMake build issue label Aug 6, 2024

GregoryKimball reviewed Aug 7, 2024

View reviewed changes

Shrink benchmark axes.

e43cd10

Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>

bdice commented Aug 7, 2024

View reviewed changes

cpp/benchmarks/stream_compaction/stable_distinct.cpp Outdated Show resolved Hide resolved

Shrink benchmark axes for stable_distinct.

017f925

bdice commented Aug 7, 2024

View reviewed changes

bdice added 2 commits August 7, 2024 14:55

clang-format

b998449

Merge remote-tracking branch 'upstream/branch-24.10' into update-dist…

bfb924b

…inct-benchmark-keep

srinivasyadav18 approved these changes Aug 7, 2024

View reviewed changes

rapids-bot bot merged commit 1bbe440 into rapidsai:branch-24.10 Aug 8, 2024
80 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keep option to distinct nvbench #16497

Add keep option to distinct nvbench #16497

bdice commented Aug 5, 2024

srinivasyadav18 Aug 5, 2024

PointKernel left a comment

GregoryKimball Aug 7, 2024 •

edited

Loading

bdice Aug 7, 2024

bdice Aug 7, 2024 •

edited

Loading

PointKernel Aug 7, 2024 •

edited

Loading

GregoryKimball commented Aug 7, 2024 •

edited

Loading

bdice left a comment

srinivasyadav18 left a comment

bdice commented Aug 7, 2024

bdice commented Aug 7, 2024

Add keep option to distinct nvbench #16497

Add keep option to distinct nvbench #16497

Conversation

bdice commented Aug 5, 2024

Description

Checklist

srinivasyadav18 Aug 5, 2024

Choose a reason for hiding this comment

PointKernel left a comment

Choose a reason for hiding this comment

GregoryKimball Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

bdice Aug 7, 2024

Choose a reason for hiding this comment

bdice Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

PointKernel Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

GregoryKimball commented Aug 7, 2024 • edited Loading

bdice left a comment

Choose a reason for hiding this comment

srinivasyadav18 left a comment

Choose a reason for hiding this comment

bdice commented Aug 7, 2024

bdice commented Aug 7, 2024

GregoryKimball Aug 7, 2024 •

edited

Loading

bdice Aug 7, 2024 •

edited

Loading

PointKernel Aug 7, 2024 •

edited

Loading

GregoryKimball commented Aug 7, 2024 •

edited

Loading