Write variable bit-width keys for Parquet dictionary encoded pages by mhaseeb123 · Pull Request #22279 · rapidsai/cudf

mhaseeb123 · 2026-04-23T21:38:43Z

Description

Contributes to #13995

This PR enables the Parquet writer to assign lower dictionary indices to elements appearing earlier in each column chunk thereby writing variable (reduced) number of dictionary key bits for earlier pages.

New algorithm:

map_insert_fn now inserts this pair in the static map: {row_idx, frag_idx} instead of {row_idx, row_idx}. frag_idx is just blockIdx.x and indicates which page fragment actually inserted this entry in the static map (CAS race dependent but earlier thread blocks generally win this race)
map_insert_fn also writes the PageFragment::num_dict_vals field for each page fragment to keep a record of number of unique values inserted by it.
Then in collect_map_entries_kernel (where we assign keys to dict values), we optionally assign spatially-local keys to dictionary values inserted by the same page fragment. See algorithm details here
Finally, we launch compute_page_dict_bits_kernel to compute the max bit width required to encode dict keys for each parquet page. See algorithm here

See follow up PR #22323 that deterministically writes the index of the first page fragment that sees each dictionary key

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Repurpose the unused mapped_type slot in populate_chunk_hash_maps_kernel to carry the fragment index of the block that first inserts each value. Rewrite collect_map_entries_kernel to bucket dict_ids by that fragment index so pages that only reference earlier fragments' values see small max dict_index. No file-size delta in isolation; prerequisite for the per-page bit-width change landing next. Made-with: Cursor

…ai#13995) Each data page now RLE-encodes dictionary indices using ceil(log2(page_max_dict_index + 1)) bits instead of the chunk-wide maximum. Combined with the first-appearance dict_id ordering from the prior commit, this closes the ~30% file-size gap vs Spark on moderate-cardinality INT64 and STRING workloads. Page size estimation continues to use the chunk-wide bits as a conservative upper bound; the dictionary page itself is unaffected. Made-with: Cursor

copy-pr-bot · 2026-04-23T21:38:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates the Parquet writer’s dictionary encoding to support smaller (variable) RLE bit-widths on earlier data pages by biasing dictionary id assignment toward earlier page fragments, and adds validation/benchmarking around the new behavior.

Changes:

Assign dictionary ids using a fragment-locality hint so earlier fragments tend to get smaller ids.
Compute and store per-page dict_rle_bits and use it during page encoding.
Add/adjust tests and introduce an nvbench benchmark to measure the effect.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
cpp/src/io/parquet/chunk_dict.cu	Implements fragment-hinted dictionary id assignment and adds per-page dict bit-width computation kernel.
cpp/src/io/parquet/page_enc.cu	Switches dict encoding to use per-page `dict_rle_bits` for data pages.
cpp/src/io/parquet/writer_impl.cu	Wires page-boundary finalization to per-page bit-width computation.
cpp/src/io/parquet/parquet_gpu.hpp	Extends GPU structs with fragment count and per-page `dict_rle_bits`.
cpp/src/io/parquet/parquet_gpu.cuh	Updates APIs for fragment mutation and adds `compute_per_page_dict_bits` declaration.
cpp/tests/io/parquet_writer_test.cpp	Adds a test asserting variable per-page dictionary bit-width behavior.
cpp/tests/io/parquet_misc_test.cpp	Updates dictionary test to validate the max bit-width across pages (not only first page).
cpp/benchmarks/io/parquet/parquet_writer_dict.cpp	Adds benchmark to measure encoding perf and resulting per-page dict bit stats.
cpp/benchmarks/CMakeLists.txt	Registers the new parquet dictionary writer benchmark.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mhaseeb123 · 2026-04-28T19:40:21Z

+          // TODO(mh): Here we insert the fragment index of the CAS winner, which may not be the
+          // smallest one (relies on monotonic block scheduling). Switch to static_map's
+          // `insert_or_apply` with `cuco::op::min` for deterministic first-fragment semantics
+          is_unique = map_insert_ref.insert(slot_type{static_cast<key_type>(val_idx), frag_idx});


Thanks, captain obvious 😄

mhaseeb123 · 2026-04-28T19:30:11Z

      auto const col                     = chunk->col_desc;
      column_device_view const& data_col = *col->leaf_column;
      __shared__ size_type total_num_dict_entries;
+      __shared__ size_type num_dict_vals;


Here we store the number of unique values we inserted to the dictionary from this fragment (thread block)

mhaseeb123 · 2026-04-28T19:30:24Z

+        if (total_num_dict_entries > MAX_DICT_SIZE) { break; }
      }  // for loop
+      // Flush the number of unique values inserted by this fragment
+      if (t == 0) { frag->num_dict_vals = num_dict_vals; };


Flush the number of unique values from this fragment to global mem

mhaseeb123 · 2026-04-28T19:31:21Z

+          // TODO(mh): Here we insert the fragment index of the CAS winner, which may not be the
+          // smallest one (relies on monotonic block scheduling). Switch to static_map's
+          // `insert_or_apply` with `cuco::op::min` for deterministic first-fragment semantics
+          is_unique = map_insert_ref.insert(slot_type{static_cast<key_type>(val_idx), frag_idx});


Insert the index of the fragment (aka blockIdx.x) that inserted this key in the dictionary for now hoping that lower blockIdx.x is scheduled first and will insert it.

mhaseeb123 · 2026-04-28T19:32:15Z

-  auto const frag    = frags[col_idx][block_x];
-  auto chunk         = frag.chunk;
-  auto col           = chunk->col_desc;
+  auto const col_idx  = blockIdx.y;


Only cosmetic changes here. Variable names are slightly renamed for consistency.

mhaseeb123 · 2026-04-28T19:32:29Z

-  auto const block_x = blockIdx.x;
-  auto const frag    = frags[col_idx][block_x];
-  auto chunk         = frag.chunk;
+  auto const col_idx  = blockIdx.y;


Cosmetic changes only

mhaseeb123 · 2026-04-28T19:32:43Z

+  EncColumnChunk* const chunk;
  template <typename T>
-  __device__ void operator()(size_type const s_start_value_idx,
+  __device__ void operator()(size_type const start_value_idx,


Cosmetic changes only

mhaseeb123 · 2026-04-28T19:34:46Z

+ * @param pages Pages span
+ */
+CUDF_KERNEL void __launch_bounds__(DEFAULT_BLOCK_SIZE)
+  compute_page_dict_bits_kernel(device_span<EncPage> pages)


Kernel logic:

Launch one warp per parquet page

Cooperatively find the largest dictionary key in there max(dict_index))

One lane writes the cuda::std::bit_width(max(dict_index)) to page.dict_rle_bits field.

mhaseeb123 · 2026-04-28T19:38:13Z

+                             device_span<EncColumnChunk> chunks,
+                             cudf::detail::device_2dspan<PageFragment const> frags)
 {
  auto& chunk = chunks[blockIdx.x];


Split this kernel into two parts now.

Part 1: If page fragments per column chunk are < 1024, run two phases:

Phase 1: Compute exclusive sum (offset) of number of unique values inserted by each page fragment (in map_insert_fn)

Phase 2: For each dictionary slot, assign a monotonically increasing dictionary index starting at the offset (computed in phase 1) of the fragment it was inserted from. After this, keys inserted by each fragment have spatial locality allowing dict bit width compaction if data distribution allows.

Part 2: If page fragments per column chunk are > 1024, run single phase

Same code as before (see in else condition) - assign a global atomic counter as dictionary index to each slot.

mhaseeb123 · 2026-04-28T20:56:01Z

+  page_dict_bits.reserve(oi.page_locations.size());
+  std::transform(
+    oi.page_locations.begin(),
+    oi.page_locations.end(),
+    std::back_inserter(page_dict_bits),
+    [&source](auto const& page_location) { return read_dict_bits(source, page_location); });


Pages no longer have the max bit width. Collect all bit widths and find the max element instead

mhaseeb123 · 2026-04-29T00:09:34Z

        if (is_valid) {
-          // Insert the keys using a single thread for best performance for now.
-          is_unique      = map_insert_ref.insert(cuco::pair{val_idx, val_idx});
+          // TODO(mh): Here we insert the fragment index of the CAS winner, which may not be the


Follow up PR #22323 makes this deterministic but then we can only compute the number of unique elements inserted by each page fragment using a loop with atomics in collect_map_entries_kernel which can add to the potential cost.

mhaseeb123 added 4 commits April 22, 2026 20:38

Phase 0

786f1e0

Update benchmark

5b07584

github-actions Bot assigned mhaseeb123 Apr 23, 2026

github-actions Bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Apr 23, 2026

mhaseeb123 and others added 12 commits April 23, 2026 14:38

Merge branch 'main' into fea/pq-dict-encode-optimize

1f2c46c

Improve tests

7dd5460

Minor

4600d21

Minor

14469b9

Humanize phase 1

fda8a8a

Clean up

816902b

humanize gtest

6686d19

Cleanup

47f7ad3

Cleanup

47012e2

Cleanup

cdc379e

Humanize phase 2

589f9d0

Merge branch 'main' into fea/pq-dict-encode-optimize

6982ab4

mhaseeb123 added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS 3 - Ready for Review Ready for review by team labels Apr 28, 2026

mhaseeb123 added 2 commits April 28, 2026 18:57

Style fix

01c41f6

Humanize phase 3

46761ec

mhaseeb123 requested review from pmattione-nvidia and vuule April 28, 2026 19:09

mhaseeb123 changed the title ~~🚧 Write variable bit-width keys for Parquet dictionary encoded pages~~ Write variable bit-width keys for Parquet dictionary encoded pages Apr 28, 2026

mhaseeb123 added 2 commits April 28, 2026 19:23

Add todo

3ffbf46

Minor

818d9a0

mhaseeb123 marked this pull request as ready for review April 28, 2026 19:24

mhaseeb123 requested review from a team as code owners April 28, 2026 19:24

mhaseeb123 requested review from abellina, Copilot and revans2 April 28, 2026 19:24

Copilot started reviewing on behalf of mhaseeb123 April 28, 2026 19:25 View session

Style again

e84dfc6

Copilot AI reviewed Apr 28, 2026

View reviewed changes

mhaseeb123 commented Apr 28, 2026

View reviewed changes

mhaseeb123 added 3 commits April 28, 2026 19:47

Copilot's comments

2954870

Minor

db808a7

Minor

b9209e9

mhaseeb123 commented Apr 28, 2026

View reviewed changes

Comment thread cpp/tests/io/parquet_misc_test.cpp Outdated

mhaseeb123 and others added 3 commits April 28, 2026 13:52

Apply suggestion from @mhaseeb123

ccc7bf9

Use freq instead of hot to match with rare

9a755c1

Merge branch 'main' into fea/pq-dict-encode-optimize

6da2c43

mhaseeb123 commented Apr 28, 2026

View reviewed changes

mhaseeb123 marked this pull request as draft April 28, 2026 23:47

mhaseeb123 marked this pull request as ready for review April 29, 2026 00:04

mhaseeb123 mentioned this pull request Apr 29, 2026

🚧 Deterministically compute parquet page fragments that first see dictionary keys #22323

Draft

5 tasks

mhaseeb123 commented Apr 29, 2026

View reviewed changes

Conversation

mhaseeb123 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhaseeb123 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhaseeb123 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhaseeb123 Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mhaseeb123 commented Apr 23, 2026 •

edited

Loading

mhaseeb123 Apr 28, 2026 •

edited

Loading

mhaseeb123 Apr 28, 2026 •

edited

Loading

mhaseeb123 Apr 29, 2026 •

edited

Loading