Skip to content

Commit

Permalink
apacheGH-32613: [C++] Simplify IPC writer for dense unions (apache#33822
Browse files Browse the repository at this point in the history
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
  • Loading branch information
2 people authored and Mike Hancock committed Feb 17, 2023
1 parent 5816a53 commit a0450d1
Showing 1 changed file with 10 additions and 14 deletions.
24 changes: 10 additions & 14 deletions cpp/src/arrow/ipc/writer.cc
Original file line number Diff line number Diff line change
Expand Up @@ -473,23 +473,19 @@ class RecordBatchSerializer {
int32_t* shifted_offsets =
reinterpret_cast<int32_t*>(shifted_offsets_buffer->mutable_data());

// Offsets may not be ascending, so we need to find out the start offset
// for each child
for (int64_t i = 0; i < length; ++i) {
const uint8_t code = type_codes[i];
// Offsets are guaranteed to be increasing according to the spec, so
// the first offset we find for a child is the initial offset and
// will become the 0th offset for this child.
for (int64_t code_idx = 0; code_idx < length; ++code_idx) {
const uint8_t code = type_codes[code_idx];
if (child_offsets[code] == -1) {
child_offsets[code] = unshifted_offsets[i];
child_offsets[code] = unshifted_offsets[code_idx];
shifted_offsets[code_idx] = 0;
} else {
child_offsets[code] = std::min(child_offsets[code], unshifted_offsets[i]);
shifted_offsets[code_idx] = unshifted_offsets[code_idx] - child_offsets[code];
}
}

// Now compute shifted offsets by subtracting child offset
for (int64_t i = 0; i < length; ++i) {
const int8_t code = type_codes[i];
shifted_offsets[i] = unshifted_offsets[i] - child_offsets[code];
// Update the child length to account for observed value
child_lengths[code] = std::max(child_lengths[code], shifted_offsets[i] + 1);
child_lengths[code] =
std::max(child_lengths[code], shifted_offsets[code_idx] + 1);
}

value_offsets = std::move(shifted_offsets_buffer);
Expand Down

0 comments on commit a0450d1

Please sign in to comment.