Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbsp: Fix hidden dependency on vec-based batches in input_upsert. #1913

Merged
merged 1 commit into from
Jun 20, 2024

Conversation

blp
Copy link
Member

@blp blp commented Jun 20, 2024

The vec-based batches set keys, values, and weights to their default values when their builders accept tuples via Builder::push or Builder::push_vals, which take mutable references. The code for input_upsert accumulates updates to keys in a key_updates vector, and then adds them to a builder via extend, which internally uses Builder::push and thus clears all the updates that were added to have weight 0.

Before this commit, that code didn't later clear the key_updates vector, which meant that anything added later was appended to a bunch of zeroed tuples. This was benign because, the next time key_updates was used, it was first consolidated, which meant that all the zero-weight tuples were dropped.

This failed with the file-based batches, which don't have any special optimizations for mutable references and thus don't zero anything, which meant that nothing in key_updates ever got cleared. This commit fixes the problem by properly clearing key_updates to empty after using it.

Is this a user-visible change (yes/no): ___

The vec-based batches set keys, values, and weights to their default
values when their builders accept tuples via `Builder::push` or
`Builder::push_vals`, which take mutable references.  The code for
`input_upsert` accumulates updates to keys in a `key_updates` vector,
and then adds them to a builder via `extend`, which internally uses
`Builder::push` and thus clears all the updates that were added to have
weight 0.

Before this commit, that code didn't later clear the `key_updates` vector,
which meant that anything added later was appended to a bunch of zeroed
tuples.  This was benign because, the next time `key_updates` was used,
it was first consolidated, which meant that all the zero-weight tuples
were dropped.

This failed with the file-based batches, which don't have any special
optimizations for mutable references and thus don't zero anything, which
meant that nothing in `key_updates` ever got cleared.  This commit fixes
the problem by properly clearing `key_updates` to empty after using it.

Signed-off-by: Ben Pfaff <blp@feldera.com>
@blp blp added DBSP core Related to the core DBSP library storage Persistence for internal state in DBSP operators rust Pull requests that update Rust code labels Jun 20, 2024
@blp blp requested a review from ryzhyk June 20, 2024 21:24
@blp blp merged commit 4a5a3d6 into main Jun 20, 2024
5 checks passed
@blp blp deleted the input-upsert-bug branch June 20, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DBSP core Related to the core DBSP library rust Pull requests that update Rust code storage Persistence for internal state in DBSP operators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants