Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transforms: fix offset at deploy time #17590

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Apr 3, 2024

This makes testing transforms easier as the offset that we start from is relative to the deploy and not when the VM starts. This means you can do:

rpk transform deploy
rpk topic produce input-topic
rpk topic consume output-topic

Without dropping records (there is a 3 second injected delay between deploy and processor start, so today everything in the first 3 seconds is dropped).

Ducktape tests will be written in a followup.

Fixes: CORE-1927, #17370

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

Improvements

  • Transforms now start at a record relative to the deploy time instead of the time at which the VM starts.
    This allows for easier testing of transforms, as one does not have to wait for the VM to boot before producing.

@github-actions github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Apr 3, 2024
@rockwotj
Copy link
Contributor Author

rockwotj commented Apr 3, 2024

Please ignore 84f6b55, that's being reviewed in #17537

@rockwotj rockwotj requested a review from oleiman April 3, 2024 19:37
@rockwotj rockwotj self-assigned this Apr 3, 2024
@rockwotj rockwotj force-pushed the fix-offset-at-deploy-time branch 2 times, most recently from 626c86d to 2df072a Compare April 3, 2024 20:48
@rockwotj rockwotj requested a review from a team as a code owner April 3, 2024 20:48
@rockwotj rockwotj requested review from rpdevmp and removed request for a team April 3, 2024 20:48
@rockwotj
Copy link
Contributor Author

rockwotj commented Apr 4, 2024

Force push: rebase to fix conflicts

src/v/serde/rw/variant.h Outdated Show resolved Hide resolved
@oleiman oleiman self-requested a review April 5, 2024 00:25
oleiman
oleiman previously approved these changes Apr 5, 2024
Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, lgtm!

Comment on lines +183 to +189
return _cond_var
.wait(
1s,
[this, index, o] {
auto it = _committed.find(index);
return it != _committed.end() && it->second >= o;
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-format
  really
  went for 
    it
    here!

I rather like it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seriously. Not how I would have formatted it, but meh

@rockwotj
Copy link
Contributor Author

Force push: drop the serde commit

In order to support starting from specific points in time during
deploys, we add a variant that allows starting from the end of the log
or a specific point in time, at which we will resolve to an offset using
a timequery.

Adding a new field to the end of the struct is backwards compatible, so
we are tree to add this new field - the default value will be
latest_offset, which is consistent with existing behavior.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Require that Wasm transform sources can perform time queries to resolve
a timestamp to an offset.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
A new change in clang-tidy-17 complains when exceptions are ignored.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This commits respects the initial position in the metadata, and means
that when we start processing records we'll start from the deploy time
for newly deployed transforms, while old transforms we will still start
at the end of the log.

This will remove the need for the song and dance around waiting for
transforms to commit before producing records in tests and matches the
expectations and papercuts customers have ran into when trying
transforms.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This makes debugging test failures a little easier, as you can narrow
down which cond_var timed out.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Add a ducktape test proving that we no longer need to wait after a
deploy to produce records to a data transform.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

Force push rebase against dev

@rockwotj rockwotj requested a review from oleiman April 10, 2024 14:47
@rockwotj rockwotj merged commit 067c249 into redpanda-data:dev Apr 10, 2024
18 checks passed
@rockwotj rockwotj deleted the fix-offset-at-deploy-time branch April 10, 2024 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants