Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow kinesis source start with timestamp #12241

Merged
merged 3 commits into from
Sep 13, 2023

Conversation

tabVersion
Copy link
Contributor

@tabVersion tabVersion commented Sep 12, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

close #12247

as title

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

allow kinesis scan.startup.mode be timestamp
and new option scan.startup.timestamp.millis as i64

please consider removing scan.startup.sequence_number and scan.startup.mode = sequence_number in the doc
because kinesis seq num encodes shard-related info.
lets say seq_num_0 is specified when shard 1, 2, 3 are available, but later some scale op are performed, shard 1 goes down and shard 4 goes up. Specify seq_num_0 can start with shard 2 and 3 but shard 4. The shard 4's executor will goes into a crash loop.
cc @neverchanje and @fuyufjh

update: we no longer support scan.startup.mode = 'sequence_number' and option scan.startup.sequence_number

}
KinesisOffset::Latest => (None, ShardIteratorType::Latest),
KinesisOffset::Earliest => (None, None, ShardIteratorType::TrimHorizon),
KinesisOffset::SequenceNumber(seq) => (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be better if we could give an warning to users, something like
"The sequence is only unique within a certain shard, so the semantics they expect may deviate from the reality, etc."

We give a warning instead of completely disabling it because we still want to keep backward compatibility

Copy link
Member

@fuyufjh fuyufjh Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a completely wrong design. Recommend removing it. (Yeah I know it's breaking change...)

Each data record has a sequence number that is unique per partition-key within its shard

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the warning here cannot propagate to the frontend. Let's make some breaking changes here.

Copy link
Member

@fuyufjh fuyufjh Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the warning here cannot propagate to the frontend. Let's make some breaking changes here.

If possible, I suggest we can leave the option here to avoid panic (perhaps print a warning log), but ignore the option, removing all related implementation code.

Copy link
Contributor

@curiosityyy curiosityyy Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no sequence_number option, unless the shards are fixed and pass sequence_numbers for each shard to rw. Otherwise, this feature doesn't make sense.

@lmatz lmatz added the user-facing-changes Contains changes that are visible to users label Sep 13, 2023
@lmatz
Copy link
Contributor

lmatz commented Sep 13, 2023

For the doc, we remove the existing seq number option to avoid users from using in the future

and provide the new timestamp option

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Merging #12241 (4d538dc) into main (3c1f52f) will decrease coverage by 0.01%.
Report is 2 commits behind head on main.
The diff coverage is 14.81%.

@@            Coverage Diff             @@
##             main   #12241      +/-   ##
==========================================
- Coverage   69.67%   69.67%   -0.01%     
==========================================
  Files        1410     1410              
  Lines      236128   236137       +9     
==========================================
- Hits       164526   164523       -3     
- Misses      71602    71614      +12     
Flag Coverage Δ
rust 69.67% <14.81%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/connector/src/source/kinesis/mod.rs 0.00% <ø> (ø)
src/connector/src/source/kinesis/source/reader.rs 19.00% <14.81%> (-2.03%) ⬇️

... and 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

- Modify `start_position` variable to use `KinesisOffset::Timestamp`
- Remove `seq_offset` field from `KinesisProperties`
- Add `timestamp_offset` field with value `123456789098765432` to `KinesisProperties`
- Remove `test_reject_redundant_seq_props` test function
- Remove `seq_offset` field from `KinesisProperties` struct in `test_single_thread_kinesis_reader` test function
- Modify `scan_startup_mode` field in `KinesisProperties` struct
- Change accepted values for `scan_startup_mode` to include "timestamp"
- Add `timestamp_offset` field to `KinesisProperties` struct

Signed-off-by: tabVersion <tabvision@bupt.icu>
@tabVersion tabVersion added this pull request to the merge queue Sep 13, 2023
Merged via the queue into main with commit 1e2a4e5 Sep 13, 2023
29 of 30 checks passed
@tabVersion tabVersion deleted the tab/kinesis-timestamp-bias branch September 13, 2023 08:16
@lmatz
Copy link
Contributor

lmatz commented Sep 13, 2023

SCR-20230913-mmk

why does the bot do that

@fuyufjh
Copy link
Member

fuyufjh commented Sep 13, 2023

why does the bot do that

@tabVersion forgot to tick this "My PR contains breaking changes"

  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The sequence number in Kinesis shard is applicable to a single shard only, use timestamp instead
4 participants