Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator: Support Kinesis Streams #8786

Open
nyh opened this issue Jun 2, 2021 · 3 comments
Open

Alternator: Support Kinesis Streams #8786

nyh opened this issue Jun 2, 2021 · 3 comments

Comments

@nyh
Copy link
Contributor

nyh commented Jun 2, 2021

DynamoDB used to support change-data-capture to DynamoDB Streams, which is very similar to Amazon Kinesis, but starting in November 2020, it is possible to use real Kinesis Data Streams to record changes, and according to @elcallio the new 2.0 version of Amazon Kinesis Client Library (KCL) supports only these Kinesis Streams, not the old DynamoDB Streams.

The question is what, if anything, we need to add to Alternator so that an application like KCL which supports only Kinesis Streams can make use of Alternator's Streams. The Kinesis API appears very similar to be a superset of the old DyanamoDB Streams API and the question is what additional operations we need to implement.
Moreover, enabling a Kinesis Stream requires a different process now, and also new DynamoDB operations: DescribeKinesisStreamingDestination, DisableKinesisStreamingDestination, and EnableKinesisStreamingDestination.

avikivity pushed a commit that referenced this issue Jun 3, 2021
In the last year, four new features were added to DynamoDB which we
don't yet support - Kinesis Streams, PartiQL, Contributor Insights and
Export to S3. Let's document them as missing Alternator features, and
point to the four newly-created issues about these features.

Refs #8786
Refs #8787
Refs #8788
Refs #8789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>
nyh added a commit that referenced this issue Jun 3, 2021
In the last year, four new features were added to DynamoDB which we
don't yet support - Kinesis Streams, PartiQL, Contributor Insights and
Export to S3. Let's document them as missing Alternator features, and
point to the four newly-created issues about these features.

Refs #8786
Refs #8787
Refs #8788
Refs #8789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>
avikivity pushed a commit that referenced this issue Jun 6, 2021
In the last year, four new features were added to DynamoDB which we
don't yet support - Kinesis Streams, PartiQL, Contributor Insights and
Export to S3. Let's document them as missing Alternator features, and
point to the four newly-created issues about these features.

Refs #8786
Refs #8787
Refs #8788
Refs #8789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>
avikivity pushed a commit that referenced this issue Jun 6, 2021
In the last year, four new features were added to DynamoDB which we
don't yet support - Kinesis Streams, PartiQL, Contributor Insights and
Export to S3. Let's document them as missing Alternator features, and
point to the four newly-created issues about these features.

Refs #8786
Refs #8787
Refs #8788
Refs #8789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>
@slivne slivne added this to the 4.x milestone Jun 15, 2021
lauranovich pushed a commit to lauranovich/scylla that referenced this issue Jul 29, 2021
In the last year, four new features were added to DynamoDB which we
don't yet support - Kinesis Streams, PartiQL, Contributor Insights and
Export to S3. Let's document them as missing Alternator features, and
point to the four newly-created issues about these features.

Refs scylladb#8786
Refs scylladb#8787
Refs scylladb#8788
Refs scylladb#8789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210603125825.1179171-1-nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Nov 1, 2021

The new "Kinesis Streams" differs from the older "DynamoDB Streams" not only in API, there are also semantic differences, which are documented in here: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/streamsmain.html

Here are some of the more interesting differences I noticed:

  1. Whereas DynamoDB Streams drop records after 24 hours, Kinesis Streams holds records for a year. This means we should use a higher TTL (1 year instead of 24 hours) for the CDC table.
  2. Kinesis Streams support push of new events over HTTP/2, not just pulling via a GetRecords request.
  3. While DynamoDB Streams guarantees that the user receives each event exactly once, Kinesis Streams only guarantees at-least-once. We can still do exactly-once because that's what the underlying CDC provides, so I don't know if there's anything we can simplify by the weaker assumption.

The document also says that you can enable both streaming models on the same table.

@elcallio
Copy link
Contributor

elcallio commented Nov 1, 2021

When we discussed this last (spring? - I mentioned it regarding v2 (3?) of AWS sdk & kinesis), we decided not to support kinesis. At least not yet. Should we change this approach?
TTL 1y sounds big...

@nyh
Copy link
Contributor Author

nyh commented Nov 2, 2021

When we discussed this last (spring? - I mentioned it regarding v2 (3?) of AWS sdk & kinesis), we decided not to support kinesis. At least not yet.
Should we change this approach?

Nothing really changed - I don't think we should rush to do this. I just think it's important to document all the missing features in Alternator, and this is one of them.

You're right that Amazon's KCL library version 2 supports only the new Kinesis Streams API, not the older DynamoDB Streams API, so I'm guessing that in the long run, that will become the reason why we will need to implement the new API.

TTL 1y sounds big...

Indeed. The document I linked above might have over-played this one-year thing. I checked again, and it appears that while Kinesis Streams does allow you to keep up to one year of streams, that's not the default, and the default is still 24 hours: According to https://aws.amazon.com/kinesis/data-streams/faqs/:

"By default, Records of a stream are accessible for up to 24 hours from the time they are added to the stream. You can raise this limit to up to 7 days by enabling extended data retention or up to 365 days by enabling long-term data retention."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants