Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator: support TTL #5060

Closed
nyh opened this issue Sep 18, 2019 · 13 comments
Closed

Alternator: support TTL #5060

nyh opened this issue Sep 18, 2019 · 13 comments

Comments

@nyh
Copy link
Contributor

nyh commented Sep 18, 2019

This issue is about supporting DynamoDB's TTL feature, which is very different from the Scylla feature of the same name:

While Scylla's TTL allows individual column values to be expired, DynamoDB's TTL is different - it specifies expiration times for entire items.
DynamoDB’s TTL works by specifying with UpdateTimeToLive, for each table you want TTL for, the name of an attribute which will hold an item’s expiration time - in seconds since the Unix epoch. An item with this attribute set to a time older than the current time will eventually be deleted. The DescribeTimeToLive operation can be used to inquire about a table’s TTL setting.
The TTL documentation explains that “a background job checks the TTL attribute of items to see if they are expired.”. We could do this too, and maybe have to do it - although it is inefficient.

Considering Scylla already has compaction and processes in place to expire data, it would be nice if we could reuse those. We can perhaps do something like this: add to the table a TTL column (in addition to the map we always have). When an UpdateItem or PutItem operation changes an item’s chosen TTL attribute (we know its name), also put this value into the TTL column with the same time also as timestamp. During compaction, if the Unix time specified in the TTL column has passed, we add a row tombstone (i.e., a range tombstone) with the column’s timestamp. However, there’s a big difficulty to correctly support the changing of an item's TTL. The problem is that a non-major compaction may not see all the data, so see an old TTL value and decide to delete data based on this old TTL value. We may need to do this TTLing only if we're sure the compaction includes all the sstables with item (as we do today in tombstone GC). Also since this node may have missed a TTL update, we need to be sure a repair happened, so we need to wait for GC grace period (exactly as we do in tombstone GC). Another option is to have a new additional primary-key bloom filter. I’m worried that this last option is an overkill for just the TTL feature, and will also be inefficient (and have high false positives) when there are many small sstables (e.g., LCS).

@nyh nyh added the area/alternator Alternator related Issues label Sep 18, 2019
@slivne slivne added this to the 3.x milestone Sep 22, 2019
@nyh
Copy link
Contributor Author

nyh commented Dec 8, 2019

By the way, as noted by @amoskong and @fastio, the Redis API also has a per-item (not per-cell) TTL, which will need a similar feature:

The above documentation says that expiration may be delayed by only 1ms after the specified expiration time. Because we obviously can't find and delete expired data with such accuracy, we will need to check for the possibility of expiration in every read - so that even if expired data is not expunged from disk yet, it's at least not returned.

Note that this read-time check is not necessary in Alternator - the DynamoDB documentation https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html explains that an item may be deleted an unspecified long time ("typically" up to 48 hours) after it was requested to be expired - and until the final deletion the expired items can still be read and modified (and it's even possible to change their expiration time, to cancel their would-be deletion).

@nyh
Copy link
Contributor Author

nyh commented May 4, 2020

An idea on how to implement the TTL feature quickly but somewhat inefficiently:

Now that Alternator uses LWT for every write and writes are substantially slower, one way we can implement the TTL feature over the existing Scylla TTL is by making every write a read-modify-write operation, as detailed below. Note that an LWT write which needs to read is a bit slower than an LWT write that doesn't read, but probably not much slower.

  1. When an UpdateItem modifies the TTL attribute (and possibly other attributes as well), we read the entire item and write it again with a different TTL on all its attributes.
  2. When an UpdateItem modifies other attributes but not the TTL, we read the TTL column first to know which TTL to put on the new mutation.
  3. When PutItem is used, we know the entire item is replaced so we don't need to read at all.

@nyh
Copy link
Contributor Author

nyh commented Sep 21, 2020

We just had an interesting discussion on the mailing list - see https://groups.google.com/g/scylladb-dev/c/Ddv31GGB9pI - on the the compaction-based implementation, some implementation details and possible pitfalls, and how it would work for the Alternator and Redis APIs.

@slivne
Copy link
Contributor

slivne commented Nov 11, 2020

The implementation needs to take into account more aspects:

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html

"
As items are deleted from the table, two background operations happen simultaneously:

Items are removed from any local secondary index and global secondary index in the same way as a DeleteItem operation.

A delete operation for each item enters the DynamoDB Stream, but is tagged as a system delete and not a regular delete. For more information about how to use this system delete, see DynamoDB Streams and Time to Live.
"

  1. DynamoDB TTL also remove GSI and LSI and its not clear if the TTL column is projected into the MV.

  2. DynamoDB TTL has an effect on whats recorded in the DynamoDB Streams

  3. DynamoDB allows updating the column that is used for TTL and allows to enable / disable this feature.
    (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-before-you-start.html#time-to-live-ttl-before-you-start-notes)

This has implications on how we can add support for this feature (e.g. entering future tombstones can't be done).

@avikivity
Copy link
Member

Right. Since our views/indexes are coordinator-managed, this is hard to do.

@psarna
Copy link
Contributor

psarna commented Dec 4, 2020

@avikivity as for secondary indexing, isn't it enough to always include the special TTL attribute in the underlying materialized view as well? It will receive all updates via existing mechanisms, and the garbage collection process could then just treat materialized views just as normal tables, removing the rows once they expire.

@psarna
Copy link
Contributor

psarna commented Dec 4, 2020

DynamoDB TTL docs also mention that:

You cannot reconfigure TTL to look for a different attribute. You must disable TTL, and then reenable TTL with the new attribute going forward.

, which would mean that when reenabling TTL, we'd have to add a column to existing views (which is not legal via CQL, but sounds possible by just upgrading the view schemas). In any case, we could start with one-shot TTLs that cannot be disabled and reenabled to another column, for simplicity's sake.

@avikivity
Copy link
Member

@avikivity as for secondary indexing, isn't it enough to always include the special TTL attribute in the underlying materialized view as well? It will receive all updates via existing mechanisms, and the garbage collection process could then just treat materialized views just as normal tables, removing the rows once they expire.

I think it should work. Auto-project the TTL attribute to the index.

Requires a read-before-write, but we're heading in that direction anyway (and indexes already do that).

@avikivity
Copy link
Member

DynamoDB TTL docs also mention that:

You cannot reconfigure TTL to look for a different attribute. You must disable TTL, and then reenable TTL with the new attribute going forward.

, which would mean that when reenabling TTL, we'd have to add a column to existing views (which is not legal via CQL, but sounds possible by just upgrading the view schemas). In any case, we could start with one-shot TTLs that cannot be disabled and reenabled to another column, for simplicity's sake.

We could launch an index rebuild job (similar to the existing job, but just copies the TTL attribute).

@slivne
Copy link
Contributor

slivne commented Dec 6, 2020 via email

@slivne
Copy link
Contributor

slivne commented Dec 6, 2020 via email

nyh added a commit to nyh/scylla that referenced this issue Apr 28, 2021
This patch adds a test suite for the DynamoDB API's TTL (item expiration)
feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice. Because DynamoDB has long delays in expiring
items and in configuring expiration, so of these tests are marked
"verylong" because they take up to 30 minutes to complete, and are
skipped in ordinary test runs (use "--runverylong" to run them).

Two things are *not* yet tested by these tests:
1. How in a table with a GSI/LSI, expiring an item also expires the
   index item.
2. How in a table with Streams enabled, an expired item also generates a
   special stream event.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs scylladb#5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Apr 28, 2021

Started to work on this feature. A draft pull request - currently including just tests - is in #8564.

@nyh nyh self-assigned this Jul 10, 2021
nyh added a commit to nyh/scylla that referenced this issue Jul 12, 2021
This patch adds a test suite for the DynamoDB API's TTL (item expiration)
feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice. Because DynamoDB has long delays in expiring
items and in configuring expiration and some of these tests take up
to 30 minutes to complete, we mark them "verylong", and are skipped
in ordinary test runs - use the "--runverylong" option to run them.

Two things are *not* yet tested by these tests:
1. How in a table with a GSI/LSI, expiring an item also expires the
   index item.
2. How in a table with Streams enabled, an expired item also generates a
   special stream event.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs scylladb#5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jul 13, 2021
This patch adds a comprehensive test suite for the DynamoDB API's TTL
(item expiration) feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice, and how item expiration interacts with other
features such as GSI, LSI and DynamoDB Streams.

Because DynamoDB has extremely long delays until items are expired, or
until expiration configuration may be changed, several of these tests
take up to 30 minutes to complete. We mark these tests with the
 "verylong" marker, so they are skipped in ordinary test runs - use the
"--runverylong" option to run them.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs scylladb#5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jul 14, 2021
This patch adds a comprehensive test suite for the DynamoDB API's TTL
(item expiration) feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice, and how item expiration interacts with other
features such as GSI, LSI and DynamoDB Streams.

Because DynamoDB has extremely long delays until items are expired, or
until expiration configuration may be changed, several of these tests
take up to 30 minutes to complete. We mark these tests with the
 "verylong" marker, so they are skipped in ordinary test runs - use the
"--runverylong" option to run them.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs scylladb#5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jul 14, 2021
This patch adds a comprehensive test suite for the DynamoDB API's TTL
(item expiration) feature.

The tests check the two new API commands added by this feature
(UpdateTimeToLive and DescribeTimeToLive), and also how items are
expired in practice, and how item expiration interacts with other
features such as GSI, LSI and DynamoDB Streams.

Because DynamoDB has extremely long delays until items are expired, or
until expiration configuration may be changed, several of these tests
take up to 30 minutes to complete. We mark these tests with the
 "verylong" marker, so they are skipped in ordinary test runs - use the
"--runverylong" option to run them.

All these tests currently pass on DynamoDB, but xfail on Alternator
because the two commands UpdateTimeToLive and DescribeTimeToLive are
currently rejected by Alternator.

Refs scylladb#5060

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
psarna added a commit that referenced this issue Jul 14, 2021
This series includes a comprehensive test suite for the DynamoDB API's
TTL (item expiration) feature described in issue #5060. Because we have
not yet implemented the TTL feature in Alternator, all of the tests
still xfail, but they all pass on DynamoDB and demonstrate exactly how
the TTL feature works and how it interacts with other features such as
LSI, GSI and Streams. The patch which introduces these tests is heavily
commented to explain exactly what it tests, and why.

Because DynamoDB only expires items some 10-30 minutes after their
expiration time (the documentation even suggests it can be delayed by 24
hours!), some of these tests are extremely long (up to 30 minutes!), so
we also introduce in this series a new marker for "verylong" tests.
verylong tests are skipped by default, unless the "--runverylong" option
is given. In the future, when we implement the TTL feature in Alternator
and start testing it, we may be able to configure it with a much shorter
expiration timeout and then we might be able to run these tests in a
reasonable time and make them run by default.

Closes #8564

* github.com:scylladb/scylla:
  test/alternator: add tests for the Alternator TTL feature
  test/alternator: add marker for "veryslow" tests
  test/alternator: add new_test_table() utility function
psarna added a commit that referenced this issue Jul 14, 2021
This series includes a comprehensive test suite for the DynamoDB API's
TTL (item expiration) feature described in issue #5060. Because we have
not yet implemented the TTL feature in Alternator, all of the tests
still xfail, but they all pass on DynamoDB and demonstrate exactly how
the TTL feature works and how it interacts with other features such as
LSI, GSI and Streams. The patch which introduces these tests is heavily
commented to explain exactly what it tests, and why.

Because DynamoDB only expires items some 10-30 minutes after their
expiration time (the documentation even suggests it can be delayed by 24
hours!), some of these tests are extremely long (up to 30 minutes!), so
we also introduce in this series a new marker for "verylong" tests.
verylong tests are skipped by default, unless the "--runverylong" option
is given. In the future, when we implement the TTL feature in Alternator
and start testing it, we may be able to configure it with a much shorter
expiration timeout and then we might be able to run these tests in a
reasonable time and make them run by default.

Closes #8564

* github.com:scylladb/scylla:
  test/alternator: add tests for the Alternator TTL feature
  test/alternator: add marker for "veryslow" tests
  test/alternator: add new_test_table() utility function
psarna added a commit that referenced this issue Sep 21, 2021
... from Nadav Har'El

This small series adds a stub implementation of Alternator's
UpdateTimeToLive and DescribeTimeToLive operations. These operations can
enable, disable, or inquire about, the chosen expiration-time attribute.
Currently, the information about the chosen attribute is only saved,
with no actual expiration of any items taking place.

Because this is an incomplete implementation of this feature, it is not
enabled unless an experimental flag is enabled on all nodes in the
cluster.

See the individual patches for more information on what this series
does.

Refs #5060.

Closes #9345

* github.com:scylladb/scylla:
  test/alternator: rename utility function test_table_name()
  alternator: stub TTL operations
  alternator: make three utility functions in executor.cc non-static
  test/alternator: test another corner case of TTL
psarna added a commit that referenced this issue Nov 26, 2021
…ce' from Nadav Har'El

In this patch series we add an implementation of an
expiration service to Alternator, which periodically scans the data in
the table, looking for expired items and deleting them.

We also continue to improve the TTL test suite to cover additional
corner cases discovered during the development of the code.

This implementation is good enough to make all existing tests but one,
plus a few new ones, pass, but is still a very partial and inefficient
implementation littered with FIXMEs throughout the code. Among other
things, this initial implementation doesn't do anything reasonable about pacing of
the scan or about multiple tables, it scans entire items instead of only the
needed parts, and because each shard "owns" a different subset of the
token ranges, if a node goes down, partitions which it "owns" will not
get expired.

The current tests cannot expose these problems, so we will need to develop
additional tests for them.

Because this implementation is very partial, the Alternator TTL continues
to remain "experimental", cannot be used without explicitly enabling this
experimental feature, and must not be used for any important deployment.

Refs #5060 but doesn't close the issue (let's not close it until we have a
reasonably complete implementation - not this partial one).

Closes #9624

* github.com:scylladb/scylla:
  alternator: fix TTL expiration scanner's handling of floating point
  test/alternator: add TTL test for more data
  test/alternator: remove "xfail" tag from passing tests in test_ttl.py
  test/alternator: make test_ttl.py tests fast on Alternator
  alternator: initial implmentation of TTL expiration service
  alternator: add another unwrap_number() variant
  alternator: add find_tag() function
  test/alternator: test another corner case of TTL setting
  test/alternator: test TTL expiration for table with sort key
  test/alternator: improve basic test for TTL expiration
  test/alternator: extract is_aws() function
avikivity pushed a commit that referenced this issue Nov 28, 2021
…ce' from Nadav Har'El

In this patch series we add an implementation of an
expiration service to Alternator, which periodically scans the data in
the table, looking for expired items and deleting them.

We also continue to improve the TTL test suite to cover additional
corner cases discovered during the development of the code.

This implementation is good enough to make all existing tests but one,
plus a few new ones, pass, but is still a very partial and inefficient
implementation littered with FIXMEs throughout the code. Among other
things, this initial implementation doesn't do anything reasonable about pacing of
the scan or about multiple tables, it scans entire items instead of only the
needed parts, and because each shard "owns" a different subset of the
token ranges, if a node goes down, partitions which it "owns" will not
get expired.

The current tests cannot expose these problems, so we will need to develop
additional tests for them.

Because this implementation is very partial, the Alternator TTL continues
to remain "experimental", cannot be used without explicitly enabling this
experimental feature, and must not be used for any important deployment.

Refs #5060 but doesn't close the issue (let's not close it until we have a
reasonably complete implementation - not this partial one).

Closes #9624

* github.com:scylladb/scylla:
  alternator: fix TTL expiration scanner's handling of floating point
  test/alternator: add TTL test for more data
  test/alternator: remove "xfail" tag from passing tests in test_ttl.py
  test/alternator: make test_ttl.py tests fast on Alternator
  alternator: initial implmentation of TTL expiration service
  alternator: add another unwrap_number() variant
  alternator: add find_tag() function
  test/alternator: test another corner case of TTL setting
  test/alternator: test TTL expiration for table with sort key
  test/alternator: improve basic test for TTL expiration
  test/alternator: extract is_aws() function
avikivity pushed a commit that referenced this issue Nov 28, 2021
…ce' from Nadav Har'El

In this patch series we add an implementation of an
expiration service to Alternator, which periodically scans the data in
the table, looking for expired items and deleting them.

We also continue to improve the TTL test suite to cover additional
corner cases discovered during the development of the code.

This implementation is good enough to make all existing tests but one,
plus a few new ones, pass, but is still a very partial and inefficient
implementation littered with FIXMEs throughout the code. Among other
things, this initial implementation doesn't do anything reasonable about pacing of
the scan or about multiple tables, it scans entire items instead of only the
needed parts, and because each shard "owns" a different subset of the
token ranges, if a node goes down, partitions which it "owns" will not
get expired.

The current tests cannot expose these problems, so we will need to develop
additional tests for them.

Because this implementation is very partial, the Alternator TTL continues
to remain "experimental", cannot be used without explicitly enabling this
experimental feature, and must not be used for any important deployment.

Refs #5060 but doesn't close the issue (let's not close it until we have a
reasonably complete implementation - not this partial one).

Closes #9624

* github.com:scylladb/scylla:
  alternator: fix TTL expiration scanner's handling of floating point
  test/alternator: add TTL test for more data
  test/alternator: remove "xfail" tag from passing tests in test_ttl.py
  test/alternator: make test_ttl.py tests fast on Alternator
  alternator: initial implmentation of TTL expiration service
  alternator: add another unwrap_number() variant
  alternator: add find_tag() function
  test/alternator: test another corner case of TTL setting
  test/alternator: test TTL expiration for table with sort key
  test/alternator: improve basic test for TTL expiration
  test/alternator: extract is_aws() function
@nyh
Copy link
Contributor Author

nyh commented Sep 12, 2022

At this point, we already have support for TTL in Alternator, although it is still marked "experimental".
I think it's time to close this issue, and instead open issues for specific things that need to be fixed or improved in this feature, of course with the goal of graduating it from "experimental" to GA.

@nyh nyh closed this as completed Sep 12, 2022
@nyh nyh added the area/ttl label Sep 13, 2022
@DoronArazii DoronArazii modified the milestones: 5.x, 5.2 Nov 8, 2022
avikivity added a commit that referenced this issue Nov 24, 2022
…hlik

Currently, TTL is listed as one of the experimental features: https://docs.scylladb.com/stable/alternator/compatibility.html#experimental-api-features

This PR moves the feature description from the Experimental Features section to a separate section.
I've also added some links and improved the formatting.

@tzach I've relied on your release notes for RC1.

Refs: #5060

Closes #11997

* github.com:scylladb/scylladb:
  Update docs/alternator/compatibility.md
  doc: update the link to Enabling Experimental Features
  doc: remove the note referring to the previous ScyllaDB versions and add the relevant limitation to the paragraph
  doc: update the links to the Enabling Experimental Features section
  doc: add the link to the Enabling Experimental Features section
  doc: move the TTL Alternator feature from the Experimental Features section to the production-ready section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants