Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator: support "global tables" #5062

Open
nyh opened this issue Sep 19, 2019 · 4 comments
Open

Alternator: support "global tables" #5062

nyh opened this issue Sep 19, 2019 · 4 comments
Labels
area/alternator Alternator related Issues enhancement
Milestone

Comments

@nyh
Copy link
Contributor

nyh commented Sep 19, 2019

DynamoDB did not originally support replication between different geographical regions: A table called “xyz” on one AWS region is completely separate from a table with the same name “xyz” in a different region. "global tables" is a replication solution across zones: It allows a customer to specify a table in several regions, and Amazon automatically synchronizes (via Dynamo’s CDC “Streams”) updates to these separate tables between the different regions. The customer is charged for the different writes to the different tables, and for the Streams activity used to maintain this synchronization.
Requests - reads and writes - to the global table can go to any one of the regions participating in the global table (the client decides to which region to send each request). Writes provide a “last writer wins” policy: If writes are done to multiple regions concurrently, the newest write (in wall-clock time) will eventually spread to the rest of the regions and replace the older writes.
A global table does not support strongly-consistent reads in the same sense as normal tables. A strongly-consistent read is only guaranteed to return the last written data if this write was done in the same region (otherwise, it will take more time until the update spreads between zones - Amazon claims this take “within seconds”).

In Alternator, we can implement multi-region (“global”) tables without CDC, and instead simply use Scylla’s existing multi-DC support: We can set up a keyspace with multiple DCs, one in each region, with RF=3 replication in each region. We already do writes with LOCAL_QUORUM, and reads with LOCAL_QUORUM (“strong consistency”) or LOCAL_ONE (“weak consistency”), and we will achieve the same consistency as guaranteed by DynamoDB. Also the conditional updates we should do with LWT with LOCAL_SERIAL consistency so they are only guaranteed to be serialized inside a single data center - which is what DynamoDB guarantees.

In DynamoDB, to create a global table one first creates an identically-configured table in several regions with no data, and then uses the CreateGlobalTable operation to join them into a global table. We can support this operation by having CreateGlobalTable delete the empty tables from the individual-region’s keyspace, and creating an identical (and empty) one in its own keyspace.
The UpdateGlobalTable operation can be used to add another region to a global table, or remove one. Scylla supports changing the replication strategy of the global table’s keyspace, however we may need to also run repair on this keyspace in added data center and/or cleanup in the removed data center. Again the add region operation requires that an empty identical table exists in the new region, and we can remove that table as part of the operation.

DynamoDB does not support taking an existing single-region table, with data, and converting it into a global table (UPDATE Nov 2019: this is no longer true - see additional comment below). In Alternator we could support this feature even though DynamoDB doesn’t. It would require either putting each table in its own keyspace in advance, or alternatively adding an ability to move an existing table between keyspaces.

We need to also implement additional operations:

  • DescribeGlobalTable to inquire about the configuration of a global table
  • ListGlobalTables to get a list of global tables, or just those that have a replica in a specific region.
  • All tables joined into a single global table must have the same capacity settings (billing mode, provisioned capacity, scaling, etc.). To update these settings after the global table is created, we should implement the UpdateGlobalTableSettings operation. To inquire about the current settings of a global table, there is DescribeGlobalTableSettings.
@nyh nyh added the area/alternator Alternator related Issues label Sep 19, 2019
@slivne slivne added this to the 3.x milestone Sep 22, 2019
@nyh
Copy link
Contributor Author

nyh commented Sep 24, 2019

After the lengthy description, a short summary. If we want to be 100% compatible with DynamoDB's semantics of global vs. local tables,

  1. "Global tables" are basically the default in Alternator now, but they shouldn't be - the default should be a local table, so we should probably have a separate keyspace for each DC to store local tables in.
  2. Our single global "alternator" keyspace should be used just for global tables, declared via the GlobalTables commands.

@nyh
Copy link
Contributor Author

nyh commented Nov 23, 2019

Update: AWS announced two days ago, here, that it is now possible to take an existing local (normal) table, and make it into a global table. The new API described here and here adds to the UpdateTable operation the ReplicaUpdates option, which adds more regions to an existing table (which becomes a global table, if it is not already one).

So we need to support this feature too, which probably means that each Alternator table should be put in its own separate keyspace, so that its global distribution can be changed in the future independently of any other Alternator tables. We should probably make this change as soon as possible - changing this after we start to support upgrades will be messier.

After a ReplicaUpdates operation, we will probably need to start the appropriate repair operation in Scylla to ensure the table is copied to the new data center. I believe Scylla does not do this automatically. TODO: figure out how DynamoDB tells the user when the new region is ready - does the table's state remain "UPDATING" until the new replica is ready? Or is it impossible to know?

There are also two additional operations DescribeTableReplicaAutoScaling and UpdateTableReplicaAutoScaling.

@tzach
Copy link
Contributor

tzach commented Nov 24, 2019

Since Dynamo does not have the notation of a Keyspace it makes sense

@nyh
Copy link
Contributor Author

nyh commented Feb 13, 2020

A patch for #5611 and partially #5596 was committed, so that the current state of Alternator is:

  1. Each table is created in its own keyspace with the same name as the table (with the "alternator_" prefix).
  2. Each of these keyspaces starts with all the DCs in the cluster.
    This means that each of the tables is now created as a "global" table, but has the option to later change its globalness separately from other tables.

In #5611 I explained how we can go from here to being more compatible with how DynamoDB does things, namely that tables start local but can later be made more global by joining it with additional empty tables or with new special commands. I'll copy what I proposed there:

We need a way which will support all the possible ways that global and local tables are used and created, and keeping in mind that Scylla does not currently allow moving tables between keyspaces, and doesn't even allow renaming tables or keyspaces. This isn't trivial, and I propose the following plan:

  1. The first time that a table named X is created (from DC Y), it will get keyspace X and table name X.
  2. After keyspace X already exists, creating a second table with the same name "X", now in a different DC Z, will get keyspace "X_Z", table name X. (note - this can only happen after we add the capability of creating a non-global table).
  3. Since both tables may remain alive and separate, reading from table X on a client in DC Z will need to inspect both keyspaces X and X_Z and find the one which has DC Z in its replication.
  4. In the old technique for creating a global table, the second table called X (in keyspace X_Z) remains empty and is then joined into a global table. This is easy - we remove the keyspace X_Z and update keyspace X to include the new DC. (EDIT: what if the user created first X, then X_Z, then put data in X_Z and then asked to join X into it? It is unlikely this will happen, but I think DynamoDB does allow it...)
  5. In the new (2019) technique of creating a global table, the user asks to modify the DCs of table X without creating another empty table first. We can do this by modifying keyspace X.
  6. But there's a snag - above we created an invariant that a global table X will always be in a keyspace X (and not, say, X_Z). But theoretically (we need to test this in practice!) DynamoDB may allow two global tables with the same name... E.g., we may have one global table X on DCs Y and Z, and another one with the same name X in DCs Q and W. Only one keyspace can be called "X". If the other one will be "X_Q" for example, how would a read on DC W know to look up a keyspace called X_Q? I don't have a pretty solution for this. Maybe we should just forbid this situation, and not allow two global tables with the same name to exist? I doubt anybody really wants this confusing situation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alternator Alternator related Issues enhancement
Projects
None yet
Development

No branches or pull requests

3 participants