-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternator Streams does does not properly distinguish the type of change event #6918
Comments
I think the key is to reduce the splitting of the alternator write into several CDC log rows. CDC can distinguish insert and update, so if we can ensure alternator uses update semantics when the data exists, and insert when not (i.e. row marker/not row marker), and fix the cdc log splitting, things should solve themselves. Big ifs... |
@elcallio this is a nice idea, and can actually work. Alternator update operations could read the old value, decide if not to do the write at all (e.g., updating a value identical to the existing value, or deleting an already non-existent item), or if it's a new item (in which case write also the row marker) or there was a pre-existing value (in which case we won't write the row marker). The last case becomes more complex when replacing an existing item with However, this idea has one performance drawback: It requires Alternator to always do a read-before-write - in Alternator - when CDC is enabled for that table. Although Alternator by default uses LWT for all writes, it doesn't actually do the read operation if the update requires it (e.g., it's a conditional update), and now, when CDC is enabled, it will need to do those extra read for all writes. And this is in addition to the read-before-write that CDC does (in a less consistent manner...) for OLD/NEW_IMAGE, when those are enabled. Eliminating this double read is my point 2 in the original issue description. We will still need to do more work than before in the KEYS_ONLY case, but maybe this extra work is worth for getting correctness (and CDC adds extra work in any case...). |
Or maybe just accept that we might give the wrong eventName if the alternator write is not read-before-write? If streams preimage is enabled we can further add the amount of correctness in reporting by simply keep some state tracking the the |
This patch adds additional tests for Alternator Streams, which helped uncover 9 new issues. The stream tests are noticibly slower than most other Alternator tests - test_streams.py now has 27 tests taking a total of 20 seconds. Much of this slowness is attributed to Alternator Stream's 512 "shards" per stream in the single-node test setup with 256 vnodes, meaning that we need over 1000 API requests per test using GetRecords. These tests could be made significantly faster (as little as 4 seconds) by setting a lower number of vnodes. Issue #6979 is about doing this in the future. The tests in this patch have comments explaining clearly (I hope) what they test, and also pointing to issues I opened about the problems discovered through these tests. In particular, the tests reproduce the following bugs: Refs #6918 Refs #6926 Refs #6930 Refs #6933 Refs #6935 Refs #6939 Refs #6942 The tests also work around the following issues (and can be changed to be more strict and reproduce these issues): Refs #6918 Refs #6931 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200804154755.1461309-1-nyh@scylladb.com>
This patch adds additional tests for Alternator Streams, which helped uncover 9 new issues. The stream tests are noticibly slower than most other Alternator tests - test_streams.py now has 27 tests taking a total of 20 seconds. Much of this slowness is attributed to Alternator Stream's 512 "shards" per stream in the single-node test setup with 256 vnodes, meaning that we need over 1000 API requests per test using GetRecords. These tests could be made significantly faster (as little as 4 seconds) by setting a lower number of vnodes. Issue #6979 is about doing this in the future. The tests in this patch have comments explaining clearly (I hope) what they test, and also pointing to issues I opened about the problems discovered through these tests. In particular, the tests reproduce the following bugs: Refs #6918 Refs #6926 Refs #6930 Refs #6933 Refs #6935 Refs #6939 Refs #6942 The tests also work around the following issues (and can be changed to be more strict and reproduce these issues): Refs #6918 Refs #6931 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200804154755.1461309-1-nyh@scylladb.com>
pushing this out till we get some feedback |
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_streams_Record.html#DDB-Type-streams_Record-eventName explains that DynamoDB assigns to each change record one of three types:
This description has some interesting implications:
PutItem
orUpdateItem
can be reported as either MODIFY or INSERT, depending on whether the item previously existed.PutItem
andUpdateItem
, if the pre-existing item is identical to the new item, no change event is produced at all!DeleteItem
of a non-existant item doesn't produce any change event at all.Alternator does not currently implement any of these finer distinctions. All updates will be marked MODIFY, all deletes (whether or not an item existed) marked as REMOVE. This is a known issue (there is a comment about it in
executor::get_records
).xfailing
test/alternator
test cases:test_streams.py
::test_streams_updateitem_keys_only
test_streams.py
::test_streams_1_keys_only
test_streams.py
::test_streams_1_new_image
test_streams.py
::test_streams_1_old_image
test_streams.py
::test_streams_1_new_and_old_images
Implementing these distinctions in CDC will require it to do a read-before-write. But there are two serious problems to consider:
KEYS_ONLY
case, where CDC doesn't do a read-before-write.CC @elcallio
The text was updated successfully, but these errors were encountered: