Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternator: wrong ordering of bytes attributes with "negative" bytes #6573

Closed
nyh opened this issue Jun 2, 2020 · 3 comments
Closed

Alternator: wrong ordering of bytes attributes with "negative" bytes #6573

nyh opened this issue Jun 2, 2020 · 3 comments
Assignees
Labels
area/alternator Alternator related Issues bug

Comments

@nyh
Copy link
Contributor

nyh commented Jun 2, 2020

DynamoDB documentation defines (see https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Condition.html) the order of byte arrays assuming they are unsigned bytes.

Unfortunately, Alternator in conditions.cc's check_compare() does

return cmp(base64_decode(kv1.value), base64_decode(kv2.value));

But base64_decode() returns Scylla's bytes type, which is an array of signed bytes... This means that Scylla sorts byte 128 as coming before byte 127, instead of after as it should.
In addition to check_compare(), similar buggy code also exists in check_BETWEEN().

This wrong type results in incorrect comparison results in Expected or ConditionalExpression conditional operations (unfortunately we don't have a test for the negative bytes for those!), and also results in incorrect comparisons in uncommitted QueryFilter code (fortunately, we do have a test there also for the negative bytes case).

It's also possible that KeyConditions / KeyConditionExpression works incorrectly with bytes as sort key with operations like < or BETWEEN and when there are negative bytes. We also need a test for that case (and if broken, fixing this would require different work than fixing the issues above...)

@nyh nyh added bug area/alternator Alternator related Issues labels Jun 2, 2020
@nyh nyh self-assigned this Jun 2, 2020
@nyh
Copy link
Contributor Author

nyh commented Jun 3, 2020

Happily, the KeyConditions / KeyConditionExpression do not have this bug - the sort keys are already sorted in correct unsigned order and range queries on them work as needed, and tests I wrote for this pass successfully on Alternator. Apparently, Cassandra's sort ordering for blobs is also as unsigned which is why we already did this correctly, but I can't find where this is documented.

I confirmed we do have the bug with Expected, ConditionExpression, QueryFilter and ScanFilter, though, and I'm preparing a patch.

@avikivity
Copy link
Member

@nyh please evaluate for backport

nyh added a commit that referenced this issue Aug 26, 2020
We implemented the order operators (LT, GT, LE, GE, BETWEEN) incorrectly
for binary attributes: DynamoDB requires that the bytes be treated as
unsigned for the purpose of order (so byte 128 is higher than 127), but
our implementation uses Scylla's "bytes" type which has signed bytes.

The solution is simple - we can continue to use the "bytes" type, but
we need to use its compare_unsigned() function, not its "<" operator.

This bug affected conditional operations ("Expected" and
"ConditionExpression") and also filters ("QueryFilter", "ScanFilter",
"FilterExpression"). The bug did *not* affect Query's key conditions
("KeyConditions", "KeyConditionExpression") because those already
used Scylla's key comparison functions - which correctly compare binary
blobs as unsigned bytes (in fact, this is why we have the
compare_unsigned() function).

The patch also adds tests that reproduce the bugs in conditional
operations, and show that the bug did not exist in key conditions.

Fixes #6573

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200603084257.394136-1-nyh@scylladb.com>
(cherry picked from commit f6b1f45)
Manually removed tests in test_key_conditions.py that did not exist in this branch
nyh added a commit that referenced this issue Aug 26, 2020
We implemented the order operators (LT, GT, LE, GE, BETWEEN) incorrectly
for binary attributes: DynamoDB requires that the bytes be treated as
unsigned for the purpose of order (so byte 128 is higher than 127), but
our implementation uses Scylla's "bytes" type which has signed bytes.

The solution is simple - we can continue to use the "bytes" type, but
we need to use its compare_unsigned() function, not its "<" operator.

This bug affected conditional operations ("Expected" and
"ConditionExpression") and also filters ("QueryFilter", "ScanFilter",
"FilterExpression"). The bug did *not* affect Query's key conditions
("KeyConditions", "KeyConditionExpression") because those already
used Scylla's key comparison functions - which correctly compare binary
blobs as unsigned bytes (in fact, this is why we have the
compare_unsigned() function).

The patch also adds tests that reproduce the bugs in conditional
operations, and show that the bug did not exist in key conditions.

Fixes #6573

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200603084257.394136-1-nyh@scylladb.com>
(cherry picked from commit f6b1f45)
Manyally removed tests in test_key_conditions.py which didn't exist in this branch.
@nyh
Copy link
Contributor Author

nyh commented Aug 26, 2020

Already in branch-4.2.
Now backported to next-4.1(5f48444) and next-4.0 (1935f2b).
In both branches I had to remove the tests in test_key_conditions.py, a file which didn't exist in in those branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/alternator Alternator related Issues bug
Projects
None yet
Development

No branches or pull requests

3 participants