Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test complex_data_types_test failed with error unpack requires a string argument of length 8 #4348

Closed
fgelcer opened this issue Mar 20, 2019 · 12 comments
Assignees
Milestone

Comments

@fgelcer
Copy link

fgelcer commented Mar 20, 2019

I'm running it locally, on my Ubuntu 18.10, using scylla scylla-0.10-11550-gc7d05b88a.

When trying to verify the fix for ticket #3662 , I got this exception (even when running inside the cqlsh):

Traceback (most recent call last):
  File "/home/fabio/git/scylla-tools-java/bin/cqlsh.py", line 1053, in perform_simple_statement
    result = future.result()
  File "/home/fabio/git/scylla-tools-java/bin/../lib/cassandra-driver-internal-only-3.11.0-bb96859b.zip/cassandra-driver-3.11.0-bb96859b/cassandra/cluster.py", line 3925, in result
    raise self._final_exception
DriverException: Failed decoding result column "many_sinks" of type list<frozen<t_kitchen_sink>>: unpack requires a string argument of length 8

the following select though, succeeded:

SELECT key1, many_sinks from complex_types where key1 = 'row1' ;

 key1 | many_sinks
------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 row1 | [{item1: 'asdf', item2: 0x0012, item3: '127.0.0.2', item4: 'whatev1', item5: '2012-02-03 04:05:00.000000+0000', item6: d05a10c8-7c12-11e4-949d-b4b6763e9d6f, item7: f90b04b1-f9ad-4ffa-b869-a7d894ce6003, item8: 'tyru', item9: -9223372036854771111, item10: 4321.45678, item11: 1.0012e+07, item12: 4.0012e+07, item13: -1147483648, item14: 2047483648, item15: True, item16: [1, 1, 2, 3, 5, 8]}, {item1: 'fdsa', item2: 0x0013, item3: '127.0.0.3', item4: 'whatev2', item5: '2013-02-03 04:05:00.000000+0000', item6: d8ac38c8-7c12-11e4-8955-b4b6763e9d6f, item7: e3e84f21-f28c-4e0f-80e0-068a640ae53a, item8: 'uytr', item9: -3333372036854775808, item10: 1234.12321, item11: 2.0012e+07, item12: 5.0012e+07, item13: -1547483648, item14: 1947483648, item15: False, item16: [3, 6, 9, 12, 15]}, {item1: 'zxcv', item2: 0x0014, item3: '127.0.0.4', item4: 'whatev3', item5: '2014-02-03 04:05:00.000000+0000', item6: de30838a-7c12-11e4-a907-b4b6763e9d6f, item7: f9381f0e-9467-4d4c-9315-eb9f0232487b, item8: 'fghj', item9: -2239372036854775808, item10: 5555.55555, item11: 3.0012e+07, item12: 6.0012e+07, item13: 2147483647, item14: 1347483648, item15: True, item16: [0, 1, 0, 1, 2, 0]}]
@roydahan
Copy link

To reproduce by dtest - complex_data_types_test

@slivne slivne added this to the 3.1 milestone Mar 24, 2019
@fgelcer
Copy link
Author

fgelcer commented Mar 25, 2019

to reproduce the issue, run one of those tests (comment out the @require):

  • complex_data_types_test on line 701
  • complex_data_types_test on line 1084
  • complex_schema_test on line 1458

@psarna
Copy link
Contributor

psarna commented Mar 25, 2019

I narrowed it down to a problem with serializing decimal type, based on varint, with json - I can see it's serialized properly from debug messages, but somehow it's not written properly to sstables - only part of a decimal is written (the scale, the value is somehow not written at all, which causes deserialization errors on reads). I have no idea why it's the case yet though. If anything I wrote here rings a bell for somebody, please let me know.

@avikivity
Copy link
Member

Possibly emptyable<cpp_int>::_is_empty somehow got set to true during the decode process.

@psarna
Copy link
Contributor

psarna commented Mar 25, 2019

I need to finish this great fun for today, so far I know that the value returned from lists_type_impl::from_json_object() is a correct sequence of bytes that represents a valid list of type instances with decimals inside it, but when list values are iterated over in lists::setter, they already contain semi-garbage and this semi-garbage is later flushed to sstables.

By semi-garbage I mean that bytes that should represent a serialized decimal:

000000080000000501ea810e
[      ][      ][      ]
  ^---- number of serialized bytes
             ^-------- scale
                     ^------ value

... are instead just

0000000800000005

, so it declares that 8 bytes are serialized, while only 4 follow, which messes up the whole deserialization process.

I'll continue figuring it out tomorrow morning. What's weird is that's it's always just decimal value that's missing, scale is always here. But maybe that's just a series of coincidences from my attempts so far.

@psarna
Copy link
Contributor

psarna commented Mar 26, 2019

The issue is unfortunately broader than JSON itself, varint serialization did not update the output iterator properly in all cases, so if they were embedded in a structure (e.g. tuple), the next serialized value would overwrite part of their data. I'll push a fix in a couple of minutes once tests are done.

psarna added a commit to psarna/scylla that referenced this issue Mar 26, 2019
avikivity added a commit that referenced this issue Mar 26, 2019
"
Fixes #4348

v2 changes:
 * added a unit test

This miniseries fixes decimal/varint serialization - it did not update
output iterator in all cases, which may lead to overwriting decimal data
if any other value follows them directly in the same buffer (e.g. in a tuple).
It also comes with a reproducing unit test covering both decimals and varints.

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
   json_test.FromJsonInsertTests.complex_data_types_test
   json_test.ToJsonSelectTests.complex_data_types_test
"

* 'fix_varint_serialization_2' of https://github.com/psarna/scylla:
  tests: add test for unpacking decimals
  types: fix varint and decimal serialization
avikivity pushed a commit that referenced this issue Mar 26, 2019
Varint and decimal types serialization did not update the output
iterator after generating a value, which may lead to corrupted
sstables - variable-length integers were properly serialized,
but if anything followed them directly in the buffer (e.g. in a tuple),
their value will be overwritten.

Fixes #4348

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
       json_test.FromJsonInsertTests.complex_data_types_test
       json_test.ToJsonSelectTests.complex_data_types_test

Note that dtests still do not succeed 100% due to formatting differences
in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query
correctness issue.

(cherry picked from commit 287a02d)
avikivity pushed a commit that referenced this issue Mar 26, 2019
Varint and decimal types serialization did not update the output
iterator after generating a value, which may lead to corrupted
sstables - variable-length integers were properly serialized,
but if anything followed them directly in the buffer (e.g. in a tuple),
their value will be overwritten.

Fixes #4348

Tests: unit (dev)
dtest: json_test.FromJsonUpdateTests.complex_data_types_test
       json_test.FromJsonInsertTests.complex_data_types_test
       json_test.ToJsonSelectTests.complex_data_types_test

Note that dtests still do not succeed 100% due to formatting differences
in compared results (e.g. 1.0e+07 vs 1.0E7, but it's no longer a query
correctness issue.

(cherry picked from commit 287a02d)
@roydahan
Copy link

@psarna do you see or can think of similar cases that are missed from out testing?
I would like to add a similar test that reproduce this issue also without JSON involved.

@slivne
Copy link
Contributor

slivne commented Mar 26, 2019 via email

@avikivity
Copy link
Member

We should test it directly. A test in types_test.cc that iterates over all types and checks that the deserialization works using the updated output iterator.

@tzach
Copy link
Contributor

tzach commented Apr 16, 2019

@fgelcer please provide the actual CQL examples that fail

@fgelcer
Copy link
Author

fgelcer commented Apr 16, 2019

@tzach , as per the content of complex_data_types_test in json_test.py:
`

cqlsh("CREATE TYPE t_todo_item (label text, details text)")
cqlsh("CREATE TYPE t_todo_list (name text, todo_list list<frozen<t_todo_item>>)")
cqlsh('''
... CREATE TYPE t_kitchen_sink (
... item1 ascii,
... item2 blob,
... item3 inet,
... item4 text,
... item5 timestamp,
... item6 timeuuid,
... item7 uuid,
... item8 varchar,
... item9 bigint,
... item10 decimal,
... item11 double,
... item12 float,
... item13 int,
... item14 varint,
... item15 boolean,
... item16 list )
... ''')

cqlsh('''
... CREATE TABLE complex_types (
... key1 text PRIMARY KEY,
... mylist list,
... myset set,
... mymap map<text, int>,
... mytuple frozen<tuple<text, int, uuid, boolean>>,
... myudt frozen<t_kitchen_sink>,
... mytodolists list<frozen<t_todo_list>>,
... many_sinks list<frozen<t_kitchen_sink>>,
... named_sinks map<text, frozen<t_kitchen_sink>> )
... ''')

Define a row with the complex data types:

cqlsh('''
... INSERT INTO complex_types (key1, mylist, myset, mymap, mytuple, myudt, mytodolists, many_sinks, named_sinks)
... VALUES (
... 'foo',
... ['five', 'six', 'seven', 'eight'],
... {4b66458a-2a19-41d3-af25-6faef4dea9fe, 080fdd90-ae74-41d6-9883-635625d3b069, 6cd7fab5-eacc-45c3-8414-6ad0177651d6},
... {'one' : 1, 'two' : 2, 'three': 3, 'four': 4},
... ('hey', 10, 16e69fba-a656-4932-8a01-6782a34505d9, true),
... {item1: 'heyimascii', item2: 0x0011, item3: '127.0.0.1', item4: 'whatev', item5: '2011-02-03 04:05+0000', item6: 0ad6dfb6-7a6e-11e4-bc39-b4b6763e9d6f, item7: bdf5e8ac-a75e-4321-9ac8-938fc9576c4a, item8: 'bleh', item9: -9223372036854775808, item10: 1234.45678, item11: 98712312.1222, item12: 98712312.5252, item13: -2147483648, item14: 2147483647, item15: false, item16: [1,3,5,7,11,13]},
... [{name: 'stuff to do!', todo_list: [{label: 'buy groceries', details: 'bread and milk'}, {label: 'pick up car from shop', details: '$325 due'}, {label: 'call dave', details: 'for some reason'}]}, {name: 'more stuff to do!', todo_list:[{label: 'buy new car', details: 'the old one is getting expensive'}, {label: 'price insurance', details: 'current cost is $95/mo'}]}],
... [{item1: 'asdf', item2: 0x0012, item3: '127.0.0.2', item4: 'whatev1', item5: '2012-02-03 04:05+0000', item6: d05a10c8-7c12-11e4-949d-b4b6763e9d6f, item7: f90b04b1-f9ad-4ffa-b869-a7d894ce6003, item8: 'tyru', item9: -9223372036854771111, item10: 4321.45678, item11: 10012312.1222, item12: 40012312.5252, item13: -1147483648, item14: 2047483648, item15: true, item16: [1,1,2,3,5,8]}, {item1: 'fdsa', item2: 0x0013, item3: '127.0.0.3', item4: 'whatev2', item5: '2013-02-03 04:05+0000', item6: d8ac38c8-7c12-11e4-8955-b4b6763e9d6f, item7: e3e84f21-f28c-4e0f-80e0-068a640ae53a, item8: 'uytr', item9: -3333372036854775808, item10: 1234.12321, item11: 20012312.1222, item12: 50012312.5252, item13: -1547483648, item14: 1947483648, item15: false, item16: [3,6,9,12,15]},{item1: 'zxcv', item2: 0x0014, item3: '127.0.0.4', item4: 'whatev3', item5: '2014-02-03 04:05+0000', item6: de30838a-7c12-11e4-a907-b4b6763e9d6f, item7: f9381f0e-9467-4d4c-9315-eb9f0232487b, item8: 'fghj', item9: -2239372036854775808, item10: 5555.55555, item11: 30012312.1222, item12: 60012312.5252, item13: 2147483647, item14: 1347483648, item15: true, item16: [0,1,0,1,2,0]}],
... {'namedsink1':{item1: 'asdf', item2: 0x0012, item3: '127.0.0.2', item4: 'whatev1', item5: '2012-02-03 04:05+0000', item6: d05a10c8-7c12-11e4-949d-b4b6763e9d6f, item7: f90b04b1-f9ad-4ffa-b869-a7d894ce6003, item8: 'tyru', item9: -9223372036854771111, item10: 4321.45678, item11: 10012312.1222, item12: 40012312.5252, item13: -1147483648, item14: 2047483648, item15: true, item16: [1,1,2,3,5,8]},'namedsink2':{item1: 'fdsa', item2: 0x0013, item3: '127.0.0.3', item4: 'whatev2', item5: '2013-02-03 04:05+0000', item6: d8ac38c8-7c12-11e4-8955-b4b6763e9d6f, item7: e3e84f21-f28c-4e0f-80e0-068a640ae53a, item8: 'uytr', item9: -3333372036854775808, item10: 1234.12321, item11: 20012312.1222, item12: 50012312.5252, item13: -1547483648, item14: 1947483648, item15: false, item16: [3,6,9,12,15]},'namedsink3':{item1: 'zxcv', item2: 0x0014, item3: '127.0.0.4', item4: 'whatev3', item5: '2014-02-03 04:05+0000', item6: de30838a-7c12-11e4-a907-b4b6763e9d6f, item7: f9381f0e-9467-4d4c-9315-eb9f0232487b, item8: 'fghj', item9: -2239372036854775808, item10: 5555.55555, item11: 30012312.1222, item12: 60012312.5252, item13: 2147483647, item14: 1347483648, item15: true, item16: [0,1,0,1,2,0]}})
... ''')

Query back the json (one field at a time to make it easier to read) and make sure it looks as it should
Check that the list is returned ok:

cqlsh_print("SELECT toJson(mylist) from complex_types where key1 = 'foo'")

system.tojson(mylist) ----------------------------------- ["five", "six", "seven", "eight"] (1 rows)

cqlsh_print("SELECT toJson(myset) from complex_types where key1 = 'foo'")

system.tojson(myset) -------------------------------------------------------------------------------------------------------------------------- ["080fdd90-ae74-41d6-9883-635625d3b069", "4b66458a-2a19-41d3-af25-6faef4dea9fe", "6cd7fab5-eacc-45c3-8414-6ad0177651d6"] (1 rows)

cqlsh_print("SELECT toJson(mymap) from complex_types where key1 = 'foo'")

system.tojson(mymap) --------------------------------------------- {"four": 4, "one": 1, "three": 3, "two": 2} (1 rows)

cqlsh_print("SELECT toJson(mytuple) from complex_types where key1 = 'foo'")

system.tojson(mytuple) ----------------------------------------------------------- ["hey", 10, "16e69fba-a656-4932-8a01-6782a34505d9", true] (1 rows)

cqlsh_print("SELECT toJson(myudt) from complex_types where key1 = 'foo'")

system.tojson(myudt) ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"item1": "heyimascii", "item2": "0x0011", "item3": "127.0.0.1", "item4": "whatev", "item5": "2011-02-03T04:05:00", "item6": "0ad6dfb6-7a6e-11e4-bc39-b4b6763e9d6f", "item7": "bdf5e8ac-a75e-4321-9ac8-938fc9576c4a", "item8": "bleh", "item9": -9223372036854775808, "item10": 1234.45678, "item11": 9.87123e+07, "item12": 9.87123e+07, "item13": -2147483648, "item14": 2147483647, "item15": false, "item16": [1, 3, 5, 7, 11, 13]} (1 rows)

cqlsh_print("SELECT toJson(mytodolists) from complex_types where key1 = 'foo'")

system.tojson(mytodolists) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [{"name": "stuff to do!", "todo_list": [{"label": "buy groceries", "details": "bread and milk"}, {"label": "pick up car from shop", "details": "$325 due"}, {"label": "call dave", "details": "for some reason"}]}, {"name": "more stuff to do!", "todo_list": [{"label": "buy new car", "details": "the old one is getting expensive"}, {"label": "price insurance", "details": "current cost is $95/mo"}]}] (1 rows)

cqlsh_print("SELECT toJson(many_sinks) from complex_types where key1 = 'foo'")

system.tojson(many_sinks) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- [{"item1": "asdf", "item2": "0x0012", "item3": "127.0.0.2", "item4": "whatev1", "item5": "2012-02-03T04:05:00", "item6": "d05a10c8-7c12-11e4-949d-b4b6763e9d6f", "item7": "f90b04b1-f9ad-4ffa-b869-a7d894ce6003", "item8": "tyru", "item9": -9223372036854771111, "item10": 4321.45678, "item11": 1.00123e+07, "item12": 4.00123e+07, "item13": -1147483648, "item14": 2047483648, "item15": true, "item16": [1, 1, 2, 3, 5, 8]}, {"item1": "fdsa", "item2": "0x0013", "item3": "127.0.0.3", "item4": "whatev2", "item5": "2013-02-03T04:05:00", "item6": "d8ac38c8-7c12-11e4-8955-b4b6763e9d6f", "item7": "e3e84f21-f28c-4e0f-80e0-068a640ae53a", "item8": "uytr", "item9": -3333372036854775808, "item10": 1234.12321, "item11": 2.00123e+07, "item12": 5.00123e+07, "item13": -1547483648, "item14": 1947483648, "item15": false, "item16": [3, 6, 9, 12, 15]}, {"item1": "zxcv", "item2": "0x0014", "item3": "127.0.0.4", "item4": "whatev3", "item5": "2014-02-03T04:05:00", "item6": "de30838a-7c12-11e4-a907-b4b6763e9d6f", "item7": "f9381f0e-9467-4d4c-9315-eb9f0232487b", "item8": "fghj", "item9": -2239372036854775808, "item10": 5555.55555, "item11": 3.00123e+07, "item12": 6.00123e+07, "item13": 2147483647, "item14": 1347483648, "item15": true, "item16": [0, 1, 0, 1, 2, 0]}] (1 rows)

cqlsh_print("SELECT toJson(named_sinks) from complex_types where key1 = 'foo'")

system.tojson(named_sinks) ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"namedsink1": {"item1": "asdf", "item2": "0x0012", "item3": "127.0.0.2", "item4": "whatev1", "item5": "2012-02-03T04:05:00", "item6": "d05a10c8-7c12-11e4-949d-b4b6763e9d6f", "item7": "f90b04b1-f9ad-4ffa-b869-a7d894ce6003", "item8": "tyru", "item9": -9223372036854771111, "item10": 4321.45678, "item11": 1.00123e+07, "item12": 4.00123e+07, "item13": -1147483648, "item14": 2047483648, "item15": true, "item16": [1, 1, 2, 3, 5, 8]}, "namedsink2": {"item1": "fdsa", "item2": "0x0013", "item3": "127.0.0.3", "item4": "whatev2", "item5": "2013-02-03T04:05:00", "item6": "d8ac38c8-7c12-11e4-8955-b4b6763e9d6f", "item7": "e3e84f21-f28c-4e0f-80e0-068a640ae53a", "item8": "uytr", "item9": -3333372036854775808, "item10": 1234.12321, "item11": 2.00123e+07, "item12": 5.00123e+07, "item13": -1547483648, "item14": 1947483648, "item15": false, "item16": [3, 6, 9, 12, 15]}, "namedsink3": {"item1": "zxcv", "item2": "0x0014", "item3": "127.0.0.4", "item4": "whatev3", "item5": "2014-02-03T04:05:00", "item6": "de30838a-7c12-11e4-a907-b4b6763e9d6f", "item7": "f9381f0e-9467-4d4c-9315-eb9f0232487b", "item8": "fghj", "item9": -2239372036854775808, "item10": 5555.55555, "item11": 3.00123e+07, "item12": 6.00123e+07, "item13": 2147483647, "item14": 1347483648, "item15": true, "item16": [0, 1, 0, 1, 2, 0]}} (1 rows) `

@ashutosh008verma
Copy link

Is there a workaround to do the correction for the record in cassandra table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants