Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fromJson() fails to set a map<ascii, int> #7949

Closed
nyh opened this issue Jan 21, 2021 · 1 comment
Closed

fromJson() fails to set a map<ascii, int> #7949

nyh opened this issue Jan 21, 2021 · 1 comment

Comments

@nyh
Copy link
Contributor

nyh commented Jan 21, 2021

The fromJson() function can take a map JSON and use it to set a map column. For example, if we have a map<text, int> column m, the following test works well in coverting the JSON {"a": 1, "b": 2} to a value for this column:

stmt = cql.prepare(f"INSERT INTO {table} (p, m) VALUES (?, fromJson(?))")
cql.execute(stmt, [p, '{"a": 1, "b": 2}'])
assert list(cql.execute(f"SELECT p, m from {table} where p = {p}")) == [(p, {'a': 1, 'b': 2})]

However, we have a bug if the column type is a map<ascii, int> - note the ascii instead of text. Although the strings "a" and "b" above are perfectly legal ASCII, Scylla fails the statement execution, with the error:

E   cassandra.FunctionFailure: Error from server: code=1400 [User Defined Function failure] message="execution of fromJson failed: Failed parsing fromJson parameter: Parsing JSON failed: Invalid value."

The same thing works correctly in Cassandra, and is in fact checked by the Cassandra test cassandra_tests/validation/entities/json_test.py::testFromJsonFct - which fails on Scylla on this issue. I also created separate tests for this issue (based on the example code above) in test_json.py::test_fromjson_map_ascii_unprepared and test_fromjson_map_ascii_prepared.

The "Parsing JSON failed" error is a red herring: This is a perfectly valid JSON, and moreover, exactly the same statement works if the type of the column is map<text,int> instead of map<ascii,int> - and it parses exactly the same JSON without any complains. I'm guessing that the ASCII validation is incorrectly applied to the correctly-parsed JSON, and when this validation (wrongly) fails, we produce a misleading error message.

As part of fixing this issue we should also improve this error message - which may (?) be visible in other cases as well and mislead us again.
When we discover the cause for this bug, we should consider whether it is specific to ascii (why?) and map - or maybe a similar bug applies to other key types? To value types? To other collection types?

psarna pushed a commit that referenced this issue Jan 22, 2021
The fromJson() function can take a map JSON and use it to set a map column.
However, the specific example of a map<ascii, int> doesn't work in Scylla
(it does work in Cassandra). The xfailing tests in this patch demonstrate
this. Although the tests use perfectly legal ASCII, scylla fails the
fromJson() function, with a misleading error.

Refs #7949.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210121233855.100640-1-nyh@scylladb.com>
syuu1228 pushed a commit to syuu1228/scylla that referenced this issue Jan 25, 2021
The fromJson() function can take a map JSON and use it to set a map column.
However, the specific example of a map<ascii, int> doesn't work in Scylla
(it does work in Cassandra). The xfailing tests in this patch demonstrate
this. Although the tests use perfectly legal ASCII, scylla fails the
fromJson() function, with a misleading error.

Refs scylladb#7949.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210121233855.100640-1-nyh@scylladb.com>
@slivne slivne added this to the 4.5 milestone Jan 25, 2021
avikivity pushed a commit that referenced this issue Feb 16, 2021
In this patch, we port validation/entities/json_test.java, containing
21 tests for various JSON-related operations - SELECT JSON, INSERT JSON,
and the fromJson() and toJson() functions.

In porting these tests, I uncovered 19 (!!) previously unknown bugs in
Scylla:

Refs #7911: Failed fromJson() should result in FunctionFailure error, not
            an internal error.
Refs #7912: fromJson() should allow null parameter.
Refs #7914: fromJson() integer overflow should cause an error, not silent
            wrap-around.
Refs #7915: fromJson() should accept "true" and "false" also as strings.
Refs #7944: fromJson() should not accept the empty string "" as a number.
Refs #7949: fromJson() fails to set a map<ascii, int>.
Refs #7954: fromJson() fails to set null tuple elements.
Refs #7972: toJson() truncates some doubles to integers.
Refs #7988: toJson() produces invalid JSON for columns with "time" type.
Refs #7997: toJson() is missing a timezone on timestamp.
Refs #8001: Documented unit "µs" not supported for assigning a "duration"
            type.
Refs #8002: toJson() of decimal type doesn't use exponents so can produce
            huge output.
Refs #8077: SELECT JSON output for function invocations should be
            compatible with Cassandra.
Refs #8078: SELECT JSON ignores the "AS" specification.
Refs #8085: INSERT JSON with bad arguments should yield InvalidRequest
            error, not internal error.
Refs #8086: INSERT JSON cannot handle user-defined types with case-
            sensitive component names.
Refs #8087: SELECT JSON incorrectly quotes strings inside map keys.
Refs #8092: SELECT JSON missing null component after adding field to
            UDT definition.
Refs #8100: SELECT JSON with IN and ORDER BY does not obey the ORDER BY.

Due to these bugs, 9 out of the 21 tests here currently xfail.

As usual in these sort of tests, all 21 tests pass when running against
Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210216154858.1172313-1-nyh@scylladb.com>
avikivity pushed a commit that referenced this issue Feb 18, 2021
In this patch, we port validation/entities/json_test.java, containing
21 tests for various JSON-related operations - SELECT JSON, INSERT JSON,
and the fromJson() and toJson() functions.

In porting these tests, I uncovered 19 (!!) previously unknown bugs in
Scylla:

Refs #7911: Failed fromJson() should result in FunctionFailure error, not
            an internal error.
Refs #7912: fromJson() should allow null parameter.
Refs #7914: fromJson() integer overflow should cause an error, not silent
            wrap-around.
Refs #7915: fromJson() should accept "true" and "false" also as strings.
Refs #7944: fromJson() should not accept the empty string "" as a number.
Refs #7949: fromJson() fails to set a map<ascii, int>.
Refs #7954: fromJson() fails to set null tuple elements.
Refs #7972: toJson() truncates some doubles to integers.
Refs #7988: toJson() produces invalid JSON for columns with "time" type.
Refs #7997: toJson() is missing a timezone on timestamp.
Refs #8001: Documented unit "µs" not supported for assigning a "duration"
            type.
Refs #8002: toJson() of decimal type doesn't use exponents so can produce
            huge output.
Refs #8077: SELECT JSON output for function invocations should be
            compatible with Cassandra.
Refs #8078: SELECT JSON ignores the "AS" specification.
Refs #8085: INSERT JSON with bad arguments should yield InvalidRequest
            error, not internal error.
Refs #8086: INSERT JSON cannot handle user-defined types with case-
            sensitive component names.
Refs #8087: SELECT JSON incorrectly quotes strings inside map keys.
Refs #8092: SELECT JSON missing null component after adding field to
            UDT definition.
Refs #8100: SELECT JSON with IN and ORDER BY does not obey the ORDER BY.

Due to these bugs, 8 out of the 21 tests here currently xfail and one
has to be skipped (issue #8100 causes the sanitizer to detect a use
after free, and crash Scylla).

As usual in these sort of tests, all 21 tests pass when running against
Cassandra.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210217130732.1202811-1-nyh@scylladb.com>
@slivne slivne modified the milestones: 4.5, 4.6 Mar 29, 2021
@slivne slivne modified the milestones: 4.6, 4.7 Nov 10, 2021
@DoronArazii DoronArazii modified the milestones: 5.0, 5.x Oct 12, 2022
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 20, 2023
For column type of map<ascii, int>, we don't want to parse the map keys as JSON.
Because ascii map keys are not valid JSON. Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 20, 2023
For column type of map<ascii, int>, we don't want to parse the map keys as JSON.
Because ascii map keys are not valid JSON. Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 20, 2023
For column type of map<ascii, int>, we don't want to parse the map keys as JSON.
Because ascii map keys are not valid JSON. Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 20, 2023
For column type of map<ascii, int>, we don't want to parse the map keys as JSON.
Because ascii characters are not valid JSON. Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 20, 2023
For column type of map<ascii, int>, we don't want to parse the map keys as JSON.
Because ascii characters are not valid JSON. Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 21, 2023
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 21, 2023
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Fixes: scylladb#7949

Signed-off-by: michaelhly <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 21, 2023
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: scylladb#7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>
michaelhly added a commit to michaelhly/scylladb that referenced this issue Sep 22, 2023
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: scylladb#7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>
denesb pushed a commit that referenced this issue Dec 18, 2023
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: #7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>

Closes #15499

(cherry picked from commit 75109e9)
@denesb
Copy link
Contributor

denesb commented Dec 18, 2023

Backported to 5.2.

nyh added a commit to nyh/scylla that referenced this issue Apr 30, 2024
More than three years ago, in issue scylladb#7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes scylladb#18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
avikivity pushed a commit that referenced this issue May 5, 2024
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes #18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #18482
mergify bot pushed a commit that referenced this issue May 5, 2024
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes #18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 7821bba)
mergify bot pushed a commit that referenced this issue May 5, 2024
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes #18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 7821bba)

# Conflicts:
#	test/cql-pytest/test_json.py
nyh added a commit that referenced this issue May 5, 2024
More than three years ago, in issue #7949, we noticed that trying to
set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the
fromJson() function) fails - the ascii key is incorrectly parsed.
We fixed that issue in commit 75109e9
but unfortunately, did not do our due diligence: We did not write enough
tests inspired by this bug, and failed to discover that actually we have
the same bug for many other key types, not just for "ascii". Specifically,
the following key types have exactly the same bug:

  * blob
  * date
  * inet
  * time
  * timestamp
  * timeuuid
  * uuid

Other types, like numbers or boolean worked "by accident" - instead of
parsing them as a normal string, we asked the JSON parser to parse them
again after removing the quotes, and because unquoted numbers and
unquoted true/false happwn to work in JSON, this didn't fail.

The fix here is very simple - for all *native* types (i.e., not
collections or tuples), the encoding of the key in JSON is simply a
quoted string - and removing the quotes is all we need to do and there's
no need to run the JSON parser a second time. Only for more elaborate
types - collections and tuples - we need to run the JSON parser a
second time on the key string to build the more elaborate object.

This patch also includes tests for fromJson() reading a map with all
native key types, confirming that all the aforementioned key types
were broken before this patch, and all key types (including the numbers
and booleans which worked even befoe this patch) work with this patch.

Fixes #18477.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 21557cf)

Closes #18522
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants