You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Within a string with an unescaped quote followed at a later point by a comma, the string gets truncated after the second " character in the unescaped quote within the string. If this string is at the end of the JSON object and the string is not immediately followed by } (i.e. is followed by whitespace or e.g. a comma), then the final word in the string is parsed as a key with an empty (string) value.
This seems to relate to #44, but it seems the attempted fix for that bug report didn't fully resolve this.
How to reproduce
(Note, I've formatted the recovered/output JSON just to make it more readable)
For
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"}')
the recovered JSON is:
{
"lorem": "Lorem \"ipsum"
}
For any of the following examples
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum" }')
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"\n}')
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum",}')
the recovered JSON is:
{
"lorem": "Lorem \"ipsum",
"laborum": ""
}
Removing the comma, the output matches what we'd expect:
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}')
>>>repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }')
yields
{
"lorem": "Lorem \"ipsum\" excepteur sint suntid est laborum"
}
Expected behavior
>>>print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}
>>>print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}
The text was updated successfully, but these errors were encountered:
This was tough because the library is actually acting as expected, I found a workaround that I am releasing now but is an unstable equilibrium when it comes to wrong delimiters because there are a million corner cases that can go wrong. Nonetheless the solution I found seems to be working and passes all tests.
Version of the library
0.19.2
Describe the bug
Within a string with an unescaped quote followed at a later point by a comma, the string gets truncated after the second
"
character in the unescaped quote within the string. If this string is at the end of the JSON object and the string is not immediately followed by}
(i.e. is followed by whitespace or e.g. a comma), then the final word in the string is parsed as a key with an empty (string) value.This seems to relate to #44, but it seems the attempted fix for that bug report didn't fully resolve this.
How to reproduce
(Note, I've formatted the recovered/output JSON just to make it more readable)
For
the recovered JSON is:
For any of the following examples
the recovered JSON is:
Removing the comma, the output matches what we'd expect:
yields
Expected behavior
The text was updated successfully, but these errors were encountered: