Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Strings containing unescaped quotes followed by commas are incorrectly truncated #46

Closed
bwest2397 opened this issue May 24, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@bwest2397
Copy link

Version of the library

0.19.2

Describe the bug

Within a string with an unescaped quote followed at a later point by a comma, the string gets truncated after the second " character in the unescaped quote within the string. If this string is at the end of the JSON object and the string is not immediately followed by } (i.e. is followed by whitespace or e.g. a comma), then the final word in the string is parsed as a key with an empty (string) value.

This seems to relate to #44, but it seems the attempted fix for that bug report didn't fully resolve this.

How to reproduce

(Note, I've formatted the recovered/output JSON just to make it more readable)

For

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"}')

the recovered JSON is:

{
  "lorem": "Lorem \"ipsum"
}

For any of the following examples

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum" }')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"\n}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum",}')

the recovered JSON is:

{
  "lorem": "Lorem \"ipsum",
  "laborum": ""
}

Removing the comma, the output matches what we'd expect:

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }')

yields

{
  "lorem": "Lorem \"ipsum\" excepteur sint suntid est laborum"
}

Expected behavior

>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}

>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}
@bwest2397 bwest2397 added the bug Something isn't working label May 24, 2024
@mangiucugna
Copy link
Owner

This was tough because the library is actually acting as expected, I found a workaround that I am releasing now but is an unstable equilibrium when it comes to wrong delimiters because there are a million corner cases that can go wrong. Nonetheless the solution I found seems to be working and passes all tests.

@bwest2397
Copy link
Author

Awesome, thanks! I tested the new release with some samples I had and they seem to work 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants