Hi,
I was dealing with having to parse json returned from ChatGPT and was looking for some kind of best effort json parser, which is how I found this.
Good work on doing this!
I noticed that the unit tests fail currently for some instances, like parsing "12." to 12. Or the test_incomplete_string fails because no exception is raised. Here I wondered why that would even be the expected behaviour? Shouldn't a string that doesn't end with a " be parsed correctly?
Dealing with incomplete json is only one of the issues, and actually not my main issue.
I found that often ChatGPT would include newlines.
For this I found you can use
json_decoder = json.JSONDecoder(strict=False)
json_decoder.decode(s)
This will preserve the newlines, which is not really legal json, but what we want.
There's also a similar project for javascript:
https://github.com/beenotung/best-effort-json-parser
And an old python based json parser, that's supposedly also able to parse somewhat illegal json:
https://pypi.org/project/demjson/
This one is outdated and doesn't work with newer python version.
There may be some ideas for features or edge cases in those projects to help you improve this library!
Would be nice to have a very robust parser for gpt produced json.
Hi,
I was dealing with having to parse json returned from ChatGPT and was looking for some kind of best effort json parser, which is how I found this.
Good work on doing this!
I noticed that the unit tests fail currently for some instances, like parsing "12." to 12. Or the test_incomplete_string fails because no exception is raised. Here I wondered why that would even be the expected behaviour? Shouldn't a string that doesn't end with a " be parsed correctly?
Dealing with incomplete json is only one of the issues, and actually not my main issue.
I found that often ChatGPT would include newlines.
For this I found you can use
This will preserve the newlines, which is not really legal json, but what we want.
There's also a similar project for javascript:
https://github.com/beenotung/best-effort-json-parser
And an old python based json parser, that's supposedly also able to parse somewhat illegal json:
https://pypi.org/project/demjson/
This one is outdated and doesn't work with newer python version.
There may be some ideas for features or edge cases in those projects to help you improve this library!
Would be nice to have a very robust parser for gpt produced json.