Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Object and JsonElement deserialization iterative #1912

Merged

Conversation

Marcono1234
Copy link
Collaborator

@Marcono1234 Marcono1234 commented Jun 18, 2021

Often when Object and JsonElement are deserialized the format of the JSON data is unknown and it might come from an untrusted source. To avoid a StackOverflowError from maliciously crafted JSON, deserialize Object and JsonElement iteratively instead of recursively.
Note that in most cases Gson already catches StackOverflowError so users of current Gson versions would not encounter them.

Concept based on FasterXML/jackson-databind@51fd2fa
But implementation is not based on it.

@google-cla google-cla bot added the cla: yes label Jun 18, 2021
Often when Object and JsonElement are deserialized the format of the JSON
data is unknown and it might come from an untrusted source. To avoid a
StackOverflowError from maliciously crafted JSON, deserialize Object and
JsonElement iteratively instead of recursively.

Concept based on FasterXML/jackson-databind@51fd2fa
But implementation is not based on it.
@Marcono1234 Marcono1234 force-pushed the marcono1234/iterative-deserialization branch from 851728d to 539952e Compare Jun 19, 2021
@Marcono1234
Copy link
Collaborator Author

Marcono1234 commented May 16, 2022

@eamonnmcmanus, what do you think about these changes? They would probably fix #2109 and #2111.
This is also related to this oss-fuzz comment.

Copy link
Member

@eamonnmcmanus eamonnmcmanus left a comment

Thanks! I think this is a great idea, and I appreciate your taking the trouble to make it work. The current business of catching StackOverflowError has always felt very wobbly to me.

I'm running this against Google's internal tests to check whether they show anything up.

My main comment is that we should probably still have some limit on the maximum depth. Otherwise we could still be exposing users to denial-of-service attacks. If every { causes a new LinkedTreeMap to be allocated then an input string with a million of them is probably going to allocate on the order of 30M. We can imagine that being enough to cause a server to fall over.

We could have a hardcoded limit on nesting, say 1000, or we could make that a parameter settable by GsonBuilder.

* the next element is neither of those.
*/
private JsonElement tryBeginNesting(JsonReader in, JsonToken peeked) throws IOException {
if (peeked == JsonToken.BEGIN_ARRAY) {
Copy link
Member

@eamonnmcmanus eamonnmcmanus May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about switch here, and about LinkedList below. Is it possible to reduce the amount of code duplication between this class and ObjectTypeAdapter?

Copy link
Collaborator Author

@Marcono1234 Marcono1234 May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reduce the amount of code duplication between this class and ObjectTypeAdapter?

The code indeed looks pretty similar. Maybe this can be done by having a separate internal abstract IterativeDeserializer or similar with subclasses for Object and JsonElement deserialization, but I am not sure if this wouldn't be more complex (and possibly more inefficient). Do you have a specific implementation in mind?

Copy link
Member

@eamonnmcmanus eamonnmcmanus May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary to do anything about this right away, unless there is some way that people could switch between the two to circumvent any depth (etc) limitations we are trying to introduce here.

@Marcono1234
Copy link
Collaborator Author

Marcono1234 commented May 17, 2022

My main comment is that we should probably still have some limit on the maximum depth. Otherwise we could still be exposing users to denial-of-service attacks.

Sounds reasonable, but maybe it shouldn't be depth then, but instead total count. Otherwise an adversary could circumvent the depth restriction by creating one JSON array and placing all the JSON objects inside it. So instead the total number of encountered JSON arrays and objects could be tracked for these deserializers per deserialization call. What do you think?

If every { causes a new LinkedTreeMap to be allocated then an input string with a million of them is probably going to allocate on the order of 30M.

Maybe there is a related issue: When ArrayList increases its capacity the new backing array length is based on the current array length. Possibly that could be abused as well to deserialize JSON arrays with just enough items to cause the capacity to get increased. But that issue probably cannot be easily solved without using a custom List implementation or manually constructing the list before copying the elements to a properly sized ArrayList.

@eamonnmcmanus
Copy link
Member

eamonnmcmanus commented May 18, 2022

Sounds reasonable, but maybe it shouldn't be depth then, but instead total count. Otherwise an adversary could circumvent the depth restriction by creating one JSON array and placing all the JSON objects inside it. So instead the total number of encountered JSON arrays and objects could be tracked for these deserializers per deserialization call. What do you think?

I see what you mean. Instead of causing {{{{{{{{... to be decoded, Dr Evil could cause [{},{},{},{},...] to be decoded, with similar effect. It's probably a fool's errand to try to avoid problems like these. We might just have an overall configurable limit on how big a string we are prepared to decode, but it's pretty trivial for users to check that size themselves. There's nothing new here: if client code is prepared to decode any arbitrary string of any length then it's already potentially vulnerable to OutOfMemoryError DoS. But we do already catch OutOfMemoryError in appropriate places. So the main consequence of this PR would be that we wouldn't really need to catch StackOverflowError as well.

@Marcono1234
Copy link
Collaborator Author

Marcono1234 commented Jun 10, 2022

Is this pull request fine like this or are there any other things you want to have changed?

Copy link
Member

@eamonnmcmanus eamonnmcmanus left a comment

This looks good. Sorry for the delay. I checked it with all of Google's internal tests and didn't find any problems.

@eamonnmcmanus eamonnmcmanus merged commit 2d01d6a into google:master Jun 23, 2022
5 checks passed
@Marcono1234 Marcono1234 deleted the marcono1234/iterative-deserialization branch Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants