Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/json: allow limiting Decoder.Decode read size #56733

Open
rolandshoemaker opened this issue Nov 14, 2022 · 7 comments
Open
Labels
Milestone

Comments

@rolandshoemaker
Copy link
Member

rolandshoemaker commented Nov 14, 2022

Typically, the advice to avoid reading excessively large values into memory when passing an untrusted io.Reader is to wrap the reader in a io.LimitedReader. For encoding/json.NewDecoder this is not necessarily a reasonable approach, since the Decoder may be intended to read from a long lived stream (i.e. a net.Conn) where the user may not want to limit the total amount of bytes read from the Reader across multiple reads, but does want to limit the number of bytes read during a single call to Decoder.Decode (i.e. reading 100 500 byte messages across the lifetime of the Decoder may be perfectly acceptable, but a single 50,000 byte read is not.)

A solution to this problem would be to add a method to Decoder, which allows limiting the amount read from the Reader into the internal Decoder buffer on each call to Decoder.Decode, returning an error if the number of bytes required to decode the value exceeds the set limit. Something along the lines of:

// LimitValueSize limits the number of bytes read from the wrapped io.Reader
// during a single call to dec.Decode. If decoding the value requires reading
// more than limit bytes, dec.Decode will return an error.
func (dec *Decoder) LimitValueSize(limit int)

cc @dsnet

@gopherbot gopherbot added this to the Proposal milestone Nov 14, 2022
@rittneje
Copy link

rittneje commented Nov 15, 2022

Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to Decode? I would think so as otherwise it would lead to inconsistent behavior, but the way it is phrased in your proposal is not entirely clear.

@dsnet
Copy link
Member

dsnet commented Nov 15, 2022

The feature seems reasonable in general, but it's unclear whether the limit should be value-based or token-based. Value-based limits make sense when calling Decode, while token-based limits make sense when calling Token.

As a work-around, you could reset the limit on an io.LimitedReader after every Decode call.

@Jorropo
Copy link
Contributor

Jorropo commented Nov 15, 2022

@dsnet The annoying thing with LimitedReader is that as you pointed out you need to reset it every time.

@Jorropo
Copy link
Contributor

Jorropo commented Nov 15, 2022

Should the bytes that are already in the decoder's buffer count against this limit on subsequent calls to Decode? I would think so as otherwise it would lead to inconsistent behavior, but the way it is phrased in your proposal is not entirely clear.

The goal is to help limit how much memory one single message is allowed to keep alive to help prevent memory exhaustion attacks,
So I would say this would limit how big the buffer is allowed, not counting buffering over-reads as this is implementation details.
So all of this to say I think this should be how big one message is allowed to be.

@Jorropo
Copy link
Contributor

Jorropo commented Nov 15, 2022

I think when the limit is reached, Decode should continue to read (but discard) the data and fully read over the problematic message.
This can be implement with a "simple" statemachine that count how many (),{},"", ... we have seen (like how the initial buffering already work but discarding away already red data).

This mean if you have 2 messages, the first one is too big but not the second one, decode would error the first time, however if you call it again it would succesfully read the second message.

@rsc
Copy link
Contributor

rsc commented Nov 16, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Nov 30, 2022

If you call SetValueLimit and then only call Token, is there just no limit?

Using an io.LimitedReader where you reset the limit is also a possibility, of course,
and it seems clearer what it means.

Maybe part of the problem is when the reset happens. What if there was just a SetInputLimit that applied to all future reads, and you have to reset it yourself when you want to allow more data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Active
Development

No branches or pull requests

6 participants