Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose message length as a property of TextReceived #86

Closed
mehaase opened this issue Nov 9, 2018 · 5 comments
Closed

Expose message length as a property of TextReceived #86

mehaase opened this issue Nov 9, 2018 · 5 comments

Comments

@mehaase
Copy link
Member

mehaase commented Nov 9, 2018

I want to enforce a maximum message size in trio-websocket. For a BytesReceived event, it is easy to check len(event.data). But for a TextReceived event, the string is already decoded and len(event.data) returns something else. (The number of characters? The number of code points?) I can encode the string and check its length, but this is a bit of extra work, and I'm unsure that decoding and re-encoding will yield the same bytes.

There are two possible ways to address this in wsproto.

  1. Add a "maximum message size" argument and let wsproto enforce it, i.e. close the connection with code 1009 (MESSAGE_TOO_BIG). This behavior can be disabled by setting max size to None.
  2. Add a data_size property to TextReceived that stores the the message length in bytes. This requires the caller to enforce maximum message sizes. It's a bit awkward to implement because the message size is a field on the header object, not the frame.

Is there interest in this? I can make a PR.

@njsmith
Copy link
Member

njsmith commented Nov 9, 2018

How important do you think it is to measure this limit exactly?

Notice that you're not going to get the actual on-the-wire size from this, since it doesn't include headers. And you don't really care about on-the-wire size here either, I don't think – the goal is to avoid spending unbounded amounts of memory on buffering, right?

@mehaase
Copy link
Member Author

mehaase commented Nov 10, 2018

The goal is to avoid spending unbounded amounts of memory on buffering, right?

Yes, that's the goal. Since we don't know how much memory will be used until we decode the string (I think that's why you asked if exactness matters), I was thinking that it would be far easier to enforce maximum message size by comparing to the "payload len" field in the header. If the payload length exceeds the maximum, then the library doesn't even need to decode the payload.

This is how aaugstin's websockets appears to do it. Their maximum is enforced per frame, not per message. (Enforcing per message makes more sense to me.) It also won't decompress more than the maximum amount, to prevent a zlib bomb, I suppose.

@njsmith
Copy link
Member

njsmith commented Nov 14, 2018

wsproto intentionally doesn't enforce any maximum on individual frames, because it can handle arbitrarily large frames in bounded memory (by streaming out pieces of the frame as fragments), and it wants to support users who can handle streaming too. (I still think it'd be nice to support this someday in trio-websocket, e.g. with a low-level receive_partial method :-).) Generally wsproto will give back all the data you put in, so as long as you only pass in a reasonable amount of data at once, then it'll give you back reasonable amounts of data. And separation of concerns suggests that the layer that's doing the buffering should be the layer that enforces limits on buffering :-).

The issue with zlib bombs is interesting though... if that's a concern then it can't be neatly handled at a higher level. The low-level frame decoding code actually needs to know about it.

Some options:

  • Ignore the zlib bomb problem: according to zlib technical notes, the theoretical maximum expansion factor for decompressing a deflate stream is 1032:1. So if you feed in N kilobytes of network data at a time, the most you could possibly get out at a time is N megabytes. This is not nearly as worrisome as if you could feed in N kilobytes and get out N petabytes! But I guess a 1000x expansion is not totally trivial either.

  • We could add frame-size limiting to wsproto. This seems a bit arbitrary and ugly to me though: the unusual users who want to handle messages in a streaming way couldn't use this, and the more common users who want to limit message sizes could use it but would still have to implement their own message-size limiting.

  • Since the zlib module can apparently stream out the decompressed data incrementally, we could tweak wsproto's decoder so that it uses this to split up frames with high compression ratios into multiple pseudo-frames. We already break up large frames into multiple pseudo-frames, so it's a pretty reasonable thing to do. It does look like this would require some refactoring of FrameDecoder.process_buffer, and maybe some refactoring of the "extensions" interface (but that interface is way overengineered anyway, so I wouldn't cry to see it changed or removed).

    This would require some policy for choosing how large of pieces to break the decompressed frames into – either by picking a good value automatically, or else making this value configurable. If we aren't using compression, then we implicitly let the user control how large each event is by how much data they choose to feed wsproto at a time. I guess for compressed data we could break it up into pieces whose size matches the last data chunk that was passed in? That feels messy and fiddly to me. We could simply hard-code the maximum size of individual pieces as like, 64k. We could make it an argument to WSConnection.__init__. We could make it an argument to WSConnection.events. Probably making it an argument to __init__ is the simplest option all around.

@mehaase
Copy link
Member Author

mehaase commented Nov 15, 2018

wsproto intentionally doesn't enforce any maximum on individual frames, because it can handle arbitrarily large frames in bounded memory

Yes, but isn't the fragmentation of WebSocket frames controlled by the peer that is sending them? I don't see where wsproto can break up large frames and stream them to the caller. Am I confused?

@njsmith
Copy link
Member

njsmith commented Nov 16, 2018

Ah, that's the confusion! wsproto's decoder does actually re-fragments WebSocket frames on the fly to reduce buffering.

If you look at FrameDecoder, it tracks a few different numbers:

  • The number of bytes in the frame as encoded on the wire: this is .header.payload_len
  • The minimum number of bytes that have to be present before we can start returning payload data to the user: this is .payload_required.
  • The number of bytes that we've already decoded from the wire frame that we've already decoded and returned: this is .payload_consumed.

If you look in FrameDecoder.parse_header, you'll see that for control frames (which have strictly bounded size and don't support fragmentation), payload_required = payload_len. For data frames (which have unbounded size and allow fragmentation), payload_required = 0, meaning we're allowed to fragment them while decoding.

And then in FrameDecoder.process_buffer, you'll see that if we're in the case where payload_required = 0 (a TEXT or BINARY frame), then it will happily construct a Frame object out of whatever partial frame it has (so it should really be called a PartialFrame or something like that), and Frame objects have both .frame_finished (which tells us whether this is the last piece in the current wire frame) and .message_finished (which tells us whether this wire frame is the last piece in the current message). Then in MessageDecoder.process_frame, the Frame class gets reused a second time with different, incompatible interpretation: now .message_finished tells us whether this particular fragment-of-a-fragment is the very last fragment-of-a-fragment in the overall message. (That's what that finished = frame.frame_finished and frame.message_finished line that you linked to is computing.)

And then eventually these end up as the .frame_finished and .message_finished attributes on DataReceived or TextReceived:

class DataReceived(Event):
def __init__(self, data, frame_finished, message_finished):
#: The message data as byte string, can be decoded as UTF-8 for TEXT messages.
#: This only represents a single chunk of data and not a full WebSocket message.
#: You need to buffer and reassemble these chunks to get the full message.
self.data = data
#: This has no semantic content, but is provided just in case some
#: weird edge case user wants to be able to reconstruct the
#: fragmentation pattern of the original stream. You don't want it:
self.frame_finished = frame_finished
#: True if this frame is the last one of this message, False if more frames are expected.
self.message_finished = message_finished

This is really convoluted and confusing, sorry! But at the end of the day, what it means as a user is:

  • Messages can be broken up into multiple DataReceived or TextReceived events
  • This can happen either because the sender broke the message into multiple frames on the wire, or because wsproto broke the frame into multiple pieces while parsing it
  • These two sources of fragmentation are exactly equivalent as far as the user is concerned, so you don't have to care, just look at the .message_finished attribute and you'll be fine

@Kriechi Kriechi closed this as completed Apr 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants