Skip to content

werkzeug.serving.DechunkedInput.read returns more than what was asked for #2021

@ghost

Description

I believe this is a bug in werkzeug 1.0.1 and not Flask because the error is produced by request.stream.read(), which is a werkzeug.serving.DechunkedInput instance. Please let me know if this actually belongs to Flask, and I will report it there instead. Note that I have used Flask in these examples for simplicity and because this is where I first found the issue.

With Python 3.9 on Windows 10, given the following Flask app (pip install flask==1.1.2) as server.py:

from flask import Flask, request

SIZE = 320000
app = Flask('bug')

@app.route('/bug', methods=['POST'])
def bug():
    chunk = request.stream.read(SIZE)
    if len(chunk) > SIZE:
        raise ValueError(f'bug in {type(request.stream)}; read({SIZE}) returned {len(chunk)} bytes')

    return 'ok'

if __name__ == "__main__":
    app.run()

And the following client.py file (pip install aiohttp==3.7.3):

import asyncio
import aiohttp

URL = 'http://localhost:5000/bug'
EXTRA = 44
CHUNK = 4000
DELAY = 0.125

async def req():
    yield bytes(EXTRA)
    while True:
        yield bytes(CHUNK)
        await asyncio.sleep(DELAY)

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post(URL, data=req()) as resp:
            async for line in resp.content:
                print(line)

asyncio.run(main())

Running python server.py followed by python client.py in another terminal will eventually lead to the following error:

[2021-01-27 16:20:20,419] ERROR in app: Exception on /bug [POST]
Traceback (most recent call last):
  File "C:\Python39\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Python39\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Python39\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "C:\Python39\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "C:\Python39\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Python39\lib\site-packages\flask\app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\server.py", line 10, in bug
    raise ValueError(f'bug in {type(request.stream)}; read({SIZE}) returned {len(chunk)} bytes')
ValueError: bug in <class 'werkzeug.serving.DechunkedInput'>; read(320000) returned 320044 bytes

I expected read to never return more than size, but it did. According to PEP 3333 - Input and Error Streams, read is defined on the input stream with as follows (strong emphasis mine):

The semantics of each method are as documented in the Python Library Reference, except for these notes as listed in the table above:

  1. The server is not required to read past the client's specified Content-Length, and should simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable.

And the Python Library Reference for io.RawIOBase.read reads:

Read up to size bytes from the object and return them. As a convenience, if size is unspecified or -1, all bytes until EOF are returned. Otherwise, only one system call is ever made. Fewer than size bytes may be returned if the operating system call returns fewer than size bytes.

If 0 bytes are returned, and size was not 0, this indicates end of file. If the object is in non-blocking mode and no bytes are available, None is returned.

The default implementation defers to readall() and readinto().

Note that it says "Read up to size", and not "Read at least size". Werkzeug behaviours seem to not behave as the documentation says it should, which I think is a bug.

Because the default read implementation delegates to readinto, the buggy implementation is likely here, although I have not investigated further:

def readinto(self, buf: bytearray) -> int: # type: ignore

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions