Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

werkzeug.formparser is really slow with large binary uploads #875

Open
sekrause opened this issue Mar 3, 2016 · 24 comments
Open

werkzeug.formparser is really slow with large binary uploads #875

sekrause opened this issue Mar 3, 2016 · 24 comments
Labels
bug

Comments

@sekrause
Copy link

@sekrause sekrause commented Mar 3, 2016

When I perform a multipart/form-data upload of any large binary file in Flask, those uploads are very easily CPU bound (with Python consuming 100% CPU) instead of I/O bound on any reasonably fast network connection.

A little bit of CPU profiling reveals that almost all CPU time during these uploads is spent in werkzeug.formparser.MultiPartParser.parse_parts(). The reason this that the method parse_lines() yields a lot of very small chunks, sometimes even just single bytes:

# we have something in the buffer from the last iteration.
# this is usually a newline delimiter.
if buf:
    yield _cont, buf
    buf = b''

So parse_parts() goes through a lot of small iterations (more than 2 million for a 100 MB file) processing single "lines", always writing just very short chunks or even single bytes into the output stream. This adds a lot of overhead slowing down those whole process and making it CPU bound very quickly.

A quick test shows that a speed-up is very easily possible by first collecting the data in a bytearray in parse_lines() and only yielding that data back into parse_parts() when self.buffer_size is exceeded. Something like this:

buf = b''
collect = bytearray()
for line in iterator:
    if not line:
        self.fail('unexpected end of stream')

    if line[:2] == b'--':
        terminator = line.rstrip()
        if terminator in (next_part, last_part):
            # yield remaining collected data
            if collect:
                yield _cont, collect
            break

    if transfer_encoding is not None:
        if transfer_encoding == 'base64':
            transfer_encoding = 'base64_codec'
        try:
            line = codecs.decode(line, transfer_encoding)
        except Exception:
            self.fail('could not decode transfer encoded chunk')

    # we have something in the buffer from the last iteration.
    # this is usually a newline delimiter.
    if buf:
        collect += buf
        buf = b''

    # If the line ends with windows CRLF we write everything except
    # the last two bytes.  In all other cases however we write
    # everything except the last byte.  If it was a newline, that's
    # fine, otherwise it does not matter because we will write it
    # the next iteration.  this ensures we do not write the
    # final newline into the stream.  That way we do not have to
    # truncate the stream.  However we do have to make sure that
    # if something else than a newline is in there we write it
    # out.
    if line[-2:] == b'\r\n':
        buf = b'\r\n'
        cutoff = -2
    else:
        buf = line[-1:]
        cutoff = -1

    collect += line[:cutoff]

    if len(collect) >= self.buffer_size:
        yield _cont, collect
        collect.clear()

This change alone reduces the upload time for my 34 MB test file from 4200 ms to around 1100 ms over localhost on my machine, that's almost a 4X increase in performance. All tests are done on Windows (64-bit Python 3.4), I'm not sure if it's as much of a problem on Linux.

It's still mostly CPU bound, so I'm sure there is even more potential for optimization. I think I'll look into it when I find a bit more time.

@languanghao

This comment has been minimized.

Copy link

@languanghao languanghao commented Mar 18, 2016

I also have same problem, when I upload an iso file(200m), the first call to request.form will take 7s

@RonnyPfannschmidt

This comment has been minimized.

Copy link
Contributor

@RonnyPfannschmidt RonnyPfannschmidt commented Mar 18, 2016

2 things seem interesting for further optimization - experimenting with cython, and experimenting with interpreting the content-site headers for smarter mime message parsing

(no need to scan for lines if you know the content-length of a sub-message)

@lnielsen

This comment has been minimized.

Copy link
Contributor

@lnielsen lnielsen commented Aug 22, 2016

Just a quick note, that if you stream the file directly in the request body (i.e. no application/multipart-formdata), you completely bypass the form parser and read the file directly from request.stream.

@carbn

This comment has been minimized.

Copy link

@carbn carbn commented Oct 18, 2016

I have the same issue with slow upload speeds with multipart uploads when using jQuery-File-Upload's chunked upload method. When using small chunks (~10MB), the transfer speed jumps between 0 and 12MB/s while the network and server are fully capable of speeds over 50MB/s. The slowdown is caused by the cpu bound multipart parsing which takes about the same time as the actual upload. Sadly, using streaming uploads to bypass the multipart parsing is not really an option as I must support iOS devices that can't do streaming in the background.

The patch provided by @sekrause looks nice but doesn't work in python 2.7.

@cuibonobo

This comment has been minimized.

Copy link

@cuibonobo cuibonobo commented Oct 18, 2016

@carbn: I was able to get the patch to work in Python 2.7 by changing the last line to collect = bytearray(). This just creates a new bytearray instead of clearing the existing one.

@carbn

This comment has been minimized.

Copy link

@carbn carbn commented Oct 18, 2016

@cuibonobo: That's the first thing I changed but still had another error. I can't check the working patch at the moment, but IIRC the yields had to be changed from yield _cont, collect to yield _cont, str(collect). This allowed the code to be tested and the patch yielded about 30% increase in the multipart processing speed. It's a nice speedup, but the performance is still pretty bad.

@sekrause

This comment has been minimized.

Copy link
Author

@sekrause sekrause commented Oct 18, 2016

A little further investigation shows that werkzeug.wsgi.make_line_iter is already too much of a bottleneck to really be able to optimize parse_lines(). Look at this Python 3 test script:

import io
import time
from werkzeug.wsgi import make_line_iter

filename = 'test.bin' # Large binary file
lines = 0

# load a large binary file into memory
with open(filename, 'rb') as f:
    data = f.read()
    stream = io.BytesIO(data)
    filesize = len(data) / 2**20 # MB

start = time.perf_counter()
for _ in make_line_iter(stream):
    lines += 1
stop = time.perf_counter()
delta = stop - start

print('File size: %.2f MB' % filesize)
print('Time: %.1f seconds' % delta)
print('Read speed: %.2f MB/s' % (filesize / delta))
print('Number of lines yielded by make_line_iter: %d' % lines)

For a 923 MB video file with Python 3.5 the output look something like this on my laptop:

File size: 926.89 MB
Time: 20.6 seconds
Read speed: 44.97 MB/s
Number of lines yielded by make_line_iter: 7562905

So even if you apply my optimization above and optimize it further until perfection you'll still be limited to ~45 MB/s for large binary uploads simply because make_line_iter can't give you the data fast enough and you'll be doing 7.5 million iterations for 923 MB of data in your loop that checks for the boundary.

I guess the only great optimization will be to completely replace parse_lines() with something else. A possible solution that comes to mind is to read a reasonably large chunk of the stream into memory then use string.find() (or bytes.find() in Python 3) to check if the boundary is in the chunk. In Python find() is a highly optimized string search algorithm written in C, so that should give you some performance. You would just have to take care of the case where the boundary might be right between two chunks.

@sdizazzo

This comment has been minimized.

Copy link

@sdizazzo sdizazzo commented Jun 2, 2017

I wanted to mention doing the parsing on the stream in chunks as it is received. @siddhantgoel wrote this great little parser for us. It's working great for me. https://github.com/siddhantgoel/streaming-form-data

@davidism davidism added the bug label Jun 2, 2017
@lambdaq

This comment has been minimized.

Copy link

@lambdaq lambdaq commented Jun 20, 2017

I guess the only great optimization will be to completely replace parse_lines()

+1 for this.

I am writing a bridge to stream user's upload directly to S3 without any intermediate temp files, possibly with backpressure, and I find werkzeug and flask situation frustrating. You can't move data directly between two pipes.

@pallets pallets deleted a comment from huhuanming Jun 20, 2017
@davidism

This comment has been minimized.

Copy link
Member

@davidism davidism commented Jun 20, 2017

@lambdaq I agree it's a problem that needs to be fixed. If this is important to you, I'd be happy to review a patch changing the behavior.

@lnielsen

This comment has been minimized.

Copy link
Contributor

@lnielsen lnielsen commented Jun 20, 2017

@lambdaq Note that if you just stream data directly in the request body and use application/octet-stream then the form parser doesn't kick in at all and you can use request.stream (i.e. no temp files etc).

The only problem we had is the werkzeug form parser is eagerly checking content length against the allowed max content length before knowing if it should actually parse the request body.

This prevents you from setting max content length on normal form data, but also allow very large file uploads.

We fixed it by reordering the check the function a bit. Not sure if it makes sense to provide this upstream as some apps might rely on the existing behaviour.

@lambdaq

This comment has been minimized.

Copy link

@lambdaq lambdaq commented Jun 21, 2017

Note that if you just stream data directly in the request body and use application/octet-stream then the form parser doesn't kick in at all and you can use request.stream (i.e. no temp files etc).

Unfortunately not. It's just normal form uploads with multipart.

I'd be happy to review a patch changing the behavior.

I tried to hack werkzeug.wsgi.make_line_iter or parse_lines() using generators's send(), so we can signal _iter_basic_lines() to emit whole chunks instead of lines. It turns out not so easy.

Basically, the rabbit whole starts with 'itertools.chain' object has no attribute 'send'.... 😂

@ThiefMaster

This comment has been minimized.

Copy link
Member

@ThiefMaster ThiefMaster commented Jun 21, 2017

I wonder how much this code could be sped up using native speedups written in C (or Cython etc.). I think handling semi-large (a few 100 MB, but not huge as in many GB) files more efficiently is important without having to change how the app uses them (ie streaming them directly instead of buffering) - for many applications this would be overkill and is not absolutely necessary (actually, even the current somewhat slow performance is probably OK for them) but making things faster is always nice!

@lambdaq

This comment has been minimized.

Copy link

@lambdaq lambdaq commented Sep 30, 2017

Another possible solution is offload the multipart parsing job to nginx

https://www.nginx.com/resources/wiki/modules/upload/

@ThiefMaster

This comment has been minimized.

Copy link
Member

@ThiefMaster ThiefMaster commented Dec 14, 2017

Both repos look dead.

@mdemin914

This comment has been minimized.

Copy link

@mdemin914 mdemin914 commented Feb 1, 2018

so is there no known solution to this?

@lnielsen

This comment has been minimized.

Copy link
Contributor

@lnielsen lnielsen commented Feb 1, 2018

There's a workaround👆

@sdizazzo

This comment has been minimized.

Copy link

@sdizazzo sdizazzo commented Feb 2, 2018

Under uwsgi, we use it's built in chunked_read() function and parse the stream on our own as it comes in. It works 99% of the time, but it has a bug that I have yet to track down. See my earlier comment for an out-of-the box streaming form parser. Under python2 it was slow, so we rolled our own and it is fast. :)

@davidism

This comment has been minimized.

Copy link
Member

@davidism davidism commented Feb 2, 2018

Quoting from above:

I agree it's a problem that needs to be fixed. If this is important to you, I'd be happy to review a patch changing the behavior.

I don't really have time to work on this right now. If this is something that you are spending time on, please consider contributing a patch. Contributions are very welcome.

@siddhantgoel

This comment has been minimized.

Copy link

@siddhantgoel siddhantgoel commented Feb 3, 2018

@sdizazzo

but it has a bug that I have yet to track down

are you talking about streaming-form-data? if so, I'd love to know what the bug is.

@kneufeld

This comment has been minimized.

Copy link

@kneufeld kneufeld commented Jul 21, 2018

Our problem was that the slow form processing prevented concurrent request handling which caused nomad to think the process was hung and killed it.

My fix was to add a sleep(0) in werkzeug/formparser.py:MutlipartParser.parse_lines():

            for i, line in enumerate(iterator):
                if not line:
                    self.fail('unexpected end of stream')

                # give other greenlets a chance to run every 100 lines
                if i % 100 == 0:
                    time.sleep(0)

search for unexpected end of stream if you want to apply this patch.

@patrislav1

This comment has been minimized.

Copy link

@patrislav1 patrislav1 commented Oct 12, 2018

I wanted to mention doing the parsing on the stream in chunks as it is received. @siddhantgoel wrote this great little parser for us. It's working great for me. https://github.com/siddhantgoel/streaming-form-data

seconded.
this speeds up file uploads to my Flask app by more than factor 10

@mkirisame

This comment has been minimized.

Copy link

@mkirisame mkirisame commented Sep 10, 2019

@siddhantgoel
Thanks a lot for your fix with streaming-form-data. I can finally upload gigabyte sized files at good speed and without memory filling up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.