-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
werkzeug.formparser is really slow with large binary uploads #875
Comments
I also have same problem, when I upload an iso file(200m), the first call to request.form will take 7s |
2 things seem interesting for further optimization - experimenting with cython, and experimenting with interpreting the content-site headers for smarter mime message parsing (no need to scan for lines if you know the content-length of a sub-message) |
Just a quick note, that if you stream the file directly in the request body (i.e. no |
I have the same issue with slow upload speeds with multipart uploads when using jQuery-File-Upload's chunked upload method. When using small chunks (~10MB), the transfer speed jumps between 0 and 12MB/s while the network and server are fully capable of speeds over 50MB/s. The slowdown is caused by the cpu bound multipart parsing which takes about the same time as the actual upload. Sadly, using streaming uploads to bypass the multipart parsing is not really an option as I must support iOS devices that can't do streaming in the background. The patch provided by @sekrause looks nice but doesn't work in python 2.7. |
@carbn: I was able to get the patch to work in Python 2.7 by changing the last line to |
@cuibonobo: That's the first thing I changed but still had another error. I can't check the working patch at the moment, but IIRC the yields had to be changed from |
A little further investigation shows that import io
import time
from werkzeug.wsgi import make_line_iter
filename = 'test.bin' # Large binary file
lines = 0
# load a large binary file into memory
with open(filename, 'rb') as f:
data = f.read()
stream = io.BytesIO(data)
filesize = len(data) / 2**20 # MB
start = time.perf_counter()
for _ in make_line_iter(stream):
lines += 1
stop = time.perf_counter()
delta = stop - start
print('File size: %.2f MB' % filesize)
print('Time: %.1f seconds' % delta)
print('Read speed: %.2f MB/s' % (filesize / delta))
print('Number of lines yielded by make_line_iter: %d' % lines) For a 923 MB video file with Python 3.5 the output look something like this on my laptop:
So even if you apply my optimization above and optimize it further until perfection you'll still be limited to ~45 MB/s for large binary uploads simply because I guess the only great optimization will be to completely replace |
I wanted to mention doing the parsing on the stream in chunks as it is received. @siddhantgoel wrote this great little parser for us. It's working great for me. https://github.com/siddhantgoel/streaming-form-data |
+1 for this. I am writing a bridge to stream user's upload directly to S3 without any intermediate temp files, possibly with backpressure, and I find |
@lambdaq I agree it's a problem that needs to be fixed. If this is important to you, I'd be happy to review a patch changing the behavior. |
@lambdaq Note that if you just stream data directly in the request body and use The only problem we had is the werkzeug form parser is eagerly checking content length against the allowed max content length before knowing if it should actually parse the request body. This prevents you from setting max content length on normal form data, but also allow very large file uploads. We fixed it by reordering the check the function a bit. Not sure if it makes sense to provide this upstream as some apps might rely on the existing behaviour. |
Unfortunately not. It's just normal form uploads with multipart.
I tried to hack Basically, the rabbit whole starts with |
I wonder how much this code could be sped up using native speedups written in C (or Cython etc.). I think handling semi-large (a few 100 MB, but not huge as in many GB) files more efficiently is important without having to change how the app uses them (ie streaming them directly instead of buffering) - for many applications this would be overkill and is not absolutely necessary (actually, even the current somewhat slow performance is probably OK for them) but making things faster is always nice! |
Another possible solution is offload the |
Both repos look dead. |
so is there no known solution to this? |
There's a workaround👆 |
Under uwsgi, we use it's built in |
Quoting from above:
I don't really have time to work on this right now. If this is something that you are spending time on, please consider contributing a patch. Contributions are very welcome. |
are you talking about streaming-form-data? if so, I'd love to know what the bug is. |
Our problem was that the slow form processing prevented concurrent request handling which caused My fix was to add a for i, line in enumerate(iterator):
if not line:
self.fail('unexpected end of stream')
# give other greenlets a chance to run every 100 lines
if i % 100 == 0:
time.sleep(0) search for |
seconded. |
@siddhantgoel |
See #1788 which discusses rewriting the parser to be sans-io. Based on the feedback here, I think that would address this issue too. |
@davidism I don't think this issue should be closed because the speed-up is negligible. Below is a little test script to benchmark the multipart parser and to compare Werkzeug with streaming-form-data. Run it with:
These are my results with a 425 MB zip file on my laptop:
So the new parser is only about 25% faster than the old parser, but still more than an order of magnitude slower than a fast parser. import argparse
import io
import time
from os.path import basename
from flask import Flask, request
from streaming_form_data import StreamingFormDataParser
from streaming_form_data.targets import BaseTarget
from werkzeug.test import EnvironBuilder, run_wsgi_app
app = Flask(__name__)
class LengthTarget(BaseTarget):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.total = 0
def on_data_received(self, chunk: bytes):
self.total += len(chunk)
@app.route("/streaming-form-data", methods=['POST'])
def streaming_form_data_upload():
target = LengthTarget()
parser = StreamingFormDataParser(headers=request.headers)
parser.register('file', target)
while True:
chunk = request.stream.read(131072)
if not chunk:
break
parser.data_received(chunk)
print(target.total)
return 'done'
@app.route("/werkzeug", methods=['POST'])
def werkzeug_upload():
file = request.files['file']
stream = file.stream
stream.seek(0, io.SEEK_END)
print(stream.tell())
return 'done'
def main():
parser = argparse.ArgumentParser()
parser.add_argument('parser', choices=['streaming-form-data', 'werkzeug'])
parser.add_argument('file')
args = parser.parse_args()
with open(args.file, 'rb') as f:
data = f.read()
# Prepare the whole environment in advance so that this doesn't slow down the benchmark.
e = EnvironBuilder(method='POST', path=f'/{args.parser}')
e.files.add_file('file', io.BytesIO(data), basename(args.file))
environ = e.get_environ()
start = time.perf_counter()
run_wsgi_app(app, environ)
stop = time.perf_counter()
delta = (stop - start) * 1000
print(f'{delta:.1f} ms')
if __name__ == "__main__":
main() |
@sekrause hey, I really appreciate the detail you're providing. However, in the five years since you opened this issue, neither you or anyone else invested in seeing the issue fixed has actually submitted a fix. I personally will not have the time to learn that library's implementation and identify how it can be applied to ours. Note that the library you're comparing to is implemented in C, so it's unlikely we'll every achieve the same speed. It's also already possible to use that library with Werkzeug when that speed is required. Perhaps someone could turn that into an extension library so it's more integrated as a I'm happy to consider a PR that adds further improvements to the parser, but leaving this issue open so far doesn't seem to have resulted in that. |
Author of the other library here. I'm more than happy to review proposals/patches in case someone wants to provide an extension so it can work better with Werkzeug. |
@davidism So I looked into your current implementation to check where it's slow and I think it turns out that from here we can get another 10x speedup by adding less than 10 lines of code. When uploading a large binary file most of the time is spent in the But we don't really need to look at all lines break. The trick is to offload as much work as possible to When we execute When uploading a large file almost all iterations of the loop can return immediately after If you want to test it yourself add this to self.boundary_end = b'--' + boundary + b'--' And then change the elif self.state == State.DATA:
if len(self.buffer) <= len(self.boundary_end):
event = NEED_DATA
elif self.buffer.find(self.boundary_end) == -1:
data = bytes(self.buffer[:-len(self.boundary_end)])
del self.buffer[:-len(self.boundary_end)]
event = Data(data=data, more_data=True)
else:
# Return up to the last line break as data, anything past
# that line break could be a boundary - more data may be
# required to know for sure.
lines = list(LINE_BREAK_RE.finditer(self.buffer))
if len(lines):
data_length = del_index = lines[-1].start()
match = self.boundary_re.search(self.buffer)
if match is not None:
if match.group(1).startswith(b"--"):
self.state = State.EPILOGUE
else:
self.state = State.PART
data_length = match.start()
del_index = match.end()
data = bytes(self.buffer[:data_length])
del self.buffer[:del_index]
more_data = match is None
if data or not more_data:
event = Data(data=data, more_data=more_data) Everything after the The What do you think? |
Sounds interesting, can you make a pr? |
I think we can make it work. The regular expression from self.boundary_re = re.compile(
br"%s--%s(--[^\S\n\r]*%s?|[^\S\n\r]*%s)"
% (LINE_BREAK, boundary, LINE_BREAK, LINE_BREAK),
re.MULTILINE,
) So if If This additional precheck with
Your change is a ~2x speed-up, from 7000 ms to 3700 ms on my computer with my 430 MB test file. I've posted my benchmark program in #875 (comment) if you want to compare yourself. |
Final summary now that our changes have landed in Github master. Small benchmark uploading a file of 64 MB random data 10 times in a row and measuring the average request time on an Intel Core i7-8550U:
With reasonable large files that's a 15x improvement (the difference is a little lower with small files because of the request overhead) and on a somewhat fast server CPU Werkzeug's multipart parser should now be able to saturate a gigabit ethernet link! I'm happy with the result. :) |
When I perform a
multipart/form-data
upload of any large binary file in Flask, those uploads are very easily CPU bound (with Python consuming 100% CPU) instead of I/O bound on any reasonably fast network connection.A little bit of CPU profiling reveals that almost all CPU time during these uploads is spent in
werkzeug.formparser.MultiPartParser.parse_parts()
. The reason this that the methodparse_lines()
yields a lot of very small chunks, sometimes even just single bytes:So
parse_parts()
goes through a lot of small iterations (more than 2 million for a 100 MB file) processing single "lines", always writing just very short chunks or even single bytes into the output stream. This adds a lot of overhead slowing down those whole process and making it CPU bound very quickly.A quick test shows that a speed-up is very easily possible by first collecting the data in a
bytearray
inparse_lines()
and only yielding that data back intoparse_parts()
whenself.buffer_size
is exceeded. Something like this:This change alone reduces the upload time for my 34 MB test file from 4200 ms to around 1100 ms over localhost on my machine, that's almost a 4X increase in performance. All tests are done on Windows (64-bit Python 3.4), I'm not sure if it's as much of a problem on Linux.
It's still mostly CPU bound, so I'm sure there is even more potential for optimization. I think I'll look into it when I find a bit more time.
The text was updated successfully, but these errors were encountered: