Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Python Flask Gevent stack - Simple “Hello World” app shows as inefficient when benchmarked #1073
I have the following simple "Hello World" app:
As you can see it's pretty straightforward.
The problem is that despite such simpliness it's pretty slow/inefficient as the following benchmark (made with Apache Benchmark) shows:
Eventually increasing the number of connections and/or concurrency doesn't bring better results, in fact it becomes worse.
What I'm most concerned about is the fact that I can't go over 700 Requests per second and a Transfer rate of 98 Kbytes/sec.
Also, the individual Time per request seems to be too much.
I got curious about what Python and Gevent are doing in the background, or better, what the OS is doing, so I used a strace to determine eventual system-side issues and here's the result:
As you can see there are 5103 errors, the worst offender being the open syscall which I suspect has to do with files not being found (ENOENT). To my surprise epoll didn't look like a troubler, as I heard of many horror stories about it.
I wish to post the full strace which goes into the detail of every single call, but it is way too large.
A final note; I also set the following system parameters (which are the maximum allowed amount) hoping it would change the situation but it didn't:
My question is, given that the sample I'm using can't be changed so much to fix these issues, where should I look to correct them?
For a comparison I made the following "Hello World" script with Wheezy.web & Gevent and I got ~2000 Requests per second:
And the benchmark results:
I find Wheezy.web's speed great, but I'd still like to use Flask as it's far simpler and less time consuming to work with.
And now measure Django please, and tell them that it is too slow. I am sure
On Fri, May 30, 2014 at 04:30:38AM -0700, yakamooz wrote:
I agree with @untitaker that striving for anything close to wheezy.web performance is not realistic -wheezy was designed explicitly for speed and high concurrency and thus lacks the flexibility of flask and doesn't do nearly as much stuff for you.
In fact if concurrent performance is of that much importance then go would probably be a better choice over python.
@danielchatfield Here is the Wheezy.web strace (weird that it took more time in the background):
The fact is that I'm trying to squeeze the most out of Flask because I like its simplicity and speed of development in primis.
This wasn't an attempt to bash Flask and/or Python, really. I hope that given this benchmark and its strace someone could help me to find the "culprit" in Flask lower concurrency and fix it.
I wouldn't go with Go (sorry for the confusion I introduced) because Python is cleaner and easier to work it.
@methane Looking at the strace make it look like there's no much difference between the two. Now that the community confirmed its performance, what do you suggest I do to handle 1500-2000 Requests per second without modifying Flask? Switch from CPython to Pypy? Spread Python processes over many servers and CPU's?
First of all 10.000 requests is a fairly small number, you want to increase that to about 100.000 or even 1.000.000.
Nevertheless if I replicate your benchmark exactly on my machine (Mid 2011 MacBook Air 1.8 GHz i7) I get more than twice the performance.
Switching to PyPy for faster interpretation, using gunicorn with eventlet (no gevent with PyPy, yet at least), using 6 worker processes which seem to produce optimal results and adjusting the number of requests to 1.000.000 I get a throughput of 780 Kb/s and 4600 req/s.
Further looking at the benchmark method used I can't help but feel that 100 concurrent requests are also fairly low. In fact there are people reconfiguring kernels and developing async systems to achieve more than 10k concurrent requests. Simply setting the file descriptor limit to ulimit -n 10000 allowed me to increase the number of concurrent requests to 350 - by far not as high as I hoped but with more effort one could probably make more requests work - which allowed for a small but decent increase to about 5200 req/s and 900 Kb/s.
This is far faster than what you have achieved for both Flask and Wheezy, even accounting for my apparently faster hardware.
The problem here is not that Flask is slow you simply haven't configured your web server correctly. You could probably improve performance further still by using varnish for example. My machine is not exactly server material and given that hardware costs much less than developer time, getting a nice server would be an easy way to increase performance significantly as well.
@DasIch It looks like your machine has more throughput than mine. May I ask you how you run Gunicorn and PyPy?
I wanted to try PyPy too and got faster results like yours.
For the test I used Monocle + Tornado (and PyPy of course) and 1000 concurrent connections x1000 times.
I got ~6000 req/s with it. I got way worse results with Wheezy.web this time.
I know that Gevent isn't (still) supposed to work with PyPy, but I wanted to give it a try and make it work anyway. You guess, I got it working without too much effort. Though I'm very dubious that it works 100% at all, but good, this is a start point nonetheless.
So, I got the Gevent + Flask snippet to work with PyPy and it wasn't bad (~4-5000 req/s when fully "warmed"). It was still less performing than Monocle + Tornado. But if you have to trade the simplicity of Flask for the performance of Monocle + Tornado you can live with the performance of Flask + Gevent anyway as there's no much difference and you got to produce faster.
I want to share with you how I got Gevent and PyPy working, so we may fix remaining issues.
First make sure that you have all the required libraries in your system:
Install the cffi module:
Install a version of Gevent which has been modified to run on PyPy:
I also patched the gevent.core cffi module to fix the "erroneous" byte declaration that stopped the installation process. You may want to apply it:
There is a socket.py that I patched in the "pypycore" folder you cloned from Github. Replace the one in /usr/lib/pypy/lib-python/2.7 with it (make a backup for safety).
Before doing anything with PyPy and Gevent make sure Gevent uses the right gevent.core in the following way:
Now you can use Gevent and PyPy together!
I'd be glad if you posted your performance with it and see if you get more throughput than the ~4-5000 req/s I had.
I'm going to patch it to work with PyPy and see how much I get.