Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark accuracy #11

Open
ioquatix opened this issue May 15, 2018 · 3 comments
Open

Benchmark accuracy #11

ioquatix opened this issue May 15, 2018 · 3 comments

Comments

@ioquatix
Copy link

ioquatix commented May 15, 2018

I thought I'd add some notes while I'm looking through the code.

  • Agoo doesn't implement the same benchmark as the other rack compatible servers, because it serves from a static directory by default. Whether or not this reflects the real world (e.g. does passenger do this by default too?) should probably be discussed, but at the very least, I think we should have the SAME rackup file and run that for all servers.

  • It's not clear to me why we are using perfer vs wrk and ab or a variety of other testing tools. wrk can definitely push a large number of requests. I'll be interested to see the results I get with perfer

  • The puma benchmark uses rackup command. At least in the case of falcon, the rackup command imposes severe performance limitations. It might not be the same for puma, but I don't know. The best way to test puma would be in cluster mode.

  • If we used wrk to perform test, we can also report on latency, which is a useful metric. Throughput and latency are related and both useful numbers to report.

  • The benchmark page doesn't feel very impartial. I think we should make the benchmark results as objective as possible. There should be some caveats section so that people know the limitations of such benchmarks.

@ohler55
Copy link
Contributor

ohler55 commented May 16, 2018

Lets me comment on your points in order.

  1. Agoo does use a different backup file only to allow serving of static files. It is something that can drastically improve the performance as seen by the end users. I think it is worth showing. Do you know how to configure any of the other servers to serve static assets? Maybe we can update the other files to reflect that.

  2. I used prefer because wrk could not keep up with Agoo or for that matter, Iodine.

  3. I used what worked at the time for puma. I might be able to better next time. If you would like to update and run that would be great or if you prefer, update and I can run them since I was going to update versions and make another pass in the next week or two. Would you be willing to add falcon to the mix? As for cluster mode, I caught that with Iodine but missed it with puma. I think both clustered and single node are useful as in cluster mode apps must remain stateless.

  4. I completely agree latency is important. I think you will find that prefer does report latency which is also shown in the results along with the throughput.

  5. Help me out with making the benchmarks more impartial. I know each server has their own advantages. Benchmarks are just one aspect of that. The main page covers some other features. It feels a bit spread out though. Do you have ideas on how to improve the presentation?

@ioquatix
Copy link
Author

@ohler55 Great response, thanks for all the details and the positive feedback.

I will need to come back to you with specific answers, but just generally:

  • I know passenger also supports servings static files. Given my experience with HTTP-2, it's going to become a lot harder to serve static files efficiently, because HTTP-2 doesn't allow you to use sendfile/splice, etc very effectively. falcon doesn't have any special case for static files because it didn't really make sense given that it supports HTTP/1.x and HTTP/2.x which both have their own framing requirements - so static files are served just like any other (i.e. dynamically generated) content.

  • I'm surprised wrk wasn't good enough. I've used it with severs implemented purely in C++ which scale to 100,000 requests per CPU core and it's still been a pretty decent benchmarking tool, although I do sometimes wonder if you run into limitations.

@ohler55
Copy link
Contributor

ohler55 commented May 16, 2018

wrk was okay with generated content but I found it could not keep up on the static file serving. On my desktop machine I was hitting 200K+ and that machine is only a 4 core. By the way, I changed the output of perfer so the benchmark script needs an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants