Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmarks - wrk overload testing files [ci skip] #2695

Closed
wants to merge 2 commits into from

Conversation

MSP-Greg
Copy link
Member

Description

After working with wrk for a while, I thought having a script that ran several sets would be helpful. Also, after each run, collecting Puma stats and output from smem would show additional info. The summary output is as below:

────────wrk────────  ─Request─time─distribution─(ms)─  Worker─requests  ─wrk─requests─
 -t    -c   req/sec   50%    75%    90%    99%   100%  spread   total     total   bad
 10    50    10504    1.3   24.6   38.8   48.2   58.9   0.29   158708    158708     0
 13    65    10536    1.4   34.1   52.8   64.7   74.4   0.28   159153    159153     0
 17    85    10580    1.5   46.3   71.2   86.4   96.5   0.19   159859    159859     0
 23   115    10627    1.7   65.5  100.1  121.0  131.5   0.27   160631    160631     0
 30   150    10604    2.4   86.9  132.4  159.6  170.9   0.22   160249    160249     0
             10570                                    Totals   798600    798600
══════════════════════════════════════════════════════════════════════════════════════

══════════════════════════════════ Memory Change per Request
WRK Threads            10     13     17     23     30
    Connections        50     65     85    115    150    All
    Av Resp Size    52974  52960  52995  53008  53000  52987
════════════════════════════════════════════════════════════
worker-0-00    USS     42     25     17    -17      9     15
worker-0-01    USS     23     21     19    -12      6     11
All            USS     32     23     18    -15      7     13
════════════════════════════════════════════════════════════

The top section shows data from the five wrk runs, which start at the total number of threads the Puma server has, and stop at three times that number. The 'spread' column shows how consistent the requests are spread between workers. It's interesting that as the load increases, throughput (req/sec) stays about the same, but the response time increases, especially in the slowest responses.

The bottom section shows memory used after each wrk run, and should remain low. Oddly, using this with Ruby head lead to the discover of a memory leak in Ruby head, as the 'All' column was over 500 bytes per request.

Some notes:

  1. Since CI is not consistent enough to check performance, a good wrk script is helpful for determining whether changes improve performance.
  2. The code is commented in overload/benchmarks/local/bench_overload_wrk.rb. It uses smem for memory info, and also uses @ioquatix's fork of wrk. It only works with Puma in cluster mode (with workers).
  3. wrk doesn't support UNIXSockets, so I may try to get code running using some of the files in Add new test framework files, test for PR #2675 #2694, which shares rackup files with this PR.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

Copy link
Contributor

@jacobherrington jacobherrington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool idea!

I noticed some nit-picky things (that might have been left in on purpose) and left notes on them, but you're welcome to disregard them. 😄

setup_options

unless File.exist? @state_file
puts "can't fined state file '#{@state_file}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed a small typo here! 🙈

Suggested change
puts "can't fined state file '#{@state_file}'"
puts "can't find state file '#{@state_file}'"


print " |#{wrk_data[/^ +\d+ +requests.+/].rstrip}\n"

# puts '', wrk_data, '' # for debugging or output format changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed or uncommented?

Suggested change
# puts '', wrk_data, '' # for debugging or output format changes


def parse_stats
obj = @puma_info.run 'stats'
# puts ''; pp obj; puts ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above!

Suggested change
# puts ''; pp obj; puts ''

pids = @pids.keys

smem_info = %x[smem -c 'pid rss pss uss command']
# puts '', smem_info, ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above!

Suggested change
# puts '', smem_info, ''

overall_summary
leak_calc
puts RUBY_DESCRIPTION, ''
# puts ''; pp @info; puts ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above!

Suggested change
# puts ''; pp @info; puts ''

@ioquatix
Copy link
Contributor

To confirm, by USS you mean unique set size right? i.e. private non-sharable data.

@MSP-Greg
Copy link
Member Author

@jacobherrington

Thanks for reviewing and trying it.

I've updated things quite a bit, adding another generic rackup file, which, based on a request header, switches the body type between array, chunked, string, or io, and also the body size. I'll be adding some more benchmark scripts. I'll remove all the debug code as your comments above.

The code currently outputs to the console, but I've set it up so all the measurement data is a Ruby object, so it be can used for 'asserts' or pass/fail code. CI machines vary quite a bit, and I haven't (yet) figured out a way to determine how 'fast' they're running. But, if that can be done, we can add some performance testing to CI...

@@ -1 +1,3 @@
run lambda { |env| [200, {"Content-Type" => "text/plain"}, ["Hello World"]] }
hdrs = {'Content-Type'.freeze => 'text/plain'.freeze}.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freezing these is a good call - we're here to benchmark Puma's overhead, not the Rack app 👍

Copy link
Member

@nateberkopec nateberkopec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this merging whenever you're ready Greg

@MSP-Greg
Copy link
Member Author

@nateberkopec

Thanks. I'm still working on it, what I've expanded it to is a wrk script and a 'ruby connections' script that both are able to test the four types of request bodies (array, chunked, string, and IO), and also allow specifying the size of the body. Without a parameter for the body type/size, they run a matrix of all the combinations, but you can select a vector of the matrix, or a single 'cell/element'. The output shows requests per second, response time, response size, etc.

The 'ruby connections' script was using code that I originally wrote to send a stream of connections for use in CI tests, for things like restarts, shutdown, etc. Without a reasonably fast connection stream, testing for those state changes in Puma might not show real metrics about request handling, etc.

A couple of things:

One might ask 'why the Ruby script?' First, wrk doesn't support UNIXSockets. Secondly, 'requests per second' (RPS) is somewhat separate from response time (RT). With 'wrk', one can fill up the listener backlog. When that happens, RPS stays almost the same, but RT increases, first in the longest 20% or so, then starts to affect all the requests. One can query for the backlog size with a TCP listener, and I added it to stats so I could see it's value. I haven't looked at whether it can be added to stats for production.

Anyway, I'm still working on it, and also working on comments/documentation. Even, today, I'm still renaming methods that aren't clear...

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Jan 1, 2022

I'll be posting a large update to this, please do not merge! Sorry for the delay...

@nateberkopec nateberkopec added waiting-for-changes Waiting on changes from the requestor and removed waiting-for-review Waiting on review from anyone labels Jan 18, 2022
@MSP-Greg MSP-Greg marked this pull request as draft September 27, 2022 18:25
@MSP-Greg
Copy link
Member Author

This code is similar to the code already committed in PR #2895. That code used a constant wrk setup, and allowed a matrix of body types and sizes. The code here increases the load on Puma, and shows behavior in terms of RPS (requests per sec) and metrics on response time. Or, how Puma behaves as the incoming requests increase to the point of overloading it.

Given that there are several other issues that I'd like to work on, and also the merge conflicts shown above, closing with the expectation that I'll hoepfully reopen with updated code.

@MSP-Greg MSP-Greg closed this Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf waiting-for-changes Waiting on changes from the requestor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants