Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH traffic volume devastates HTTPS test scaling #194

Closed
FliesLikeABrick opened this issue Nov 16, 2017 · 3 comments
Closed

SSH traffic volume devastates HTTPS test scaling #194

FliesLikeABrick opened this issue Nov 16, 2017 · 3 comments

Comments

@FliesLikeABrick
Copy link
Contributor

FliesLikeABrick commented Nov 16, 2017

BWMG operates by opening an SSH session to each instance to collect verbose data from apachebench which is operating in verbose mode "-v 3", to collect fine-grained timing and response status data to be aggregated by BWMG itself.

This verbose output includes a large number of lines which are not needed by BWMG for statistics generation, especially when attacking HTTPS URLs (due to additional SSL/TLS debug sent to stderr and stdout). stderr is entirely discarded within bees.py anyway.

For an instance performing 2500 HTTP req/s, this is approximately 800KByte/s sent over SSH. For instances performing 400-600 HTTPS req/s this is approximately 1.2 megabytes/s.

In a test with 100 instances, this adds up: 80 megabytes/s of raw data to support 250kreq/s of load testing, or 600 megabytes/s (5 gigabits/s) to support 250kreq/s of HTTPS testing.

Unfortunately, this has a few negative side-effects:

  • Instances with 1 CPU face increased CPU contention and context switches to send this data
  • Data outbound from AWS is billable, and this bandwidth usage (in certain use cases) is actually greater than the data used for the actual benchmarking
  • For testing with a high number of instances, this bandwidth requirement becomes too large for the python client to sanely collect, on the order of hundreds of megabytes/s of SSH payload data.

Some kind of instance-side filtering of stdout and stderr to only the necessary data would alleviate these scaling limitations, reduce cost, and likely increase the requests/s that a given instance can perform when not maintaining this housekeeping (especially single-vCPU instances)

@FliesLikeABrick
Copy link
Contributor Author

I will be opening a PR shortly with a proposed solution for this issue.

Results (before -> after for [protocol] (req/s)):
800kbyte/s -> 145 kbyte/s for HTTP (3650 req/s)
1200kbyte/s -> 25 kbyte/s for HTTPS (600req/s)

Notice that the requests per second are appreciably higher than before due to lower CPU contention and context switching on the single-vCPU instance used (m3.medium). HTTP performance to the same target 3ms from us-east-1 increased from 2500 HTTP req/s to 3650. HTTPS performance increased from 400-500 req/s up to 630 req/s.

This issue was much worse for HTTPS due to the SSL verbosity from apachebench. This change represents a 98% reduction in output carried over SSH.

FliesLikeABrick pushed a commit to FliesLikeABrick/beeswithmachineguns that referenced this issue Nov 16, 2017
…rload of data carried via SSH to Python BWMG, especially for HTTPS. stderr disacarded in benchmark_command, and stdout filtered remotely with fast non-regexp fgrep.
FliesLikeABrick pushed a commit to FliesLikeABrick/beeswithmachineguns that referenced this issue Nov 16, 2017
@FliesLikeABrick FliesLikeABrick changed the title Devastating HTTPS performance issues SSH traffic volume devastates HTTPS test scaling Nov 16, 2017
@FliesLikeABrick
Copy link
Contributor Author

PR #195 opened:

  • stderr is discarded on the instance side inside of the benchmark_command shell command. stderr is never consumed by BWMG so this data should be discarded
  • fgrep was used due to simple substring matching being sufficient
  • A list of substrings required for statistics assembly is added, joined by newlines, and used with -F. This is to avoid needing to drop a file with list of patterns onto the AMI, at the "cost" of a larger benchmark_command. If this leads to undesirable output on the client (as benchmark_command is printed to the user), the command that is printed could be split on | and truncated (or abbreciated at this point with ellipses .... to at least indicate to the user that more command is being hidden
  • Doing this filtering via fgrep is not as ideal compared to if apachebench provided output tailored to BWMG's usage, but it offers one benefit that multi-v/CPU instances will likely schedule fgrep (and ssh) on another core, allowing ab to operate on one CPU less-interrupted. This is hypothetical/unproven.
  • This only minimally helps with context switches on the instance side when running on single-vCPU instances, as fgrep will contend for the same CPU. However, the work being performed is far less than what SSH would have been doing resource-wise. As mentioned in the previous comment, there is a definite benefit to apachebench performance with this patch.

@FliesLikeABrick
Copy link
Contributor Author

Closing - resolved by PR #195 merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant