Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utlity: Add benchmark for string-utility functions #2348

Merged
merged 18 commits into from
Jan 23, 2018

Conversation

jmarantz
Copy link
Contributor

@jmarantz jmarantz commented Jan 11, 2018

Description:
Adds a benchmarking for new string utility functions added in #2368 using the google microbenchmarking library (https://github.com/google/benchmark), which was added in #2350. This serves as an example and proof of concept on how to use the benchmarking library, and as a reference point for the approximate performance of some string-parsing functions.

Risk Level: Very Low

Testing:
This is just a new speed-test, so running it on its own is sufficient to test it.

Results:

2018-01-18 08:01:44
Run on (12 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 15360K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------
Benchmark                                              Time           CPU Iterations
-------------------------------------------------------------------------------------
BM_RTrimStringView                                    11 ns         11 ns   54467092
BM_RTrimStringViewAlreadyTrimmed                       9 ns          9 ns   77143046
BM_RTrimStringViewAlreadyTrimmedAndMakeString         24 ns         24 ns   29094696
BM_FindToken                                         165 ns        165 ns    4230849
BM_FindTokenValueNestedSplit                         427 ns        427 ns    1653627
BM_FindTokenValueSearchForEqual                      180 ns        180 ns    4046401

In practice I did not find this benchmark to be noisy on repeated runs.

to the other handlers with a browser.

Add a private mechanism to have admin handlers supply arbitrary
headers.  Note that the handler callback doesn't expose a header
structure, so the public v2 API does not provide this header-control.
However, it was specifying text/html as content-type by default, so
I changed the default to text/plain (with nosniff).

Added sanitization of both the prefix string and the help text, which
need to be safely injected into the HTML doc.

Added some styling and a reference to the existing Envoy favicon as well.

Note that no existing handlers get any response bytes changed as a result
of this commit, just their content-type.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ms in particular.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
… can set the

content-type and/or cache-control.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Adds an alternative string right-trimming function based on
absl::string_view, which improves on the existing std::string-trimming
function by about 3x:

2018-01-11 00:45:49
Run on (12 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 15360K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
avg trimmed=3 of 1 iters.
avg trimmed=3 of 10 iters.
avg trimmed=3 of 100 iters.
avg trimmed=3 of 1000 iters.
avg trimmed=3 of 10000 iters.
avg trimmed=3 of 100000 iters.
avg trimmed=3 of 1000000 iters.
avg trimmed=3 of 8390885 iters.
avg trimmed=3 of 18864031 iters.
----------------------------------------------------------
Benchmark                   Time           CPU Iterations
----------------------------------------------------------
BM_RTrimString             34 ns         34 ns   18864031
avg trimmed=3 of 1 iters.
avg trimmed=3 of 10 iters.
avg trimmed=3 of 100 iters.
avg trimmed=3 of 1000 iters.
avg trimmed=3 of 10000 iters.
avg trimmed=3 of 100000 iters.
avg trimmed=3 of 1000000 iters.
avg trimmed=3 of 10000000 iters.
avg trimmed=3 of 64745962 iters.
BM_RTrimStringView         11 ns         11 ns   64745962

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Rewrites the rightTrim function to coincide with the impl in
https://github.com/envoyproxy/envoy/pull/2087/files which is
shorter (although it benchmarks out to be the same speed).

Signed-off-by: Joshua Marantz <jmarantz@google.com>
return absl::string_view(source.data(), pos + 1);
}
return absl::string_view("");
absl::string_view StringUtil::rightTrim(absl::string_view source) {
Copy link
Member

@gsagula gsagula Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz I created a new class StringViewUtil with similar method/implementation in my last commit. Check this out: https://github.com/gsagula/envoy/blob/02ba40dd236645b89aec7689a81c9ff89f71dfc2/source/common/common/utility.cc#L222

Comment:
I think it's important to explicitly differentiate methods that use string view. Let me know what you think, so I might just move everything under StringUtil instead.

Copy link
Contributor Author

@jmarantz jmarantz Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In absl -- and in earlier versions of similar libraries in Google & Chromium, they are co-mingled. Note that absl::string_view is I think intended as a stop-gap until C++17, when std::string_view (http://en.cppreference.com/w/cpp/string/basic_string_view) becomes available.

The reason -- I think -- that they are co-mingled is that there's no reason for the ones that take std::string arguments to exist at all. E.g. in my example if you want to right-strip a std::string you can just pass in it, because absl::string_view has an implicit constructor from std::string, though it requires an explicit std::string out-conversion, e.g.:

    std::string foo;
    std::string bar = std::string(StringUtil::rightStrip(foo));

Ordinarily, though, if the backing-store stays alive along enough, you can just live in the string_view world and never make copies.

See https://github.com/abseil/abseil-cpp/blob/master/absl/strings/strip.h where the doc speaks in terms of std::string but actually all the functions take string_view.

So my preference is that you replace the ones in StringUtil:: rather than adding new alternatives. This will help spread the knowledge about string_view's performance by eliminating the slower equivalent.

Having said that, I knew your PR was coming, and the only reason I added my own version of rightStrip was as a testing vehicle for the benchmarking integration. And I want the benchmarking for something unrelated (faster startup in the presence of giant configs). Do you think you could split your StringUtil stuff in your PR into a smaller one, in which you also eliminate entirely the old rtrim and fix its call-sites? Unfortunately the rtrim function's signature (modifying a std::string&) can't be fixed in place.

What I'll do in any case is move my new utility function into the benchmark code itself, and your version can be authoritative, and I'll have a TODO on mine to delete it it and call yours instead once it's in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz Thanks for the details.

Sounds good to me. I will open a small PR for StringUtil refactoring and some new methods that I need for #2087

…rt it.

The authoritative work for this is in
envoyproxy#2087 .

Another speed-test case I thought might be worth covering, where no
trimming was necessary, and the result needs to be converted back
to a std::string.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat.


int main(int argc, char** argv) {
benchmark::Initialize(&argc, argv);
if (benchmark::ReportUnrecognizedArguments(argc, argv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces on even single line if preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done; this was just a c&p from the example in benchmark.h, but I added the braces.

for (auto _ : state) {
absl::string_view text(AlreadyTrimmed, AlreadyTrimmedLength);
std::string string_copy = std::string(rightTrim(text));
accum += AlreadyTrimmedLength - string_copy.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how small the trim text is, I think the other loop overheads might diminish the actual speedup from the optimization, could be worth using much larger inner loop text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still showed a consistent 3x improvement.

@@ -0,0 +1,100 @@
// Note: this should be run with --compilation_mode=opt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a some experience doing this kind of microbenchmark work on Google workstations, I think the caveats are wider than just opt. For example, you want to disable cstate power management, ideally have a quiescent system and/or use CPU affinity with taskset to carve off some CPU to work on. Ideally we would run these in something equivalent to perflab.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some more text to that effect, though my results were pretty stable after multiple iterations. I guess a good place to put the results is the PR description.


set -x

export COMMIT="e1c3a83b8197cf02e794f61228461c27d4e78cfb" # benchmark @ Jan 11, 2018
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll continue to iterate on that.

envoy_cc_binary(
name = "utility_speed_test",
deps = ["utility_speed_test_lib"],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the binary/lib split here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

envoy_cc_binary does not allow an external lib. I'll add that as a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also extend envoy_cc_binary to do this, it's only a couple of lines in the .bzl if you look at what envoy_cc_library does.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
…ate library for the speed_test.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
jmarantz added a commit to jmarantz/envoy that referenced this pull request Jan 11, 2018
This is extracted from envoyproxy#2348
and needs to go in first -- with no code depdning on it -- so the CI
works in the PR which uses the new dependency.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title utlity: Add benchmark library utlity: Add benchmark for string-trimming Jan 11, 2018
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool stuff. @jmarantz as a drive by comment, as long as you are doing stuff like this, can you audit all the string utility functions we have and potentially see which ones should be outright swapped for absl versions? For sure split should be replaced with the string view one. Maybe others.

@@ -69,3 +71,17 @@ envoy_cc_test(
srcs = ["callback_impl_test.cc"],
deps = ["//source/common/common:callback_impl_lib"],
)

# This is broken out as a separate library because envoy_cc_binary does
# not allow for external_deps.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment.

@@ -2,6 +2,8 @@ licenses(["notice"]) # Apache 2

load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_binary",
"envoy_cc_library",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed.

htuch pushed a commit that referenced this pull request Jan 17, 2018
This is a prereq for running CI on #2348

Description:
Integrates github.com/google/benchmark so it can be used in future PRs for demonstrating improvement. Note this PR does not actually depend on it, so it can pass CI.

Risk Level: Low

Tested: //test/...

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz jmarantz changed the title utlity: Add benchmark for string-trimming utlity: Add benchmark for string-utility functions Jan 17, 2018
@jmarantz
Copy link
Contributor Author

CI fails due to precompiled benchmark library not being available on asan/tsan/release. Format is probably something else :)

I don't really need to the speed_tests to be built with asan or tsan, but release-mode binaries would be good. How do I:

  • indicate in BUILD or CI not to build tsan/asan versions of the speed tests
  • make the release build work?

Note that libbenchmark is a 'prebuilt' at this point. I could try to do this the "right" way as a resolution, but other prebuilts must of the same issue?

@mattklein123
Copy link
Member

@jmarantz for third party libraries like the one you added, you need to bump the CI build image SHA: https://github.com/envoyproxy/envoy/blob/master/.circleci/config.yml#L3 https://hub.docker.com/r/envoyproxy/envoy-build/tags/

That should work for all builds.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

Thanks @mattklein123 ! That helped. I'm having a little trouble now understanding the format failure, as the one .cc file I modified does not seem affected running fix_format locally. WDYT of a PR to check_format.py to remove the >/dev/null redirection on the check commands, so we can see what's wrong in the CI log?

It's interesting to see the overhead of creating the split-vector
per-token, which resolved by just looking for '=' and calling substr
manually.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
@htuch
Copy link
Member

htuch commented Jan 18, 2018

@jmarantz are you running fix_format under Docker or locally? It's basically useless if you run it outside of Docker, unless you have precisely the same version of clang-format (and potentially other tools) as those used under CI. Best practice is to run under Docker IMHO.

I agree that better debugging in check_format would be useful, +1 to any change here.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@jmarantz
Copy link
Contributor Author

@htuch I sorted out my format issues; it was a script problem in my own setup. PTAL.

@htuch htuch merged commit bfc6408 into envoyproxy:master Jan 23, 2018
Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020
Previously, 265MB release packages with debug symbols were published
instead of 9MB release packages without them.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants