utlity: Add benchmark for string-utility functions #2348

jmarantz · 2018-01-11T06:06:15Z

Description:
Adds a benchmarking for new string utility functions added in #2368 using the google microbenchmarking library (https://github.com/google/benchmark), which was added in #2350. This serves as an example and proof of concept on how to use the benchmarking library, and as a reference point for the approximate performance of some string-parsing functions.

Risk Level: Very Low

Testing:
This is just a new speed-test, so running it on its own is sufficient to test it.

Results:

2018-01-18 08:01:44
Run on (12 X 3800 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 15360K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-------------------------------------------------------------------------------------
Benchmark                                              Time           CPU Iterations
-------------------------------------------------------------------------------------
BM_RTrimStringView                                    11 ns         11 ns   54467092
BM_RTrimStringViewAlreadyTrimmed                       9 ns          9 ns   77143046
BM_RTrimStringViewAlreadyTrimmedAndMakeString         24 ns         24 ns   29094696
BM_FindToken                                         165 ns        165 ns    4230849
BM_FindTokenValueNestedSplit                         427 ns        427 ns    1653627
BM_FindTokenValueSearchForEqual                      180 ns        180 ns    4046401

In practice I did not find this benchmark to be noisy on repeated runs.

to the other handlers with a browser. Add a private mechanism to have admin handlers supply arbitrary headers. Note that the handler callback doesn't expose a header structure, so the public v2 API does not provide this header-control. However, it was specifying text/html as content-type by default, so I changed the default to text/plain (with nosniff). Added sanitization of both the prefix string and the help text, which need to be safely injected into the HTML doc. Added some styling and a reference to the existing Envoy favicon as well. Note that no existing handlers get any response bytes changed as a result of this commit, just their content-type. Signed-off-by: Joshua Marantz <jmarantz@google.com>

…ms in particular. Signed-off-by: Joshua Marantz <jmarantz@google.com>

… can set the content-type and/or cache-control. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Adds an alternative string right-trimming function based on absl::string_view, which improves on the existing std::string-trimming function by about 3x: 2018-01-11 00:45:49 Run on (12 X 3800 MHz CPU s) CPU Caches: L1 Data 32K (x6) L1 Instruction 32K (x6) L2 Unified 256K (x6) L3 Unified 15360K (x1) ***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. avg trimmed=3 of 1 iters. avg trimmed=3 of 10 iters. avg trimmed=3 of 100 iters. avg trimmed=3 of 1000 iters. avg trimmed=3 of 10000 iters. avg trimmed=3 of 100000 iters. avg trimmed=3 of 1000000 iters. avg trimmed=3 of 8390885 iters. avg trimmed=3 of 18864031 iters. ---------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------- BM_RTrimString 34 ns 34 ns 18864031 avg trimmed=3 of 1 iters. avg trimmed=3 of 10 iters. avg trimmed=3 of 100 iters. avg trimmed=3 of 1000 iters. avg trimmed=3 of 10000 iters. avg trimmed=3 of 100000 iters. avg trimmed=3 of 1000000 iters. avg trimmed=3 of 10000000 iters. avg trimmed=3 of 64745962 iters. BM_RTrimStringView 11 ns 11 ns 64745962 Signed-off-by: Joshua Marantz <jmarantz@google.com>

Rewrites the rightTrim function to coincide with the impl in https://github.com/envoyproxy/envoy/pull/2087/files which is shorter (although it benchmarks out to be the same speed). Signed-off-by: Joshua Marantz <jmarantz@google.com>

gsagula · 2018-01-11T07:10:48Z

source/common/common/utility.cc

-    return absl::string_view(source.data(), pos + 1);
-  }
-  return absl::string_view("");
+absl::string_view StringUtil::rightTrim(absl::string_view source) {


@jmarantz I created a new class StringViewUtil with similar method/implementation in my last commit. Check this out: https://github.com/gsagula/envoy/blob/02ba40dd236645b89aec7689a81c9ff89f71dfc2/source/common/common/utility.cc#L222

Comment:
I think it's important to explicitly differentiate methods that use string view. Let me know what you think, so I might just move everything under StringUtil instead.

In absl -- and in earlier versions of similar libraries in Google & Chromium, they are co-mingled. Note that absl::string_view is I think intended as a stop-gap until C++17, when std::string_view (http://en.cppreference.com/w/cpp/string/basic_string_view) becomes available.

The reason -- I think -- that they are co-mingled is that there's no reason for the ones that take std::string arguments to exist at all. E.g. in my example if you want to right-strip a std::string you can just pass in it, because absl::string_view has an implicit constructor from std::string, though it requires an explicit std::string out-conversion, e.g.:

std::string foo; std::string bar = std::string(StringUtil::rightStrip(foo));

Ordinarily, though, if the backing-store stays alive along enough, you can just live in the string_view world and never make copies.

See https://github.com/abseil/abseil-cpp/blob/master/absl/strings/strip.h where the doc speaks in terms of std::string but actually all the functions take string_view.

So my preference is that you replace the ones in StringUtil:: rather than adding new alternatives. This will help spread the knowledge about string_view's performance by eliminating the slower equivalent.

Having said that, I knew your PR was coming, and the only reason I added my own version of rightStrip was as a testing vehicle for the benchmarking integration. And I want the benchmarking for something unrelated (faster startup in the presence of giant configs). Do you think you could split your StringUtil stuff in your PR into a smaller one, in which you also eliminate entirely the old rtrim and fix its call-sites? Unfortunately the rtrim function's signature (modifying a std::string&) can't be fixed in place.

What I'll do in any case is move my new utility function into the benchmark code itself, and your version can be authoritative, and I'll have a TODO on mine to delete it it and call yours instead once it's in.

@jmarantz Thanks for the details.

Sounds good to me. I will open a small PR for StringUtil refactoring and some new methods that I need for #2087

…rt it. The authoritative work for this is in envoyproxy#2087 . Another speed-test case I thought might be worth covering, where no trimming was necessary, and the result needs to be converted back to a std::string. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch

Neat.

htuch · 2018-01-11T13:22:03Z

test/common/common/utility_speed_test.cc

+
+int main(int argc, char** argv) {
+  benchmark::Initialize(&argc, argv);
+  if (benchmark::ReportUnrecognizedArguments(argc, argv))


Nit: braces on even single line if preferred.

done; this was just a c&p from the example in benchmark.h, but I added the braces.

htuch · 2018-01-11T13:24:47Z

test/common/common/utility_speed_test.cc

+  for (auto _ : state) {
+    absl::string_view text(AlreadyTrimmed, AlreadyTrimmedLength);
+    std::string string_copy = std::string(rightTrim(text));
+    accum += AlreadyTrimmedLength - string_copy.size();


Given how small the trim text is, I think the other loop overheads might diminish the actual speedup from the optimization, could be worth using much larger inner loop text.

It still showed a consistent 3x improvement.

htuch · 2018-01-11T13:27:07Z

test/common/common/utility_speed_test.cc

@@ -0,0 +1,100 @@
+// Note: this should be run with --compilation_mode=opt.


Based on a some experience doing this kind of microbenchmark work on Google workstations, I think the caveats are wider than just opt. For example, you want to disable cstate power management, ideally have a quiescent system and/or use CPU affinity with taskset to carve off some CPU to work on. Ideally we would run these in something equivalent to perflab.

I added some more text to that effect, though my results were pretty stable after multiple iterations. I guess a good place to put the results is the PR description.

htuch · 2018-01-11T13:28:43Z

ci/build_container/build_recipes/benchmark.sh

+
+set -x
+
+export COMMIT="e1c3a83b8197cf02e794f61228461c27d4e78cfb"  # benchmark @ Jan 11, 2018


FYI, the new preferred way to add external deps is https://github.com/envoyproxy/envoy/blob/master/bazel/EXTERNAL_DEPS.md#adding-external-dependencies-to-envoy-genrule-repository.

OK, I'll continue to iterate on that.

htuch · 2018-01-11T13:29:26Z

test/common/common/BUILD

+envoy_cc_binary(
+    name = "utility_speed_test",
+    deps = ["utility_speed_test_lib"],
+)


How come the binary/lib split here?

envoy_cc_binary does not allow an external lib. I'll add that as a comment.

You could also extend envoy_cc_binary to do this, it's only a couple of lines in the .bzl if you look at what envoy_cc_library does.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

…ate library for the speed_test. Signed-off-by: Joshua Marantz <jmarantz@google.com>

This is extracted from envoyproxy#2348 and needs to go in first -- with no code depdning on it -- so the CI works in the PR which uses the new dependency. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Signed-off-by: Joshua Marantz <jmarantz@google.com>

mattklein123

Cool stuff. @jmarantz as a drive by comment, as long as you are doing stuff like this, can you audit all the string utility functions we have and potentially see which ones should be outright swapped for absl versions? For sure split should be replaced with the string view one. Maybe others.

htuch · 2018-01-12T17:28:13Z

test/common/common/BUILD

@@ -69,3 +71,17 @@ envoy_cc_test(
    srcs = ["callback_impl_test.cc"],
    deps = ["//source/common/common:callback_impl_lib"],
 )
+
+# This is broken out as a separate library because envoy_cc_binary does
+# not allow for external_deps.


Stale comment.

htuch · 2018-01-12T17:28:18Z

test/common/common/BUILD

@@ -2,6 +2,8 @@ licenses(["notice"])  # Apache 2

 load(
    "//bazel:envoy_build_system.bzl",
+    "envoy_cc_binary",
+    "envoy_cc_library",


Not needed.

This is a prereq for running CI on #2348 Description: Integrates github.com/google/benchmark so it can be used in future PRs for demonstrating improvement. Note this PR does not actually depend on it, so it can pass CI. Risk Level: Low Tested: //test/... Signed-off-by: Joshua Marantz <jmarantz@google.com>

…not ready yet. Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-01-18T00:10:07Z

CI fails due to precompiled benchmark library not being available on asan/tsan/release. Format is probably something else :)

I don't really need to the speed_tests to be built with asan or tsan, but release-mode binaries would be good. How do I:

indicate in BUILD or CI not to build tsan/asan versions of the speed tests
make the release build work?

Note that libbenchmark is a 'prebuilt' at this point. I could try to do this the "right" way as a resolution, but other prebuilts must of the same issue?

mattklein123 · 2018-01-18T00:16:57Z

@jmarantz for third party libraries like the one you added, you need to bump the CI build image SHA: https://github.com/envoyproxy/envoy/blob/master/.circleci/config.yml#L3 https://hub.docker.com/r/envoyproxy/envoy-build/tags/

That should work for all builds.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-01-18T01:26:22Z

Thanks @mattklein123 ! That helped. I'm having a little trouble now understanding the format failure, as the one .cc file I modified does not seem affected running fix_format locally. WDYT of a PR to check_format.py to remove the >/dev/null redirection on the check commands, so we can see what's wrong in the CI log?

It's interesting to see the overhead of creating the split-vector per-token, which resolved by just looking for '=' and calling substr manually. Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch · 2018-01-18T15:32:15Z

@jmarantz are you running fix_format under Docker or locally? It's basically useless if you run it outside of Docker, unless you have precisely the same version of clang-format (and potentially other tools) as those used under CI. Best practice is to run under Docker IMHO.

I agree that better debugging in check_format would be useful, +1 to any change here.

Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz · 2018-01-23T15:41:49Z

@htuch I sorted out my format issues; it was a script problem in my own setup. PTAL.

Previously, 265MB release packages with debug symbols were published instead of 9MB release packages without them. Signed-off-by: Piotr Sikora <piotrsikora@google.com>

jmarantz added 6 commits December 21, 2017 18:10

Add unit tests and get all tests working, correcting some path proble…

8eb9e25

…ms in particular. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Make the admin HandlerCb pass through a headers reference so handlers…

29bc713

… can set the content-type and/or cache-control. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Merge remote-tracking branch 'refs/remotes/origin/master'

8fe318b

Shortens rightTrim function.

b79d16a

Rewrites the rightTrim function to coincide with the impl in https://github.com/envoyproxy/envoy/pull/2087/files which is shorter (although it benchmarks out to be the same speed). Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Jan 11, 2018

Performance: Allocating stats is O(n) #1824

Closed

gsagula reviewed Jan 11, 2018

View reviewed changes

jmarantz added 2 commits January 11, 2018 08:11

Remove absl includes now that I reverted the references to it.

be822f4

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch reviewed Jan 11, 2018

View reviewed changes

jmarantz added 2 commits January 11, 2018 09:48

Fix braces nit in boilerplate main code. Remove excess printing.

fcdfe13

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add external_deps support to envoy_cc_binary and remove the intermedi…

daf4d41

…ate library for the speed_test. Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz mentioned this pull request Jan 11, 2018

utility: add benchmark library, but don't use it yet. #2350

Merged

jmarantz changed the title ~~utlity: Add benchmark library~~ utlity: Add benchmark for string-trimming Jan 11, 2018

Remove gtest dependency from benchmark.

9057266

Signed-off-by: Joshua Marantz <jmarantz@google.com>

mattklein123 reviewed Jan 12, 2018

View reviewed changes

gsagula mentioned this pull request Jan 12, 2018

filter: implemented gzip http filter #2087

Merged

htuch reviewed Jan 12, 2018

View reviewed changes

mattklein123 assigned htuch Jan 12, 2018

gsagula mentioned this pull request Jan 15, 2018

utility: modified string util class #2368

Merged

jmarantz changed the title ~~utlity: Add benchmark for string-trimming~~ utlity: Add benchmark for string-utility functions Jan 17, 2018

jmarantz added 3 commits January 17, 2018 18:21

Merge branch 'master' into add-benchmark-library

c94cf1d

Merge branch 'master' into add-benchmark-library

dc1db43

Remove some detritus from earlier PRs and stats_speed_test, which is …

5132ae7

…not ready yet. Signed-off-by: Joshua Marantz <jmarantz@google.com>

Update the SHA to hopefully get CI to work.

bcf5812

Signed-off-by: Joshua Marantz <jmarantz@google.com>

Add an alternative faster implementation for finding a value in a token.

b22ea04

It's interesting to see the overhead of creating the split-vector per-token, which resolved by just looking for '=' and calling substr manually. Signed-off-by: Joshua Marantz <jmarantz@google.com>

jmarantz added 2 commits January 18, 2018 20:18

Add an alternative faster findToken operation.

dda3cda

Signed-off-by: Joshua Marantz <jmarantz@google.com>

formatting fix-up.

12b97c6

Signed-off-by: Joshua Marantz <jmarantz@google.com>

htuch approved these changes Jan 23, 2018

View reviewed changes

htuch merged commit bfc6408 into envoyproxy:master Jan 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

utlity: Add benchmark for string-utility functions #2348

utlity: Add benchmark for string-utility functions #2348

jmarantz commented Jan 11, 2018 •

edited

Loading

gsagula Jan 11, 2018 •

edited

Loading

jmarantz Jan 11, 2018 •

edited

Loading

gsagula Jan 11, 2018

htuch left a comment

htuch Jan 11, 2018

jmarantz Jan 11, 2018

htuch Jan 11, 2018

jmarantz Jan 11, 2018

htuch Jan 11, 2018

jmarantz Jan 11, 2018

htuch Jan 11, 2018

jmarantz Jan 11, 2018

htuch Jan 11, 2018

jmarantz Jan 11, 2018

htuch Jan 11, 2018

mattklein123 left a comment

htuch Jan 12, 2018

htuch Jan 12, 2018

jmarantz commented Jan 18, 2018

mattklein123 commented Jan 18, 2018

jmarantz commented Jan 18, 2018

htuch commented Jan 18, 2018

jmarantz commented Jan 23, 2018

		@@ -0,0 +1,100 @@
		// Note: this should be run with --compilation_mode=opt.


		set -x

		export COMMIT="e1c3a83b8197cf02e794f61228461c27d4e78cfb" # benchmark @ Jan 11, 2018

utlity: Add benchmark for string-utility functions #2348

utlity: Add benchmark for string-utility functions #2348

Conversation

jmarantz commented Jan 11, 2018 • edited Loading

gsagula Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

jmarantz Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmarantz commented Jan 18, 2018

mattklein123 commented Jan 18, 2018

jmarantz commented Jan 18, 2018

htuch commented Jan 18, 2018

jmarantz commented Jan 23, 2018

jmarantz commented Jan 11, 2018 •

edited

Loading

gsagula Jan 11, 2018 •

edited

Loading

jmarantz Jan 11, 2018 •

edited

Loading