Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added benchmark to rosidl_runtime_cpp bounded vector #528

Merged
merged 5 commits into from
Sep 24, 2020

Conversation

ahcorde
Copy link
Contributor

@ahcorde ahcorde commented Sep 15, 2020

This PR builds on top of this other PR #521. Added some benchmark test to the BoundedVector class

Compiling up-to rosidl_runtime_cpp and testing rosidl_runtime_cpp with -DAMENT_RUN_PERFORMANCE_TESTS=ON:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Signed-off-by: ahcorde ahcorde@gmail.com

Signed-off-by: ahcorde <ahcorde@gmail.com>
@ahcorde ahcorde added the tests label Sep 15, 2020
@ahcorde ahcorde self-assigned this Sep 15, 2020
@ahcorde
Copy link
Contributor Author

ahcorde commented Sep 15, 2020

@cottsay Does it make sense to add a new folder for the benchmark tests?

@cottsay
Copy link
Member

cottsay commented Sep 15, 2020

Benchmark job: (job deleted)

----------------------------------------------------------------------------------------------------------------
Benchmark                                                      Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------
PerformanceTest/bounded_vector/8                            10.3 ns         10.3 ns     67490890 heap_allocations=88.9009n
PerformanceTest/bounded_vector/16                           19.3 ns         19.3 ns     36406180 heap_allocations=192.275n
PerformanceTest/bounded_vector/32                           43.5 ns         43.5 ns     16094356 heap_allocations=497.069n
PerformanceTest/bounded_vector/64                           73.0 ns         73.0 ns      9623053 heap_allocations=935.254n
PerformanceTest/bounded_vector/128                           151 ns          151 ns      4642678 heap_allocations=2.15393u
PerformanceTest/bounded_vector/256                           293 ns          293 ns      2377555 heap_allocations=4.6266u
PerformanceTest/bounded_vector/512                           580 ns          580 ns      1196697 heap_allocations=10.0276u
PerformanceTest/bounded_vector/1024                         1151 ns         1151 ns       605978 heap_allocations=21.4529u
PerformanceTest/bounded_vector/2048                         2296 ns         2296 ns       304568 heap_allocations=45.9667u
PerformanceTest/bounded_vector/4096                         4583 ns         4583 ns       152521 heap_allocations=98.3471u
PerformanceTest/bounded_vector_rvalue/8                     9.53 ns         9.53 ns     73172679 heap_allocations=81.9978n
PerformanceTest/bounded_vector_rvalue/16                    18.5 ns         18.5 ns     37899693 heap_allocations=184.698n
PerformanceTest/bounded_vector_rvalue/32                    43.0 ns         43.0 ns     16286802 heap_allocations=491.195n
PerformanceTest/bounded_vector_rvalue/64                    72.1 ns         72.1 ns      9732560 heap_allocations=924.731n
PerformanceTest/bounded_vector_rvalue/128                    152 ns          152 ns      4601041 heap_allocations=2.17342u
PerformanceTest/bounded_vector_rvalue/256                    295 ns          295 ns      2375556 heap_allocations=4.63049u
PerformanceTest/bounded_vector_rvalue/512                    581 ns          581 ns      1207204 heap_allocations=9.94032u
PerformanceTest/bounded_vector_rvalue/1024                  1154 ns         1154 ns       608669 heap_allocations=21.3581u
PerformanceTest/bounded_vector_rvalue/2048                  2298 ns         2298 ns       304648 heap_allocations=45.9547u
PerformanceTest/bounded_vector_rvalue/4096                  4588 ns         4588 ns       152668 heap_allocations=98.2524u
PerformanceTest/bounded_vector_insert/8                     2.23 ns         2.23 ns    313603669 heap_allocations=22.3212n
PerformanceTest/bounded_vector_insert/16                    2.23 ns         2.23 ns    313571066 heap_allocations=25.5126n
PerformanceTest/bounded_vector_insert/32                    3.72 ns         3.72 ns    187923123 heap_allocations=47.8919n
PerformanceTest/bounded_vector_insert/64                    3.63 ns         3.63 ns    192793007 heap_allocations=51.8691n
PerformanceTest/bounded_vector_insert/128                   8.50 ns         8.50 ns     75496726 heap_allocations=145.702n
PerformanceTest/bounded_vector_insert/256                   15.1 ns         15.1 ns     69463806 heap_allocations=172.752n
PerformanceTest/bounded_vector_insert/512                   28.5 ns         28.5 ns     24575068 heap_allocations=528.991n
PerformanceTest/bounded_vector_insert/1024                  55.3 ns         55.3 ns     12554901 heap_allocations=1.1151u
PerformanceTest/bounded_vector_insert/2048                  95.0 ns         95.0 ns      6690774 heap_allocations=2.24189u
PerformanceTest/bounded_vector_insert/4096                   250 ns          250 ns      3043871 heap_allocations=5.25646u
PerformanceTest/bounded_vector_input_iterators/8             215 ns          215 ns      3203446 heap_allocations=1
PerformanceTest/bounded_vector_input_iterators/16            210 ns          210 ns      3324255 heap_allocations=1
PerformanceTest/bounded_vector_input_iterators/32            216 ns          216 ns      3326207 heap_allocations=1
PerformanceTest/bounded_vector_input_iterators/64            215 ns          215 ns      3193527 heap_allocations=1
PerformanceTest/bounded_vector_input_iterators/128           210 ns          210 ns      3325638 heap_allocations=1
PerformanceTest/bounded_vector_input_iterators/256           213 ns          213 ns      3271500 heap_allocations=1.00001
PerformanceTest/bounded_vector_input_iterators/512           215 ns          215 ns      3324627 heap_allocations=1.00001
PerformanceTest/bounded_vector_input_iterators/1024          213 ns          213 ns      3325783 heap_allocations=1.00001
PerformanceTest/bounded_vector_input_iterators/2048          211 ns          211 ns      3273590 heap_allocations=1.00001
PerformanceTest/bounded_vector_input_iterators/4096          211 ns          211 ns      3322525 heap_allocations=1.00001
PerformanceTest/bounded_vector_forward_iterators/8          11.0 ns         11.0 ns     63447399 heap_allocations=173.372n
PerformanceTest/bounded_vector_forward_iterators/16         20.0 ns         20.0 ns     34933370 heap_allocations=543.893n
PerformanceTest/bounded_vector_forward_iterators/32         54.0 ns         54.0 ns     12988760 heap_allocations=2.69464u
PerformanceTest/bounded_vector_forward_iterators/64          139 ns          139 ns      5024708 heap_allocations=13.3341u
PerformanceTest/bounded_vector_forward_iterators/128         447 ns          447 ns      1565266 heap_allocations=83.6918u
PerformanceTest/bounded_vector_forward_iterators/256         877 ns          877 ns       799666 heap_allocations=323.885u
PerformanceTest/bounded_vector_forward_iterators/512        1733 ns         1733 ns       403432 heap_allocations=1.27655m
PerformanceTest/bounded_vector_forward_iterators/1024       4493 ns         4493 ns       156361 heap_allocations=6.56813m
PerformanceTest/bounded_vector_forward_iterators/2048       9811 ns         9811 ns        71833 heap_allocations=0.0285523
PerformanceTest/bounded_vector_forward_iterators/4096      19477 ns        19477 ns        36311 heap_allocations=0.112886

I think that pre-allocating a static 4096 size in the vector is really making the heap_allocations statistic useless. Any reason not to pre-allocate to len instead?

@ahcorde
Copy link
Contributor Author

ahcorde commented Sep 15, 2020

the BoundedVector class is templated:

 rosidl_runtime_cpp::BoundedVector<int, 4096> v;

This required a constexpr which is defined in compile time. we cannot use the variable len. any thoughts ?

@cottsay
Copy link
Member

cottsay commented Sep 17, 2020

Okay, I have a better understanding of what's going on now. My interpretation of the template arguments was wrong, but we should still be pre-allocating, or it will skew the results on the first iteration. Looks like v.reserve(len); is what you should be adding to prevent the need to allocate memory during the testing loop. That said, it may be interesting to test the case when allocation is necessary as the vector grows, but if that is the intent of the test, the vector needs to be shrunk between iterations so it is grown again each time, and I don't believe v.clear(); de-allocates the memory so that it can be re-allocated on the next iteration.

On a separate note, you should drop the calls to st.SetComplexityN(len);. They're only necessary for asymptotic complexity analysis, which we're not doing here. I need to remove them from the rosidl_runtime_c PR as well.

Signed-off-by: ahcorde <ahcorde@gmail.com>
Signed-off-by: ahcorde <ahcorde@gmail.com>
Copy link
Member

@cottsay cottsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still not reserving capacity in the vector before starting the timing loop, so there is still a single allocation happening for the first iteration but not for any subsequent iterations, resulting in a really small value for the heap_allocations value in the results.

All but one of these tests act in this way. The bounded_vector_input_iterators test, however, is performing an allocation for every assign invocation, which is interesting (this is demonstrated by heap_allocations=1 in the results, meaning that on average, there was 1 allocation for every iteration of the timing loop). My guess is that there is an unconditional call to realloc that's probably not in our code.

This is the patch I used to resolve that:

--- a/rosidl_runtime_cpp/test/benchmark/benchmark_bounded_vector.cpp
+++ b/rosidl_runtime_cpp/test/benchmark/benchmark_bounded_vector.cpp
@@ -28,6 +28,10 @@ BENCHMARK_DEFINE_F(PerformanceTest, bounded_vector)(benchmark::State & st)
 {
   rosidl_runtime_cpp::BoundedVector<int, 1> v;
 
+  v.reserve(1);
+
+  reset_heap_counters();
+
   for (auto _ : st) {
     v.push_back(0);
     v.erase(v.begin());
@@ -39,6 +43,10 @@ BENCHMARK_DEFINE_F(PerformanceTest, bounded_vector_rvalue)(benchmark::State & st
 {
   rosidl_runtime_cpp::BoundedVector<int, 1> v;
 
+  v.reserve(1);
+
+  reset_heap_counters();
+
   for (auto _ : st) {
     v.emplace_back(0);
     v.erase(v.begin());
@@ -49,9 +57,10 @@ BENCHMARK_REGISTER_F(PerformanceTest, bounded_vector_rvalue);
 BENCHMARK_DEFINE_F(PerformanceTest, bounded_vector_insert)(benchmark::State & st)
 {
   rosidl_runtime_cpp::BoundedVector<int, 1> v;
-  rosidl_runtime_cpp::BoundedVector<int, 1> v2;
 
+  rosidl_runtime_cpp::BoundedVector<int, 1> v2;
   v2.push_back(0);
+  v.reserve(1);
 
   reset_heap_counters();
 
@@ -71,6 +80,7 @@ BENCHMARK_DEFINE_F(PerformanceTest, bounded_vector_input_iterators)(benchmark::S
   std::istringstream ss;
   ss.str(vector_string);
   std::istream_iterator<int> ii(ss), end;
+  v.reserve(1);
 
   reset_heap_counters();
 
@@ -87,6 +97,9 @@ BENCHMARK_DEFINE_F(PerformanceTest, bounded_vector_forward_iterators)(benchmark:
 
   std::forward_list<int> l;
   l.push_front(0);
+  v.reserve(1);
+
+  reset_heap_counters();
 
   for (auto _ : st) {
     v.assign(l.begin(), l.end());

Signed-off-by: ahcorde <ahcorde@gmail.com>
@ahcorde
Copy link
Contributor Author

ahcorde commented Sep 23, 2020

Compiling up-to rosidl_runtime_cpp and testing rosidl_runtime_cpp with -DAMENT_RUN_PERFORMANCE_TESTS=ON:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@cottsay cottsay merged commit 5b9b8dc into perf_tests Sep 24, 2020
@delete-merged-branch delete-merged-branch bot deleted the ahcorde/perf_test branch September 24, 2020 19:35
cottsay pushed a commit that referenced this pull request Sep 24, 2020
Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants