Add benchmarks for rosidl_runtime_* packages #521

cottsay · 2020-09-03T19:14:30Z

~~This should not be merged before ros2/ros_buildfarm_config#119 or ros2/ci#512~~
The PR build will fail until ament/ament_cmake#276 is released

Linux
Linux-aarch64
macOS
Windows

With -DAMENT_RUN_PERFORMANCE_TESTS=ON:

Linux
Linux-aarch64
macOS
Windows

This demonstrates that in a typical build, performance tests will be skipped, but the executables are still built on all platforms.

With the flag set to enable the tests, they are run on platforms which support the osrf_testing_tools_cpp memory tools, and platforms which don't support the memory tools get a build warning and the tests are skipped.

Performance test output:

-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
PerformanceTest/string_assign/8                     597 ns          594 ns       993376 heap_allocations=1
PerformanceTest/string_assign/16                    659 ns          655 ns      1141052 heap_allocations=1
PerformanceTest/string_assign/32                    629 ns          626 ns      1164928 heap_allocations=1
PerformanceTest/string_assign/64                    622 ns          620 ns      1004753 heap_allocations=1
PerformanceTest/string_assign/128                   612 ns          611 ns      1002201 heap_allocations=1
PerformanceTest/string_assign/256                   593 ns          592 ns      1182533 heap_allocations=1
PerformanceTest/string_assign/512                   601 ns          600 ns      1170811 heap_allocations=1
PerformanceTest/string_assign/1024                  611 ns          610 ns      1163985 heap_allocations=1
PerformanceTest/string_assign/2048                  626 ns          625 ns      1082659 heap_allocations=1
PerformanceTest/string_assign/4096                  644 ns          643 ns      1083679 heap_allocations=1
PerformanceTest/string_resize_assign/8             1201 ns         1199 ns       586648 heap_allocations=2
PerformanceTest/string_resize_assign/16            1205 ns         1203 ns       592009 heap_allocations=2
PerformanceTest/string_resize_assign/32            1251 ns         1249 ns       567994 heap_allocations=2
PerformanceTest/string_resize_assign/64            1265 ns         1264 ns       547082 heap_allocations=2
PerformanceTest/string_resize_assign/128           1301 ns         1298 ns       540009 heap_allocations=2
PerformanceTest/string_resize_assign/256           1283 ns         1281 ns       538794 heap_allocations=2
PerformanceTest/string_resize_assign/512           1338 ns         1334 ns       541751 heap_allocations=2
PerformanceTest/string_resize_assign/1024          1312 ns         1309 ns       393994 heap_allocations=2
PerformanceTest/string_resize_assign/2048          1315 ns         1311 ns       531893 heap_allocations=2
PerformanceTest/string_resize_assign/4096          1329 ns         1328 ns       526063 heap_allocations=2
PerformanceTest/u16string_assign/8                  608 ns          604 ns      1168873 heap_allocations=1
PerformanceTest/u16string_assign/16                 565 ns          563 ns      1287944 heap_allocations=1
PerformanceTest/u16string_assign/32                 703 ns          698 ns      1270066 heap_allocations=1
PerformanceTest/u16string_assign/64                 570 ns          568 ns       988422 heap_allocations=1
PerformanceTest/u16string_assign/128                575 ns          573 ns      1147430 heap_allocations=1
PerformanceTest/u16string_assign/256                564 ns          563 ns      1157172 heap_allocations=1
PerformanceTest/u16string_assign/512                568 ns          566 ns      1208399 heap_allocations=1
PerformanceTest/u16string_assign/1024               601 ns          600 ns      1165304 heap_allocations=1
PerformanceTest/u16string_assign/2048               622 ns          621 ns      1016077 heap_allocations=1
PerformanceTest/u16string_assign/4096               649 ns          648 ns       816846 heap_allocations=1
PerformanceTest/u16string_resize_assign/8          1219 ns         1217 ns       499635 heap_allocations=2
PerformanceTest/u16string_resize_assign/16         1217 ns         1215 ns       549265 heap_allocations=2
PerformanceTest/u16string_resize_assign/32         1222 ns         1220 ns       550277 heap_allocations=2
PerformanceTest/u16string_resize_assign/64         1286 ns         1284 ns       537823 heap_allocations=2
PerformanceTest/u16string_resize_assign/128        1308 ns         1306 ns       509220 heap_allocations=2
PerformanceTest/u16string_resize_assign/256        1272 ns         1271 ns       545931 heap_allocations=2
PerformanceTest/u16string_resize_assign/512        1273 ns         1271 ns       545449 heap_allocations=2
PerformanceTest/u16string_resize_assign/1024       1257 ns         1255 ns       541757 heap_allocations=2
PerformanceTest/u16string_resize_assign/2048       1284 ns         1282 ns       530219 heap_allocations=2
PerformanceTest/u16string_resize_assign/4096       1305 ns         1303 ns       531641 heap_allocations=2

Signed-off-by: Scott K Logan <logans@cottsay.net>

cottsay · 2020-09-03T19:16:54Z

Hmm, as a side effect of removing the CMake variable to disable the performance tests by default, I don't think there is a mechanism for suppressing the CMake warning that comes up when the osrf_testing_tools_cpp memory tools aren't available. We'll need to solve that before merging any tests so that we don't regress the arm64 and Windows builds, where the memory tools aren't supported.

EDIT: This has been addressed.

rosidl_runtime_c/test/benchmark_string_conversion.cpp

Signed-off-by: Scott K Logan <logans@cottsay.net>

brawner

The complexity result was N^2 for strings and N*log(N) for wstrings, but they had very similar performance numbers. That's a bit surprising. But the time is probably dominated by heap allocations since those grow faster super-linearly.

cottsay · 2020-09-11T01:59:42Z

The complexity result was N^2 for strings and N*log(N) for wstrings, but they had very similar performance numbers.

I've plotted the results and ran my own analysis several times, and I think the problem is that there is so much overhead relative to the part of the code that actually scales based on the size, the results end up pretty noisy. If I run the analysis on really small sizes (100-10000), the difference in performance is negligible. If I make the numbers huge, the trends start to come through, but we'll testing string copy operations on 2 MiB strings is pretty much useless in this context.

I hate to say it, but the results are a lot better on my local machine. I'm seeing O(N) on all of them with an RMS closer to 5%. The EC2 machines just aren't giving us consistent enough results to run this type of analysis reliably.

I think part of the blame is on the analysis itself, too. It's fitting a log(n) and n^2 curves with coefficients that are so small that they're pretty much linear anyway, but they fit the data just a little better than a straight-up linear regression.

tl;dr - I think what we're doing here is so simple that the noise pretty much takes over. It's just not a good candidate for this type of test, where the overhead of calling the function in another library is about as costly as the function's algorithm.

cottsay · 2020-09-11T02:03:59Z

Tell you what. I'm going to drop the asymptotic analysis for now. Maybe I can tweak the detection algorithm to be a little more reasonable about how it chooses the result.

I'll also change the range of sizes to test what we're realistically expecting to see for string conversions. (more like 8-2048 characters). It doesn't provide value to test unrealistic string sizes just to get the analysis to act how I'm expecting it to.

Signed-off-by: Scott K Logan <logans@cottsay.net>

cottsay · 2020-09-15T17:31:52Z

@ros-pull-request-builder, retest this please

Signed-off-by: Scott K Logan <logans@cottsay.net>

ahcorde · 2020-09-23T07:55:33Z

Running CI with -DAMENT_RUN_PERFORMANCE_TESTS=ON

Linux
Linux-aarch64
macOS
Windows

Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>

Signed-off-by: Scott K Logan <logans@cottsay.net>

There isn't any interesting data here, and we aren't expecting the performance to regress based on the size alone. I'm using an `Arg` instead of hard-coding the size so that the test name reflects that size, which should make it easier to compare results should we decide to change it the size someday. Signed-off-by: Scott K Logan <logans@cottsay.net>

Co-authored-by: Alejandro Hernández Cordero <alejandro@openrobotics.org> Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org> Signed-off-by: Scott K Logan <logans@cottsay.net>

Add tests for asymptotic complexity of string conversions

002c7e8

Signed-off-by: Scott K Logan <logans@cottsay.net>

cottsay self-assigned this Sep 3, 2020

brawner reviewed Sep 9, 2020

View reviewed changes

rosidl_runtime_c/test/benchmark_string_conversion.cpp Outdated Show resolved Hide resolved

rosidl_runtime_c/test/benchmark_string_conversion.cpp Outdated Show resolved Hide resolved

cottsay added 2 commits September 10, 2020 15:34

Add round-trip resizing, tweak case counts

a986d2c

Signed-off-by: Scott K Logan <logans@cottsay.net>

Merge branch 'master' into perf_tests

5187d75

brawner approved these changes Sep 11, 2020

View reviewed changes

Drop complexity analysis

d46f249

Signed-off-by: Scott K Logan <logans@cottsay.net>

cottsay changed the title ~~Add tests for asymptotic complexity of string conversions~~ Add benchmarks for string conversion functions Sep 11, 2020

ahcorde mentioned this pull request Sep 15, 2020

Added benchmark to rosidl_runtime_cpp bounded vector #528

Merged

ahcorde mentioned this pull request Sep 15, 2020

Added benchmark test for rosidl_generator_c #529

Closed

ahcorde approved these changes Sep 16, 2020

View reviewed changes

cottsay added 2 commits September 17, 2020 11:07

Drop complexity notation, fix heap statistics

7239d67

Signed-off-by: Scott K Logan <logans@cottsay.net>

Move benchmarks to a separate directory

2166ddc

Signed-off-by: Scott K Logan <logans@cottsay.net>

ahcorde and others added 2 commits September 24, 2020 15:03

Added benchmark to rosidl_runtime_cpp bounded vector (#528)

03e3718

Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>

A couple of fixups

9285549

Signed-off-by: Scott K Logan <logans@cottsay.net>

cottsay force-pushed the perf_tests branch from 1d54f41 to 9285549 Compare September 24, 2020 22:03

cottsay changed the title ~~Add benchmarks for string conversion functions~~ Add benchmarks for rosidl_runtime_* packages Sep 24, 2020

cottsay merged commit be18237 into master Sep 24, 2020

delete-merged-branch bot deleted the perf_tests branch September 24, 2020 22:38

ahcorde mentioned this pull request Sep 25, 2020

Updated Quality Level to 1 #532

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for rosidl_runtime_* packages #521

Add benchmarks for rosidl_runtime_* packages #521

cottsay commented Sep 3, 2020 •

edited

Loading

cottsay commented Sep 3, 2020 •

edited

Loading

brawner left a comment

cottsay commented Sep 11, 2020

cottsay commented Sep 11, 2020

cottsay commented Sep 15, 2020

ahcorde commented Sep 23, 2020

Add benchmarks for rosidl_runtime_* packages #521

Add benchmarks for rosidl_runtime_* packages #521

Conversation

cottsay commented Sep 3, 2020 • edited Loading

cottsay commented Sep 3, 2020 • edited Loading

brawner left a comment

Choose a reason for hiding this comment

cottsay commented Sep 11, 2020

cottsay commented Sep 11, 2020

cottsay commented Sep 15, 2020

ahcorde commented Sep 23, 2020

cottsay commented Sep 3, 2020 •

edited

Loading

cottsay commented Sep 3, 2020 •

edited

Loading