Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks for rosidl_runtime_* packages #521

Merged
merged 9 commits into from
Sep 24, 2020
Merged

Add benchmarks for rosidl_runtime_* packages #521

merged 9 commits into from
Sep 24, 2020

Conversation

cottsay
Copy link
Member

@cottsay cottsay commented Sep 3, 2020

This should not be merged before ros2/ros_buildfarm_config#119 or ros2/ci#512
The PR build will fail until ament/ament_cmake#276 is released

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

With -DAMENT_RUN_PERFORMANCE_TESTS=ON:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

This demonstrates that in a typical build, performance tests will be skipped, but the executables are still built on all platforms.

With the flag set to enable the tests, they are run on platforms which support the osrf_testing_tools_cpp memory tools, and platforms which don't support the memory tools get a build warning and the tests are skipped.

Performance test output:

-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
PerformanceTest/string_assign/8                     597 ns          594 ns       993376 heap_allocations=1
PerformanceTest/string_assign/16                    659 ns          655 ns      1141052 heap_allocations=1
PerformanceTest/string_assign/32                    629 ns          626 ns      1164928 heap_allocations=1
PerformanceTest/string_assign/64                    622 ns          620 ns      1004753 heap_allocations=1
PerformanceTest/string_assign/128                   612 ns          611 ns      1002201 heap_allocations=1
PerformanceTest/string_assign/256                   593 ns          592 ns      1182533 heap_allocations=1
PerformanceTest/string_assign/512                   601 ns          600 ns      1170811 heap_allocations=1
PerformanceTest/string_assign/1024                  611 ns          610 ns      1163985 heap_allocations=1
PerformanceTest/string_assign/2048                  626 ns          625 ns      1082659 heap_allocations=1
PerformanceTest/string_assign/4096                  644 ns          643 ns      1083679 heap_allocations=1
PerformanceTest/string_resize_assign/8             1201 ns         1199 ns       586648 heap_allocations=2
PerformanceTest/string_resize_assign/16            1205 ns         1203 ns       592009 heap_allocations=2
PerformanceTest/string_resize_assign/32            1251 ns         1249 ns       567994 heap_allocations=2
PerformanceTest/string_resize_assign/64            1265 ns         1264 ns       547082 heap_allocations=2
PerformanceTest/string_resize_assign/128           1301 ns         1298 ns       540009 heap_allocations=2
PerformanceTest/string_resize_assign/256           1283 ns         1281 ns       538794 heap_allocations=2
PerformanceTest/string_resize_assign/512           1338 ns         1334 ns       541751 heap_allocations=2
PerformanceTest/string_resize_assign/1024          1312 ns         1309 ns       393994 heap_allocations=2
PerformanceTest/string_resize_assign/2048          1315 ns         1311 ns       531893 heap_allocations=2
PerformanceTest/string_resize_assign/4096          1329 ns         1328 ns       526063 heap_allocations=2
PerformanceTest/u16string_assign/8                  608 ns          604 ns      1168873 heap_allocations=1
PerformanceTest/u16string_assign/16                 565 ns          563 ns      1287944 heap_allocations=1
PerformanceTest/u16string_assign/32                 703 ns          698 ns      1270066 heap_allocations=1
PerformanceTest/u16string_assign/64                 570 ns          568 ns       988422 heap_allocations=1
PerformanceTest/u16string_assign/128                575 ns          573 ns      1147430 heap_allocations=1
PerformanceTest/u16string_assign/256                564 ns          563 ns      1157172 heap_allocations=1
PerformanceTest/u16string_assign/512                568 ns          566 ns      1208399 heap_allocations=1
PerformanceTest/u16string_assign/1024               601 ns          600 ns      1165304 heap_allocations=1
PerformanceTest/u16string_assign/2048               622 ns          621 ns      1016077 heap_allocations=1
PerformanceTest/u16string_assign/4096               649 ns          648 ns       816846 heap_allocations=1
PerformanceTest/u16string_resize_assign/8          1219 ns         1217 ns       499635 heap_allocations=2
PerformanceTest/u16string_resize_assign/16         1217 ns         1215 ns       549265 heap_allocations=2
PerformanceTest/u16string_resize_assign/32         1222 ns         1220 ns       550277 heap_allocations=2
PerformanceTest/u16string_resize_assign/64         1286 ns         1284 ns       537823 heap_allocations=2
PerformanceTest/u16string_resize_assign/128        1308 ns         1306 ns       509220 heap_allocations=2
PerformanceTest/u16string_resize_assign/256        1272 ns         1271 ns       545931 heap_allocations=2
PerformanceTest/u16string_resize_assign/512        1273 ns         1271 ns       545449 heap_allocations=2
PerformanceTest/u16string_resize_assign/1024       1257 ns         1255 ns       541757 heap_allocations=2
PerformanceTest/u16string_resize_assign/2048       1284 ns         1282 ns       530219 heap_allocations=2
PerformanceTest/u16string_resize_assign/4096       1305 ns         1303 ns       531641 heap_allocations=2

Signed-off-by: Scott K Logan <logans@cottsay.net>
@cottsay cottsay self-assigned this Sep 3, 2020
@cottsay
Copy link
Member Author

cottsay commented Sep 3, 2020

Hmm, as a side effect of removing the CMake variable to disable the performance tests by default, I don't think there is a mechanism for suppressing the CMake warning that comes up when the osrf_testing_tools_cpp memory tools aren't available. We'll need to solve that before merging any tests so that we don't regress the arm64 and Windows builds, where the memory tools aren't supported.

EDIT: This has been addressed.

Copy link
Contributor

@brawner brawner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity result was N^2 for strings and N*log(N) for wstrings, but they had very similar performance numbers. That's a bit surprising. But the time is probably dominated by heap allocations since those grow faster super-linearly.

@cottsay
Copy link
Member Author

cottsay commented Sep 11, 2020

The complexity result was N^2 for strings and N*log(N) for wstrings, but they had very similar performance numbers.

I've plotted the results and ran my own analysis several times, and I think the problem is that there is so much overhead relative to the part of the code that actually scales based on the size, the results end up pretty noisy. If I run the analysis on really small sizes (100-10000), the difference in performance is negligible. If I make the numbers huge, the trends start to come through, but we'll testing string copy operations on 2 MiB strings is pretty much useless in this context.

I hate to say it, but the results are a lot better on my local machine. I'm seeing O(N) on all of them with an RMS closer to 5%. The EC2 machines just aren't giving us consistent enough results to run this type of analysis reliably.

I think part of the blame is on the analysis itself, too. It's fitting a log(n) and n^2 curves with coefficients that are so small that they're pretty much linear anyway, but they fit the data just a little better than a straight-up linear regression.

tl;dr - I think what we're doing here is so simple that the noise pretty much takes over. It's just not a good candidate for this type of test, where the overhead of calling the function in another library is about as costly as the function's algorithm.

@cottsay
Copy link
Member Author

cottsay commented Sep 11, 2020

Tell you what. I'm going to drop the asymptotic analysis for now. Maybe I can tweak the detection algorithm to be a little more reasonable about how it chooses the result.

I'll also change the range of sizes to test what we're realistically expecting to see for string conversions. (more like 8-2048 characters). It doesn't provide value to test unrealistic string sizes just to get the analysis to act how I'm expecting it to.

Signed-off-by: Scott K Logan <logans@cottsay.net>
@cottsay cottsay changed the title Add tests for asymptotic complexity of string conversions Add benchmarks for string conversion functions Sep 11, 2020
@cottsay
Copy link
Member Author

cottsay commented Sep 15, 2020

@ros-pull-request-builder, retest this please

Signed-off-by: Scott K Logan <logans@cottsay.net>
Signed-off-by: Scott K Logan <logans@cottsay.net>
@ahcorde
Copy link
Contributor

ahcorde commented Sep 23, 2020

Running CI with -DAMENT_RUN_PERFORMANCE_TESTS=ON

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

ahcorde and others added 2 commits September 24, 2020 15:03
Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Signed-off-by: Scott K Logan <logans@cottsay.net>
There isn't any interesting data here, and we aren't expecting the
performance to regress based on the size alone. I'm using an `Arg`
instead of hard-coding the size so that the test name reflects that
size, which should make it easier to compare results should we decide to
change it the size someday.

Signed-off-by: Scott K Logan <logans@cottsay.net>
@cottsay cottsay changed the title Add benchmarks for string conversion functions Add benchmarks for rosidl_runtime_* packages Sep 24, 2020
@cottsay cottsay merged commit be18237 into master Sep 24, 2020
@delete-merged-branch delete-merged-branch bot deleted the perf_tests branch September 24, 2020 22:38
ahcorde added a commit that referenced this pull request Oct 5, 2020
Co-authored-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Signed-off-by: Scott K Logan <logans@cottsay.net>
ahcorde added a commit that referenced this pull request Oct 8, 2020
Co-authored-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Signed-off-by: Alejandro Hernández Cordero <alejandro@openrobotics.org>
Signed-off-by: Scott K Logan <logans@cottsay.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants