[mppi] Investigate speed up potential of preallocated noise. #4232

aosmw · 2024-04-02T07:55:16Z

Feature request

Feature description

Add a noise_benchmark to help explore potential speed up of using an xt::adapt(ed) vector of pre-generated noise rather than lazily evaluating it with xt::random::randn.

Implementation considerations

Trade off memory usage for speed of operation?
How much noise is enough?
Acceptable startup delay to pre-generate noise?
Option to reuse pre-generated noise during optimiser reset?
Does it translate to an actual improvement?

Preliminary Work.

Code - https://github.com/aosmw/navigation2/blob/feature/mw/pregenerate_noise/nav2_mppi_controller/benchmark/noise_benchmark.cpp
Results - noise_benchmark.txt

2024-04-02T17:07:32+11:00
Running ./build/nav2_mppi_controller/benchmark/noise_benchmark
Run on (24 X 4600 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 1024 KiB (x12)
  L3 Unified 19712 KiB (x1)
Load Average: 0.38, 0.51, 0.45
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------------
Benchmark                                          Time             CPU   Iterations
------------------------------------------------------------------------------------
BM_Noise_xt_random                            896127 ns       896116 ns          631
BM_Noise_xt_random_noalias                    885876 ns       885859 ns          762
BM_Noise_adapt_vector_1k                       81336 ns        81331 ns         8505
BM_Noise_adapt_vector_3k                       81357 ns        81355 ns         8602
BM_Noise_adapt_vector_shape_1k                 95690 ns        95684 ns         7114
BM_Noise_adapt_vector_shape_3k                 94969 ns        94966 ns         7315
BM_Noise_adapt_vector_shape_3k_adopt_once      81218 ns        81211 ns         8577

The text was updated successfully, but these errors were encountered:

SteveMacenski · 2024-04-02T19:25:05Z

We are not lazily evaluating it with xt::random::randn, if you look at the data structures involved, its immediately evaluated into the xtensor<float,2>, not an evaluation expression. But, happy for any contributions you might be able to provide to accelerate that process :-) Perhaps some different patterns here could be applied to other bottlenecks like trajectory rollouts.

Note that we have the regenerate_noises parameter that lets you select whether to regenerate noises each iteration or use a static set noise distribution for all iterations. After speaking with some folks about this, its not the most ideal for small samples, but as the number of trajectories being evaluated increases, it is increasingly less important that it is resampled each time. There are several companies that use MPPI in automotive applications which I spoke with and did not regenerate noises each iteration in their certified implementations, so I took my inspiration from them and "good enough for them, good enough for me".

But your vector 1k/3k being 10x faster seems really compelling, can you open a PR to introduce that and I can test it real quick on my side and merge it? I would mention that your numbers don't align with my experiences with computational time of these operations (they're a bit low) so I wonder in context if there's some cache breaks that result in some of that slow down. But hey, saving 0.7ms is a worthwhile activity!

SteveMacenski · 2024-04-22T18:59:10Z

@aosmw any insights from your work?

SteveMacenski · 2024-05-03T20:38:03Z

Closing without update. The other threads with Eigen seem more relevant

SteveMacenski closed this as completed May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mppi] Investigate speed up potential of preallocated noise. #4232

[mppi] Investigate speed up potential of preallocated noise. #4232

aosmw commented Apr 2, 2024 •

edited

Loading

SteveMacenski commented Apr 2, 2024

SteveMacenski commented Apr 22, 2024

SteveMacenski commented May 3, 2024

[mppi] Investigate speed up potential of preallocated noise. #4232

[mppi] Investigate speed up potential of preallocated noise. #4232

Comments

aosmw commented Apr 2, 2024 • edited Loading

Feature request

Feature description

Implementation considerations

Preliminary Work.

SteveMacenski commented Apr 2, 2024

SteveMacenski commented Apr 22, 2024

SteveMacenski commented May 3, 2024

aosmw commented Apr 2, 2024 •

edited

Loading