ENH: Standalone benchmark script for the inner loops of ufunc #15987

seiko2plus · 2020-04-15T15:17:34Z

ENH: A standalone benchmark script for the inner loops of ufunc

This script only measuring the performance of inner loops of ufunc,
the idea behind it is to remove umath object calls from the equation,
in order to reduce the number of noises and provides stable ratios.

benchmarks/misc/benchin_ufunc.py

eric-wieser · 2020-04-15T15:32:24Z

Can we reuse our existing benchmark machinery here?

seiko2plus · 2020-04-15T15:42:03Z

@eric-wieser, I tried to use ASV but the result wasn't stable enough, check this patch and patch2 from #13516, the idea behind this patch is to benchmarking only the inner loop of ufunc in order to reduce the noises as much as possible, also ASV is kinda slow too.

EDIT: I moved the two mentioned patches to separate pull-requests #15992 and #15990

eric-wieser · 2020-04-15T15:49:35Z

It would be nice if we could at least hook into ASV for things like benchmark result comparisons and storage, rather than building our own version of those too. It might be worth starting a conversation with @pv about the best way to do that.

seberg · 2020-04-15T16:28:14Z

@seiko2plus you are repeating the function run multiple times here within your run function. May that be enough to stabilize the results a bit in asv?

EDIT: This got lost: "You are doing a few other things here that you are not doing in the asv version."

For example, if you just define the run function in C (and monkeypatch it into Benchmark), and make it do a couple of C-level calls (to offset the ~200ns or so overhead. That might be enough to get a stable result as well?

seiko2plus · 2020-04-15T17:03:37Z

@seberg, ASV already collect multiple samples for each benchmark, but still not stable enough even on idle CPU.

This script is not providing a replacement for the current ASV implementation, the main reason behind it is to detect any performance changes in the inner loops of ufunc and removing the functionality of umath and multiarry from the equation in order to reduce the noises as much as possible, it also provides more testing cases like multiple strides, sizes and better control for the testing process.

For example, if you just define the run function in C (and monkeypatch it into Benchmark), and make it do a couple of C-level calls (to offset the ~200ns or so overhead. That might be enough to get a stable result as well?

The problem is ASV doesn't provide a way to specify the elapsed time manually.

seiko2plus · 2020-04-15T19:46:06Z

EDIT: This got lost: "You are doing a few other things here that you are not doing in the asv version."

@seberg, I moved the mentioned patches from #13516, into a separate pull #15992 and #15990. also modified the number of repeats and samples to be equal to the default settings of this script.
but still, the ratio of ASV not stable enough.

r-devulap · 2020-04-17T03:48:11Z

One reason that could be causing noise is turbo mode. In case you haven't already done, I would recommend disabling for benchmarking purposes (set /sys/devices/system/cpu/intel_pstate/no_turbo to 1). May be that will help? I haven't had too much variability while benchmarking ufunc's with asv.

seiko2plus · 2020-04-18T16:56:17Z

@r-devulap, Before I run any benchmarks, I usually do:

isolate logical cores from scheduling through linux kernel options isolcpus and rcu_nocbs
reducing scheduling-clock ticks through nohz_full for the isolated cores
use option --cpu-affinity that comes with this script or what ASV provides for the isolated cores
use scaling governor performance via /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
disable turbo boost via /sys/devices/system/cpu/intel_pstate/no_turbo
make sure that ASLR(address space layout randomization) state is 'full randomization' through
set 2 to /proc/sys/kernel/randomize_va_space

Lately, I realized a python module called pyperf, provides a tool to tune the system with the above tips and many more via command pyperf system tune

However, it seems I should have an idle hardware in order to get almost stable ratios for ASV not just isolate some logical cores since any involved system calls that interpret the thread during collecting the benchmark samples will eliminate the benefits from isolating the logical cores via isolcpus and rcu_nocbs.

One of the things I don't like in ASV that its uses a separate process for each collected sample,
which makes it too slow.

mattip · 2020-07-10T10:45:11Z

ping @pv. Is there something here that we all are missing?

This script only measuring the performance of inner loops of ufunc, the idea behind it is to remove umath object calls from the equation, in order to reduce the number of noises and provides stable ratios.

hameerabbasi · 2020-11-09T14:07:24Z

I ran this PR on a live environment without a desktop (Ubuntu Server), using the method in the PR description. The noise was around 3% and this PR had a performance impact of ±5%, so not too much of a difference.

Whoops, had the wrong tab open. This comment was meant for #16247, copy pasting there.

charris · 2021-12-28T02:54:28Z

close/reopen

eric-wieser reviewed Apr 15, 2020

View reviewed changes

benchmarks/misc/benchin_ufunc.py Outdated Show resolved Hide resolved

eric-wieser reviewed Apr 15, 2020

View reviewed changes

benchmarks/misc/benchin_ufunc.py Outdated Show resolved Hide resolved

seiko2plus force-pushed the new_ufunc_benchmark branch from 3fb1562 to 28b0c07 Compare April 15, 2020 16:37

seiko2plus force-pushed the new_ufunc_benchmark branch 2 times, most recently from a58ab33 to 5f4bbde Compare April 15, 2020 19:41

seiko2plus mentioned this pull request Apr 15, 2020

ENH: Provides a deep benchmark for universal functions #15992

Draft

seiko2plus changed the title ~~ENH: Benchmark script for the inner loops of universal functions.~~ ENH: A standalone benchmark script for the inner loops of ufunc Apr 15, 2020

seiko2plus force-pushed the new_ufunc_benchmark branch from 5f4bbde to 9b4245b Compare April 15, 2020 20:24

seiko2plus marked this pull request as draft April 17, 2020 02:25

seiko2plus force-pushed the new_ufunc_benchmark branch from a28f11e to 230bc23 Compare April 18, 2020 17:13

seiko2plus marked this pull request as ready for review April 18, 2020 17:13

seiko2plus force-pushed the new_ufunc_benchmark branch 2 times, most recently from 8408248 to f17305e Compare April 19, 2020 13:06

charris added 01 - Enhancement 28 - Benchmark component: benchmarks labels Apr 19, 2020

charris changed the title ~~ENH: A standalone benchmark script for the inner loops of ufunc~~ ENH: Standalone benchmark script for the inner loops of ufunc Apr 19, 2020

seiko2plus force-pushed the new_ufunc_benchmark branch 3 times, most recently from dbce6f3 to e62c951 Compare April 23, 2020 03:08

seiko2plus force-pushed the new_ufunc_benchmark branch from a2ed2e5 to a231322 Compare April 29, 2020 02:08

seiko2plus mentioned this pull request May 1, 2020

ENH: enable multi-platform SIMD compiler optimizations #13516

Merged

seiko2plus mentioned this pull request Jul 11, 2020

ENH: Move dispatch-able umath fast-loops to the new dispatcher #16396

Closed

ENH: Standalone benchmark script for the inner loops of ufunc

5e557b5

This script only measuring the performance of inner loops of ufunc, the idea behind it is to remove umath object calls from the equation, in order to reduce the number of noises and provides stable ratios.

seiko2plus force-pushed the new_ufunc_benchmark branch from a231322 to 5e557b5 Compare October 7, 2020 07:28

seiko2plus mentioned this pull request Oct 7, 2020

ENH:Umath Replace raw SIMD of unary float point(32-64) with NPYV - g0 #16247

Merged

11 tasks

seiko2plus mentioned this pull request Oct 20, 2020

SIMD: Replace raw SIMD of sin/cos with NPYV(universal intrinsics) #17587

Merged

5 tasks

mattip mentioned this pull request Nov 12, 2020

BUG, Benchmark: fix passing optimization build options to asv #17736

Merged

seiko2plus added 2 commits November 14, 2020 19:10

improve argument parsing and add new option --rand-range

519d464

print numpy info

cd05ff6

Base automatically changed from master to main March 4, 2021 02:04

charris closed this Dec 28, 2021

charris reopened this Dec 28, 2021

mattip mentioned this pull request Jan 11, 2022

BENCH: consistently test benchmarks (specifically argmax/argmin) #20785

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Standalone benchmark script for the inner loops of ufunc #15987

ENH: Standalone benchmark script for the inner loops of ufunc #15987

seiko2plus commented Apr 15, 2020 •

edited

eric-wieser commented Apr 15, 2020

seiko2plus commented Apr 15, 2020 •

edited

eric-wieser commented Apr 15, 2020

seberg commented Apr 15, 2020 •

edited

seiko2plus commented Apr 15, 2020

seiko2plus commented Apr 15, 2020 •

edited

r-devulap commented Apr 17, 2020

seiko2plus commented Apr 18, 2020

mattip commented Jul 10, 2020

hameerabbasi commented Nov 9, 2020 •

edited

charris commented Dec 28, 2021

ENH: Standalone benchmark script for the inner loops of ufunc #15987

Are you sure you want to change the base?

ENH: Standalone benchmark script for the inner loops of ufunc #15987

Conversation

seiko2plus commented Apr 15, 2020 • edited

ENH: A standalone benchmark script for the inner loops of ufunc

eric-wieser commented Apr 15, 2020

seiko2plus commented Apr 15, 2020 • edited

eric-wieser commented Apr 15, 2020

seberg commented Apr 15, 2020 • edited

seiko2plus commented Apr 15, 2020

seiko2plus commented Apr 15, 2020 • edited

r-devulap commented Apr 17, 2020

seiko2plus commented Apr 18, 2020

mattip commented Jul 10, 2020

hameerabbasi commented Nov 9, 2020 • edited

charris commented Dec 28, 2021

seiko2plus commented Apr 15, 2020 •

edited

seiko2plus commented Apr 15, 2020 •

edited

seberg commented Apr 15, 2020 •

edited

seiko2plus commented Apr 15, 2020 •

edited

hameerabbasi commented Nov 9, 2020 •

edited