Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Standalone benchmark script for the inner loops of ufunc #15987

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Apr 15, 2020

ENH: A standalone benchmark script for the inner loops of ufunc

This script only measuring the performance of inner loops of ufunc,
the idea behind it is to remove umath object calls from the equation,
in order to reduce the number of noises and provides stable ratios.

@eric-wieser
Copy link
Member

Can we reuse our existing benchmark machinery here?

@seiko2plus
Copy link
Member Author

seiko2plus commented Apr 15, 2020

@eric-wieser, I tried to use ASV but the result wasn't stable enough, check this patch and patch2 from #13516, the idea behind this patch is to benchmarking only the inner loop of ufunc in order to reduce the noises as much as possible, also ASV is kinda slow too.

EDIT: I moved the two mentioned patches to separate pull-requests #15992 and #15990

@eric-wieser
Copy link
Member

It would be nice if we could at least hook into ASV for things like benchmark result comparisons and storage, rather than building our own version of those too. It might be worth starting a conversation with @pv about the best way to do that.

@seberg
Copy link
Member

seberg commented Apr 15, 2020

@seiko2plus you are repeating the function run multiple times here within your run function. May that be enough to stabilize the results a bit in asv?

EDIT: This got lost: "You are doing a few other things here that you are not doing in the asv version."

For example, if you just define the run function in C (and monkeypatch it into Benchmark), and make it do a couple of C-level calls (to offset the ~200ns or so overhead. That might be enough to get a stable result as well?

@seiko2plus
Copy link
Member Author

@seberg, ASV already collect multiple samples for each benchmark, but still not stable enough even on idle CPU.

This script is not providing a replacement for the current ASV implementation, the main reason behind it is to detect any performance changes in the inner loops of ufunc and removing the functionality of umath and multiarry from the equation in order to reduce the noises as much as possible, it also provides more testing cases like multiple strides, sizes and better control for the testing process.

For example, if you just define the run function in C (and monkeypatch it into Benchmark), and make it do a couple of C-level calls (to offset the ~200ns or so overhead. That might be enough to get a stable result as well?

The problem is ASV doesn't provide a way to specify the elapsed time manually.

@seiko2plus seiko2plus force-pushed the new_ufunc_benchmark branch 2 times, most recently from a58ab33 to 5f4bbde Compare April 15, 2020 19:41
@seiko2plus
Copy link
Member Author

seiko2plus commented Apr 15, 2020

EDIT: This got lost: "You are doing a few other things here that you are not doing in the asv version."

@seberg, I moved the mentioned patches from #13516, into a separate pull #15992 and #15990. also modified the number of repeats and samples to be equal to the default settings of this script.
but still, the ratio of ASV not stable enough.

@seiko2plus seiko2plus changed the title ENH: Benchmark script for the inner loops of universal functions. ENH: A standalone benchmark script for the inner loops of ufunc Apr 15, 2020
@seiko2plus seiko2plus marked this pull request as draft April 17, 2020 02:25
@r-devulap
Copy link
Member

One reason that could be causing noise is turbo mode. In case you haven't already done, I would recommend disabling for benchmarking purposes (set /sys/devices/system/cpu/intel_pstate/no_turbo to 1). May be that will help? I haven't had too much variability while benchmarking ufunc's with asv.

@seiko2plus
Copy link
Member Author

@r-devulap, Before I run any benchmarks, I usually do:

  • isolate logical cores from scheduling through linux kernel options isolcpus and rcu_nocbs
  • reducing scheduling-clock ticks through nohz_full for the isolated cores
  • use option --cpu-affinity that comes with this script or what ASV provides for the isolated cores
  • use scaling governor performance via /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
  • disable turbo boost via /sys/devices/system/cpu/intel_pstate/no_turbo
  • make sure that ASLR(address space layout randomization) state is 'full randomization' through
    set 2 to /proc/sys/kernel/randomize_va_space

Lately, I realized a python module called pyperf, provides a tool to tune the system with the above tips and many more via command pyperf system tune

However, it seems I should have an idle hardware in order to get almost stable ratios for ASV not just isolate some logical cores since any involved system calls that interpret the thread during collecting the benchmark samples will eliminate the benefits from isolating the logical cores via isolcpus and rcu_nocbs.

One of the things I don't like in ASV that its uses a separate process for each collected sample,
which makes it too slow.

@seiko2plus seiko2plus marked this pull request as ready for review April 18, 2020 17:13
@seiko2plus seiko2plus force-pushed the new_ufunc_benchmark branch 2 times, most recently from 8408248 to f17305e Compare April 19, 2020 13:06
@charris charris changed the title ENH: A standalone benchmark script for the inner loops of ufunc ENH: Standalone benchmark script for the inner loops of ufunc Apr 19, 2020
@seiko2plus seiko2plus force-pushed the new_ufunc_benchmark branch 3 times, most recently from dbce6f3 to e62c951 Compare April 23, 2020 03:08
@mattip
Copy link
Member

mattip commented Jul 10, 2020

ping @pv. Is there something here that we all are missing?

    This script only measuring the performance of inner loops
    of ufunc, the idea behind it is to remove umath object calls
    from the equation, in order to reduce the number of noises and
    provides stable ratios.
@hameerabbasi
Copy link
Contributor

hameerabbasi commented Nov 9, 2020

I ran this PR on a live environment without a desktop (Ubuntu Server), using the method in the PR description. The noise was around 3% and this PR had a performance impact of ±5%, so not too much of a difference.

Whoops, had the wrong tab open. This comment was meant for #16247, copy pasting there.

Base automatically changed from master to main March 4, 2021 02:04
@charris
Copy link
Member

charris commented Dec 28, 2021

close/reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants