Add performance benchmarking scripts #1979

EwoutH · 2024-01-19T21:39:21Z

This is a cherry pick from #1978, without the CI. This can be reviewed and merged first, and then the CI can be built on top.

General usage (once this is merged to main):

Go to the main branch.
Run global_benchmarks.py
Go to your development branch
Run global_benchmarks.py again
Run compare_timings.py

It should output a table like this:

Model	Size	Init time [95% CI]	Run time [95% CI]
SchellingModel	small	🔵 +5.7% [+1.7%, +10.8%]	🔵 +5.8% [+1.1%, +12.0%]
SchellingModel	large	🔵 -1.0% [-2.1%, -0.2%]	🔵 -0.5% [-2.1%, +1.2%]
WolfSheep	small	🔵 +0.0% [-0.6%, +0.6%]	🔵 -0.0% [-0.5%, +0.4%]
WolfSheep	large	🔴 +211.2% [+172.4%, +255.0%]	🔵 +4.6% [-8.0%, +22.4%]
BoidFlockers	small	🔵 +0.1% [-9.7%, +11.6%]	🔵 +3.1% [-0.7%, +7.4%]
BoidFlockers	large	🔵 -11.9% [-25.2%, -1.8%]	🟢 -18.2% [-31.6%, -4.7%]

Positive indicates increate in runtime, negative a decrease. As for the colors:
If the 95% confidence interval is

fully below -3%: 🟢
fully above +3%: 🔴
else: 🔵

Open points

I'm not 100% sure the seeds work correctly. Especially the fluctuations in WolfSheep are unexpectedly large sometimes. I questions if the runs are 100% deterministic. If someone want to dive in, please.
The models are copied from ABMFrameworksComparison and not following our own code standards for Black and pre-commit. I'm thinking of excluding them from it.
I'm using bootstrapping to circumvent small sample sizes. If someone with a statistics background can say if it's good enough for this use case, please do.

Curious if this is useful and what everybody thinks!

https://github.com/JuliaDynamics/ABMFrameworksComparison/tree/5551d7abf1611d377b3b32346c7774f176af4c65

Few less replications, some more seeds. Every benchmark now takes between 10 and 20 seconds (on my machine).

That allows switching branches without benchmarks results disappearing

Prints some stuff when running and saves a pickle file after running.

- The bootstrap_speedup_confidence_interval function calculates the mean speedup and its confidence interval using bootstrapping, which is more suitable for paired data. - The mean speedup and confidence interval are calculated for both initialization and run times. - Positive values indicate an increase in time (longer duration), and negative values indicate a decrease (shorter duration). - The results are displayed in a DataFrame with the percentage changes and their confidence intervals.

…older

If the 95% confidence interval is: - fully below -3%: 🟢 - fully above +3%: 🔴 - else: 🔵

rht · 2024-01-20T01:48:58Z

It makes sense to decouple the CI issue from the rest, which already works.

rht · 2024-01-20T02:25:37Z

I fixed the Ruff and Codespell issues, and am merging this PR as a checkpoint.

EwoutH · 2024-01-20T08:10:46Z

Thanks for the fixes and merging!

EwoutH added 12 commits January 19, 2024 22:28

benchmarks: Upload initial models from JuliaDynamics

20374c0

https://github.com/JuliaDynamics/ABMFrameworksComparison/tree/5551d7abf1611d377b3b32346c7774f176af4c65

benchmarks: Add dictionary with configurations

db6aace

benchmarks: Update configurations, use relative imports

e84f911

benchmarks: Add single script to run all benchmarks

37ed9ce

benchmarks: Update configurations

f914026

Few less replications, some more seeds. Every benchmark now takes between 10 and 20 seconds (on my machine).

benchmarks: Add generated pickle files to gitignore

794f0a9

That allows switching branches without benchmarks results disappearing

benchmarks: Update global script to calculate and save stuff

b7de31d

Prints some stuff when running and saves a pickle file after running.

benchmarks: Remove seperate benchmark scripts

40cb9a0

benchmarks: Black and ruff

898a40f

benchmarks: Use old f-string formatting to work with Python 3.11 and …

ebda3ea

…older

benchmarks: Add fancy colored performance indicators 🟢🔴🔵

9484f82

If the 95% confidence interval is: - fully below -3%: 🟢 - fully above +3%: 🔴 - else: 🔵

EwoutH added feature Release notes label Performance labels Jan 19, 2024

EwoutH requested review from rht and Corvince January 19, 2024 21:39

rht added 6 commits January 19, 2024 20:50

Fix typos caught by codespell

9cbb781

Apply ruff format benchmarks/

38cfdf7

Apply ruff --fix benchmarks/

b3f3c0d

Apply ruff --fix --unsafe-fixes benchmarks/

138a5f9

Apply manual Ruff fixes

a84a10a

codecov: Ignore benchmarks/

ccbf735

rht approved these changes Jan 20, 2024

View reviewed changes

rht merged commit 8973d56 into projectmesa:main Jan 20, 2024
12 checks passed

EwoutH mentioned this pull request Jan 20, 2024

Add performance benchmarking scripts and run in CI #1978

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance benchmarking scripts #1979

Add performance benchmarking scripts #1979

EwoutH commented Jan 19, 2024 •

edited

Loading

rht commented Jan 20, 2024

rht commented Jan 20, 2024

EwoutH commented Jan 20, 2024

Add performance benchmarking scripts #1979

Add performance benchmarking scripts #1979

Conversation

EwoutH commented Jan 19, 2024 • edited Loading

rht commented Jan 20, 2024

rht commented Jan 20, 2024

EwoutH commented Jan 20, 2024

EwoutH commented Jan 19, 2024 •

edited

Loading