Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance benchmarking scripts #1979

Merged
merged 18 commits into from
Jan 20, 2024
Merged

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Jan 19, 2024

This is a cherry pick from #1978, without the CI. This can be reviewed and merged first, and then the CI can be built on top.

General usage (once this is merged to main):

  1. Go to the main branch.
  2. Run global_benchmarks.py
  3. Go to your development branch
  4. Run global_benchmarks.py again
  5. Run compare_timings.py

It should output a table like this:

Model Size Init time [95% CI] Run time [95% CI]
SchellingModel small 🔵 +5.7% [+1.7%, +10.8%] 🔵 +5.8% [+1.1%, +12.0%]
SchellingModel large 🔵 -1.0% [-2.1%, -0.2%] 🔵 -0.5% [-2.1%, +1.2%]
WolfSheep small 🔵 +0.0% [-0.6%, +0.6%] 🔵 -0.0% [-0.5%, +0.4%]
WolfSheep large 🔴 +211.2% [+172.4%, +255.0%] 🔵 +4.6% [-8.0%, +22.4%]
BoidFlockers small 🔵 +0.1% [-9.7%, +11.6%] 🔵 +3.1% [-0.7%, +7.4%]
BoidFlockers large 🔵 -11.9% [-25.2%, -1.8%] 🟢 -18.2% [-31.6%, -4.7%]

Positive indicates increate in runtime, negative a decrease. As for the colors:
If the 95% confidence interval is

  • fully below -3%: 🟢
  • fully above +3%: 🔴
  • else: 🔵

Open points

  1. I'm not 100% sure the seeds work correctly. Especially the fluctuations in WolfSheep are unexpectedly large sometimes. I questions if the runs are 100% deterministic. If someone want to dive in, please.
  2. The models are copied from ABMFrameworksComparison and not following our own code standards for Black and pre-commit. I'm thinking of excluding them from it.
  3. I'm using bootstrapping to circumvent small sample sizes. If someone with a statistics background can say if it's good enough for this use case, please do.

Curious if this is useful and what everybody thinks!

Few less replications, some more seeds. Every benchmark now takes between 10 and 20 seconds (on my machine).
That allows switching branches without benchmarks results disappearing
Prints some stuff when running and saves a pickle file after running.
- The bootstrap_speedup_confidence_interval function calculates the mean speedup and its confidence interval using bootstrapping, which is more suitable for paired data.
- The mean speedup and confidence interval are calculated for both initialization and run times.
- Positive values indicate an increase in time (longer duration), and negative values indicate a decrease (shorter duration).
- The results are displayed in a DataFrame with the percentage changes and their confidence intervals.
If the 95% confidence interval is:
- fully below -3%: 🟢
- fully above +3%: 🔴
- else: 🔵
@EwoutH EwoutH added feature Release notes label Performance labels Jan 19, 2024
@EwoutH EwoutH requested review from rht and Corvince January 19, 2024 21:39
@rht
Copy link
Contributor

rht commented Jan 20, 2024

It makes sense to decouple the CI issue from the rest, which already works.

@rht
Copy link
Contributor

rht commented Jan 20, 2024

I fixed the Ruff and Codespell issues, and am merging this PR as a checkpoint.

@rht rht merged commit 8973d56 into projectmesa:main Jan 20, 2024
12 checks passed
@EwoutH
Copy link
Member Author

EwoutH commented Jan 20, 2024

Thanks for the fixes and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Release notes label Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants