Benchmarks are not running anymore #12

abitrolly · 2021-08-24T12:03:16Z

Looks like benchmarks are not being run for 3 years already https://numba.pydata.org/numba-benchmark/#summarylist?sort=0&dir=asc&llvmpy=%5Bnone%5D&os=Linux%203.13.0-54-generic&python=3.6&ram=3288606 What happened to them?

esc · 2021-08-24T12:10:15Z

Looks like benchmarks are not being run for 3 years already https://numba.pydata.org/numba-benchmark/#summarylist?sort=0&dir=asc&llvmpy=%5Bnone%5D&os=Linux%203.13.0-54-generic&python=3.6&ram=3288606 What happened to them?

There were Insufficient resources, so this is largely on hold.

abitrolly · 2021-08-24T12:13:34Z

@esc I could not find Numba on https://opencollective.com/search?q=numba

How much resources are needed?

esc · 2021-08-24T12:22:58Z

@esc I could not find Numba on https://opencollective.com/search?q=numba

How much resources are needed?

I am not sure how to answer this. The Numba project/ecosystem (which this repo is a part of) needs more contributors and especially people who are able to review pull-requests adequately.

abitrolly · 2021-08-24T15:26:27Z

@esc I actually thought about computing resources. I can try to setup CI/CD on GitHub Actions or GitLab Pipelines, and if there is a testing hardware, run benchmarks there to avoid going over GitHub/GitLab quota.

It is not that I am actually hoarded a lot of time to do this. With OpenCollective this time could be at least somehow compensated.

esc · 2021-08-24T16:37:21Z

@esc I actually thought about computing resources. I can try to setup CI/CD on GitHub Actions or GitLab Pipelines, and if there is a testing hardware, run benchmarks there to avoid going over GitHub/GitLab quota.

It is not that I am actually hoarded a lot of time to do this. With OpenCollective this time could be at least somehow compensated.

Interesting and thank you for bringing this to our attention. I am not sure it makes sense for you to work on this right now, there probably would be no-one to review and merge your changes at this stage.

abitrolly · 2021-08-25T06:04:38Z

Let try a lucky number #13.

gmarkall · 2021-08-25T08:58:57Z

Thanks for having a look at this. I think "resources" is a broad term, that in this case encompasses both:

Compute resources: As you suggested, one possibility is running benchmarks using Github actions. However, for machines that we're assigned by a CI / CD system, I'm not sure how we can have a stable baseline on which to run benchmarks - the machines that run the benchmarks will surely differ in their level of contention, load, memory / CPU specifications, etc., so it would be very difficult to make any comparisons from run to run, which I think limits the value of benchmarking. I find it hard to see how we can get to a useful and reliable benchmark setup without having greater / absolute control over the machines on which we run the benchmarks.
Time / people resources: As @esc already mentioned, reviewing PRs that add this sort of thing is one thing we have limited time to do, but in general once a benchmarking system is set up, it will require continuous monitoring / maintenance to ensure that it continues to run correctly and usefully. Maintaining the existing CI / testing setups for Numba (the Actions / pipelines for Numba / llvmlite and the buildfarm) presently consumes all the available time for this sort of task.

Whilst it would be great to get the benchmarks running continuously again, I'm not sure I see the way to set this up and keep it running at the present time.

abitrolly · 2021-08-26T12:49:42Z

The value of ephemeral CI/CD machines is just to ensure that benchmarks are not crashing, like now they are failing flake8 import checks (Use Github Actions #13). Then at least in GitLab it is possible to use own machines for runners, from which GitLab collects reports and logs. Baseline machines with specific hardware are needed. Is there a financial problem to get them?
If there are not enough people to care about infrastructure, does that mean Numba is low on budget, and it is a financial problem again? If somebody would want to represent the financial problem as mathematical problem, I think the benchmarks would be a key variable for third parties to estimate their savings as a reason to fund their availability.

gmarkall · 2021-08-27T10:48:20Z

That's a good point, it would be good to make sure that they're not crashing :-) - it would make sense to have them running for this purpose, just with some clear messaging / guidelines that the benchmark results aren't particularly usable.
My impression is that the limitations are mainly around maintainer time rather than money.

abitrolly · 2021-08-28T04:32:29Z

If there is no problem with funding, then I don't see the reason why not to grow project capacity to onboard more people to take care about that part specifically. )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks are not running anymore #12

Benchmarks are not running anymore #12

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 25, 2021

gmarkall commented Aug 25, 2021

abitrolly commented Aug 26, 2021

gmarkall commented Aug 27, 2021

abitrolly commented Aug 28, 2021

Benchmarks are not running anymore #12

Benchmarks are not running anymore #12

Comments

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 24, 2021

esc commented Aug 24, 2021

abitrolly commented Aug 25, 2021

gmarkall commented Aug 25, 2021

abitrolly commented Aug 26, 2021

gmarkall commented Aug 27, 2021

abitrolly commented Aug 28, 2021