-
-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cirrus CI free usage going away - job runtime issues & credits #24280
Comments
What do you want to happen with the build label? I use it when I want to test wheel builds. It is a bit annoying, as simply adding a label will retrigger the builds. |
Labels are for indicating what a PR or issue is about, so I add it to build-related PRs. And running wheel builds by default is fine, but the |
Fair enough, but some labels do trigger wheel builds, so they aren't only about what the PR or issue is about. Not sure how we could change that unless we want to add a |
Why Lines 23 to 25 in 7fc7277
It looks like it's not the build label though, we run wheel builds on Cirrus always. E.g. look at this doc-only PR with no labels: gh-24277. This wastes a huge amount of compute time. |
Cirrus is a bit different. One problem is that it cannot be manually run, I think it needs something in the *.yml file for that. I also think (only) wheels are built and tested, I'm not sure how a build can be made without that. It is on my problem list, but not annoying enough to spend time on it. Maybe @andyfaff has some thoughts. |
.cirrus.star is triggered to run with a lot of GH events on the numpy/numpy repo, e.g. if there are commits to PRs, commits to branches, PRs are opened, labels are attached to PRs, tags are pushed, merges, etc. Here is the current logic for numpy's cirrus CI, as contained in .cirrus.star:
This logic will dictate that the wheels always get built if there is no
I'm pretty sure [skip cirrus] works for commits to PRs. But for other events if the commit message for the SHA doesn't contain those words then the wheel build and macosx_arm64 will run. For example adding labels to a PR will trigger these runs if the last commit to the PR doesn't contain the magic words. It's not clear to me from previous comments what additional logic is being requested. The following environment variables may be useful in reducing the number of runs made:
Possible extra logic that could be done:
Relevant links:CIRRUS_CHANGE_IN_REPO EDIT: |
Having just said that I see an issue at https://github.com/numpy/numpy/blob/main/.cirrus.star#L27. I'll open a PR. |
You can examine what we requested from the GH API in 24282. The github request is: This is what is returned:
The cirrus CI didn't run because [skip cirrus] is in |
I think we should be able to limit the wheel build to whatever is in the GHA wheels build, if that's desired. EDIT: apart from manual trigger, not sure how to do that. |
Thanks for the fix @andyfaff! Given the major reduction in free resources available per 1 Sep (see https://cirrus-ci.org/blog/2023/07/17/limiting-free-usage-of-cirrus-ci/), I think we have a lot more work to do here unfortunately (and may consider buying some credits). Regarding the label-based trigger, I think there are two things wrong with it:
Given the above and that our resource usage at the current rate (see screenshot below) is completely unsustainable and would run at ~$2,500/month (or ~$1,400 after the upcoming price reductions also announced in the blog post linked above) if we'd have to pay for it from 1 Sep, I'd much prefer to get rid of label-based triggering completely. Manual wheel build triggers should be rare and reserved to maintainers who know what they are doing and are able to push an empty commit with the correct commands in the commit message. CPU usage is also bad on I'll note that on jobs with 2 CPUs, using Example log from a recent
Here is the full list of jobs and runtimes for a single run: That's a total of ~222 CPU minutes for wheel builds per run, divided in
So each wheel build costs about $0.80 each time it's triggered - this is a lot. We also have issues with some tests in the full test suite that need fixing (e.g., the slow typing tests shouldn't be run by default, they're the same on all platforms and take well over a minute). But most importantly, we should not be triggering wheel builds so much, they're only very rarely useful. |
As you can see in gh-24289, that PR - which only tweaked a code comment in a |
And then after a merge to main, it's running yet again: https://cirrus-ci.com/build/5871472732798976. |
This is the relevant code in # Obtain commit message for the event. Unfortunately CIRRUS_CHANGE_MESSAGE
# only contains the actual commit message on a non-PR trigger event.
# For a PR event it contains the PR title and description.
SHA = env.get("CIRRUS_CHANGE_IN_REPO")
url = "https://api.github.com/repos/numpy/numpy/git/commits/" + SHA
dct = http.get(url).json()
# if "[wheel build]" in dct["message"]:
# return fs.read("ci/cirrus_wheels.yml")
if "[skip cirrus]" in dct["message"] or "[skip ci]" in dct["message"]:
return []
# add extra jobs to the cirrus run by += adding to config
config = fs.read("tools/ci/cirrus_wheels.yml")
config += fs.read("tools/ci/cirrus_macosx_arm64.yml")
return config I don't see any label-based triggers, also not in |
I'll try to fix some of the test suite invocation and runtime issues. EDIT: see gh-24291 |
[skip cirrus]
logic broken.[skip cirrus]
logic broken and job runtime issues on Cirrus CI
I'm currently experimenting with ccache for scipy builds (which use meson). Would the numpy macosx_arm64 benefit from this? |
I think so - not by much though, given that the whole build is less than a minute and ~10 seconds of that is the configure stage. So if ccache helps by a factor of ~2x, it may save 20 sec or so. |
Is there a way to tie the cirrus CI builds into the successful run of the smoke test from github actions? |
@mattip , I'm not sure. It might be possible to have manual triggering if desired, https://cirrus-ci.org/guide/writing-tasks/#manual-tasks |
I think this can probably be closed now |
The skip/run logic is fixed (thanks!), but gh-24291 still needs finishing and then we need to deal with Cirrus CI credits. So let me re-title this issue rather than close it. |
[skip cirrus]
logic broken and job runtime issues on Cirrus CI
Current state after 12 days in August - this is looking pretty good, ~3x over the free limit: We haven't done many wheel builds though in August, and we do need those soon for the 1.26.x releases. Finishing up gh-24291 should be useful there. And then we'll probably end up with a O($150/month) bill that we can figure out if we're happy with and if so, the logistics of paying it. |
We're 19 credits away from an outage, so at this rate another 5 days or so. I'll have a look at buying some credits or wiring up a credit card tomorrow. |
See docs at https://cirrus-ci.org/pricing/#compute-credits Starting with collaborators only, because those are the only ones who should trigger wheel builds, and also author the vast majority of PRs where architecture-specific CI is actually useful. We can always set it to an unconditional "true" later on. xref numpygh-24280 [skip actions] [skip azp] [skip circle]
Cirrus upped the free credits from 40 to 50, and we're at 41 now - so no problems so far. I've bought a bunch of credits and opened gh-24695 to enable using them. |
For the record, the NumPy Steering Council signed off on my proposal to spend credits - up to max $200/month. My goal would be to stay below $100/month, and that seems to be feasible. And the invoicing and consumption reporting seems reasonably smooth, so all good so far. |
There are a couple more things to try, manual triggering of the Mac run, or a Cron job e.g. every couple of days. |
We used $45 in the first 28 days. I just added $99, so we're good for quite a while now. The consumption is close to what I estimated before. |
We're at $0 now. There's a couple of issues:
I have to follow up with them, but in the meantime CI jobs may stop running. |
I worst comes to worst, I also have a credit card 😉 |
I've added $98 today (20 Dec '23), they're accepting my credit card again. Don't know what happened, assuming some validation issue. I still need to follow up with Cirrus CI about issues with mixing NumPy/SciPy credits and getting better invoices. |
In anticipation of lots of wheel builds in the run-up to 2.0.0rc1 I added $121 more credits - we're at 164 credits as of today (7 Jan 2024). Should be enough until sometime in Feb. Next up: testing the reimbursement process. |
Bought another $97 in credits (note: slightly different amount each time is to ensure the invoices are easier to tell apart). With a number of the macOS arm64 jobs now migrated to GHA, this should hopefully last us a little while. |
Just a note on support quality: I noticed that a single job ran for ~17 minutes longer yesterday, and hence consumed more credits. I emailed Cirrus support; they were already on it, and fixed the issue with a comprehensive explanation within an hour or so, with credits returned and all previous jobs in the last month also audited. Impressive. |
@rgommers This is an example that supports the argument some economists make for leaving spending decisions to the individual :) |
This is getting kinda annoying:
[skip cirrus]
is broken and the wheel build and other CI jobs are running way too often. It was probably broken by the addition of other logic like always triggering on PRs with build-related label.The text was updated successfully, but these errors were encountered: