Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Linux: Build the default platform in one job #36627

Merged
merged 5 commits into from
Nov 5, 2023

Conversation

mkoeppe
Copy link
Member

@mkoeppe mkoeppe commented Nov 2, 2023

This job is the combination of the "default-pre" job and the first half of the "default" job (with make ptest removed). It takes about 4 hours and so it fits robustly in the 6h job limit. (Test run: https://github.com/mkoeppe/sage/actions/runs/6727624213/job/18285698639)

Together with #36616, which deliberately clogs the pipeline with ~50 "standard-pre" jobs, this gives the following behavior when a release tag is pushed:

  • "default" is scheduled
  • ~45 "standard-pre" jobs are scheduled (runtime 0.5 to 4 hours)
  • ~45 "minimal-pre" jobs (with max-parallel = 20) are scheduled (3 to 4 hours)
  • this is calibrated (and can be recalibrated later) so as to keep CI jobs from PRs waiting until the Docker image built by "default" is available.

During the clogging time, developers can rebase their PRs to the latest develop.

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation accordingly.

⌛ Dependencies

@mkoeppe mkoeppe self-assigned this Nov 2, 2023
Copy link

github-actions bot commented Nov 2, 2023

Documentation preview for this PR (built with commit c1d97ca; changes) is ready! 🎉

@kwankyu
Copy link
Collaborator

kwankyu commented Nov 2, 2023

I presume that "standard-" jobs are better to finish as fast as possible for early checking, but "minimal-" and others are not urgent. Then to give more chances for PR workflows to get in, shouldn't we set max-parallel for "minimal-" and all others small such as 10?

@mkoeppe
Copy link
Member Author

mkoeppe commented Nov 2, 2023

The "minimal-pre" jobs are not urgent; in fact, before this PR, they are run after the "standard" jobs.

In this PR, I am running the "minimal-pre" earlier precisely so that PR jobs are blocked until the new Docker image is ready (~4 hours). The "standard-pre" jobs by themselves are not enough to do this: We have 60 runners, but only 45 "standard-pre" jobs (runtime 0.5 to 4 hours).

@kwankyu
Copy link
Collaborator

kwankyu commented Nov 2, 2023

The "minimal-pre" jobs are not urgent; in fact, before this PR, they are run after the "standard" jobs.

In this PR, I am running the "minimal-pre" earlier precisely so that PR jobs are blocked until the new Docker image is ready (~4 hours). The "standard-pre" jobs by themselves are not enough to do this: We have 60 runners, but only 45 "standard-pre" jobs (runtime 0.5 to 4 hours).

Ah, I missed that you removed needs: [standard]. So you fill 60 - 45 = 15 runners with "minimal-pre" jobs. OK.

What I suggested seems independent from what you are doing here and Tobias' PR may be used later for that, if this PR is not sufficient to solve the problem we are tackling.

OK.

Copy link
Collaborator

@kwankyu kwankyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@mkoeppe
Copy link
Member Author

mkoeppe commented Nov 2, 2023

What I suggested seems independent from what you are doing here and Tobias' PR may be used later for that, if this PR is not sufficient to solve the problem we are tackling.

Yes, there are more knobs that we can adjust. We'll observe and iterate

@mkoeppe
Copy link
Member Author

mkoeppe commented Nov 2, 2023

Thanks for reviewing!

@tobiasdiez
Copy link
Contributor

tobiasdiez commented Nov 3, 2023

Combining the default jobs is a nice improvement.

Thinking more about this, I still don't get how the "we clog everything"-strategy is suppose to work. You are blocking 5 * 60 = 300 hours of cpu time from jobs outside the ci-linux. For this you get a reduction of 4-5h of cpu time per PR (1-1.5h x 3 jobs). So in order to get back your investment of cpu time, you would need 60 PR that get updated in the first 5 hours. Such a high number is very unrealistic.

What you are optimizing is the overall cpu time spent, where you get a reduction by about 5h x PRs updated in the first 5 hours. But what should be optimized (at least in my opinion) is the average time until a build & test workflow reports its results. I don't get happy by the fact that the build & test workflow for my PR only takes 4 hours instead of 5 of execution time, if I have to wait 20 hours until it gets started.

@vbraun vbraun merged commit 7354bdb into sagemath:develop Nov 5, 2023
20 of 21 checks passed
@mkoeppe mkoeppe added this to the sage-10.2 milestone Nov 5, 2023
@mkoeppe mkoeppe deleted the ci_linux_default_one_job branch November 5, 2023 17:47
@mkoeppe
Copy link
Member Author

mkoeppe commented Nov 5, 2023

Observations about how it is playing out in the 10.2.rc0 run: #36616 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants