Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a timeout to every CI step to halt hung builds #906

Merged
merged 3 commits into from Feb 21, 2020

Conversation

huonw
Copy link
Member

@huonw huonw commented Feb 20, 2020

If a step in build hangs or takes an unusually long time, previously CI would let it continue, occupying machines forever. In lieu of a global timeout (buildkite/feedback#170, https://forum.buildkite.community/t/pipeline-timeouts/722), we can manually apply a timeout to every step, as a last resort to catch slow/hung builds. This uses the timeout_in_minutes optional attribute:

The number of minutes a job created from this step is allowed to run. If the job does not finish within this limit, it will be automatically canceled and the build will fail.

Our steps currently range from 30 seconds to 8 minutes, so 30 minutes should be a safe "something serious is wrong" time-out.

See: #905

@codeclimate
Copy link

codeclimate bot commented Feb 20, 2020

Code Climate has analyzed commit f33239b and detected 0 issues on this pull request.

View more on Code Climate.

@huonw
Copy link
Member Author

huonw commented Feb 20, 2020

Example build with timeouts reduced to 2 minutes as a test: https://buildkite.com/stellar/stellargraph-public/builds/1596

image

image

image

@huonw huonw marked this pull request as ready for review February 20, 2020 20:47
Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - 30 minutes seems like a safe timeout to apply to all the steps 👍

@stellar-graph-bot
Copy link

Codecov Report

Merging #906 into develop will decrease coverage by 0.4%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #906     +/-   ##
=========================================
- Coverage     85.3%   84.9%   -0.4%     
=========================================
  Files           51      51             
  Lines         5189    5026    -163     
=========================================
- Hits          4427    4266    -161     
+ Misses         762     760      -2     
Impacted Files Coverage Δ
stellargraph/core/graph.py 98.5% <0.0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 928f80b...f33239b. Read the comment docs.

@huonw huonw merged commit 913cf97 into develop Feb 21, 2020
@huonw huonw deleted the bugfix/905-build-timeout branch February 21, 2020 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants