Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jenkins: Get nightly to two sigma reliability #2878

Closed
oskarth opened this issue Jan 3, 2018 · 11 comments
Closed

Jenkins: Get nightly to two sigma reliability #2878

oskarth opened this issue Jan 3, 2018 · 11 comments

Comments

@oskarth
Copy link
Contributor

oskarth commented Jan 3, 2018

User Story

As a developer, I want nightly to work so I can focus on my work.

As a tester, I want nightly to work so I can focus on my work.

Description

Type: Feature/Devops (and tooling)

Summary: Get nightly build to 95% (two sigma) reliability over last some reasonable time frame (30d / 100 builds).

Expected behavior

95% build success of nightly build over some reasonable time frame.

Additionally, visibility into this with something like https://wiki.jenkins.io/display/JENKINS/build-metrics-plugin and a simple weekly report to people in Slack/Github Teams.

UPDATE: Additionally: https://github.com/status-im/status-react/commits/develop CI/checks feedback shoul be green. Not sure why they often aren't right now, maybe WIP integration?

Actual behavior

Over last 154 builds (full history): 64/154 (42%)
Over last 100 builds: 60/100 (60%)
Over last 50 builds: 21/50 (42%)

Source: https://jenkins.status.im/job/status-react/job/nightly/buildTimeTrend (and copy-paste / text edit / grep "Success")

Yes, sometimes build fails multiple times in a row and sigma over a specific time period would be a better metric. But this can serve as a proxy.

@oskarth oskarth added the devops label Jan 3, 2018
@oskarth
Copy link
Contributor Author

oskarth commented Jan 3, 2018

Note that this isn't 100% a devops task but also in terms of devs not merging faulty PRs. The general goal is the same and is part of ensuring reliability so core contributors can focus on work, not things breaking all the time.

@oskarth
Copy link
Contributor Author

oskarth commented Jan 4, 2018

@v2nek thoughts on this one? want to own it and figure out what we need to do to get there? Might include processes for merging / simplifying build steps etc. Happy to assist too.

@v2nek
Copy link
Contributor

v2nek commented Jan 4, 2018

This looks stable since most of pipeline configs were migrated to jenkinsfile from git, you and @rasom did most of the required changes, and now it depends mostly on devs to keep it green as much as possible.

We can improve this by building all PR's after change of destination branch, but now it creates queue of builds for few hours. We can add more macos machines, or reduce number of open PR's, or just wait for this builds.

Last step is to switch nightly to jenkinsfile from scm. I dont expect a lot of problems there, as we met most of them during migration of parametrized and gh folder.

@oskarth
Copy link
Contributor Author

oskarth commented Jan 4, 2018

and now it depends mostly on devs to keep it green as much as possible.

While I agree it is up to each dev, it is also up to someone to own the problem and do whatever is necessary to make sure this happens, for example by installing processes/automation/telling people to do X/not Y. I was hoping this person could be you! :)

@oskarth
Copy link
Contributor Author

oskarth commented Jan 4, 2018

Putting this as high-priority because it is unbelievable how much time we are wasting on Jenkins/QA merge cycle being blocked last few months. We have to fix this.

@oskarth
Copy link
Contributor Author

oskarth commented Jan 5, 2018

Additionally: https://github.com/status-im/status-react/commits/develop CI/checks feedback shoul be green. Not sure why they often aren't right now, maybe WIP integration?

Not sure if this is a separate issue or not.

@oskarth
Copy link
Contributor Author

oskarth commented Jun 19, 2018

@jakubgs Re OKR scoring

Also to make this 100% without relying on sloppy devs: https://graydon.livejournal.com/186550.html

A technical note about a program I wrote last year called bors and some of its ancestry. This is excruciatingly boring unless you happen to build software for a living, in which case I recommend taking a minute to read it. Thirteen years ago I worked at Cygnus/RedHat on a project with a delightful…

@jakubgs
Copy link
Member

jakubgs commented Jun 22, 2018

I think a lot of those build failures are due to issues with things like Artifactory, or uploading to Play Store and iTrunes(I've seen plenty), but there's definitely a benefit in using autoomations like bors to avoid breaking the mainline branch.

@oskarth
Copy link
Contributor Author

oskarth commented Jun 25, 2018

Update with same methodology last 60 builds:

> wc -l two-sigma-end-of-june
      60 two-sigma-end-of-june
> grep Success two-sigma-end-june  | wc -l
      20

So 20/60 (33%), i.e. still bad.

@status-github-bot
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@status-github-bot
Copy link

This issue has been automatically closed. Please re-open if this issue is important to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants