New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate merging of snapshot/timestamp PR #553
Comments
Agreed, sounds like good improvement! |
We have issues for the failed jobs, but not for prober issues. The one thing I don't have is visibility on when root probers failed. If that happens, I'd be happy to do that. But I don't know when on-callers are getting pinged when something is going wrong in staging. I could add in a prober that runs at less granularity though and reports issues, in which case then I'd be OK doing automatic merges. I'm not totally sure how we can practically achieve it -- maybe give write perms to the job and commit instead of create a PR. Less visibility, but so be it. |
Oncall gets paged 9am-5pm PST for probers, and if a prober fails over night, it will page at 9am the next day. There's an open ticket in the infrastructure repo to create an issue automatically every time a page occurs. I'm sure we could also set it up to create an issue in this repo when just the root prober fails. Does that sound good as the blocker for automation?
I've seen Trillian do this with serverless monitors on GH. Create a PR, and have another job that watches for PRs and auto approves and merges them. |
OK, mentioned offline -- The presubmits do the job of testing cosign against the new repo which is the main thing I want to test. I like the solution of monitoring a PR. Let me see if I can get that in. |
Description
Previously we didn't automate this due to lack of e2e tests and a lack of a staging environment. Now that we have good testing, probes, a preprod environment, and issue filing for failed automation, I'd like to revisit this. Merging the snapshot/timestamp PR has been a source for issues in the past, especially over holidays. Since we have a long break coming up, I think we should prioritize this automation.
cc @asraa @kommendorkapten for thoughts
The text was updated successfully, but these errors were encountered: