-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(proposals): support scheduled builds #772
Conversation
The implementation for this needs some kind of lock or synchronization method in the server so that if there are multiple servers (HA: Active-Active), they do not kick off the same job at the same time. Either a single server needs to own schedule management, or the jobs need to be partitioned somehow. |
@cognifloyd the plan is to leverage the Similar to how we check the DB for things like Each time a build is pushed to the queue from a schedule, the server will update the For each server being used, it'll compare the To provide an example scenario, let's say I have two servers ( Let's assume all defaults are being used so the entry for Server So, Server So, Since |
|
||
## Questions | ||
|
||
1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse for now, and add another field later if that causes any problems.
|
||
1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`? | ||
|
||
2. Should we prevent running multiple builds for the same schedule for a repo? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. This might need a new YAML pipeline key so the pipeline user to declare whether or not overlapping workflows is expected or safe.
What if X and Y end up scanning for and starting the build at the same time? Hmm. I guess this could work if both queueing and updating
In which case, the db transaction would serve as the locking mechanism. Is that what you're suggesting? |
|
||
The thread will be setup to scan the table every half of the minimum supported cadence i.e. `1hr` / 2 = `30m` | ||
|
||
The server will use [jitter](https://en.wikipedia.org/wiki/Jitter) to minimize the amount of builds created at the same time for all repos. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there's an outage (i.e. upgrades, disasters, etc), how would you anticipate Scheduled Builds working? Does it skip any "schedules" it missed or would it attempt to catch up? If we play "catch up", that may spawn a lot of builds immediately following an outage. Alternatively, if we skip the "schedules" how much scope creep do we have to add an ondemand re-run of the pipeline without a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there's an outage (i.e. upgrades, disasters, etc), how would you anticipate Scheduled Builds working?
I think the idea is to adopt the same behavior we have today for other features/functionality of the server.
i.e. the server doesn't push builds to the queue for webhooks delivered while it was unavailable
Regardless of the reason why the server(s) were down, no attempts will be made to catch up.
Once the server(s) comes back online, they'll start processing the schedules as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockopp How about an on-demand re-run? We've seen this request in the past and, depending on how we approach Scheduled Builds, this could be implemented very similarily. Do you think a "Run Now" option could fit into this proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I do think there is value in providing such a feature and we're hoping to leverage deployments
for this:
@cognifloyd this shouldn't be possible since we plan to use jitter with some level of randomness. In an ideal scenario, this means two or more servers can't push builds to the queue for the same schedule.
Yes, the plan is to adopt the same logic we have in other parts of the codebase that depend on data from the DB. i.e. build number, hook number, repo counter etc. |
overall looks great +1 to wondering how we'll do the database transaction locking (hoping to use that same logic in future functionality, so this will be good inspiration) |
have you had any thoughts on if we should do clean up on stale schedules? perhaps a "scheduled_until" field that defaults to 6 months, that gets renewed with some sort of engagement/interaction from the user? |
any thoughts on having a limit to the amount of schedules a repo can have? or is the plan to have the repo-build-limit be the gatekeeper for the excessive build traffic this might cause? |
@plyr4 understand the concern and approach.. what if we based off the i.e. if a repo's My concern with For example, do we expect Vela to send reminders to users to notify them their schedule will stop running? If so, then we'll need to build out that notification component within the system. Also, do we only notify the repo owner? Or do we have to notify all repo admins? I can already imagine users complaining that their scheduled builds didn't run and they weren't notified about it.
We have no opposition for setting a limit to the number of schedules a repo can have. What did you have in mind as a limit for the number of schedules per repo? To answer your question, the plan was to leverage the build limits per repo to prevent excessive build traffic. The value here is that build limit is configurable per Vela installation granting admins some level of control. Ideally that combined with configurable minimum schedule cadence would grant enough control for admins. |
@jbrockopp awesome thank you for typing that up
good idea
noo thank you 😅
👍 |
@jbrockopp ive still got one concern with your explanation on not needing a database locking mechanism
we already see a number of issues with concurrency using database constraints (see #213). its arguably easy to reproduce it. maybe im not following, but would not having a database locking mechanism just compound on the problem? could you clarify a bit on where the jitter would be implemented? the server scanning interval, the actual cron scheduling time, both? |
also, any thoughts on making this feature-flaggable? |
another idea i heard some mention, thoughts on having a field to control if the schedule is "active" or "enabled" |
Are you saying you're actively facing issues with concurrency today? 😮 AFAIK, the issue for #213 is only present for questionable use-cases i.e. spamming builds via restart or webhooks Ever since go-vela/server#191 was completed, I haven't seen or heard customers complaining To be clear, the underlying issue still exists and you'll see that reflected in the server logs but customers don't 😅
I'm not entirely sure but that kinda seems out of scope for the initial implementation? Based off my limited exposure to database locking, I believe it should be a separate proposal 😅 Database locks come with their own set of caveats/tradeoffs and can be harmful to a system i.e. "excessive blocking" Happy to write up a feature issue for this once the initial code is implemented 😃
In an ideal world, this shouldn't be possible since we'll use jitter with a variation of randomness 😄 If implemented correctly, the chances of two+ servers scanning at the same time should be a fraction of a percent
The jitter will be involved on both sides of the coin to prevent duplicate workloads when using multiple servers 👍
No objections to that approach but that kinda seems out of scope for the initial implementation? At least, we don't have a need for such functionality for the first attempt at rolling this out Happy to write up an enhancement issue for this once the initial code is implemented 😃
Could you elaborate on the value for this? I assume the idea is a user want's to stop a schedule from running anymore. There isn't a ton of information required to create a schedule so why wouldn't they simply remove the schedule? |
Yes. Affected folks have put workarounds in place, so it's not pressing, but certainly not ideal.
The hope is to allow for the flexibility of a controlled rollout, similar to the allow repos flag (though that is not changeable post server launch at the moment).
The idea is to have something in place that users and platform admins can leverage to pause schedules if needed without having to recall the schedule that may have been in place previously. A toggle would be pretty user friendly. |
Agreed 😃 certainly not ideal
Makes sense and sounds like a great enhancement once that capability is available 👍 Until then, sounds like we want a https://go-vela.github.io/docs/installation/server/reference/#vela_repo_allowlist
Makes sense so are we good with a If so, any other things we need to account for besides the below?
EDIT: Thinking about this more, do we want to enable all schedules for a repo if it has been reenabled? |
I would like a CLI flag on the server to completely disable support for any scheduled builds in Vela. At least for now, we're trying to push all scheduled runs of things into a different service so that there is a central view of all scheduled jobs (and no more ad-hoc cron on random servers). So, when this feature is implemented, I need a way to just turn it off and prevent people from using it. |
@cognifloyd sounds good 👍 The ability to disable this feature will be available via the |
…a/community into feature/proposal/scheduled_builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
xref: #538
This proposes adding support to schedule builds on a periodic cadence for a repo in Vela.