feat(proposals): support scheduled builds #772

jbrockopp · 2023-03-16T15:05:12Z

xref: #538

This proposes adding support to schedule builds on a periodic cadence for a repo in Vela.

cognifloyd · 2023-03-17T00:28:01Z

The implementation for this needs some kind of lock or synchronization method in the server so that if there are multiple servers (HA: Active-Active), they do not kick off the same job at the same time. Either a single server needs to own schedule management, or the jobs need to be partitioned somehow.

jbrockopp · 2023-03-17T13:19:43Z

@cognifloyd the plan is to leverage the scheduled_at field within the database for this.

Similar to how we check the DB for things like build.number and repo.counter before taking further actions.

Each time a build is pushed to the queue from a schedule, the server will update the scheduled_at field.

For each server being used, it'll compare the entry for the schedule against the scheduled_at field.

To provide an example scenario, let's say I have two servers (X & Y) with a repo (foo) that has a schedule (bar).

Let's assume all defaults are being used so the entry for bar is 1hr and the servers scan the table every 30m.

Server X scans the table, see the schedule for foo and finds the scheduled_at field is empty.

So, X pushes a build to the queue and updates the scheduled_at field for bar.

Server Y scans the table, sees the schedule for foo and finds the scheduled_at field is NOT empty.

So, Y crunches the data and notices a build for the schedule was just ran 10m ago.

Since 10m < 1hr, Y will skip doing anything else with foo and move on to the next schedule.

cognifloyd · 2023-03-17T16:04:09Z

proposals/2023/03-16_scheduled-builds.md

+
+## Questions
+
+1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`?


reuse for now, and add another field later if that causes any problems.

cognifloyd · 2023-03-17T16:05:02Z

proposals/2023/03-16_scheduled-builds.md

+
+1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`?
+
+2. Should we prevent running multiple builds for the same schedule for a repo?


Possibly. This might need a new YAML pipeline key so the pipeline user to declare whether or not overlapping workflows is expected or safe.

cognifloyd · 2023-03-17T16:14:09Z

@cognifloyd the plan is to leverage the scheduled_at field within the database for this.

Similar to how we check the DB for things like build.number and repo.counter before taking further actions.

Each time a build is pushed to the queue from a schedule, the server will update the scheduled_at field.

For each server being used, it'll compare the entry for the schedule against the scheduled_at field.

To provide an example scenario, let's say I have two servers (X & Y) with a repo (foo) that has a schedule (bar).

Let's assume all defaults are being used so the entry for bar is 1hr and the servers scan the table every 30m.

Server X scans the table, see the schedule for foo and finds the scheduled_at field is empty.

So, X pushes a build to the queue and updates the scheduled_at field for bar.

Server Y scans the table, sees the schedule for foo and finds the scheduled_at field is NOT empty.

So, Y crunches the data and notices a build for the schedule was just ran 10m ago.

Since 10m < 1hr, Y will skip doing anything else with foo and move on to the next schedule.

What if X and Y end up scanning for and starting the build at the same time?

Hmm. I guess this could work if both queueing and updating scheduled_at happen within a transaction that aborts if the schedule row has changed:

So, X pushes a build to the queue and updates the scheduled_at field for bar.

In which case, the db transaction would serve as the locking mechanism. Is that what you're suggesting?

chrispdriscoll · 2023-03-17T21:33:59Z

proposals/2023/03-16_scheduled-builds.md

+
+The thread will be setup to scan the table every half of the minimum supported cadence i.e. `1hr` / 2 = `30m`
+
+The server will use [jitter](https://en.wikipedia.org/wiki/Jitter) to minimize the amount of builds created at the same time for all repos.


When there's an outage (i.e. upgrades, disasters, etc), how would you anticipate Scheduled Builds working? Does it skip any "schedules" it missed or would it attempt to catch up? If we play "catch up", that may spawn a lot of builds immediately following an outage. Alternatively, if we skip the "schedules" how much scope creep do we have to add an ondemand re-run of the pipeline without a PR?

When there's an outage (i.e. upgrades, disasters, etc), how would you anticipate Scheduled Builds working?

I think the idea is to adopt the same behavior we have today for other features/functionality of the server.

i.e. the server doesn't push builds to the queue for webhooks delivered while it was unavailable

Regardless of the reason why the server(s) were down, no attempts will be made to catch up.

Once the server(s) comes back online, they'll start processing the schedules as expected.

@jbrockopp How about an on-demand re-run? We've seen this request in the past and, depending on how we approach Scheduled Builds, this could be implemented very similarily. Do you think a "Run Now" option could fit into this proposal?

Yeah, I do think there is value in providing such a feature and we're hoping to leverage deployments for this:

https://github.com/go-vela/community/blob/feature/proposal/scheduled_builds/proposals/2023/03-16_scheduled-builds.md#yaml

jbrockopp · 2023-03-20T12:53:00Z

What if X and Y end up scanning for and starting the build at the same time?

@cognifloyd this shouldn't be possible since we plan to use jitter with some level of randomness.

In an ideal scenario, this means two or more servers can't push builds to the queue for the same schedule.

In which case, the db transaction would serve as the locking mechanism. Is that what you're suggesting?

Yes, the plan is to adopt the same logic we have in other parts of the codebase that depend on data from the DB.

i.e. build number, hook number, repo counter etc.

plyr4 · 2023-03-20T19:11:52Z

overall looks great

+1 to wondering how we'll do the database transaction locking (hoping to use that same logic in future functionality, so this will be good inspiration)

plyr4 · 2023-03-20T19:12:31Z

have you had any thoughts on if we should do clean up on stale schedules? perhaps a "scheduled_until" field that defaults to 6 months, that gets renewed with some sort of engagement/interaction from the user?
i fear the "setup-and-forget" scenario where users set up schedules for testing and never clean them up.

plyr4 · 2023-03-20T19:16:07Z

any thoughts on having a limit to the amount of schedules a repo can have? or is the plan to have the repo-build-limit be the gatekeeper for the excessive build traffic this might cause?
i think overall our build rate limiting could use some work.. so maybe its scope creep to think about it in this feature 🤷
and i suppose it should be up to the pipeline designer to be wary of hitting the rate limit

jbrockopp · 2023-03-21T14:50:28Z

have you had any thoughts on if we should do clean up on stale schedules? perhaps a "scheduled_until" field that defaults to 6 months, that gets renewed with some sort of engagement/interaction from the user?
i fear the "setup-and-forget" scenario where users set up schedules for testing and never clean them up.

@plyr4 understand the concern and approach.. what if we based off the repo.active field?

i.e. if a repo's active field is false, then we don't run any schedules for that repo

My concern with scheduled_until is the extra complexity and logistics required to deliver on that.

For example, do we expect Vela to send reminders to users to notify them their schedule will stop running?

If so, then we'll need to build out that notification component within the system.

Also, do we only notify the repo owner? Or do we have to notify all repo admins?

I can already imagine users complaining that their scheduled builds didn't run and they weren't notified about it.

any thoughts on having a limit to the amount of schedules a repo can have? or is the plan to have the repo-build-limit be the gatekeeper for the excessive build traffic this might cause?
i think overall our build rate limiting could use some work.. so maybe its scope creep to think about it in this feature 🤷
and i suppose it should be up to the pipeline designer to be wary of hitting the rate limit

We have no opposition for setting a limit to the number of schedules a repo can have.

What did you have in mind as a limit for the number of schedules per repo?

To answer your question, the plan was to leverage the build limits per repo to prevent excessive build traffic.

The value here is that build limit is configurable per Vela installation granting admins some level of control.

Ideally that combined with configurable minimum schedule cadence would grant enough control for admins.

plyr4 · 2023-03-29T14:28:30Z

@jbrockopp awesome thank you for typing that up

i.e. if a repo's active field is false, then we don't run any schedules for that repo

good idea

notification component

noo thank you 😅

We have no opposition for setting a limit to the number of schedules a repo can have.

👍

plyr4 · 2023-03-29T14:38:07Z

@jbrockopp ive still got one concern with your explanation on not needing a database locking mechanism

Yes, the plan is to adopt the same logic we have in other parts of the codebase that depend on data from the DB.

we already see a number of issues with concurrency using database constraints (see #213). its arguably easy to reproduce it. maybe im not following, but would not having a database locking mechanism just compound on the problem?
i know you mentioned jitter which sounds like it would help in most scenarios but is it still possible (albeit unlikely) that two servers could sync up and pick up the same schedule?

could you clarify a bit on where the jitter would be implemented? the server scanning interval, the actual cron scheduling time, both?

plyr4 · 2023-03-29T14:40:06Z

also, any thoughts on making this feature-flaggable?
it might work on a per-repo basis, gated somehow by the platform admins. (speaking for some ideas i heard, totally stolen)

plyr4 · 2023-03-29T14:42:35Z

another idea i heard some mention, thoughts on having a field to control if the schedule is "active" or "enabled"
enabled would be easier to discern from the repo.active field but maybe active would be more consistent 🤷

jbrockopp · 2023-04-04T16:11:25Z

@plyr4

we already see a number of issues with concurrency using database constraints (see #213).

Are you saying you're actively facing issues with concurrency today? 😮

AFAIK, the issue for #213 is only present for questionable use-cases i.e. spamming builds via restart or webhooks

Ever since go-vela/server#191 was completed, I haven't seen or heard customers complaining

To be clear, the underlying issue still exists and you'll see that reflected in the server logs but customers don't 😅

but would not having a database locking mechanism just compound on the problem?

I'm not entirely sure but that kinda seems out of scope for the initial implementation?

Based off my limited exposure to database locking, I believe it should be a separate proposal 😅

Database locks come with their own set of caveats/tradeoffs and can be harmful to a system i.e. "excessive blocking"

Happy to write up a feature issue for this once the initial code is implemented 😃

i know you mentioned jitter which sounds like it would help in most scenarios but is it still possible (albeit unlikely) that two servers could sync up and pick up the same schedule?

In an ideal world, this shouldn't be possible since we'll use jitter with a variation of randomness 😄

If implemented correctly, the chances of two+ servers scanning at the same time should be a fraction of a percent

could you clarify a bit on where the jitter would be implemented? the server scanning interval, the actual cron scheduling time, both?

The jitter will be involved on both sides of the coin to prevent duplicate workloads when using multiple servers 👍

also, any thoughts on making this feature-flaggable?
it might work on a per-repo basis, gated somehow by the platform admins. (speaking for some ideas i heard, totally stolen)

No objections to that approach but that kinda seems out of scope for the initial implementation?

At least, we don't have a need for such functionality for the first attempt at rolling this out

Happy to write up an enhancement issue for this once the initial code is implemented 😃

another idea i heard some mention, thoughts on having a field to control if the schedule is "active" or "enabled"
enabled would be easier to discern from the repo.active field but maybe active would be more consistent

Could you elaborate on the value for this?

I assume the idea is a user want's to stop a schedule from running anymore.

There isn't a ton of information required to create a schedule so why wouldn't they simply remove the schedule?

wass3rw3rk · 2023-04-05T15:12:46Z

we already see a number of issues with concurrency using database constraints (see #213).

Are you saying you're actively facing issues with concurrency today? 😮

Yes. Affected folks have put workarounds in place, so it's not pressing, but certainly not ideal.

also, any thoughts on making this feature-flaggable?
it might work on a per-repo basis, gated somehow by the platform admins. (speaking for some ideas i heard, totally stolen)

No objections to that approach but that kinda seems out of scope for the initial implementation?

The hope is to allow for the flexibility of a controlled rollout, similar to the allow repos flag (though that is not changeable post server launch at the moment).

another idea i heard some mention, thoughts on having a field to control if the schedule is "active" or "enabled"
enabled would be easier to discern from the repo.active field but maybe active would be more consistent

Could you elaborate on the value for this?

The idea is to have something in place that users and platform admins can leverage to pause schedules if needed without having to recall the schedule that may have been in place previously. A toggle would be pretty user friendly.

jbrockopp · 2023-04-05T22:00:21Z

@wass3rw3rk

Yes. Affected folks have put workarounds in place, so it's not pressing, but certainly not ideal.

Agreed 😃 certainly not ideal

The hope is to allow for the flexibility of a controlled rollout, similar to the allow repos flag (though that is not changeable post server launch at the moment).

Makes sense and sounds like a great enhancement once that capability is available 👍

Until then, sounds like we want a VELA_SCHEDULE_ALLOWLIST that replicates VELA_REPO_ALLOWLIST?

https://go-vela.github.io/docs/installation/server/reference/#vela_repo_allowlist

The idea is to have something in place that users and platform admins can leverage to pause schedules if needed without having to recall the schedule that may have been in place previously. A toggle would be pretty user friendly.

Makes sense so are we good with a schedule.active field that replicates the behavior of repo.active?

If so, any other things we need to account for besides the below?

active field for all schedules attached to a repo mirror its active field
- disabling a repo will also disable all schedules for that repo
- enabling a repo will also enable all schedules for that repo (unsure if we actually want this part)
servers scanning the schedules table will only check for schedules where active == true
API/CLI/UI need to support modifying the active field for a schedule

EDIT: Thinking about this more, do we want to enable all schedules for a repo if it has been reenabled?

cognifloyd · 2023-04-21T20:40:39Z

also, any thoughts on making this feature-flaggable?
it might work on a per-repo basis, gated somehow by the platform admins. (speaking for some ideas i heard, totally >> stolen)

No objections to that approach but that kinda seems out of scope for the initial implementation?

I would like a CLI flag on the server to completely disable support for any scheduled builds in Vela. At least for now, we're trying to push all scheduled runs of things into a different service so that there is a central view of all scheduled jobs (and no more ad-hoc cron on random servers). So, when this feature is implemented, I need a way to just turn it off and prevent people from using it.

jbrockopp · 2023-05-01T16:43:47Z

@cognifloyd sounds good 👍

The ability to disable this feature will be available via the VELA_SCHEDULE_ALLOWLIST environment variable

…a/community into feature/proposal/scheduled_builds

KellyMerrick

lgtm

feat(proposals): start scheduled builds

c638484

jbrockopp requested a review from a team as a code owner March 16, 2023 15:05

cognifloyd reviewed Mar 17, 2023

View reviewed changes

chrispdriscoll reviewed Mar 17, 2023

View reviewed changes

jbrockopp added feature Indicates a new feature area/api Indicates a change to the API area/cli Indicates a change to the CLI area/server Indicates a change to the server area/ui Indicates a change to the UI labels Mar 21, 2023

JordanSussman previously approved these changes Mar 22, 2023

View reviewed changes

Merge branch 'main' into feature/proposal/scheduled_builds

5f55249

jbrockopp dismissed JordanSussman’s stale review via 5f55249 April 4, 2023 15:42

jbrockopp mentioned this pull request Apr 28, 2023

feat(types): add support for schedules go-vela/server#833

Closed

jbrockopp mentioned this pull request Apr 28, 2023

feat(database): add support for schedules go-vela/server#834

Merged

jbrockopp changed the title ~~feat(proposals): start scheduled builds~~ feat(proposals): support scheduled builds May 3, 2023

JayCeeJr and others added 2 commits May 3, 2023 09:29

Update 03-16_scheduled-builds.md

8a9d6f8

Merge branch 'feature/proposal/scheduled_builds' of github.com:go-vel…

99b7f0e

…a/community into feature/proposal/scheduled_builds

JordanSussman mentioned this pull request May 3, 2023

feat(api): add support for schedules go-vela/server#836

Merged

chore: misc updates

9642624

This was referenced May 4, 2023

feat: add support for schedules go-vela/sdk-go#233

Merged

feat: add support for schedules go-vela/cli#446

Merged

This was referenced May 11, 2023

feat: add support for schedules go-vela/types#289

Merged

enhance(repo)!: add AllowSchedule field go-vela/types#291

Merged

feat: add support for processing schedules go-vela/server#846

Merged

This was referenced May 22, 2023

enhance(repo)!: add AllowSchedule field v2 go-vela/types#294

Closed

feat: add support for schedules go-vela/worker#479

Merged

jbrockopp mentioned this pull request Jun 8, 2023

feat: add information for schedules go-vela/docs#366

Merged

jbrockopp mentioned this pull request Jun 26, 2023

fix(schedules): criteria for triggering a build go-vela/server#893

Merged

plyr4 previously approved these changes Jul 18, 2023

View reviewed changes

wass3r previously approved these changes Jul 18, 2023

View reviewed changes

Merge branch 'main' into feature/proposal/scheduled_builds

fb6ad9a

plyr4 dismissed stale reviews from wass3r and themself via fb6ad9a October 11, 2023 14:25

plyr4 approved these changes Oct 11, 2023

View reviewed changes

ecrupper approved these changes Oct 11, 2023

View reviewed changes

KellyMerrick approved these changes Oct 11, 2023

View reviewed changes

ecrupper merged commit 9408b4b into main Oct 11, 2023

ecrupper deleted the feature/proposal/scheduled_builds branch October 11, 2023 14:25

chrispdriscoll mentioned this pull request Dec 6, 2023

docs(proposal): add proposal for monorepo support #887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proposals): support scheduled builds #772

feat(proposals): support scheduled builds #772

jbrockopp commented Mar 16, 2023 •

edited

Loading

cognifloyd commented Mar 17, 2023

jbrockopp commented Mar 17, 2023 •

edited

Loading

cognifloyd Mar 17, 2023

cognifloyd Mar 17, 2023

cognifloyd commented Mar 17, 2023

chrispdriscoll Mar 17, 2023

jbrockopp Mar 20, 2023

chrispdriscoll Mar 29, 2023

jbrockopp Apr 3, 2023

jbrockopp commented Mar 20, 2023

plyr4 commented Mar 20, 2023

plyr4 commented Mar 20, 2023

plyr4 commented Mar 20, 2023 •

edited

Loading

jbrockopp commented Mar 21, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023

jbrockopp commented Apr 4, 2023

wass3rw3rk commented Apr 5, 2023

jbrockopp commented Apr 5, 2023 •

edited

Loading

cognifloyd commented Apr 21, 2023 •

edited

Loading

jbrockopp commented May 1, 2023

KellyMerrick left a comment


		## Questions

		1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`?


		1. Should we create a new field (i.e. `schedule`/`schedule_target`) or reuse existing `target` field in the `ruleset`?

		2. Should we prevent running multiple builds for the same schedule for a repo?


		The thread will be setup to scan the table every half of the minimum supported cadence i.e. `1hr` / 2 = `30m`

		The server will use [jitter](https://en.wikipedia.org/wiki/Jitter) to minimize the amount of builds created at the same time for all repos.

feat(proposals): support scheduled builds #772

feat(proposals): support scheduled builds #772

Conversation

jbrockopp commented Mar 16, 2023 • edited Loading

cognifloyd commented Mar 17, 2023

jbrockopp commented Mar 17, 2023 • edited Loading

cognifloyd Mar 17, 2023

Choose a reason for hiding this comment

cognifloyd Mar 17, 2023

Choose a reason for hiding this comment

cognifloyd commented Mar 17, 2023

chrispdriscoll Mar 17, 2023

Choose a reason for hiding this comment

jbrockopp Mar 20, 2023

Choose a reason for hiding this comment

chrispdriscoll Mar 29, 2023

Choose a reason for hiding this comment

jbrockopp Apr 3, 2023

Choose a reason for hiding this comment

jbrockopp commented Mar 20, 2023

plyr4 commented Mar 20, 2023

plyr4 commented Mar 20, 2023

plyr4 commented Mar 20, 2023 • edited Loading

jbrockopp commented Mar 21, 2023 • edited Loading

plyr4 commented Mar 29, 2023 • edited Loading

plyr4 commented Mar 29, 2023 • edited Loading

plyr4 commented Mar 29, 2023 • edited Loading

plyr4 commented Mar 29, 2023

jbrockopp commented Apr 4, 2023

wass3rw3rk commented Apr 5, 2023

jbrockopp commented Apr 5, 2023 • edited Loading

cognifloyd commented Apr 21, 2023 • edited Loading

jbrockopp commented May 1, 2023

KellyMerrick left a comment

Choose a reason for hiding this comment

jbrockopp commented Mar 16, 2023 •

edited

Loading

jbrockopp commented Mar 17, 2023 •

edited

Loading

plyr4 commented Mar 20, 2023 •

edited

Loading

jbrockopp commented Mar 21, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

plyr4 commented Mar 29, 2023 •

edited

Loading

jbrockopp commented Apr 5, 2023 •

edited

Loading

cognifloyd commented Apr 21, 2023 •

edited

Loading