pr-1567/derrickstolee/maintenance-random-minute-v2
tagged this
10 Aug 20:39
When we initially created background maintenance -- with its hourly, daily, and weekly schedules -- we considered the effects of all clients launching fetches to the server every hour on the hour. The worry of DDoSing server hosts was noted, but left as something we would consider for a future update. As background maintenance has gained more adoption over the past three years, our worries about DDoSing the big Git hosts has been unfounded. Those systems, especially those serving public repositories, are already resilient to thundering herds of much smaller scale. However, sometimes organizations spin up specific custom server infrastructure either in addition to or on top of their Git host. Some of these technologies are built for a different range of scale, and can hit concurrency limits sooner. Organizations with such custom infrastructures are more likely to recommend tools like scalar which furthers their adoption of background maintenance. This series attempts to help by spreading out background maintenance to a random minute of the hour. This minute is selected during git maintenance start, and the same minute applies to each of the three schedules. This isn't a full solution to this problem, as the custom infrastructure needs to be resilient to bursts of activity, but at least this will help somewhat. Each of the integrated schedulers needs a different way of integrating the random minute. The most problematic is systemd, since our integration had a clever use of templates to write one schedule that inserted the hourly, daily, and weekly schedules as a string into the template. This needs some refactoring before the custom minute could be inserted. For the most part, each scheduler's integration is relatively simple. That is, until we get to the systemd integration. That integration made use of a clever templating technique that is no longer possible when making this adjustment. Patches 5-7 involve systemd, though patch 5 is just a move of code (without edits) to make the diff in patch 6 somewhat simpler (it's still complicated due to templating changes). Patch 7 fixes an issue where the systemd schedules overlap. Patch 8 fixes an issue where config changes persist even if the scheduler fails to initialize. Thanks for noticing, Philip! Updates in version 2 ==================== * get_random_minute() now uses a new helper, git_rand(), which is itself a wrapper around csprng_bytes() for easier use. * get_random_minute() also had an error in its use of getenv() which is now fixed. * Patch 6 has a lot of new changes, including: * Keeping the .service template. * Deleting the old .timer template when safe to do so. * Patch 7 fixes the schedule overlap in systemd. * Patch 8 fixes the issue where 'mainteancne.auto=false' would persist even if the scheduler failed to initialize. Thanks, -Stolee Derrick Stolee (8): maintenance: add get_random_minute() maintenance: use random minute in launchctl scheduler maintenance: use random minute in Windows scheduler maintenance: use random minute in cron scheduler maintenance: swap method locations maintenance: use random minute in systemd scheduler maintenance: fix systemd schedule overlaps maintenance: update schedule before config builtin/gc.c | 291 +++++++++++++++++++++++++++++------------ t/t7900-maintenance.sh | 28 +++- wrapper.c | 10 ++ wrapper.h | 6 + 4 files changed, 250 insertions(+), 85 deletions(-) base-commit: a82fb66fed250e16d3010c75404503bea3f0ab61 Submitted-As: https://lore.kernel.org/git/pull.1567.v2.git.1691699987.gitgitgadget@gmail.com In-Reply-To: https://lore.kernel.org/git/pull.1567.git.1691434300.gitgitgadget@gmail.com
Assets 2
-
2023-08-10T20:39:48Z -
2023-08-10T20:39:48Z -