Skip to content

pr-1567/derrickstolee/maintenance-random-minute-v2

When we initially created background maintenance -- with its hourly, daily,
and weekly schedules -- we considered the effects of all clients launching
fetches to the server every hour on the hour. The worry of DDoSing server
hosts was noted, but left as something we would consider for a future
update.

As background maintenance has gained more adoption over the past three
years, our worries about DDoSing the big Git hosts has been unfounded. Those
systems, especially those serving public repositories, are already resilient
to thundering herds of much smaller scale.

However, sometimes organizations spin up specific custom server
infrastructure either in addition to or on top of their Git host. Some of
these technologies are built for a different range of scale, and can hit
concurrency limits sooner. Organizations with such custom infrastructures
are more likely to recommend tools like scalar which furthers their adoption
of background maintenance.

This series attempts to help by spreading out background maintenance to a
random minute of the hour. This minute is selected during git maintenance
start, and the same minute applies to each of the three schedules.

This isn't a full solution to this problem, as the custom infrastructure
needs to be resilient to bursts of activity, but at least this will help
somewhat.

Each of the integrated schedulers needs a different way of integrating the
random minute. The most problematic is systemd, since our integration had a
clever use of templates to write one schedule that inserted the hourly,
daily, and weekly schedules as a string into the template. This needs some
refactoring before the custom minute could be inserted.

For the most part, each scheduler's integration is relatively simple. That
is, until we get to the systemd integration. That integration made use of a
clever templating technique that is no longer possible when making this
adjustment.

Patches 5-7 involve systemd, though patch 5 is just a move of code (without
edits) to make the diff in patch 6 somewhat simpler (it's still complicated
due to templating changes). Patch 7 fixes an issue where the systemd
schedules overlap.

Patch 8 fixes an issue where config changes persist even if the scheduler
fails to initialize. Thanks for noticing, Philip!

Updates in version 2
====================

 * get_random_minute() now uses a new helper, git_rand(), which is itself a
   wrapper around csprng_bytes() for easier use.
 * get_random_minute() also had an error in its use of getenv() which is now
   fixed.
 * Patch 6 has a lot of new changes, including:
   * Keeping the .service template.
   * Deleting the old .timer template when safe to do so.
 * Patch 7 fixes the schedule overlap in systemd.
 * Patch 8 fixes the issue where 'mainteancne.auto=false' would persist even
   if the scheduler failed to initialize.

Thanks, -Stolee

Derrick Stolee (8):
  maintenance: add get_random_minute()
  maintenance: use random minute in launchctl scheduler
  maintenance: use random minute in Windows scheduler
  maintenance: use random minute in cron scheduler
  maintenance: swap method locations
  maintenance: use random minute in systemd scheduler
  maintenance: fix systemd schedule overlaps
  maintenance: update schedule before config

 builtin/gc.c           | 291 +++++++++++++++++++++++++++++------------
 t/t7900-maintenance.sh |  28 +++-
 wrapper.c              |  10 ++
 wrapper.h              |   6 +
 4 files changed, 250 insertions(+), 85 deletions(-)

base-commit: a82fb66fed250e16d3010c75404503bea3f0ab61

Submitted-As: https://lore.kernel.org/git/pull.1567.v2.git.1691699987.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1567.git.1691434300.gitgitgadget@gmail.com
Assets 2