Skip to content

Conversation

@derrickstolee
Copy link

  • This is an early version of work already under review upstream.

See gitgitgadget#1567 for the version submitted upstream. This is an early version for the microsoft/git fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.)

At least on Windows, we should re-run git maintenance start as part of scalar reconfigure during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories.

@derrickstolee derrickstolee self-assigned this Aug 7, 2023
When we initially created background maintenance -- with its hourly,
daily, and weekly schedules -- we considered the effects of all clients
launching fetches to the server every hour on the hour. The worry of
DDoSing server hosts was noted, but left as something we would consider
for a future update.

As background maintenance has gained more adoption over the past three
years, our worries about DDoSing the big Git hosts has been unfounded.
Those systems, especially those serving public repositories, are already
resilient to thundering herds of much smaller scale.

However, sometimes organizations spin up specific custom server
infrastructure either in addition to or on top of their Git host. Some
of these technologies are built for a different range of scale, and can
hit concurrency limits sooner. Organizations with such custom
infrastructures are more likely to recommend tools like `scalar` which
furthers their adoption of background maintenance.

To help solve for this, create get_random_minute() as a method to help
Git select a random minute when creating schedules in the future. The
integrations with this method do not yet exist, but will follow in
future changes.

To avoid multiple calls to srand() in the same process, create a new
git_rand() method that focuses on the initialization of srand() and the
call to rand(). To make it clear through this abstraction that it never
returns a negative number, convert the result to uint32_t (this has
implications in the typical case of taking the result modulo some
positive integer). There is another use of srand() in lockfile.c, but
this is being replaced with a cryptographic generator in an independent
change.

One thing that is important for testability is that we notice when we
are under a test scenario and return a predictable result. The schedules
themselves are not checked for this value, but at least one launchctl
test checks that we do not unnecessarily reboot the schedule if it has
not changed from a previous version.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance
schedules to be fixed to a random minute of the hour. This randomness is
only intended to spread out the load from a number of clients, but each
client should have an hour between each maintenance cycle.

Use get_random_minute() when constructing the schedules for launchctl.

The format already includes a 'Minute' key which is modified from 0 to
the random minute.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance
schedules to be fixed to a random minute of the hour. This randomness is
only intended to spread out the load from a number of clients, but each
client should have an hour between each maintenance cycle.

Add this random minute to the Windows scheduler integration.

We need only to modify the minute value for the 'StartBoundary' tag
across the three schedules.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance
schedules to be fixed to a random minute of the hour. This randomness is
only intended to spread out the load from a number of clients, but each
client should have an hour between each maintenance cycle.

Add this random minute to the cron integration.

The cron schedule specification starts with a minute indicator, which
was previously inserted as the "0" string but now takes the given minute
as an integer parameter.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The systemd_timer_write_unit_templates() method writes a single template
that is then used to start the hourly, daily, and weekly schedules with
systemd.

However, in order to schedule systemd maintenance on a given minute,
these templates need to be replaced with specific schedules for each of
these jobs.

Before modifying the schedules, move the writing method above the
systemd_timer_enable_unit() method, so we can write a specific schedule
for each unit.

The diff is computed smaller by showing systemd_timer_enable_unit() move
instead of systemd_timer_write_unit_templates().

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance
schedules to be fixed to a random minute of the hour. This randomness is
only intended to spread out the load from a number of clients, but each
client should have an hour between each maintenance cycle.

Add this random minute to the systemd integration.

This integration is more complicated than similar changes for other
schedulers because of a neat trick that systemd allows: templating.

The previous implementation generated two template files with names
of the form 'git-maintenance@.(timer|service)'. The '.timer' or
'.service' indicates that this is a template that is picked up when we
later specify '...@<schedule>.timer' or '...@<schedule>.service'. The
'<schedule>' string is then used to insert into the template both the
'OnCalendar' schedule setting and the '--schedule' parameter of the
'git maintenance run' command.

In order to set these schedules to a given minute, we can no longer use
the 'hourly', 'daily', or 'weekly' strings for '<schedule>' and instead
need to abandon the template model.

Modify the template with a custom schedule in the 'OnCalendar' setting.
This schedule has some interesting differences from cron-like patterns,
but is relatively easy to figure out from context. The one that might be
confusing is that '*-*-*' is a date-based pattern, but this must be
omitted when using 'Mon' to signal that we care about the day of the
week. Monday is used since that matches the day used for the 'weekly'
schedule used previously.

Now that these are no longer templates, we might want to abandon the '@'
symbol in the file names. However, this would cause users with existing
schedules to get two competing schedules due to different names. The
work to remove the old schedule name is one thing that we can avoid by
keeping the '@' symbol in our unit names.

The rest of the change involves making sure we are writing these .timer
and .service files before initializing the schedule with 'systemctl' and
deleting the files when we are done. Some changes are also made to share
the random minute along with a single computation of the execution path
of the current Git executable.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The 'git maintenance run' command prevents concurrent runs in the same
repository using a 'maintenance.lock' file. However, when using systemd
the hourly maintenance runs the same time as the daily and weekly runs.
(Similarly, daily maintenance runs at the same time as weekly
maintenance.) These competing commands result in some maintenance not
actually being run.

This overlap was something we could not fix until we made the recent
change to not use the builting 'hourly', 'daily', and 'weekly' schedules
in systemd. We can adjust the schedules such that:

 1. Hourly runs avoid the 0th hour.
 2. Daily runs avoid Monday.

This will keep maintenance runs from colliding when using systemd.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
@derrickstolee derrickstolee force-pushed the maintenance-random-minute branch from 14e340b to 958f644 Compare August 8, 2023 17:38
@derrickstolee
Copy link
Author

I needed to close this so I can reopen it with a new branch name. My upstream contribution needed to be rebased onto a more-recent version to avoid conflicts in header files.

derrickstolee added a commit that referenced this pull request Aug 14, 2023
* [x] This is an early version of work already under review upstream.

See gitgitgadget#1567 for the version submitted upstream. This is an
early version for the `microsoft/git` fork so we could potentially
include it in a release to our Microsoft partners. Upgrading to a
version with these changes may help with some of the auth problems
plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid
having fingers pointed in this direction in the future.)

At least on Windows, we should re-run `git maintenance start` as part of
`scalar reconfigure` during the installer. On other platforms, we will
need to rely on users slowly rotating through their local repositories.

> This PR is a recreation of #593 now that the upstream change required
rebasing to resolve header conflicts.
jeffhostetler pushed a commit that referenced this pull request Aug 23, 2023
* [x] This is an early version of work already under review upstream.

See gitgitgadget#1567 for the version submitted upstream. This is an
early version for the `microsoft/git` fork so we could potentially
include it in a release to our Microsoft partners. Upgrading to a
version with these changes may help with some of the auth problems
plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid
having fingers pointed in this direction in the future.)

At least on Windows, we should re-run `git maintenance start` as part of
`scalar reconfigure` during the installer. On other platforms, we will
need to rely on users slowly rotating through their local repositories.

> This PR is a recreation of #593 now that the upstream change required
rebasing to resolve header conflicts.
dscho pushed a commit that referenced this pull request Nov 3, 2023
* [x] This is an early version of work already under review upstream.

See gitgitgadget#1567 for the version submitted upstream. This is an
early version for the `microsoft/git` fork so we could potentially
include it in a release to our Microsoft partners. Upgrading to a
version with these changes may help with some of the auth problems
plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid
having fingers pointed in this direction in the future.)

At least on Windows, we should re-run `git maintenance start` as part of
`scalar reconfigure` during the installer. On other platforms, we will
need to rely on users slowly rotating through their local repositories.

> This PR is a recreation of #593 now that the upstream change required
rebasing to resolve header conflicts.
dscho pushed a commit that referenced this pull request Nov 3, 2023
* [x] This is an early version of work already under review upstream.

See gitgitgadget#1567 for the version submitted upstream. This is an
early version for the `microsoft/git` fork so we could potentially
include it in a release to our Microsoft partners. Upgrading to a
version with these changes may help with some of the auth problems
plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid
having fingers pointed in this direction in the future.)

At least on Windows, we should re-run `git maintenance start` as part of
`scalar reconfigure` during the installer. On other platforms, we will
need to rely on users slowly rotating through their local repositories.

> This PR is a recreation of #593 now that the upstream change required
rebasing to resolve header conflicts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant