Skip to content

pr-1134/neerajsi-msft/ns/batched-fsync-v5

…jects

V5 changes:

 * Remove 'local-with-assignment' from perf-lib.sh
 * Add a new patch to put an ODB transaction around add_files_to_cache.
 * Move the perf tests to t/perf/p0008-odb-fsync.sh and add more perf tests.

V4 changes:

 * Make ODB transactions nestable.
 * Add an ODB transaction around writing out the cached tree.
 * Change update-index to use a more straightforward way of managing ODB
   transactions.
 * Fix missing 'local's in lib-unique-files
 * Add a per-iteration setup mechanism to test_perf.
 * Fix camelCasing in warning message.

V3 changes:

 * Rebrand plug/unplug-bulk-checkin to "begin_odb_transaction" and
   "end_odb_transaction"
 * Add a patch to pass filenames to fsync_or_die, rather than the string
   "loose object"
 * Update the commit description for "core.fsyncmethod to explain why we do
   not directly expose objects until an fsync occurs.
 * Also explain in the commit description why we're using a dummy file for
   the fsync.
 * Create the bulk-fsync tmp-objdir lazily the first time a loose object is
   added. We now do fsync iff that objdir exists.
 * Do batch fsync if core.fsyncMethod=batch and core.fsync contains
   loose-object, regardless of the core.fsyncObjectFiles setting.
 * Mitigate the risk in update-index of an object not being visible due to
   bulk checkin.
 * Add a perf comment to justify the unpack-objects usage of bulk-checkin.
 * Add a new patch to create helpers for parsing OIDs from git commands.
 * Add a comment to the lib-unique-files.sh helper about uniqueness only
   within a repo.
 * Fix style and add '&&' chaining to test helpers.
 * Comment on some magic numbers in tests.
 * Take the object list as an argument in
   ./t5300-pack-object.sh:check_unpack ()
 * Drop accidental change to t/perf/perf-lib.sh

V2 changes:

 * Change doc to indicate that only some repo updates are batched
 * Null and zero out control variables in do_batch_fsync under
   unplug_bulk_checkin
 * Make batch mode default on Windows.
 * Update the description for the initial patch that cleans up the
   bulk-checkin infrastructure.
 * Rebase onto 'seen' at 0cac37f38f9.

--Original definition-- When core.fsync includes loose-object, we issue an
fsync after every written object. For a 'git-add' or similar command that
adds a lot of files to the repo, the costs of these fsyncs adds up. One
major factor in this cost is the time it takes for the physical storage
controller to flush its caches to durable media.

This series takes advantage of the writeout-only mode of git_fsync to issue
OS cache writebacks for all of the objects being added to the repository
followed by a single fsync to a dummy file, which should trigger a
filesystem log flush and storage controller cache flush. This mechanism is
known to be safe on common Windows filesystems and expected to be safe on
macOS. Some linux filesystems, such as XFS, will probably do the right thing
as well. See [1] for previous discussion on the predecessor of this patch
series.

This series is important on Windows, where loose-objects are included in the
fsync set by default in Git-For-Windows. In this series, I'm also setting
the default mode for Windows to turn on loose object fsyncing with batch
mode, so that we can get CI coverage of the actual git-for-windows
configuration upstream. We still don't actually issue fsyncs for the test
suite since GIT_TEST_FSYNC is set to 0, but we exercise all of the
surrounding batch mode code.

This work is based on 'next' at c54b8eb302. It's dependent on
ns/core-fsyncmethod.

[1]
https://lore.kernel.org/git/2c1ddef6057157d85da74a7274e03eacf0374e45.1629856293.git.gitgitgadget@gmail.com/

Neeraj Singh (14):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
  object-file: pass filename to fsync_or_die
  core.fsyncmethod: batched disk flushes for loose-objects
  cache-tree: use ODB transaction around writing a tree
  builtin/add: add ODB transaction around add_files_to_cache
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsync: use batch mode and sync loose objects by default on
    Windows
  test-lib-functions: add parsing helpers for ls-files and ls-tree
  core.fsyncmethod: tests for batch mode
  t/perf: add iteration setup mechanism to perf-lib
  core.fsyncmethod: performance tests for batch mode
  core.fsyncmethod: correctly camel-case warning message

 Documentation/config/core.txt          |   8 ++
 builtin/add.c                          |  13 +++-
 builtin/unpack-objects.c               |   3 +
 builtin/update-index.c                 |  24 ++++++
 bulk-checkin.c                         | 101 ++++++++++++++++++++++---
 bulk-checkin.h                         |  17 ++++-
 cache-tree.c                           |   3 +
 cache.h                                |  12 ++-
 compat/mingw.h                         |   3 +
 config.c                               |   6 +-
 git-compat-util.h                      |   2 +
 object-file.c                          |  15 ++--
 t/lib-unique-files.sh                  |  34 +++++++++
 t/perf/p0008-odb-fsync.sh              |  81 ++++++++++++++++++++
 t/perf/p4220-log-grep-engines.sh       |   3 +-
 t/perf/p4221-log-grep-engines-fixed.sh |   3 +-
 t/perf/p5302-pack-index.sh             |  15 ++--
 t/perf/p7519-fsmonitor.sh              |  18 +----
 t/perf/p7820-grep-engines.sh           |   6 +-
 t/perf/perf-lib.sh                     |  62 +++++++++++++--
 t/t3700-add.sh                         |  28 +++++++
 t/t3903-stash.sh                       |  20 +++++
 t/t5300-pack-object.sh                 |  41 ++++++----
 t/t5317-pack-objects-filter-objects.sh |  91 +++++++++++-----------
 t/test-lib-functions.sh                |  10 +++
 25 files changed, 501 insertions(+), 118 deletions(-)
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p0008-odb-fsync.sh

base-commit: c54b8eb302ffb72f31e73a26044c8a864e2cb307

Submitted-As: https://lore.kernel.org/git/pull.1134.v5.git.1648616734.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1134.git.1647379859.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1134.v2.git.1647760560.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1134.v3.git.1648097906.gitgitgadget@gmail.com
In-Reply-To: https://lore.kernel.org/git/pull.1134.v4.git.1648514552.gitgitgadget@gmail.com
Assets 2