Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 4c719308ce59dc70e606f910f40801f2c6051b24
Choose a base ref
...
head repository: git/git
compare: caff8b73402d4b5edb2c6c755506c5a90351b69a
Choose a head ref
  • 7 commits
  • 6 files changed
  • 1 contributor

Commits on Sep 1, 2021

  1. fetch: speed up lookup of want refs via commit-graph

    When updating our local refs based on the refs fetched from the remote,
    we need to iterate through all requested refs and load their respective
    commits such that we can determine whether they need to be appended to
    FETCH_HEAD or not. In cases where we're fetching from a remote with
    exceedingly many refs, resolving these refs can be quite expensive given
    that we repeatedly need to unpack object headers for each of the
    referenced objects.
    
    Speed this up by opportunistically trying to resolve object IDs via the
    commit graph. We only do so for any refs which are not in "refs/tags":
    more likely than not, these are going to be a commit anyway, and this
    lets us avoid having to unpack object headers completely in case the
    object is a commit that is part of the commit-graph. This significantly
    speeds up mirror-fetches in a real-world repository with
    2.3M refs:
    
        Benchmark #1: HEAD~: git-fetch
          Time (mean ± σ):     56.482 s ±  0.384 s    [User: 53.340 s, System: 5.365 s]
          Range (min … max):   56.050 s … 57.045 s    5 runs
    
        Benchmark #2: HEAD: git-fetch
          Time (mean ± σ):     33.727 s ±  0.170 s    [User: 30.252 s, System: 5.194 s]
          Range (min … max):   33.452 s … 33.871 s    5 runs
    
        Summary
          'HEAD: git-fetch' ran
            1.67 ± 0.01 times faster than 'HEAD~: git-fetch'
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    fe7df03 View commit details
    Browse the repository at this point in the history
  2. fetch: avoid unpacking headers in object existence check

    When updating local refs after the fetch has transferred all objects, we
    do an object existence test as a safety guard to avoid updating a ref to
    an object which we don't have. We do so via `oid_object_info()`: if it
    returns an error, then we know the object does not exist.
    
    One side effect of `oid_object_info()` is that it parses the object's
    type, and to do so it must unpack the object header. This is completely
    pointless: we don't care for the type, but only want to assert that the
    object exists.
    
    Refactor the code to use `repo_has_object_file()`, which both makes the
    code's intent clearer and is also faster because it does not unpack
    object headers. In a real-world repo with 2.3M refs, this results in a
    small speedup when doing a mirror-fetch:
    
        Benchmark #1: HEAD~: git-fetch
          Time (mean ± σ):     33.686 s ±  0.176 s    [User: 30.119 s, System: 5.262 s]
          Range (min … max):   33.512 s … 33.944 s    5 runs
    
        Benchmark #2: HEAD: git-fetch
          Time (mean ± σ):     31.247 s ±  0.195 s    [User: 28.135 s, System: 5.066 s]
          Range (min … max):   30.948 s … 31.472 s    5 runs
    
        Summary
          'HEAD: git-fetch' ran
            1.08 ± 0.01 times faster than 'HEAD~: git-fetch'
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    47c6100 View commit details
    Browse the repository at this point in the history
  3. connected: refactor iterator to return next object ID directly

    The object ID iterator used by the connectivity checks returns the next
    object ID via an out-parameter and then uses a return code to indicate
    whether an item was found. This is a bit roundabout: instead of a
    separate error code, we can just return the next object ID directly and
    use `NULL` pointers as indicator that the iterator got no items left.
    Furthermore, this avoids a copy of the object ID.
    
    Refactor the iterator and all its implementations to return object IDs
    directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs:
    
        Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch
          Time (mean ± σ):     30.110 s ±  0.148 s    [User: 27.161 s, System: 5.075 s]
          Range (min … max):   29.934 s … 30.406 s    10 runs
    
        Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch
          Time (mean ± σ):     29.899 s ±  0.109 s    [User: 26.916 s, System: 5.104 s]
          Range (min … max):   29.696 s … 29.996 s    10 runs
    
        Summary
          '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran
            1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch'
    
    While this 1% speedup could be labelled as statistically insignificant,
    the speedup is consistent on my machine. Furthermore, this is an end to
    end test, so it is expected that the improvement in the connectivity
    check itself is more significant.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    9fec7b2 View commit details
    Browse the repository at this point in the history
  4. fetch-pack: optimize loading of refs via commit graph

    In order to negotiate a packfile, we need to dereference refs to see
    which commits we have in common with the remote. To do so, we first look
    up the object's type -- if it's a tag, we peel until we hit a non-tag
    object. If we hit a commit eventually, then we return that commit.
    
    In case the object ID points to a commit directly, we can avoid the
    initial lookup of the object type by opportunistically looking up the
    commit via the commit-graph, if available, which gives us a slight speed
    bump of about 2% in a huge repository with about 2.3M refs:
    
        Benchmark #1: HEAD~: git-fetch
          Time (mean ± σ):     31.634 s ±  0.258 s    [User: 28.400 s, System: 5.090 s]
          Range (min … max):   31.280 s … 31.896 s    5 runs
    
        Benchmark #2: HEAD: git-fetch
          Time (mean ± σ):     31.129 s ±  0.543 s    [User: 27.976 s, System: 5.056 s]
          Range (min … max):   30.172 s … 31.479 s    5 runs
    
        Summary
          'HEAD: git-fetch' ran
            1.02 ± 0.02 times faster than 'HEAD~: git-fetch'
    
    In case this fails, we fall back to the old code which peels the
    objects to a commit.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    62b5a35 View commit details
    Browse the repository at this point in the history
  5. fetch: refactor fetch refs to be more extendable

    Refactor `fetch_refs()` code to make it more extendable by explicitly
    handling error cases. The refactored code should behave the same.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    284b2ce View commit details
    Browse the repository at this point in the history
  6. fetch: merge fetching and consuming refs

    The functions `fetch_refs()` and `consume_refs()` must always be called
    together such that we first obtain all missing objects and then update
    our local refs to match the remote refs. In a subsequent patch, we'll
    further require that `fetch_refs()` must always be called before
    `consume_refs()` such that it can correctly assert that we have all
    objects after the fetch given that we're about to move the connectivity
    check.
    
    Make this requirement explicit by merging both functions into a single
    `fetch_and_consume_refs()` function.
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    1c7d1ab View commit details
    Browse the repository at this point in the history
  7. fetch: avoid second connectivity check if we already have all objects

    When fetching refs, we are doing two connectivity checks:
    
        - The first one is done such that we can skip fetching refs in the
          case where we already have all objects referenced by the updated
          set of refs.
    
        - The second one verifies that we have all objects after we have
          fetched objects.
    
    We always execute both connectivity checks, but this is wasteful in case
    the first connectivity check already notices that we have all objects
    locally available.
    
    Skip the second connectivity check in case we already had all objects
    available. This gives us a nice speedup when doing a mirror-fetch in a
    repository with about 2.3M refs where the fetching repo already has all
    objects:
    
        Benchmark #1: HEAD~: git-fetch
          Time (mean ± σ):     30.025 s ±  0.081 s    [User: 27.070 s, System: 4.933 s]
          Range (min … max):   29.900 s … 30.111 s    5 runs
    
        Benchmark #2: HEAD: git-fetch
          Time (mean ± σ):     25.574 s ±  0.177 s    [User: 22.855 s, System: 4.683 s]
          Range (min … max):   25.399 s … 25.765 s    5 runs
    
        Summary
          'HEAD: git-fetch' ran
            1.17 ± 0.01 times faster than 'HEAD~: git-fetch'
    
    Signed-off-by: Patrick Steinhardt <ps@pks.im>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    pks-t authored and gitster committed Sep 1, 2021
    Copy the full SHA
    caff8b7 View commit details
    Browse the repository at this point in the history