Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gitoxide for performance improvements #635

Merged
merged 38 commits into from
Apr 4, 2022
Merged

Conversation

Byron
Copy link
Collaborator

@Byron Byron commented Mar 29, 2022

This is a very first preview of what it would mean to use gitoxide for commit-traversal alone, done with the smallest possible changes for everything to remain as familiar as possible. It ain't super pretty yet, but it will get there. My rough plan is to get feedback early and ultimately squash all/most commits once a merge is possible.

Furthermore I think a delayed progress bar could easily be implemented to provide some entertainment while people are waiting for their huge repositories (those will never finish below 1s no matter how hard we try πŸ˜…).

Right now it's 2.2x faster on reactos and 2.4x faster on the linux kernel at 4% the heap memory consumption compared to what's on main.

I am looking forward to your feedback.

Making-of Video

  • A video highlighting a few improvements done to gitoxide to make this PR possible.

Additional Changes

Tasks

  • git-mailmap
  • initial use of gitoxide for commit graph traversal
  • avoid allocating all commits and avoid string-duplication in Sig
  • Use gitoxide in all possible places to validate the API is en-par with git2 or more convenient
    • configuration access and worktree-status is still done by git2, but we are working on it, this should be ready this year.
  • basic parallelization
  • …what about progress ❓
  • release gitoxide and switch to using crates.io

Performance

  • e3b29b0 - commit traversal with gitoxide

    • reactos - 1.492 s β†’ 1.040 s = 1.43x
    • linux v5.16 - 19.634 s β†’ 10.886 s = 1.80x, 1.47 GB β†’ 1.18 GB peak mem = 0.8x
  • 0652bbe - minimize allocations

    • reactos - 986.2 ms β†’ 915.4 ms = 1.08x
    • linux v5.15 - 11.313 s β†’ 9.670 s = 1.17x, 1180 MB β†’ 62 MB peak mem = 0.05x
  • 633f0ce - parallelize workspace status, tokei, and commit traversal

    • reactos - 915.4 ms β†’ 703.7 ms = 1.30x
    • linux v5.15 - 9.670 s β†’ 8.070 s = 1.20x

Performance comparison with main

  • reactos - 1.492 s β†’ 703.7 ms = 2.12x
  • linux v5.15 - 19.634 s β†’ 8.070 s = 2.43x, 1470 MB β†’ 62 MB peak mem = 0.04x

On 633f0ce

reactos

➜  reactos git:(master) hyperfine ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):     703.7 ms Β±  54.6 ms    [User: 2148.3 ms, System: 526.4 ms]
  Range (min … max):   605.4 ms … 814.1 ms    10 runs

linux v5.16

➜  linux git:(df0cc57e057f) βœ— hyperfine ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):      8.070 s Β±  0.139 s    [User: 12.196 s, System: 1.909 s]
  Range (min … max):    7.900 s …  8.314 s    10 runs

On 0652bbe

reactos

➜  reactos git:(master) hyperfine onefetch ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: onefetch
  Time (mean Β± Οƒ):     986.2 ms Β±   9.9 ms    [User: 2120.3 ms, System: 618.4 ms]
  Range (min … max):   976.1 ms … 1003.5 ms    10 runs

Benchmark 2: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):     915.4 ms Β±  19.5 ms    [User: 2062.3 ms, System: 578.8 ms]
  Range (min … max):   902.2 ms … 966.8 ms    10 runs

Summary
  '/Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch' ran
    1.08 Β± 0.03 times faster than 'onefetch'

** linux**

➜  linux git:(df0cc57e057f) βœ— hyperfine onefetch ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: onefetch
  Time (mean Β± Οƒ):     11.313 s Β±  0.193 s    [User: 13.065 s, System: 2.030 s]
  Range (min … max):   10.997 s … 11.572 s    10 runs

Benchmark 2: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):      9.670 s Β±  0.118 s    [User: 12.319 s, System: 1.782 s]
  Range (min … max):    9.482 s …  9.835 s    10 runs

Summary
  '/Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch' ran
    1.17 Β± 0.02 times faster than 'onefetch'

➜  linux git:(df0cc57e057f) βœ— /usr/bin/time -lp /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.0 (Apple Git-132)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: linux (1 branch, 735 tags)
       ++++++++++++++++++++++++++          HEAD: df0cc57 (linux-checkout-4)
    ++++++++++++++++++++++++++++++++       Pending: 13+- 13-
 +++++++++++++************+++++++++++++    Version: v5.17-rc7
+++++++++++******************++++++++;;;   Created: 7 years ago
+++++++++**********************++;;;;;;;   Languages:
++++++++*********++++++******;;;;;;;;;;;              ● C (99.0 %) ● Shell (0.5 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● Python (0.2 %) ● Perl (0.2 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Assembly (0.0 %) ● C++ (0.0 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;              ● Other (0.0 %)
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;   Authors: 3% Linus Torvalds 30365
+++++++********::::::::::**;;;;;;;;;;;;;            1% David S. Miller 13285
++++++++*********::::::******;;;;;;;;;;;            1% Arnd Bergmann 8636
++++++:::**********************::;;;;;;;   Last change: 2 months ago
+++::::::::******************::::::::;;;   Contributors: 31713
 :::::::::::::************:::::::::::::    Repo: https://github.com/torvalds/linux
    ::::::::::::::::::::::::::::::::       Commits: 1060298
       ::::::::::::::::::::::::::          Lines of code: 16026142
          ::::::::::::::::::::             Size: 1.02 GiB (74304 files)
              ::::::::::::
                 ::::::

real 9.51
user 12.18
sys 2.00
           868958208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               73721  page reclaims
                   1  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   6  voluntary context switches
               23996  involuntary context switches
        113921189621  instructions retired
         43680206129  cycles elapsed
            62381056  peak memory footprint
➜  linux git:(df0cc57e057f) βœ— /usr/bin/time -lp onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.0 (Apple Git-132)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: linux (1 branch, 735 tags)
       ++++++++++++++++++++++++++          HEAD: df0cc57 (linux-checkout-4)
    ++++++++++++++++++++++++++++++++       Pending: 13+- 13-
 +++++++++++++************+++++++++++++    Version: v5.17-rc7
+++++++++++******************++++++++;;;   Created: 7 years ago
+++++++++**********************++;;;;;;;   Languages:
++++++++*********++++++******;;;;;;;;;;;              ● C (99.0 %) ● Shell (0.5 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● Python (0.2 %) ● Perl (0.2 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Assembly (0.0 %) ● C++ (0.0 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;              ● Other (0.0 %)
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;   Authors: 3% Linus Torvalds 30365
+++++++********::::::::::**;;;;;;;;;;;;;            1% David S. Miller 13285
++++++++*********::::::******;;;;;;;;;;;            1% Arnd Bergmann 8636
++++++:::**********************::;;;;;;;   Last change: 2 months ago
+++::::::::******************::::::::;;;   Contributors: 31712
 :::::::::::::************:::::::::::::    Repo: https://github.com/torvalds/linux
    ::::::::::::::::::::::::::::::::       Commits: 1060298
       ::::::::::::::::::::::::::          Lines of code: 16026142
          ::::::::::::::::::::             Size: 1.02 GiB (74304 files)
              ::::::::::::
                 ::::::

real 11.42
user 12.88
sys 2.17
          2300035072  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              140632  page reclaims
                 638  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               45086  voluntary context switches
               47751  involuntary context switches
        125092258058  instructions retired
         47152882263  cycles elapsed
          1178364992  peak memory footprint

On e3b29b0

reactos

➜  reactos git:(master) hyperfine onefetch ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: onefetch
  Time (mean Β± Οƒ):      1.492 s Β±  0.084 s    [User: 2.549 s, System: 0.698 s]
  Range (min … max):    1.437 s …  1.726 s    10 runs

Benchmark 2: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):      1.040 s Β±  0.130 s    [User: 2.158 s, System: 0.609 s]
  Range (min … max):    0.989 s …  1.408 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (1.408 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Summary
  '/Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch' ran
    1.43 Β± 0.20 times faster than 'onefetch'

linux v5.16

➜  linux git:(df0cc57e057f) βœ— hyperfine onefetch ~/dev/github.com/o2sh/onefetch/target/release/onefetch
Benchmark 1: onefetch
  Time (mean Β± Οƒ):     19.634 s Β±  1.127 s    [User: 20.267 s, System: 2.932 s]
  Range (min … max):   18.966 s … 22.787 s    10 runs

  Warning: The first benchmarking run for this command was significantly slower than the rest (22.787 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.

Benchmark 2: /Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch
  Time (mean Β± Οƒ):     10.886 s Β±  0.416 s    [User: 13.057 s, System: 2.024 s]
  Range (min … max):   10.351 s … 11.571 s    10 runs

Summary
  '/Users/byron/dev/github.com/o2sh/onefetch/target/release/onefetch' ran
    1.80 Β± 0.12 times faster than 'onefetch'

It also uses less memory (already)

➜  linux git:(df0cc57e057f) βœ— /usr/bin/time -lp onefetch
/usr/bin/time -lp ~/dev/github.com/o2sh/onefetch/target/release/onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.0 (Apple Git-132)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: linux (1 branch, 735 tags)
       ++++++++++++++++++++++++++          HEAD: df0cc57 (linux-checkout-4)
    ++++++++++++++++++++++++++++++++       Pending: 13+- 13-
 +++++++++++++************+++++++++++++    Version: v5.17-rc7
+++++++++++******************++++++++;;;   Created: 16 years ago
+++++++++**********************++;;;;;;;   Languages:
++++++++*********++++++******;;;;;;;;;;;              ● C (99.0 %) ● Shell (0.5 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● Python (0.2 %) ● Perl (0.2 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Assembly (0.0 %) ● C++ (0.0 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;              ● Other (0.0 %)
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;   Authors: 3% Linus Torvalds 30365
+++++++********::::::::::**;;;;;;;;;;;;;            1% David S. Miller 13285
++++++++*********::::::******;;;;;;;;;;;            1% Arnd Bergmann 8636
++++++:::**********************::;;;;;;;   Last change: 2 months ago
+++::::::::******************::::::::;;;   Contributors: 31710
 :::::::::::::************:::::::::::::    Repo: https://github.com/torvalds/linux
    ::::::::::::::::::::::::::::::::       Commits: 1060298
       ::::::::::::::::::::::::::          Lines of code: 16026142
          ::::::::::::::::::::             Size: 1.02 GiB (74304 files)
              ::::::::::::
                 ::::::

real 19.71
user 20.29
sys 3.22
          2462187520  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              197675  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               45187  voluntary context switches
               50938  involuntary context switches
        235305804597  instructions retired
         74307371842  cycles elapsed
          1579909760  peak memory footprint
➜  linux git:(df0cc57e057f) βœ— /usr/bin/time -lp ~/dev/github.com/o2sh/onefetch/target/release/onefetch
                 ++++++                    Sebastian Thiel ~ git version 2.32.0 (Apple Git-132)
              ++++++++++++                 ----------------------------------------------------
          ++++++++++++++++++++             Project: linux (1 branch, 735 tags)
       ++++++++++++++++++++++++++          HEAD: df0cc57 (linux-checkout-4)
    ++++++++++++++++++++++++++++++++       Pending: 13+- 13-
 +++++++++++++************+++++++++++++    Version: v5.17-rc7
+++++++++++******************++++++++;;;   Created: 7 years ago
+++++++++**********************++;;;;;;;   Languages:
++++++++*********++++++******;;;;;;;;;;;              ● C (99.0 %) ● Shell (0.5 %)
+++++++********++++++++++**;;;;;;;;;;;;;              ● Python (0.2 %) ● Perl (0.2 %)
+++++++*******+++++++++;;;;;;;;;;;;;;;;;              ● Assembly (0.0 %) ● C++ (0.0 %)
+++++++******+++++++;;;;;;;;;;;;;;;;;;;;              ● Other (0.0 %)
+++++++*******+++:::::;;;;;;;;;;;;;;;;;;   Authors: 3% Linus Torvalds 30365
+++++++********::::::::::**;;;;;;;;;;;;;            1% David S. Miller 13285
++++++++*********::::::******;;;;;;;;;;;            1% Arnd Bergmann 8636
++++++:::**********************::;;;;;;;   Last change: 2 months ago
+++::::::::******************::::::::;;;   Contributors: 31712
 :::::::::::::************:::::::::::::    Repo: https://github.com/torvalds/linux
    ::::::::::::::::::::::::::::::::       Commits: 1060298
       ::::::::::::::::::::::::::          Lines of code: 16026142
          ::::::::::::::::::::             Size: 1.02 GiB (74304 files)
              ::::::::::::
                 ::::::

real 11.07
user 13.31
sys 2.00
          2298560512  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              140884  page reclaims
                 308  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               14833  voluntary context switches
               24536  involuntary context switches
        123147113212  instructions retired
         47528928511  cycles elapsed
          1181969472  peak memory footprint

Note that the contributor count is off by 2 - it's likely to be related to the mailmap, and for I don't know if libgit2 is off or gitoxide. I will look into it.

It's using the latest version from github for now until it's clear
what should be published later - gitoxide might change to provide
the best possible experience.
Cargo.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@spenserblack spenserblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all your hard work so far! I have a couple more questions:

src/info/repo.rs Outdated Show resolved Hide resolved
src/info/repo.rs Outdated Show resolved Hide resolved
… identity

Additionally we sort contributors by name in case they have the same
amount of commits to assure stable results, which may help in some
cases where similar commits counts would be displayed as they
are in the top 3.
We cache the lower-case value only if it is different from what's there
to speed up comparisons while using memory/allocations only when needed.
@Byron Byron changed the title [DRAFT] Gitoxide for performance improvements Gitoxide for performance improvements Mar 31, 2022
@Byron
Copy link
Collaborator Author

Byron commented Mar 31, 2022

I think it's ready for a formal review. Thanks.

@Byron
Copy link
Collaborator Author

Byron commented Apr 1, 2022

I took a look at what mainline does when displaying shallow clones and noticed that it displays it as 0 commits, even though there is one commit. This is what is shown with the code in this branch, but I thought it would be good to indicate the history has been truncated. Thus 927815a introduces a new form of display that displays this case as 1 commit (shallow).

Hopefully this is something you find as valuable as me, please let me know what you think.

Edit: Here is how it looks like.

That way apps using it won't be surprised by default, but instead
can upgrade to deal even better with shallow clones by using
the data provided in the iterator.
@spenserblack
Copy link
Collaborator

spenserblack commented Apr 1, 2022

I thought it would be good to indicate the history has been truncated.

That is a good idea! Shallow clones can misrepresent the commit count, after all. Issue #592 may also be relevant -- it seems to be another way that a user might truncate the history.

I'd consider this a new feature besides the goal performance improvement. Would you be willing to keep a log of additional bugfixes/features in this PR's description? That way we can make sure to give you credit for all your improvements in the next release πŸ™‚
Just something simple, like

## Additional Changes

- detect shallow clones

src/info/repo.rs Outdated Show resolved Hide resolved
@o2sh
Copy link
Owner

o2sh commented Apr 1, 2022

Would you be willing to keep a log of additional bugfixes/features in this PR's description? That way we can make sure to give you credit for all your improvements in the next release slightly_smiling_face

I have created 3 issues related to this PR: #628 #629 #636

@Byron
Copy link
Collaborator Author

Byron commented Apr 2, 2022

@spenserblack Thanks so much for pointing me to this issue. Even though I stumbled across the graft/replace feature when studying git's source I could never make sense of it. That changed now and I will implement support for it in gitoxide. No change will be needed in onefetch for that to work.

@o2sh Thanks for the issues, I created a new section in the PR body to track those.

Besides that, I think all review comments are now addressed πŸŽ‰, but I will let you know separately once I think it's ready for a final look.

@Byron
Copy link
Collaborator Author

Byron commented Apr 2, 2022

Many thanks for the hint about replacement objects, thanks to that gitoxide now respects them by default similar to what git does, and also picks up typical git environment variables by default like GIT_NO_REPLACE_OBJECTS and the under-documented GIT_REPLACE_REFS_BASE.

This means that from my side, this PR is now feature complete and I am looking forward to your feedback to push it over the finishing line.

@o2sh
Copy link
Owner

o2sh commented Apr 2, 2022

I've just approved the PR.
We'll wait for @spenserblack to finish his review.
After that, you'll be free to merge this branch with main.

Prior to that, don't forget to:

when ready for merge: release gitoxide and switch to using crates.io

@spenserblack, do we keep the cargo run in the ci.yml?

Thanks again for your contribution @Byron, onefetch + gitoxide seems to be a great match and I can't wait for git_config and worktree-status to be supported so that we can drop libGit2 entirely.

Copy link
Collaborator

@spenserblack spenserblack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work maintaining gitoxide and making this PR!

These are just some comments to think about. The only one that's somewhat high-priority is the BUG expects, but overall LGTM.

.github/workflows/ci.yml Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
src/info/repo.rs Outdated Show resolved Hide resolved
src/info/repo.rs Outdated Show resolved Hide resolved
Ideally we add more test coverage so users have no chance from ever
seeing it.
Previously this wasn't possible as commits would be kept in `Repo`
which would cause self-referential borrow check issues unless
the git2 repository was kept outside.
@Byron
Copy link
Collaborator Author

Byron commented Apr 4, 2022

Thanks for your hard work maintaining gitoxide and making this PR!

You are welcome, it's a pleasure :).

Thanks for all your suggestions, you find them implemented in the new commits. I will let it settle for a day or so in case other bits and pieces come up, and go ahead with the merge otherwise.

@Byron Byron merged commit cc6f332 into o2sh:main Apr 4, 2022
@Byron Byron deleted the gitoxide-for-traversal branch April 4, 2022 12:41
@Byron
Copy link
Collaborator Author

Byron commented Apr 4, 2022

And it's done πŸŽ‰! Thanks everyone for their suggestions and comments!

I can't wait to finalize the transition to gitoxide in the course of the year, you can subscribe here to track the remaining missing features.

@Byron Byron added this to the v2.13.0 milestone Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants