Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make precommit check staged changes instead of entire working directory #351

Merged
merged 1 commit into from Apr 25, 2022

Conversation

hong-godaddy
Copy link
Contributor

@hong-godaddy hong-godaddy commented Apr 23, 2022

To help us get this pull request reviewed and merged quickly, please be sure to include the following items:

  • Tests (if applicable)
  • Documentation (if applicable)
  • Changelog entry
  • A full explanation here in the PR description of the work done

PR Type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update (formatting, local variables)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Backward Compatibility

Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?

  • Yes (backward compatible)
  • No (breaking changes)

What's new?

tartufo pre-commit in Tartufo 3, with the move to pygit2, is much slower than Tartufo 2.

The point of precommit is to check the diffs in index/staging against HEAD, there should not be a need to compare against the entire working directory, which in the case of a large repo could take a while. This can be seen by running pre-commit even without any changes staged, which will compare HEAD against working directory.

Here's the v2 of the PrecommitScanner, which looks at the repo index instead of workdir - https://github.com/godaddy/tartufo/blob/v2.10.1/tartufo/scanner.py#L742

Here's some performance differences in our repo, with and without cached=True:

>>> import profile
>>> profile.run("repo.diff('HEAD')")
         15 function calls in 4.074 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    4.062    4.062    4.062    4.062 :0(diff_to_workdir)
        1    0.000    0.000    4.063    4.063 :0(exec)
        4    0.000    0.000    0.000    0.000 :0(isinstance)
        2    0.000    0.000    0.000    0.000 :0(peel)
        1    0.000    0.000    0.000    0.000 :0(revparse_single)
        1    0.011    0.011    0.011    0.011 :0(setprofile)
        1    0.000    0.000    4.063    4.063 <string>:1(<module>)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    4.074    4.074 profile:0(repo.diff('HEAD'))
        2    0.000    0.000    0.000    0.000 repository.py:458(__whatever_to_tree_or_blob)
        1    0.000    0.000    4.063    4.063 repository.py:482(diff)


>>> profile.run("repo.diff('HEAD', cached=True)")
         24 function calls in 0.004 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 :0(__new__)
        1    0.003    0.003    0.003    0.003 :0(diff_to_index)
        1    0.000    0.000    0.004    0.004 :0(exec)
        1    0.000    0.000    0.000    0.000 :0(git_index_free)
        1    0.000    0.000    0.000    0.000 :0(git_repository_index)
        4    0.000    0.000    0.000    0.000 :0(isinstance)
        1    0.000    0.000    0.000    0.000 :0(new)
        2    0.000    0.000    0.000    0.000 :0(peel)
        1    0.000    0.000    0.000    0.000 :0(revparse_single)
        1    0.000    0.000    0.000    0.000 :0(setprofile)
        1    0.000    0.000    0.004    0.004 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 errors.py:33(check_error)
        1    0.000    0.000    0.000    0.000 index.py:56(from_c)
        1    0.000    0.000    0.000    0.000 index.py:65(_pointer)
        1    0.000    0.000    0.000    0.000 index.py:69(__del__)
        0    0.000             0.000          profile:0(profiler)
        1    0.000    0.000    0.004    0.004 profile:0(repo.diff('HEAD', cached=True))
        2    0.000    0.000    0.000    0.000 repository.py:458(__whatever_to_tree_or_blob)
        1    0.000    0.000    0.004    0.004 repository.py:482(diff)
        1    0.000    0.000    0.000    0.000 repository.py:643(index)

https://www.pygit2.org/diff.html

@hong-godaddy hong-godaddy requested a review from a team as a code owner April 23, 2022 01:51
@hong-godaddy hong-godaddy changed the title pass cached=True to Repository.diff() for PrecommitScanner Make precommit check staged changes instead of entire working directory Apr 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants