Fix pre-commit ignoring untracked files #352

jwilbur-godaddy · 2022-04-27T19:50:56Z

To help us get this pull request reviewed and merged quickly, please be sure to include the following items:

Tests (if applicable)
Documentation (if applicable)
Changelog entry
A full explanation here in the PR description of the work done

PR Type

What kind of change does this PR introduce?

Backward Compatibility

Is this change backward compatible with the most recently released version? Does it introduce changes which might change the user experience in any way? Does it alter the API in any way?

Yes (backward compatible)
No (breaking changes)

Issue Linking

closes #331

What's new?

Fix non-scanning of new files. It just took a few extra git flags to show untracked files in the diff against HEAD.

rbailey-godaddy · 2022-04-28T13:54:55Z

This commit is admirably simple but hurts my brain to think about it. A few questions/requests/comments...

Can you add a comment citing https://github.com/libgit2/libgit2/blob/main/include/git2/diff.h so people who want to understand these flags can quickly find information about what they do?
I believe I understand why this works, but I don't understand why it is necessary. Both Pre-commit hook: High entropy strings not detected in new files by 3.0.0 #331 and No feature parity in detection between pre-commit and scan-local-repo for tartufo v3, contrary to v2 #350 provide test cases that explicitly do a git add before running the pre-commit check; at this point, the bad file should be in the index and it seems the existing cached=True should be sufficient to do what we want, i.e. "the index is compared to 'a' ["HEAD"]". Additionally tossing untracked files into the mix seems problematic logically in the sense that we do not want to fail a pre-commit check on the basis of files the caller has not added to the commit.

This is perhaps a bit expansive for a unit test, but can you validate the following scenario?

date | shasum > file-a.txt
date | shasum > file-b.txt
git add file-a.txt
tartufo pre-commit

We expect that a finding will be generated for file-a.txt, but not for file-b.txt. I worry that the change as proposed at this point will flag both files.

jwilbur-godaddy · 2022-04-28T15:01:41Z

This commit is admirably simple but hurts my brain to think about it. A few questions/requests/comments...

Can you add a comment citing https://github.com/libgit2/libgit2/blob/main/include/git2/diff.h so people who want to understand these flags can quickly find information about what they do?

I believe I understand why this works, but I don't understand why it is necessary. Both Pre-commit hook: High entropy strings not detected in new files by 3.0.0 #331 and No feature parity in detection between pre-commit and scan-local-repo for tartufo v3, contrary to v2 #350 provide test cases that explicitly do a git add before running the pre-commit check; at this point, the bad file should be in the index and it seems the existing cached=True should be sufficient to do what we want, i.e. "the index is compared to 'a' ["HEAD"]". Additionally tossing untracked files into the mix seems problematic logically in the sense that we do not want to fail a pre-commit check on the basis of files the caller has not added to the commit.

This is perhaps a bit expansive for a unit test, but can you validate the following scenario?
date | shasum > file-a.txt
date | shasum > file-b.txt
git add file-a.txt
tartufo pre-commit
We expect that a finding will be generated for file-a.txt, but not for file-b.txt. I worry that the change as proposed at this point will flag both files.

Hey Scott,

I will add that comment you requested. That's a good idea. As for the rest, I have a test case that I have shared elsewhere, which I will paste below. For clarification, this does not scan all files in the repo. Because of the recent addition of cached=True on the same exact line that I changed, this will only apply to untracked files that were added to the index, meaning that this will not result in tartufo scanning unstaged changes. I think the problem is that a file still counts as "untracked" even after it has been added to the index. It only becomes tracked once it has actually been committed.

Test Case

Clone tartufo
Run openssl genrsa > private_key.pem to place an RSA private key right in the tartufo repository. (If you do this in Powershell, the file will be UTF-16-LE encoded, so tartufo will NOT catch this (I am going to report this as a bug, too). You can re-encode it in VS Code.)
After ~line 571 (for chunk in self.chunks) in scanner.py, add print(chunk.file_path) so you can see what files are being scanned.
One line ~909 of scanner.py, remove the flags.
Run poetry run python .\tartufo_main_.py pre-commit. Observe that it DOES NOT scan private_key.pem. This is because there are no flags telling it to include untracked files and their contents.
One line ~909 of scanner.py, reintroduce the flags.
Run the command from step 5 again. Observe that it DOES NOT scan private_key.pem. This is because private_key.pem is not in the Git index.
Stage private_key.pem.
Run the command from step 5 again. Observe that it DOES scan private_key.pem because it is now staged.
"does that cause us to screen non-staged untracked files as well"
No, it does not, because of the cached=True flag that was introduced recently on that same line.

rbailey-godaddy · 2022-04-28T16:25:00Z

The lightbulb goes on...

I think the problem is that a file still counts as "untracked" even after it has been added to the index. It only becomes tracked once it has actually been committed.

This was totally non-obvious to me. Can you extend your comment to include this bit by way of explanation regarding why the chosen flags are present? Otherwise, I'm happy.

rbailey-godaddy

But I'm expecting to see that those URLs will trigger entropy findings that will need to be added to pyproject.toml lol...

rbailey-godaddy · 2022-04-28T18:29:42Z

@jwilbur-godaddy can you cherry-pick 78b27ee from the tartufo repo into your branch? This should fix the issues I noted earlier in my review.

* Reformat with black; fixes failed CI test * Add signature; silences alert on long URLs in comments

pmevzek-godaddy · 2022-05-04T18:28:13Z

This commit is admirably simple but hurts my brain to think about it.

I fully agree with this, and personally I wouldn't see it favorably this gets merged (even if it seems to solve a problem, I still think there is a code-smell here).

At the very least, there is a terminology problem. A new file being staged/cached IS NOT UNTRACKED.
This comes straight from git-status manual:

Short Format
In the short-format, the status of each path is shown as one of these forms

       XY PATH
       XY ORIG_PATH -> PATH

...

       X          Y     Meaning
       -------------------------------------------------
                [AMD]   not updated
       M        [ MTD]  updated in index
       T        [ MTD]  type changed in index
       A        [ MTD]  added to index
       D                deleted from index
       R        [ MTD]  renamed in index
       C        [ MTD]  copied in index
       [MTARC]          index and work tree matches
       [ MTARC]    M    work tree changed since index
       [ MTARC]    T    type changed in work tree since index
       [ MTARC]    D    deleted in work tree
                   R    renamed in work tree
                   C    copied in work tree
       -------------------------------------------------
       D           D    unmerged, both deleted
       A           U    unmerged, added by us
       U           D    unmerged, deleted by them
       U           A    unmerged, added by them
       D           U    unmerged, deleted by us
       A           A    unmerged, both added
       U           U    unmerged, both modified
       -------------------------------------------------
       ?           ?    untracked
       !           !    ignored
       -------------------------------------------------

Now let us try with a new file being committed:

$ git init test
$ cd test
$ git status --porcelain
$

# empty, as expected

$ > a_new_file
$ git status --porcelain
?? a_new_file

# per manual above, THIS is an untracked file for now

# let us stage it now

$ git add a_new_file
$ git status --porcelain
A  a_new_file

# per documentation above, the file is "ADDED" it is NOT untracked anymore.

Maybe the confusion is coming from git and/or libgit2 itself (maybe the flag has to be read as "untracked OR newly added" and not just "untracked"), but at the very least I don't think it is right to say that a new added file (to cache/stage) is untracked. It is not untracked anymore once staged.

jwilbur-godaddy · 2022-05-04T19:25:26Z

Okay, so do you just want me to change the comment? What are you asking for?

hong-godaddy · 2022-05-18T18:06:55Z

Can we get additional reviews on this? Would love to get this merged and have a release

jgowdy · 2022-05-18T18:27:29Z

Do we have a unit test we can run against the existing main branch that fails to detect something in the index that is untracked, and that with these new flags it passes, and do we have a unit test that demonstrates that with the flags that untracked files not in the index are NOT scanned? I see the steps to reproduce above, but we should codify those in tests to validate the change we are making.

jgowdy · 2022-05-18T18:30:11Z

This also needs a changelog entry per the PR template

jgowdy · 2022-05-23T14:57:19Z

Can we test the negative case also? i.e. That this change doesn't cause untracked files not staged to the index to be scanned?

tarkatronic · 2022-05-26T18:19:39Z

Can we test the negative case also? i.e. That this change doesn't cause untracked files not staged to the index to be scanned?

I've got to agree with Jeremiah on this one. I really like this solution, and its elegance, but I definitely think we want that other test just to be extra careful and cautious. Looking at the test you already wrote, it looks like it should even be a decently simple one to add!

jwilbur-godaddy · 2022-05-27T14:32:17Z

@jgowdy @tarkatronic I don't understand how this section of code does not deliver what you are asking for:

https://github.com/jwilbur-godaddy/tartufo/blob/bcc2716fe10757808fa48e097bb1ab911a898b8c/tests/test_scan_local_repo.py#L51-L55

tarkatronic · 2022-05-27T18:27:40Z

Ah ha. I had missed that part. Can you split that out to a separate unit test so that the two conditions are tested (and can potentially fail) on their own?

tarkatronic · 2022-05-27T18:29:17Z

One other thing, it looks like black and pylint are failing in the CI pipeline -- could you take a look and see if you can get those passing locally? Then we can get this merged and released!

tarkatronic · 2022-05-31T15:22:42Z

tests/test_scan_local_repo.py

+        repo.index.add("tests/data/config/" + file_name)
+        repo.index.write() # This actually writes the index to disk. Without it, the tracked file is not actually staged.
+        result = runner.invoke(cli.main, ["--entropy-sensitivity", "1", "pre-commit"])
+        self.assertNotEqual(result.exit_code, 0)


I finally realized what struck me as odd here. An exit code of 0 would indicate a successful scan with no issues found. So what I'm not sure about here is, how does this verify that the file was scanned? Since you're writing a sha256 digest to the file, that should get flagged for entropy, though I notice you modify the entropy sensitivity. Shouldn't we be letting it fail, and checking for an exit code greater than 0? Or even better maybe capture json output from the command and ensure that the filename in question is in the output?

Did you miss the assertNotEqual? I am checking that the return code is not zero.

Oops haha yes I did, sorry. ☕ has clearly not kicked in yet!

tarkatronic

Looks great to me! Thanks for all the work and putting up with us through the process @jwilbur-godaddy!

@jgowdy can you take another look and make sure your concerns are covered?

jgowdy

Black still doesn't seem happy, but my concerns about tests are satisfied. Can we fix the black linting and then we're g2g

jwilbur-godaddy · 2022-05-31T18:45:46Z

Done! Thank you, all!

jwilbur-godaddy · 2022-05-31T19:04:24Z

Just a heads up, I am starting paternity leave tomorrow, so I will not be able to follow up on this after today. I also do not have write access to this repo, so I will not be able to merge this in.

Fix pre-commit ignoring untracked files

94518bd

jwilbur-godaddy requested a review from a team as a code owner April 27, 2022 19:50

jwilbur-godaddy requested review from irodelta, jmink-godaddy and rscottbailey April 27, 2022 19:50

Add explanatory comment

28aec70

Expand on explanatory comment

762026a

rbailey-godaddy approved these changes Apr 28, 2022

View reviewed changes

PR cleanups

8d372be

* Reformat with black; fixes failed CI test * Add signature; silences alert on long URLs in comments

mdayanc-GoDaddy approved these changes May 6, 2022

View reviewed changes

jwilbur-godaddy added 4 commits May 20, 2022 14:05

Add CHANGELOG entry

db35c19

Test Python 3.10

8501321

Add tests for godaddy#352

ed8ce81

Merge branch 'main' into main

bcc2716

sushantmimani requested a review from tarkatronic May 25, 2022 22:32

Split tests and address linting issues

b57817c

Merge branch 'main' of https://github.com/jwilbur-godaddy/tartufo

3a7fa7d

tarkatronic reviewed May 31, 2022

View reviewed changes

tarkatronic approved these changes May 31, 2022

View reviewed changes

jgowdy approved these changes May 31, 2022

View reviewed changes

Fix whitespace

59151d7

rbailey-godaddy approved these changes May 31, 2022

View reviewed changes

sushantmimani merged commit 20e7b51 into godaddy:main May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pre-commit ignoring untracked files #352

Fix pre-commit ignoring untracked files #352

jwilbur-godaddy commented Apr 27, 2022

rbailey-godaddy commented Apr 28, 2022

jwilbur-godaddy commented Apr 28, 2022

rbailey-godaddy commented Apr 28, 2022

rbailey-godaddy left a comment

rbailey-godaddy commented Apr 28, 2022

pmevzek-godaddy commented May 4, 2022

jwilbur-godaddy commented May 4, 2022

hong-godaddy commented May 18, 2022

jgowdy commented May 18, 2022 •

edited

Loading

jgowdy commented May 18, 2022

jgowdy commented May 23, 2022

tarkatronic commented May 26, 2022

jwilbur-godaddy commented May 27, 2022

tarkatronic commented May 27, 2022

tarkatronic commented May 27, 2022

tarkatronic May 31, 2022

jwilbur-godaddy May 31, 2022 •

edited

Loading

tarkatronic May 31, 2022

tarkatronic left a comment

jgowdy left a comment

jwilbur-godaddy commented May 31, 2022

jwilbur-godaddy commented May 31, 2022

Fix pre-commit ignoring untracked files #352

Fix pre-commit ignoring untracked files #352

Conversation

jwilbur-godaddy commented Apr 27, 2022

PR Type

Backward Compatibility

Issue Linking

What's new?

rbailey-godaddy commented Apr 28, 2022

jwilbur-godaddy commented Apr 28, 2022

Test Case

rbailey-godaddy commented Apr 28, 2022

rbailey-godaddy left a comment

Choose a reason for hiding this comment

rbailey-godaddy commented Apr 28, 2022

pmevzek-godaddy commented May 4, 2022

jwilbur-godaddy commented May 4, 2022

hong-godaddy commented May 18, 2022

jgowdy commented May 18, 2022 • edited Loading

jgowdy commented May 18, 2022

jgowdy commented May 23, 2022

tarkatronic commented May 26, 2022

jwilbur-godaddy commented May 27, 2022

tarkatronic commented May 27, 2022

tarkatronic commented May 27, 2022

tarkatronic May 31, 2022

Choose a reason for hiding this comment

jwilbur-godaddy May 31, 2022 • edited Loading

Choose a reason for hiding this comment

tarkatronic May 31, 2022

Choose a reason for hiding this comment

tarkatronic left a comment

Choose a reason for hiding this comment

jgowdy left a comment

Choose a reason for hiding this comment

jwilbur-godaddy commented May 31, 2022

jwilbur-godaddy commented May 31, 2022

jgowdy commented May 18, 2022 •

edited

Loading

jwilbur-godaddy May 31, 2022 •

edited

Loading