Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclusion Reasons #202

Closed
dclayton-godaddy opened this issue Jul 6, 2021 · 8 comments · Fixed by #286
Closed

Exclusion Reasons #202

dclayton-godaddy opened this issue Jul 6, 2021 · 8 comments · Fixed by #286
Labels
enhancement New feature or request
Milestone

Comments

@dclayton-godaddy
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe.

It took me a while to generate an exclusion report that explains what is excluded and why. One thing I’d love to see in tartufo is the ability to specify the reason along-side the exclusion and have a report produced that we can easily provide to a security team. When someone adds an exclusion, it might not be front-and-center we need to track the reason until it’s time to create an exclusion report. We need to reverse engineer the exclusions to determine the reason.

Describe the solution you'd like

A reason field along-side any exclusion.

Describe alternatives you've considered

tartufo --json ... along with manually generating the report.

Teachability, Documentation, Adoption, Migration Strategy

Files

tartufo.toml::contains tartufo signatures flagged by entropy

Config

exclude-signatures=[
   "dfg99df8g0d98fg0s9d8fg09d8sfgd90sfg8s0df9::characters in google doc url"
]
exclude-entropy=[
  "file1.md::^[a-zA-Z0-9]{40}$::git sha in a markdown file"
}
@dclayton-godaddy dclayton-godaddy added the enhancement New feature or request label Jul 6, 2021
@jakeswenson
Copy link

@dclayton-godaddy would adding a TOML comment suffice?

exclude-signatures=[
   "dfg99df8g0d98fg0s9d8fg09d8sfgd90sfg8s0df9" # characters in google doc url
]
exclude-entropy=[
  "file1.md::^[a-zA-Z0-9]{40}$" # git sha in a markdown file
]

@tarkatronic
Copy link
Contributor

@jakeswenson I certainly think that your suggestion is the "good enough" path for now, and that's what I generally do myself.

I do also think that @dclayton-godaddy may have hit on a good idea, and this is something I've been ruminating about for some time myself. I've been considering, perhaps with v3.0, switching to a new pattern for exclusion signatures. Something like:

exclude-signatures = [
    "tartufo:blake2:dfg99df8g0d98fg0s9d8fg09d8sfgd90sfg8s0df9:characters in google doc url"
]

The pattern here is the static string tartufo; the hash algorithm used to generate the signature, in this case blake2; the hash itself; a comment describing the signature.

This accomplishes a number of things, of varying levels of importance. Some perhaps even mostly useless, but just sound shiny and neat to me. In no particular order:

  • By having that strict structure, we are better able to exclude signatures during scans, fixing Removing signatures from excluded-signatures should not get flagged #201
  • We could potentially use a faster hashing algorithm in the future. I know I've been curious to try moving from blake2 to blake3.
  • We get built-in comments
  • Unlike TOML comments, we would have ready access to the comments in the code, for whatever reporting we might want to do
  • This format is machine readable, machine writable, and importantly human readable.

Of course, I'm sure there are many reasons not to go this route that I haven't thought of yet. And I would love to hear them!

@jakeswenson
Copy link

Agree with all of the goals you've mentioned, however why come up with a custom string format? What about instead just allowing strings or nested TOML objects, like so:

exclude-signatures = [
    { match = "tartufo", algo = "blake2", signature = "dfg99df8g0d98fg0s9d8fg09d8sfgd90sfg8s0df9", reason = "characters in google doc url" }
]

OR (the beauty of TOML)

[[exclude-signatures]]
match = "tartufo"
algo = "blake2"
signature = "dfg99df8g0d98fg0s9d8fg09d8sfgd90sfg8s0df9"
reason = "characters in google doc url"

This feels like a better approach to achieve the great goals you've mentioned @tarkatronic

What do you think?

@tarkatronic
Copy link
Contributor

Oh yeah I totally forgot about that. I like that much better. 😄

I'm just wondering... is this becoming an over-engineered solution? Fixing a problem that doesn't exist?

@tarkatronic
Copy link
Contributor

Hmmm also I think the TOML table syntax makes the first point much harder. The struggle I've had with trying to figure out a fix for #201 is that, in order to effectively solve the problem, we would have to reconstruct the entire history of the config file from git history, and collect all exclusion signatures that were ever configured. If we go with the TOML table syntax, do we run the risk of a similar issue? We can always exclude matches that look like signature = "<hash>"... but that seems pretty dangerous to me, with a potential for false negatives. Whereas we can much more easily exclude "tartufo:<algo>:<hash>:<text>" with an extremely high level of confidence.

@jakeswenson
Copy link

I'm just wondering... is this becoming an over-engineered solution? Fixing a problem that doesn't exist?

Some form of structured signature will be required to version the signature algorithm (efficiently.)
I also like the idea of having a parsable reason as to why a signature was added; maybe someone wants to write a github check for tartufo and it could be nice to post an informative message with the reason/context for ignored matches...

So there is definitely a forward-looking benefit to thinking about this.

If those are the goals, it would be better to have structure and not require someone to write some fragile string split approach.

in order to effectively solve the problem, we would have to reconstruct the entire history of the config file from git history, and collect all exclusion signatures that were ever configured.

Not sure I understand, and maybe this discussion is better had in #201, but are you suggesting it would be an issue to parse every config version? (and thus you're looking for creative ways to avoid it?)

If we go with the TOML table syntax, do we run the risk of a similar issue? We can always exclude matches that look like signature = ""... but that seems pretty dangerous to me, with a potential for false negatives. Whereas we can much more easily exclude "tartufo:::" with an extremely high level of confidence.

I understand the thinking that a specific string structure lowers the probability of falsely matching a secret.
But I disagree that it's solving the problem the right way for #201, but maybe I don't have enough context.

Happy to chat more tomorrow about #201

@dclayton-godaddy
Copy link
Contributor Author

Ideally we run a scan and the results of the scan should output all the information to audit the scan. If the exclusions can read from multiple formats, that would allow the exclusions be in a structured format as well as a string delimited format.

@tarkatronic
Copy link
Contributor

Fixed in #286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants