Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taint tracking produces duplicate findings #3742

Closed
erwanlr opened this issue Aug 21, 2021 · 4 comments · Fixed by #3781
Closed

Taint tracking produces duplicate findings #3742

erwanlr opened this issue Aug 21, 2021 · 4 comments · Fixed by #3781
Assignees
Labels

Comments

@erwanlr
Copy link

erwanlr commented Aug 21, 2021

Describe the bug
Taint mode can produces duplicate findings

To Reproduce
https://semgrep.dev/s/WOzo

Expected behavior
There should not be duplicate findings

Screenshots
Screenshot 2021-08-21 at 21 06 15

What is the priority of the bug to you?

P1: important to fix or quite annoying

Environment
Semgrep.dev & CLI 0.62.0 on MacOSX (Installed via Homebrew)

@ievans ievans added bug Something isn't working feature:taint labels Aug 21, 2021
@IagoAbal IagoAbal self-assigned this Aug 27, 2021
@IagoAbal
Copy link
Collaborator

The way taint tracking works it is non-trivial to guarantee that there are no duplicates. Right now, internally, we use search-mode queries to compute the ranges of sources, sanitizers, and sinks. Then if e.g. an expression is inside a pattern-sources range, then we consider it a source of taint; and, if it is inside a pattern-sinks range, we consider it a sink. We have some mechanisms to remove duplicates, but they don't work when a single Generic "instruction" translates into multiple Dataflow IL instructions.

@IagoAbal IagoAbal changed the title PHP - Taint - Duplicate Findings Taint tracking produces duplicate findings Aug 30, 2021
@IagoAbal
Copy link
Collaborator

Another example reported by @emjin: https://semgrep.dev/s/d0dZ/

@IagoAbal
Copy link
Collaborator

I will let the engine produce the duplicates and filter these duplicates at the end. Unless there are too many taint matches, it should not cause perf problems. There may be a better solution but I need to think more on it and discuss it with some colleagues.

@erwanlr
Copy link
Author

erwanlr commented Aug 30, 2021

Fine by me ;) I think the max duplicates per issue I had so far is 3

Btw, I noticed another issue (ticket linked) when trying to reduce duplicate output by adding functions such as isset(...) as sanitizer (which works in term of eliminating some duplicates, but then some code won't match :o)

IagoAbal added a commit that referenced this issue Aug 31, 2021
Closes #3742

test plan:
make test # test included
brendongo pushed a commit that referenced this issue Sep 1, 2021
Closes #3742

test plan:
make test # test included
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

4 participants