Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

Open
1 of 3 tasks
andrew-konstantinov opened this issue Apr 26, 2024 · 4 comments
Labels
bug Something isn't working feature:taint

Comments

@andrew-konstantinov
Copy link

andrew-konstantinov commented Apr 26, 2024

Describe the bug
I have a code base where a certain sanitizer is used both as sink(id) as well as sanitize(id.toString()) (this is Kotlin). I've defined two sanitizer rules, each of them correctly matching one of these patterns, however when used together it looks like a more generic one takes priority (sink(id)) and the other one is completely ignored:

    # These two sanitizer rules together should produce zero findings
    # (identifying sanitizers in all four test cases), but they produce
    # (false positive) findings at test1 and test2
    pattern-sanitizers:
      # Alone, produces findings at test1 and test2
      - patterns:
        - pattern: sanitize($X)
        - focus-metavariable: $X
        by-side-effect: true

      # Alone, produces findings at test3 and test4
      - patterns:
        - pattern: sanitize($X.toString())
        - focus-metavariable: $X
        by-side-effect: true

To Reproduce
A concise example that demonstrates the problem (Kotlin): https://semgrep.dev/playground/s/5rzll

Expected behavior
All sanitizer rules should be matched separately. It should be never possible to create more matches by adding a new sanitizer rule to existing ones, but that's exactly what we observe right now: commenting out the second rule will avoid matching test1 and test2, but enabling it produce these false positive matches.

Screenshots
With only one sanitizer pattern enabled (sanitize($X.toString())), test1 and test2 examples are correctly identified as sanitized and not matched:
image

By adding an additional sanitizer pattern, parsing of the previous rule appears to be affected, as test and test2 not are considered matches:
image

What is the priority of the bug to you?

  • P0: blocking your adoption of Semgrep or workflow
  • P1: important to fix or quite annoying
  • P2: regular bug that should get fixed

Environment
The issue surfaced when using Semgrep Pro for integration within the CI/CD, but it is reproducible on semgrep.dev as well.

Use case
Significantly reduce false positives, increasing confidence in the Semgrep results and enabling it's adoption in our CI/CD.

@andrew-konstantinov
Copy link
Author

Another example of the same bug I believe: https://semgrep.dev/playground/s/BYDb9

There are three sanitization rules, each of them matching the relevant test example when the other two are commented out, but as soon as any two are enabled, they all get effectively cancelled and produce three false positives:

image

@IagoAbal
Copy link
Collaborator

IagoAbal commented May 9, 2024

@andrew-konstantinov for the first issue here you have a workaround: https://semgrep.dev/playground/s/7KNqw; the second rule I would have written it like this: https://semgrep.dev/playground/s/8GNlB. Since sources/sanitizers/sinks can be arbitrary search-mode formulas, there is often a way to work around things. But yeah we need to address those limitations to make it simpler to write taint rules, both were known to us, but still very useful to get this bug report, so thanks for that and for the examples.

@andrew-konstantinov
Copy link
Author

Thank you, @IagoAbal! I will try to apply this approach to my rules.

Are there any resources you could share about debugging issues such as this one, besides https://semgrep.dev/docs/troubleshooting/rules? I like the "Inspect Rule" gadget in the Playground UI, but it's sometimes it's not enough. I'm comfortable reading/hacking the source code, so pointers to useful methods would suffice as well.

@IagoAbal
Copy link
Collaborator

Hope it helps! For the issues you reported here, I think "Inspect Rule" is probably one of the best tools you have. In the second example, you can observe that the individual pattern-insides are matching code, but the patterns matches nothing (because it computes the intersection). If you use the semgrep-core binary you also have the -dfg_tainting flag, although I don't think it would have helped you for any of these issues. You are also welcome to tag me in our Community Slack ("iago (Semgrep)") any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature:taint
Development

No branches or pull requests

3 participants