False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

andrew-konstantinov · 2024-04-26T13:19:23Z

Describe the bug
I have a code base where a certain sanitizer is used both as sink(id) as well as sanitize(id.toString()) (this is Kotlin). I've defined two sanitizer rules, each of them correctly matching one of these patterns, however when used together it looks like a more generic one takes priority (sink(id)) and the other one is completely ignored:

    # These two sanitizer rules together should produce zero findings
    # (identifying sanitizers in all four test cases), but they produce
    # (false positive) findings at test1 and test2
    pattern-sanitizers:
      # Alone, produces findings at test1 and test2
      - patterns:
        - pattern: sanitize($X)
        - focus-metavariable: $X
        by-side-effect: true

      # Alone, produces findings at test3 and test4
      - patterns:
        - pattern: sanitize($X.toString())
        - focus-metavariable: $X
        by-side-effect: true

To Reproduce
A concise example that demonstrates the problem (Kotlin): https://semgrep.dev/playground/s/5rzll

Expected behavior
All sanitizer rules should be matched separately. It should be never possible to create more matches by adding a new sanitizer rule to existing ones, but that's exactly what we observe right now: commenting out the second rule will avoid matching test1 and test2, but enabling it produce these false positive matches.

Screenshots
With only one sanitizer pattern enabled (sanitize($X.toString())), test1 and test2 examples are correctly identified as sanitized and not matched:

By adding an additional sanitizer pattern, parsing of the previous rule appears to be affected, as test and test2 not are considered matches:

What is the priority of the bug to you?

P0: blocking your adoption of Semgrep or workflow
P1: important to fix or quite annoying
P2: regular bug that should get fixed

Environment
The issue surfaced when using Semgrep Pro for integration within the CI/CD, but it is reproducible on semgrep.dev as well.

Use case
Significantly reduce false positives, increasing confidence in the Semgrep results and enabling it's adoption in our CI/CD.

The text was updated successfully, but these errors were encountered:

andrew-konstantinov · 2024-04-26T14:43:00Z

Another example of the same bug I believe: https://semgrep.dev/playground/s/BYDb9

There are three sanitization rules, each of them matching the relevant test example when the other two are commented out, but as soon as any two are enabled, they all get effectively cancelled and produce three false positives:

IagoAbal · 2024-05-09T11:02:38Z

@andrew-konstantinov for the first issue here you have a workaround: https://semgrep.dev/playground/s/7KNqw; the second rule I would have written it like this: https://semgrep.dev/playground/s/8GNlB. Since sources/sanitizers/sinks can be arbitrary search-mode formulas, there is often a way to work around things. But yeah we need to address those limitations to make it simpler to write taint rules, both were known to us, but still very useful to get this bug report, so thanks for that and for the examples.

andrew-konstantinov · 2024-05-10T18:46:24Z

Thank you, @IagoAbal! I will try to apply this approach to my rules.

Are there any resources you could share about debugging issues such as this one, besides https://semgrep.dev/docs/troubleshooting/rules? I like the "Inspect Rule" gadget in the Playground UI, but it's sometimes it's not enough. I'm comfortable reading/hacking the source code, so pointers to useful methods would suffice as well.

IagoAbal · 2024-05-13T11:31:44Z

Hope it helps! For the issues you reported here, I think "Inspect Rule" is probably one of the best tools you have. In the second example, you can observe that the individual pattern-insides are matching code, but the patterns matches nothing (because it computes the intersection). If you use the semgrep-core binary you also have the -dfg_tainting flag, although I don't think it would have helped you for any of these issues. You are also welcome to tag me in our Community Slack ("iago (Semgrep)") any time.

andrew-konstantinov mentioned this issue Apr 26, 2024

Ability to Specify Code Assumptions #10169

Open

ievans added bug Something isn't working feature:taint labels Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

andrew-konstantinov commented Apr 26, 2024 •

edited

andrew-konstantinov commented Apr 26, 2024

IagoAbal commented May 9, 2024

andrew-konstantinov commented May 10, 2024

IagoAbal commented May 13, 2024

False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

False Positives in Taint Mode when multiple related sanitizer rules defined with focus-metavariable #10167

Comments

andrew-konstantinov commented Apr 26, 2024 • edited

andrew-konstantinov commented Apr 26, 2024

IagoAbal commented May 9, 2024

andrew-konstantinov commented May 10, 2024

IagoAbal commented May 13, 2024

andrew-konstantinov commented Apr 26, 2024 •

edited