filter-irrelevant-rule causes memory/perf blowup on some rules #7839

emjin · 2023-05-19T01:26:21Z

Describe the bug
https://semgrep.dev/playground/s/E2B5 takes forever (> 13 minutes).

So does https://semgrep.dev/playground/s/LbgX (rule = kotlin_faster_but_longer.yaml, target = kotlin_slow_import.kt). However, though this one is longer, it takes less time (2 minutes)

The two rules are pretty similar; the shorter rule has deleted some patterns compared to the longer rule.

To Reproduce

Create files from the playground example.

➜  misc semgrep --config kotlin_faster_but_longer.yaml kotlin_slow_import.kt -d
➜  misc sc -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -json_time -fast -debug
[0.043  Info       Main.Core_CLI        ] Executed as: /Users/emma/workspace/semgrep/_build/install/default/bin/semgrep-core -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -json_time -fast -debug
[0.043  Info       Main.Core_CLI        ] Version: semgrep-core version: 1.22.0
[0.043  Info       Main.Run_semgrep     ] Parsing /Users/emma/.semgrep/semgrep_rules.json:
[0.049  Info       Main.Run_semgrep     ] extracting nested content from 1 files
[0.049  Info       Main.Run_semgrep     ] processing 1 files, skipping 0 files
[0.000  Info       Main.Run_semgrep     ] [51223] Analyzing kotlin_slow_import.kt
[0.897  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 673 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[1.496  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 1024 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[2.484  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 1791 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[8.372  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 4765 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[15.810 Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 8335 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[28.459 Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 14578 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[60.922 Error      Main.Analyze_rule    ] [51223] CNF size exploded on rule id ktor_request_xss

Takes about 2 minutes.

Then, try running without filter-irrelevant-rules.

➜  misc time sc -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -no_filter_irrelevant_rules 
.
{"matches":[{"rule_id":"ktor_request_xss","location":{"path":"kotlin_slow_import.kt","start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996}},"extra":{"message":"","metavars":{"$F":{"start":{"line":20,"col":36,"offset":592},"end":{"line":20,"col":42,"offset":598},"abstract_content":"accept"},"$RESPFUNC":{"start":{"line":55,"col":18,"offset":2947},"end":{"line":55,"col":30,"offset":2959},"abstract_content":"respondBytes"},"$INPUT":{"start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996},"abstract_content":"\"\"hi! ${${resp}\""}},"dataflow_trace":{"taint_source":["CoreLoc",{"path":"kotlin_slow_import.kt","start":{"line":20,"col":32,"offset":588},"end":{"line":20,"col":44,"offset":600}}],"intermediate_vars":[{"location":{"path":"kotlin_slow_import.kt","start":{"line":19,"col":17,"offset":516},"end":{"line":19,"col":21,"offset":520}}}],"taint_sink":["CoreLoc",{"path":"kotlin_slow_import.kt","start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996}}]},"engine_kind":"OSS"}}],"errors":[],"skipped_rules":[],"explanations":[],"stats":{"okfiles":1,"errorfiles":0},"rules_by_engine":[["ktor_request_xss","OSS"]],"engine_requested":"OSS"}
/Users/emma/workspace/semgrep/_build/install/default/bin/semgrep-core -json    0.05s user 0.03s system 29% cpu 0.260 total

Expected behavior
Expect the optimized version to not be significantly slower.

What is the priority of the bug to you?

P0: blocking your adoption of Semgrep or workflow
P1: important to fix or quite annoying
P2: regular bug that should get fixed

Environment
If not using semgrep.dev: are you running off docker, an official binary, a local build?

Use case
What will fixing this bug enable for you?

The text was updated successfully, but these errors were encountered:

Closes #7839 In Analyze_rule, we distribute out formulas like `(x0 ^ x1) v (y0 ^ y1)` into cnf. Let's call `x0 ^ x1` p and `y0 ^ y1` q. Currently, we check that `p` and `q` are individually under 1,000,000 elements to prevent memory blowup. However, `p` and `q` produce a formula that is `p * q` elements long. So, even under these constraints, if `p` and `q` are both large, this step can still take too long. Instead, what we need to do is guard the result. This PR currently requires that that result be under 10,000,000 elements. It leaves the `p` and `q` restrictions intact just in case. This allows us to take advantage of `List.compare_length` in case we get an incredibly long `p` or `q`. To consider: do we still need the gates for `p` and `q`? Is it possible to get a list for `p` or `q` so large that traversing it is a problem? Test plan: will add tests

Closes #7839 In Analyze_rule, we distribute out formulas like `(x0 ^ x1) v (y0 ^ y1)` into cnf. Let's call `x0 ^ x1` p and `y0 ^ y1` q. Currently, we check that `p` and `q` are individually under 1,000,000 elements to prevent memory blowup. However, `p` and `q` produce a formula that is `p * q` elements long. So, even under these constraints, if `p` and `q` are both large, this step can still take too long. Instead, what we need to do is guard the result. This PR currently requires that that result be under 10,000,000 elements. It leaves the `p` and `q` restrictions intact just in case. This allows us to take advantage of `List.compare_length` in case we get an incredibly long `p` or `q`. To consider: do we still need the gates for `p` and `q`? Is it possible to get a list for `p` or `q` so large that traversing it is a problem? Test plan: will add tests PR checklist: - [x] Purpose of the code is [evident to future readers](https://semgrep.dev/docs/contributing/contributing-code/#explaining-code) - [x] Tests included or PR comment includes a reproducible test plan - [x] Documentation is up-to-date - [x] A changelog entry was [added to changelog.d](https://semgrep.dev/docs/contributing/contributing-code/#adding-a-changelog-entry) for any user-facing change - [x] Change has no security implications (otherwise, ping security team) If you're unsure about any of this, please see: - [Contribution guidelines](https://semgrep.dev/docs/contributing/contributing-code)! - [One of the more specific guides located here](https://semgrep.dev/docs/contributing/contributing/) --------- Co-authored-by: Emma Jin <--get>

emjin self-assigned this May 19, 2023

emjin changed the title ~~filter-irrelevant-rule causes memory blowup on some rulesed~~ filter-irrelevant-rule causes memory/perf blowup on some rules May 19, 2023

emjin mentioned this issue May 19, 2023

fix(perf): set better limits on analyze_rule #7845

Merged

5 tasks

aryx closed this as completed in #7845 May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

emjin commented May 19, 2023 •

edited

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

Comments

emjin commented May 19, 2023 • edited

emjin commented May 19, 2023 •

edited