Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

Closed
1 of 3 tasks
emjin opened this issue May 19, 2023 · 0 comments · Fixed by #7845
Closed
1 of 3 tasks

filter-irrelevant-rule causes memory/perf blowup on some rules #7839

emjin opened this issue May 19, 2023 · 0 comments · Fixed by #7845
Assignees

Comments

@emjin
Copy link
Collaborator

emjin commented May 19, 2023

Describe the bug
https://semgrep.dev/playground/s/E2B5 takes forever (> 13 minutes).

So does https://semgrep.dev/playground/s/LbgX (rule = kotlin_faster_but_longer.yaml, target = kotlin_slow_import.kt). However, though this one is longer, it takes less time (2 minutes)

The two rules are pretty similar; the shorter rule has deleted some patterns compared to the longer rule.

To Reproduce

Create files from the playground example.

➜  misc semgrep --config kotlin_faster_but_longer.yaml kotlin_slow_import.kt -d
➜  misc sc -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -json_time -fast -debug
[0.043  Info       Main.Core_CLI        ] Executed as: /Users/emma/workspace/semgrep/_build/install/default/bin/semgrep-core -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -json_time -fast -debug
[0.043  Info       Main.Core_CLI        ] Version: semgrep-core version: 1.22.0
[0.043  Info       Main.Run_semgrep     ] Parsing /Users/emma/.semgrep/semgrep_rules.json:
[0.049  Info       Main.Run_semgrep     ] extracting nested content from 1 files
[0.049  Info       Main.Run_semgrep     ] processing 1 files, skipping 0 files
[0.000  Info       Main.Run_semgrep     ] [51223] Analyzing kotlin_slow_import.kt
[0.897  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 673 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[1.496  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 1024 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[2.484  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 1791 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[8.372  Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 4765 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[15.810 Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 8335 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[28.459 Warning    Main.Memory_limit    ] [51223] [kotlin_slow_import.kt] large heap size: 14578 MiB (memory limit is 0 MiB). If a crash follows, you could suspect OOM.
[60.922 Error      Main.Analyze_rule    ] [51223] CNF size exploded on rule id ktor_request_xss

Takes about 2 minutes.

Then, try running without filter-irrelevant-rules.

➜  misc time sc -json -rules /Users/emma/.semgrep/semgrep_rules.json -j 12 -targets /Users/emma/.semgrep/semgrep_targets.txt -timeout 30 -timeout_threshold 3 -max_memory 0 -no_filter_irrelevant_rules 
.
{"matches":[{"rule_id":"ktor_request_xss","location":{"path":"kotlin_slow_import.kt","start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996}},"extra":{"message":"","metavars":{"$F":{"start":{"line":20,"col":36,"offset":592},"end":{"line":20,"col":42,"offset":598},"abstract_content":"accept"},"$RESPFUNC":{"start":{"line":55,"col":18,"offset":2947},"end":{"line":55,"col":30,"offset":2959},"abstract_content":"respondBytes"},"$INPUT":{"start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996},"abstract_content":"\"\"hi! ${${resp}\""}},"dataflow_trace":{"taint_source":["CoreLoc",{"path":"kotlin_slow_import.kt","start":{"line":20,"col":32,"offset":588},"end":{"line":20,"col":44,"offset":600}}],"intermediate_vars":[{"location":{"path":"kotlin_slow_import.kt","start":{"line":19,"col":17,"offset":516},"end":{"line":19,"col":21,"offset":520}}}],"taint_sink":["CoreLoc",{"path":"kotlin_slow_import.kt","start":{"line":56,"col":23,"offset":2983},"end":{"line":56,"col":36,"offset":2996}}]},"engine_kind":"OSS"}}],"errors":[],"skipped_rules":[],"explanations":[],"stats":{"okfiles":1,"errorfiles":0},"rules_by_engine":[["ktor_request_xss","OSS"]],"engine_requested":"OSS"}
/Users/emma/workspace/semgrep/_build/install/default/bin/semgrep-core -json    0.05s user 0.03s system 29% cpu 0.260 total

Expected behavior
Expect the optimized version to not be significantly slower.

What is the priority of the bug to you?

  • P0: blocking your adoption of Semgrep or workflow
  • P1: important to fix or quite annoying
  • P2: regular bug that should get fixed

Environment
If not using semgrep.dev: are you running off docker, an official binary, a local build?

Use case
What will fixing this bug enable for you?

@emjin emjin self-assigned this May 19, 2023
@emjin emjin changed the title filter-irrelevant-rule causes memory blowup on some rulesed filter-irrelevant-rule causes memory/perf blowup on some rules May 19, 2023
emjin pushed a commit that referenced this issue May 19, 2023
Closes #7839

In Analyze_rule, we distribute out formulas like `(x0 ^ x1) v (y0 ^ y1)` into cnf.
Let's call `x0 ^ x1` p and `y0 ^ y1` q. Currently, we check that `p` and `q` are
individually under 1,000,000 elements to prevent memory blowup. However, `p` and `q`
produce a formula that is `p * q` elements long. So, even under these constraints,
if `p` and `q` are both large, this step can still take too long.

Instead, what we need to do is guard the result. This PR currently requires that
that result be under 10,000,000 elements. It leaves the `p` and `q` restrictions
intact just in case. This allows us to take advantage of `List.compare_length`
in case we get an incredibly long `p` or `q`.

To consider: do we still need the gates for `p` and `q`? Is it possible to get
a list for `p` or `q` so large that traversing it is a problem?

Test plan: will add tests
aryx pushed a commit that referenced this issue May 19, 2023
Closes #7839

In Analyze_rule, we distribute out formulas like `(x0 ^ x1) v (y0 ^ y1)`
into cnf. Let's call `x0 ^ x1` p and `y0 ^ y1` q. Currently, we check
that `p` and `q` are individually under 1,000,000 elements to prevent
memory blowup. However, `p` and `q` produce a formula that is `p * q`
elements long. So, even under these constraints, if `p` and `q` are both
large, this step can still take too long.

Instead, what we need to do is guard the result. This PR currently
requires that that result be under 10,000,000 elements. It leaves the
`p` and `q` restrictions intact just in case. This allows us to take
advantage of `List.compare_length` in case we get an incredibly long `p`
or `q`.

To consider: do we still need the gates for `p` and `q`? Is it possible
to get a list for `p` or `q` so large that traversing it is a problem?

Test plan: will add tests

PR checklist:

- [x] Purpose of the code is [evident to future
readers](https://semgrep.dev/docs/contributing/contributing-code/#explaining-code)
- [x] Tests included or PR comment includes a reproducible test plan
- [x] Documentation is up-to-date
- [x] A changelog entry was [added to
changelog.d](https://semgrep.dev/docs/contributing/contributing-code/#adding-a-changelog-entry)
for any user-facing change
- [x] Change has no security implications (otherwise, ping security
team)

If you're unsure about any of this, please see:

- [Contribution
guidelines](https://semgrep.dev/docs/contributing/contributing-code)!
- [One of the more specific guides located
here](https://semgrep.dev/docs/contributing/contributing/)

---------

Co-authored-by: Emma Jin <--get>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant