Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.

Add overrides via .semgrepconfig.yml #319

Closed
wants to merge 6 commits into from
Closed

Conversation

underyx
Copy link
Member

@underyx underyx commented Jul 1, 2021

You can add a .semgrepconfig.yml that looks like this:

overrides:
  - if.path: "tests/*" # the first two lines in this example are "conditions"
    if.rule_id: "secrets.*aws*"
    mute: true # this last line in this example is an "action"

  - if.policy_slug: "*important*"
    if.severity_in: ["ERROR"]
    unmute: true # report issues even if they were muted with # nosemgrep
    set_severity: WARNING # but lower their severity a bit

Every finding will be checked against these override definitions one by one.
The override definitions will run in the same order you have them in your config file.

An override's actions are applied when all if. conditions are true. If an override applies, it'll take actions as described by keys not starting with if.*.

Available conditions

key example value True if the finding's…
if.path "tests/*" path matches the given glob
if.rule_id "secrets.*aws*" rule ID matches the given glob
if.ruleset_id "secrets" rule is from a ruleset matching the given glob
if.finding_id "1fd8aac00" finding's syntactic_id starts with this value
if.policy_slug "security*" rule is from a policy matching the given glob
if.severity_in ["INFO", "WARNING"] rule's severity is in the given list

Available actions

key example value description
mute true acts as if the line had a # nosemgrep comment
unmute true ignores any # nosemgrep comments
set_severity "INFO" changes the reported severity of the issue

@aryx
Copy link
Contributor

aryx commented Jul 1, 2021

Why not be more general and just let people filter the JSON with whatever they want (jq, python comprehension, whatever)?

@underyx
Copy link
Member Author

underyx commented Jul 1, 2021

Why not be more general and just let people filter the JSON with whatever they want (jq, python comprehension, whatever)?

First, as context, here's my earlier response to this from Slack:

That’s because semgrep-agent publishes the findings to semgrep.dev as well. So if you’re thinking of a pipeline such as scan | postprocess | publish it’s not trivial to place a processing step in the middle cause semgrep-agent already does scan | publish, and by design it’s supposed to handle the piping of data as well (from scan to publish) so that users can drop the Semgrep CI config in their repo and don’t have to think about it.

Now, a more specific list of reasons against that solution:

  1. users would need to learn semgrep's JSON schema
  2. we'd need to commit much more to a stable JSON format (including internal-only keys within metadata:)
  3. it'd make it easier for users to introduce bugs (including security bugs) in Semgrep CI's behavior
  4. maybe there is a clean way to do this but I haven't yet thought of a way to inject arbitrary code in the middle of semgrep-agent's execution — I'd want to avoid eval() if possible
  5. this is even more difficult if we want to allow people to use any language or even binary (such as jq), cause then we need to pipe to an arbitrary subprocess
  6. the semgrep-agent docker image doesn't even include jq nor any language environment other than Python right now, we try to keep this image lean
  7. but even if we include some extra languages, what if a user doesn't know how to write code in any of them?
  8. and even if users can already write python, it sounds annoying for everyone to individually have to search Python docs for how to do glob matching and similar operations
  9. with this PR's declarative overrides, we can apply them at any point in semgrep-agent's execution, while piping results to jq is post-processing only. this PR also only does post-processing for the sake of simplicity, but we could later keep using the same config format and make some overrides skip scanning entire files for a performance boost

@DrewDennison
Copy link
Member

DrewDennison commented Jul 1, 2021

Thoughts on naming this semgrep.yml? I feel like we have an opportunity grow this into semgrep’s version of package.json or Dockerfile so we should choose wisely on the name. I would not do .hidden since want people to see that semgrep is installed when running ls

@underyx
Copy link
Member Author

underyx commented Jul 1, 2021

Thoughts on naming this semgrep.yml?

I'd be okay with it. I just wanted the solution to work from the get-go and iterate in the PR. semgrep.yml already has a special meaning as it contains local rules if you want your rules committed, so we'll need to redo some of the special handling of that. But it's not too bad since rules: and overrides: can easily coexist.

Should I update the PR to name it semgrep.yml?

@ievans
Copy link
Member

ievans commented Jul 1, 2021

@underyx is/will this be documented in semgrep-docs?

@underyx
Copy link
Member Author

underyx commented Jul 1, 2021

@underyx is/will this be documented in semgrep-docs?

Depends on the reaction here, really. Currently I'm not even sure if this PR will be accepted. If there's very good reception from users and r2seers, then we can add it in semgrep-docs.

Copy link
Member

@brendongo brendongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should plan out more what pain point this is solving and if adding more fields and complexity to semgrep-action is the best solution for those problems.

@sabrinabrogren sabrinabrogren linked an issue Jul 2, 2021 that may be closed by this pull request
@brendongo brendongo closed this Nov 19, 2021
@brendongo brendongo deleted the bence/semgrepconfig branch March 8, 2022 19:02
@underyx underyx restored the bence/semgrepconfig branch April 6, 2022 22:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

False positives management
5 participants