Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pre-commit] Semgrep trying to scan not supported languages file with python rule #4310

Closed
vedanandusekar opened this issue Nov 18, 2021 · 11 comments
Assignees
Labels
bug Something isn't working priority:high Issue requires immediate attention

Comments

@vedanandusekar
Copy link

We are currently running semgrep with the following ruleset
semgrep --config=p/r2c-security-audit --JSON in the repository which has mixed-language support.

So ideally it should just ignore the language files which it doesn't support.
But instead of that, it is trying to run python-Django rule-set against R type file.

Here is the error :

line 967, characters 4-71\nCalled from Match_rules.check.(fun) in file \"src/engine/Match_rules.ml\", line 999, characters 17-79\n\n-----[ END error trace ]-----\n", "path": "test.R", "rule_id": "python.django.security.injection.path-traversal.path-traversal-file-name.path-traversal-file-name", "type": "Fatal error"}], "results": []}

Let us know how we can fix it.
Any help would be appreciated.

@aryx
Copy link
Collaborator

aryx commented Nov 19, 2021

which version of semgrep are you using?

@ievans ievans added the bug Something isn't working label Nov 19, 2021
@vedanandusekar
Copy link
Author

It's 0.73.0

@daneah
Copy link

daneah commented Dec 2, 2021

We're experiencing the same on 0.74.0 and 0.75.0—semgrep attempts to parse a SCSS file as Python despite the following in a custom rule:

# .semgrep/error/python.yml

rules:
  - id: mutable-default-arguments
    languages:
      - python
    message: Do not use mutable values as default arguments
    pattern-either:
      - pattern: |
            def $FUNC(..., $ARG=[], ...):
                ...
      - pattern: |
            def $FUNC(..., $ARG={}, ...):
                ...
    severity: ERROR

We're running Semgrep using pre-commit with the following command:

-   repo: https://github.com/returntocorp/semgrep
    rev: 'v0.74.0'
    hooks:
    -   id: semgrep
        args: ['--config', '.semgrep/error', '--error']
Error details
Semgrep Core ERROR - Fatal error: When running mutable-default-arguments on path/to/one/of/our/scss/files.scss: (Failure "Lexer_python.top_mode: empty stack")
-----[ BEGIN error trace ]-----
Raised at Parse_target.run in file "src/parsing/Parse_target.ml", line 185, characters 17-26
Called from Parse_target.parse_and_resolve_name_use_pfff_or_treesitter in file "src/parsing/Parse_target.ml", line 349, characters 30-60
Called from Parse_with_caching.parse_generic.(fun) in file "src/runner/Parse_with_caching.ml", line 102, characters 12-80
Called from Parse_with_caching.cache_computation.(fun) in file "src/runner/Parse_with_caching.ml", line 55, characters 20-24
Called from Parse_with_caching.parse_generic in file "src/runner/Parse_with_caching.ml", line 86, characters 4-1023
Called from CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 31, characters 17-27
Re-raised at CamlinternalLazy.force_lazy_block in file "camlinternalLazy.ml", line 36, characters 4-11
Called from Common.with_time in file "src/pfff/commons/Common.ml", line 215, characters 12-16
Called from Match_rules.matches_of_patterns in file "src/engine/Match_rules.ml", line 264, characters 8-67
Called from Match_rules.matches_of_xpatterns in file "src/engine/Match_rules.ml", line 601, characters 6-105
Called from Match_rules.matches_of_formula in file "src/engine/Match_rules.ml", line 931, characters 4-109
Called from Match_rules.check_rule in file "src/engine/Match_rules.ml", line 967, characters 4-71
Called from Match_rules.check.(fun) in file "src/engine/Match_rules.ml", line 999, characters 17-79

@aryx aryx self-assigned this Dec 2, 2021
@aryx
Copy link
Collaborator

aryx commented Dec 2, 2021

Ok looking

@aryx
Copy link
Collaborator

aryx commented Dec 2, 2021

@vedanandusekar are you also using semgrep via pre-commit? Or you run it on the command-line? Could you copy-paste this test.R file so I can try to reproduce.
I've created a testing /tmp/tests/test.R file and ran --debug --config=p/r2c-security-audit --json /tmp/test/
but I don't get any of the error you mention.
If you could also attach the content of the output of semgrep with the --debug option that would be great.

@aryx
Copy link
Collaborator

aryx commented Dec 2, 2021

@daneah can you run the pre-commit command on the command-line that shows the error and also attach this file.scss?
I try to reproduce on my machine but I can't get semgrep 0.74 to run on it.

@daneah
Copy link

daneah commented Dec 2, 2021

@aryx Interestingly, running pre-commit run directly I get no error:

semgrep..............................................(no files to check)Skipped

The file.scss is arbitrary—I can add a newline to any of a wide variety of our SCSS files and when I attempt to commit it I can reproduce the error. Here's an example, but I want to reiterate that I don't think it's "special" in any way:

////
/// Much of the styling here could be reduced to an `all: unset` rule
/// But `all`, `unset`, and `initial` are not supported in IE11
////

///
/// Reset Foundation style that applies to all images on the site
///

#jstor-logo img {
    vertical-align: initial;
}

///
/// Reset Foundation's styling from overly-broad selectors
/// (e.g. [type="text"] rather than input[type="text"])
///

[type=submit],
[type=reset],
[type=button] {
    appearance: none;
    -webkit-appearance: none;
    -moz-appearance: none;

    @at-root button#{&} {
        appearance: button;
    }
}

pharos-text-input,
[data-pharos-component="PharosTextInput"],
.form-jstor pharos-text-input,
.form-jstor [data-pharos-component="PharosTextInput"],
pharos-input-group,
[data-pharos-component="PharosInputGroup"],
.form-jstor pharos-input-group,
.form-jstor [data-pharos-component="PharosInputGroup"] {

    &[type="text"],
    &[type="password"],
    &[type="date"],
    &[type="datetime"],
    &[type="datetime-local"],
    &[type="month"],
    &[type="week"],
    &[type="email"],
    &[type="number"],
    &[type="search"],
    &[type="tel"],
    &[type="time"],
    &[type="url"],
    &[type="color"] {
        display: block;
        box-sizing: border-box !important;
        margin: 0 0 0 0;
        padding: 0 !important;
        border: none !important;
        border-radius: 0 !important;
        background-color: transparent !important;
        box-shadow: none !important;
        font-family: initial !important;
        font-size: initial !important;
        font-weight: initial !important;
        line-height: initial !important;
        color: initial !important;
        transition: initial !important;
        height: initial !important;

        & input {
            margin: 0;
        }
    }
}

///
/// Update Foundation's styling for layout classes to account for the change in base font size
///

.column, .columns {
    padding-right: 0.9375rem;
    padding-left: 0.9375rem;
}

This leads me to suspect that when run as part of pre-commit's pre-commit hook, Semgrep is attempting to parse all changed files for each rule, whereas other modes of execution respect the languages key. Explicitly adding the following stops the error during pre-commit hooks:

paths:
  exclude:
    - "*.scss"

@aryx aryx added the priority:high Issue requires immediate attention label Dec 2, 2021
@aryx
Copy link
Collaborator

aryx commented Dec 2, 2021

Ok I was able to reproduce the error. Thx!
Seems like a regression introduced in 0.72.
Will try to fix ASAP, in the mean time you can use 0.71 to be on the safe side.

@aryx aryx changed the title Semgrep trying to scan not supported languages file [pre-commit] Semgrep trying to scan not supported languages file since 0.72 Dec 2, 2021
@aryx aryx changed the title [pre-commit] Semgrep trying to scan not supported languages file since 0.72 [pre-commit] Semgrep trying to scan not supported languages file with python rule Dec 3, 2021
@aryx
Copy link
Collaborator

aryx commented Dec 3, 2021

Actually it was not introduced in 0.72. The bug has always been there it's just that it appears only when someone commit a file that will make the python lexer fails with a bad exception.

@aryx
Copy link
Collaborator

aryx commented Dec 3, 2021

There are 2 problems:

  • we should not parse non Python files with python, but in pre-commit we get explicitely passed a list of file, which we interpret as the user knows what is doing. Maybe we should change the default of --scan-unknown-exts to false.
  • we should return a regular parse error in the Python lexer, not a Failure (which is then turned into a Fatal error)

aryx added a commit to semgrep/pfff that referenced this issue Dec 3, 2021
aryx added a commit that referenced this issue Dec 3, 2021
This will help #4310

test plan:
test file included
mmcqd pushed a commit that referenced this issue Dec 3, 2021
* [Python] generate a proper lexical exn for unbalanced braces

This will help #4310

test plan:
test file included

* changelog
@vedanandusekar
Copy link
Author

Thank you for fixing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high Issue requires immediate attention
Development

No branches or pull requests

4 participants