Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metavariable-regex not working with pattern-inside? #2548

Closed
xmo-odoo opened this issue Feb 11, 2021 · 16 comments · Fixed by #2689
Closed

metavariable-regex not working with pattern-inside? #2548

xmo-odoo opened this issue Feb 11, 2021 · 16 comments · Fixed by #2689
Assignees
Labels
bug Something isn't working feature:metavariable priority:low user:external requested by someone outside of r2c

Comments

@xmo-odoo
Copy link

xmo-odoo commented Feb 11, 2021

When using pattern-inside and metavariable-regex which tries to filter one of the pattern-inside's metavariables, the match seems to never happen https://semgrep.dev/s/8ykq

  patterns:
    - pattern-either: 
      - pattern: |
          return None
      - pattern: |
          return
    - pattern-inside: |
        class $C:
            def $METHOD(...):
              ...
    - metavariable-regex:
        metavariable: "$METHOD"
        regex: "^[^_].*"

The intent is to match all methods which are "public" (not starting with _) and use a bare return or a return None. Commenting the metavariable-regex, all methods with the return are matched (including those we want to exclude).

Duplicating the pattern-inside so the metavariables are extracted from a pattern, things work as expected: https://semgrep.dev/s/gLGx

  patterns:
    - pattern-inside: |
        class $C:
            ...
    - pattern-either:
      - pattern: |
              def $METHOD(...):
                ...
                return
      - pattern: |
              def $METHOD(...):
                ...
                return None
    - metavariable-regex:
        metavariable: "$METHOD"
        regex: "^[^_].*"

The problem with the workaround is that I also want to match functions which don't return (and thus have an implicit return None), but I don't see how I could match a pattern-not: return $X inside a method with the proper name.

@mschwager mschwager self-assigned this Feb 17, 2021
@mschwager
Copy link
Contributor

Hey @xmo-odoo,

Thanks for bringing this to our attention, I'm going to investigate what's going on. This looks similar to #1699, hopefully we can solve this "metavariables with pattern-inside" in a generalized way 👍

@DrewDennison
Copy link
Member

@xmo-odoo thanks for the bug report. Just checking what the priority of this fix is for you and if there are any timelines you're looking for a fix by?

@xmo-odoo
Copy link
Author

@DrewDennison this is a low priority thing, I'm looking into semgrep and trying to use it personally (in part in order to better understand the quirks of pattern combinations, I don't think my mental model is correct yet) but that's about it.

This I decided to report because I spent a while trying to find out what I was doing wrong and a reduced test case showed it didn't seem to be supported (sometimes I can't get a rule to work properly in situ but I can't get the mismatch to trigger in a reduced test case, in those cases I rather assume I am doing something wrong somehow, even if I don't know what).

Maybe accessing a metavariable which doesn't exist should trigger an error or something? At least optionally? Or is there a tracing / debugging mode through which it'd be possible to see the flow of an item through a rule?

@xmo-odoo
Copy link
Author

@mschwager

Thanks for bringing this to our attention, I'm going to investigate what's going on. This looks similar to #1699, hopefully we can solve this "metavariables with pattern-inside" in a generalized way +1

It does look like the same issue (just revealing itself in a different context). Should I close this one?

Maybe edit the title of #1699 so there's more chance people will find it when searching? Given #2154 it looks like I'm not the only one which missed the original.

@mschwager
Copy link
Contributor

Hey @xmo-odoo,

After looking into this I understand what's going on. The patterns operator will only pass along metavariable information for findings that are present in all sub-operators. In other words, in your first example, since $METHOD is not present in your pattern operators its information will not be propagated to metavariable-regex.

The bad news is I don't see any code improvements we can make to get around this, the good news is we can create a rule that meets your needs, we just have to be crafty with our pattern-not: https://semgrep.dev/s/Zv34/.

More on why we can't make any code improvements: as mentioned above, metavariables are only propagated when they're present in all sub-operators. If this was not the case then patterns would propagate metavariables that don't meet our boolean logic specification, which would produce incorrect results.

Hope this is helpful! Let me know if this solves your problem and/or if there's anything else I can help with.

@xmo-odoo
Copy link
Author

@mschwager thanks for the example, I'll take a look at it.

More on why we can't make any code improvements: as mentioned above, metavariables are only propagated when they're present in all sub-operators. If this was not the case then patterns would propagate metavariables that don't meet our boolean logic specification, which would produce incorrect results.

I understand the reasoning for pattern-either, but I can't grasp it for patterns (possibly because my mental model of semgrep remains incomplete or incorrect): since all branches of the pattern must match, surely it doesn't matter which branch matched it, and the metavariable will always match the logic specification?

@mschwager
Copy link
Contributor

mschwager commented Feb 23, 2021

I understand the reasoning for pattern-either, but I can't grasp it for patterns (possibly because my mental model of semgrep remains incomplete or incorrect): since all branches of the pattern must match, surely it doesn't matter which branch matched it, and the metavariable will always match the logic specification?

Hmm, after thinking about it a bit more, I think you're correct that it makes sense to propagate metavariables in some situations. My initial thought is that this propagation makes sense when using patterns + pattern or pattern-inside, i.e. additive operations guaranteed to have equivalent metavariables under the conjunctive nature of patterns, but not things like pattern-either or filtering patterns (e.g. pattern-not, pattern-not-inside, etc.).

I'm trying to think through any obvious bugs that might occur with this change, but, again, I think we're okay due to the conjunctive nature of patterns. @brendongo does this seem reasonable? The nice thing here is bugs like #1699 and others relying on metavariable data will be solved in a more general nature with metavariable propagation.

I think the general flow would be something like:

  • Only propagate metavariables when in patterns.
  • Only propagate metavariables from additive pattern or pattern-inside sub-operators.
  • Only propagate additional metavariables for a particular range, don't overwrite existing metavariables. I think this is a non-issue, since metavariable matching saves us, but I'd include this as a safety precaution to prevent unforeseen breakage.

Thoughts?

@brendongo
Copy link
Member

Yeah propagating metavariables in patterns and pattern-inside but not in pattern-either makes sense.

@aryx
Copy link
Collaborator

aryx commented Feb 23, 2021

Subscribe

@xmo-odoo
Copy link
Author

xmo-odoo commented Feb 24, 2021

@mschwager awesome

but not things like pattern-either or filtering patterns (e.g. pattern-not, pattern-not-inside, etc.).

That I've not used negative-filtering patterns much yet and thus didn't really think of them when I answered is probably why I had a much more optimistic view than you did. But glad to know it seems to work out in the end.

It'd make sense that only "positive" assertions "export" metavariables and negative ones don't, though from what you're writing it feels like semgrep uses metavariables more like unified terms so a negative assertion could theoretically bind a metavar if it was not bound by an other pattern?

aryx added a commit that referenced this issue Feb 25, 2021
You first need to create a token to access github API.
Then you can create a "base" board with
./curl.shell $GITHUB_TOKEN > yesterday.json

test plan:
```
$ ./curl.sh $GITHUB_TOKEN > today.json
$ ./_build/default/oncall.exe -base yesterday.json today.json

To Discuss (UNASSIGNED)

	- [story] Figma can run `nginx` rules on their codebase and report that the performance was good
	<semgrep/semgrep-app#1736> ()
	- read timeout on downloading the semgrep config !PRIORITY:HIGH!
	<#2620> ()

Unassigned To do

Assigned To do

	- Analyzing the entire Figma codebase with Figma's policies takes less than 5 minutes !PRIORITY:HIGH!
	<semgrep/semgrep-app#1727> (DrewDennison colleend)
	- Parser error when parsing obfuscated JavaScript (CLI and Online Rules Editor) !PRIORITY:HIGH!
	<semgrep/enterprise#572> (underyx)
	- Negation not working with subexpressions
	<#2457> (aryx)
	- Clearly document that metavariables match all expressions, and how to exclude array assignment
	<#2305> (brendongo)
	- metavariable-regex not working with pattern-inside?
	<#2548> (mschwager)
	- Prevent users from naming two notifications the same thing _without_ sending a Sentry error
	<semgrep/semgrep-app#1724> (chmccreery)
	- Make deployment id and org name more visible in UI
	<semgrep/semgrep-app#1739> (chmccreery)

Closed since last email (good job guys!)

	- [Php] Add ellipsis nested statement feature support
	<#2622> (aryx)
	- Go parsed with pfff does not do nested statement matching
	<#2636> (aryx)
	- [MacOS] semgrep-core now uses libpcre but is not linked statically in macOS !PRIORITY:HIGH!
	<#2624> (mjambon brendongo)
	- [Python] Constant tracking do not follow assignment operators
	<#2616> (IagoAbal)
```
@mschwager
Copy link
Contributor

It'd make sense that only "positive" assertions "export" metavariables and negative ones don't, though from what you're writing it feels like semgrep uses metavariables more like unified terms so a negative assertion could theoretically bind a metavar if it was not bound by an other pattern?

Yeah the problem with negative assertion is there could be many things to "match" against. E.g.

pattern-not-inside: |
    def $FUNC(...):
        ...

If we're looking for a = ... in the following, what do we set $FUNC to?

a = 1

def func1():
    a = 2
    ...

def func2():
    a = 3
    ...

Generally I think it makes sense to propagate metavariables as outlined in my comment above, so I'll go ahead with the implementation 👍

aryx added a commit that referenced this issue Mar 1, 2021
* Script to automate the oncall email

You first need to create a token to access github API.
Then you can create a "base" board with
./curl.shell $GITHUB_TOKEN > yesterday.json

test plan:
```
$ ./curl.sh $GITHUB_TOKEN > today.json
$ ./_build/default/oncall.exe -base yesterday.json today.json

To Discuss (UNASSIGNED)

	- [story] Figma can run `nginx` rules on their codebase and report that the performance was good
	<semgrep/semgrep-app#1736> ()
	- read timeout on downloading the semgrep config !PRIORITY:HIGH!
	<#2620> ()

Unassigned To do

Assigned To do

	- Analyzing the entire Figma codebase with Figma's policies takes less than 5 minutes !PRIORITY:HIGH!
	<semgrep/semgrep-app#1727> (DrewDennison colleend)
	- Parser error when parsing obfuscated JavaScript (CLI and Online Rules Editor) !PRIORITY:HIGH!
	<semgrep/enterprise#572> (underyx)
	- Negation not working with subexpressions
	<#2457> (aryx)
	- Clearly document that metavariables match all expressions, and how to exclude array assignment
	<#2305> (brendongo)
	- metavariable-regex not working with pattern-inside?
	<#2548> (mschwager)
	- Prevent users from naming two notifications the same thing _without_ sending a Sentry error
	<semgrep/semgrep-app#1724> (chmccreery)
	- Make deployment id and org name more visible in UI
	<semgrep/semgrep-app#1739> (chmccreery)

Closed since last email (good job guys!)

	- [Php] Add ellipsis nested statement feature support
	<#2622> (aryx)
	- Go parsed with pfff does not do nested statement matching
	<#2636> (aryx)
	- [MacOS] semgrep-core now uses libpcre but is not linked statically in macOS !PRIORITY:HIGH!
	<#2624> (mjambon brendongo)
	- [Python] Constant tracking do not follow assignment operators
	<#2616> (IagoAbal)
```

* Report new semgrep issues not in the customer board

Report also issues not updated since last time

* Fix CI, follow semgrep recommendations or skip file

* Adding more documentation
@nbrahms nbrahms added bug Something isn't working priority:low user:external requested by someone outside of r2c feature:metavariable and removed priority:medium labels Mar 1, 2021
mschwager pushed a commit that referenced this issue Mar 3, 2021
mschwager added a commit that referenced this issue Mar 9, 2021
* Add metavariable propagation within 'patterns'

Fixes #2548.

* Fix snapshot tests

* Stabilize evaluation ordering
@mschwager
Copy link
Contributor

Hey @xmo-odoo, you're rule matches now! https://semgrep.dev/s/8ykq?version=0.42.0

@xmo-odoo
Copy link
Author

@mschwager awesome! Thanks ❤️

@aryx
Copy link
Collaborator

aryx commented Mar 22, 2021

I think there is no need to propagate metavariables during the match; we just need to AND the metavariable-regexp with the relevant patterns;
In some sense, metavariable-regexp is a trick to overcome some limitations of semgrep; we currently allow regexp in strings, so you don't do $E = "$STR" where metavariab-regexp: $STR regexp: foo|bar, but instead write directly $E = "=~/foo|bar"
so we just need to do the same; attach the metavariable-regexp to the relevant pattern as if it was part of the pattern.

@aryx
Copy link
Collaborator

aryx commented Mar 22, 2021

I think this was a simple solution that didn't need any change to the engine: https://semgrep.dev/s/Doq2/ just group the metavariable-regexp with a nested patterns:

@mschwager
Copy link
Contributor

I think this was a simple solution that didn't need any change to the engine: https://semgrep.dev/s/Doq2/ just group the metavariable-regexp with a nested patterns:

Not entirely sure why, but that rule errors out in v0.41.1: https://semgrep.dev/s/Doq2/?version=v0.41.1, so I'm not sure that would've fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature:metavariable priority:low user:external requested by someone outside of r2c
Development

Successfully merging a pull request may close this issue.

6 participants