Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate tree-sitter java parser into sgrep #2

Closed
minusworld opened this issue Jan 21, 2020 · 3 comments
Closed

integrate tree-sitter java parser into sgrep #2

minusworld opened this issue Jan 21, 2020 · 3 comments
Assignees
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@minusworld
Copy link
Member

No description provided.

@aryx
Copy link
Collaborator

aryx commented Jan 21, 2020

there is already a Java parser in sgrep; it parses 98% code.
the title should be integrate tree-sitter java parser into sgrep

@ulziibay ulziibay changed the title Integrate a Java parser into sgrep integrate tree-sitter java parser into sgrep Jan 28, 2020
@ulziibay
Copy link
Contributor

ulziibay commented Feb 4, 2020

@ievans ievans added the enhancement New feature or request label Feb 20, 2020
@dlukeomalley dlukeomalley added the wontfix This will not be worked on label May 22, 2020
@dlukeomalley
Copy link
Member

Closing until broader tree-sitter work lands. Java is already supported. CC @DrewDennison @nbrahms

spencerdrak pushed a commit that referenced this issue Feb 28, 2023
# This is the 1st commit message:

fix: address issues with brew nightly

# This is the commit message #2:

dbg: turn on debug logging for install

# This is the commit message #3:

dbg: cat formula

# This is the commit message #4:

fix: remove notifications

# This is the commit message #5:

force update

# This is the commit message #6:

dbg: add repostiroy

# This is the commit message #7:

dbg: verbose update
emjin added a commit that referenced this issue May 25, 2023
Below was the old PR header. Now, this just adds a new `steps` mode
available only in `semgrep-core`, so that I can build off it in
semgrep-proprietary, and moves collect for later convenience.

----
Initial changes for a potential join-mode v2. See
https://www.notion.so/r2cdev/Multi-language-rules-join-97f2d6b91a914afeb22e38e4f81c7848?pvs=4
for the motivation and planning.

**Intro to the PR**

This PR is less scary than it looks, many of the files are test files.
In hindsight, I should have made smaller PRs, but now I'd rather not.
There are basically four parts, in order of first appearance:

1. Adding a "syntactic equal" comparison option. This allows
metavariables to be compared for purposes of joining in `Join_util.ml`.
(See `AST_generic.ml`, `AST_utils.ml`, `common.ml`)
2. Renaming extract mode's collect to group and moving it to common2.
(See `common2.ml`, `Match_extract_mode.ml`). This is later used in
`Join_util.ml`
3. Changing the rule syntax to include join mode. (See `Rule.ml`,
`Parse_rule.ml`, and a bunch of matching/analyzing files where I had to
thread the change). When deciding how to make changes to the rule type,
I prioritized making them as reversible as possible. That's why join is
just an additional mode.
4. Having rules run with join. (See `Run_semgrep.ml`, `Join_util.ml`). I
put as much of the join-specific code into `Join_util` as possible.

**An example of what the PR does**

Here is a new join rule:

```
➜  abc git:(emma/join-mode-experiment) ✗ cat deep.yaml 
rules:
  - id: abc
    message: "abc"
    languages: [python]
    severity: WARNING
    mode: join
    steps:
        - languages: [python]
          patterns:
          - pattern: |
                x = $A + $B
        - languages: [python]
          patterns:
          - pattern: |
                y = $B + $C
        - languages: [python]
          patterns:
          - pattern: |
                z = $A + $C
```

When run on some files that set `x`, `y`, and `z`, it will only match `z
= $A + $C`
for `$A` and `$C` bound by the previous variables. Note that `$B` needs
to be
the same `$B` bound by `$A + $B` as in `$B + $C` (though actually the
code for
that right now is not quite right).

I kept the python join mode's paradigm that the matches occur on the
last step.

Here is the result. For simplicity, the previous matches are still
shown. Otherwise,
I would have to change the print_match hook. I think this is kind of
nice though for
the text mode display.

```
➜  abc git:(emma/join-mode-experiment) ✗ pwd
/Users/emma/workspace/semgrep/tests/join/abc
➜  abc git:(emma/join-mode-experiment) ✗  sc -rules deep.yaml . -l py 
./abc.py:5 with rule abc__step_2
     z = b + c
./abc.py:4 with rule abc__step_2
     z = a + c
./abc.py:2 with rule abc__step_2
     z = a + b
./ab.py:2 with rule abc__step_0
     x = a + b 
./bc.py:2 with rule abc__step_1
     y = b + c


---------------------------------------------------

The previous matches include matches for join steps. Here are the final matches:

./abc.py:4
     z = a + c
```

**Limitations**

Compared to the previous join mode, this does less. The things it does
not do are:

1. Join between files of multiple languages
2. Allow comparison between metavariables (`$A < $B`)
3. Allow access to the paths of metavariables (`path($A) == $B`)
4. Recursive joins

#1 is definitely a must-do. It just doesn't work because file targeting
annoyingly happens in
Python. I will probably try to make it work with `osemgrep`!

#2 and #3 will be made possible in Pro, where `metavariable-comparison`
will be available.
It will be easy to extend the existing syntax to allow for `path`, and
we can also make
substring easier if necessary. I would prefer to do this in Pro so that
we can easily reuse
Semgrep's existing syntax as much as possible. I also think it's a
natural way to make a
distinction between OSS and Pro.

#4: I am not currently planning on supporting recursive join mode, and
if I do it'll be in Pro.

PR checklist:

- [x] Purpose of the code is [evident to future
readers](https://semgrep.dev/docs/contributing/contributing-code/#explaining-code)
- [x] Tests included or PR comment includes a reproducible test plan
- [x] Documentation is up-to-date
- [x] A changelog entry was [added to
changelog.d](https://semgrep.dev/docs/contributing/contributing-code/#adding-a-changelog-entry)
for any user-facing change
- [x] Change has no security implications (otherwise, ping security
team)

If you're unsure about any of this, please see:

- [Contribution
guidelines](https://semgrep.dev/docs/contributing/contributing-code)!
- [One of the more specific guides located
here](https://semgrep.dev/docs/contributing/contributing/)

---------

Co-authored-by: Emma Jin <emma@Emmas-M2.local>
Co-authored-by: Emma Jin <--get>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Development

No branches or pull requests

6 participants