-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate tree-sitter java parser into sgrep #2
Labels
Comments
there is already a Java parser in sgrep; it parses 98% code. |
ulziibay
changed the title
Integrate a Java parser into sgrep
integrate tree-sitter java parser into sgrep
Jan 28, 2020
Final code lives in https://github.com/returntocorp/ocaml-tree-sitter/tree/master/proto |
Closing until broader tree-sitter work lands. Java is already supported. CC @DrewDennison @nbrahms |
spencerdrak
pushed a commit
that referenced
this issue
Feb 28, 2023
# This is the 1st commit message: fix: address issues with brew nightly # This is the commit message #2: dbg: turn on debug logging for install # This is the commit message #3: dbg: cat formula # This is the commit message #4: fix: remove notifications # This is the commit message #5: force update # This is the commit message #6: dbg: add repostiroy # This is the commit message #7: dbg: verbose update
mjambon
added a commit
that referenced
this issue
May 25, 2023
emjin
added a commit
that referenced
this issue
May 25, 2023
Below was the old PR header. Now, this just adds a new `steps` mode available only in `semgrep-core`, so that I can build off it in semgrep-proprietary, and moves collect for later convenience. ---- Initial changes for a potential join-mode v2. See https://www.notion.so/r2cdev/Multi-language-rules-join-97f2d6b91a914afeb22e38e4f81c7848?pvs=4 for the motivation and planning. **Intro to the PR** This PR is less scary than it looks, many of the files are test files. In hindsight, I should have made smaller PRs, but now I'd rather not. There are basically four parts, in order of first appearance: 1. Adding a "syntactic equal" comparison option. This allows metavariables to be compared for purposes of joining in `Join_util.ml`. (See `AST_generic.ml`, `AST_utils.ml`, `common.ml`) 2. Renaming extract mode's collect to group and moving it to common2. (See `common2.ml`, `Match_extract_mode.ml`). This is later used in `Join_util.ml` 3. Changing the rule syntax to include join mode. (See `Rule.ml`, `Parse_rule.ml`, and a bunch of matching/analyzing files where I had to thread the change). When deciding how to make changes to the rule type, I prioritized making them as reversible as possible. That's why join is just an additional mode. 4. Having rules run with join. (See `Run_semgrep.ml`, `Join_util.ml`). I put as much of the join-specific code into `Join_util` as possible. **An example of what the PR does** Here is a new join rule: ``` ➜ abc git:(emma/join-mode-experiment) ✗ cat deep.yaml rules: - id: abc message: "abc" languages: [python] severity: WARNING mode: join steps: - languages: [python] patterns: - pattern: | x = $A + $B - languages: [python] patterns: - pattern: | y = $B + $C - languages: [python] patterns: - pattern: | z = $A + $C ``` When run on some files that set `x`, `y`, and `z`, it will only match `z = $A + $C` for `$A` and `$C` bound by the previous variables. Note that `$B` needs to be the same `$B` bound by `$A + $B` as in `$B + $C` (though actually the code for that right now is not quite right). I kept the python join mode's paradigm that the matches occur on the last step. Here is the result. For simplicity, the previous matches are still shown. Otherwise, I would have to change the print_match hook. I think this is kind of nice though for the text mode display. ``` ➜ abc git:(emma/join-mode-experiment) ✗ pwd /Users/emma/workspace/semgrep/tests/join/abc ➜ abc git:(emma/join-mode-experiment) ✗ sc -rules deep.yaml . -l py ./abc.py:5 with rule abc__step_2 z = b + c ./abc.py:4 with rule abc__step_2 z = a + c ./abc.py:2 with rule abc__step_2 z = a + b ./ab.py:2 with rule abc__step_0 x = a + b ./bc.py:2 with rule abc__step_1 y = b + c --------------------------------------------------- The previous matches include matches for join steps. Here are the final matches: ./abc.py:4 z = a + c ``` **Limitations** Compared to the previous join mode, this does less. The things it does not do are: 1. Join between files of multiple languages 2. Allow comparison between metavariables (`$A < $B`) 3. Allow access to the paths of metavariables (`path($A) == $B`) 4. Recursive joins #1 is definitely a must-do. It just doesn't work because file targeting annoyingly happens in Python. I will probably try to make it work with `osemgrep`! #2 and #3 will be made possible in Pro, where `metavariable-comparison` will be available. It will be easy to extend the existing syntax to allow for `path`, and we can also make substring easier if necessary. I would prefer to do this in Pro so that we can easily reuse Semgrep's existing syntax as much as possible. I also think it's a natural way to make a distinction between OSS and Pro. #4: I am not currently planning on supporting recursive join mode, and if I do it'll be in Pro. PR checklist: - [x] Purpose of the code is [evident to future readers](https://semgrep.dev/docs/contributing/contributing-code/#explaining-code) - [x] Tests included or PR comment includes a reproducible test plan - [x] Documentation is up-to-date - [x] A changelog entry was [added to changelog.d](https://semgrep.dev/docs/contributing/contributing-code/#adding-a-changelog-entry) for any user-facing change - [x] Change has no security implications (otherwise, ping security team) If you're unsure about any of this, please see: - [Contribution guidelines](https://semgrep.dev/docs/contributing/contributing-code)! - [One of the more specific guides located here](https://semgrep.dev/docs/contributing/contributing/) --------- Co-authored-by: Emma Jin <emma@Emmas-M2.local> Co-authored-by: Emma Jin <--get>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No description provided.
The text was updated successfully, but these errors were encountered: