Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] fix(kotlin): classes with newlines #477

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amchiclet
Copy link
Contributor

This PR refactors Brandon's PR, so the CST to Generic AST process becomes a bit easier. The rest of the description is copied and pasted from #476

What:

This PR makes it so we can parse Kotlin classes that have a newline between the class identifier and constructor.

Why:

Parse rate.

How:

The problem is in cases like:

class Foo
constructor Bar() { ... }

Here, we insert an automatic semicolon between Foo and constructor. This leaves us able to parse class Foo as a class_declaration, but constructor Bar ... is not allowed on its own.

We simply allow constructor Bar ... to be a standalone statement, and resolve to stitch them together at parsing time.

Security

  • Change has no security implications (otherwise, ping the security team)

@amchiclet amchiclet marked this pull request as ready for review March 27, 2024 03:12
@amchiclet amchiclet requested a review from a team as a code owner March 27, 2024 03:12
@amchiclet amchiclet requested review from nmote, ajbt200128, aryx and brandonspark and removed request for a team and ajbt200128 March 27, 2024 03:12
@amchiclet
Copy link
Contributor Author

I will merge commits from this PR into Brandon's original PR: #476 (comment).

Not closing this PR yet. I'd like the branch to still be alive while merging changes.

@amchiclet amchiclet changed the title fix(kotlin): classes with newlines [DO NOT MERGE] fix(kotlin): classes with newlines Mar 27, 2024
amchiclet added a commit to semgrep/semgrep that referenced this pull request Mar 27, 2024
Fixes: SAF-899
Fixes: SAF-923

Requires: semgrep/ocaml-tree-sitter-semgrep#477

The current version of Kotlin's tree-sitter doesn't support newlines
when declaring classes. It has to do with automatic semicolons. For
example, the following won't parse:

```
class C
constructor(arg:Int){}
```

but the following parses fine:

```
class C constructor(arg:Int){}
```

The fix requires a hack in the grammar. (Ideally the scanner seems to be
the right place for a fix, but modifying the scanner is complicated.)

The hack works as follows. When the parser sees the code:

```
class C
constructor(arg:Int){}
```

The first line is parsed as a class declaration node, and the second
line is parsed as a partial class declaration node. These need to be
consolidated during CST -> Generic AST. This PR implements this merge.

Test plan:
* `make core-test` with new tests
* ran `semgrep-core -lang kotlin -dump_ast` with some examples that have
new lines in class declarations, including the one reported on Linear.
The output AST shows that there's a single class definition that is
correctly populated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants