New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C# support #1392
Comments
Thank you for filing this @mackowski. Adding @aryx and @nbrahms to discuss what adding support for a new language looks like. We'd love to get more folks involved there. |
Are you familiar with OCaml? the only remaining part to handle C# needs to be written in OCaml (in semgrep/semgrep-core/parsing/ like the Parse_ruby_tree_sitter.ml in this directory for example, but here it would be a Parse_csharp_tree_sitter.ml. We can generate the boilerplate for this file, but we still need to convert that in the generic AST (defined in pfff/h_program-lang/AST_generic.ml) |
I do not know OCaml, but it seems to be fun to learn it. Anyway, as I did not do it before I do not know how much time is needed to do it. I am ready to learn something new and help you with this project, but I also do not want to be a roadblock. |
Sounds great! @mjambon can you add csharp in ocaml-tree-sitter-lang? I'll create the Parse_csharp_tree_sitter.ml boilerplate with a few cases filled in as a starting point. |
@aryx I just added csharp to ocaml-tree-sitter-lang. I parsed one sample program successfully on the first try but didn't run large-scale stats. |
Ok, I'll create the Parse_csharp_tree_sitter.ml boilerplate with a few cases filled in as a starting point. |
This will help semgrep/semgrep#1392 test plan: make
This will help semgrep/semgrep#1392 test plan: make
This is the start of #1392 Test plan: + /home/pad/github/semgrep/semgrep-core/_build/default/bin/Main.exe -dump_ast -dump_ast hello_world.cs Some constructs are not handled yet CST was: | Using_dire | | using | | Simple_name | | | Choice_global | | | Id | | | System | | ";" | Name_decl | | namespace | | Simple_name | | | Choice_global | | | Id | | | HelloWorldApp | | "{" | | | Class_decl | | | | class | | | | Geeks | | | | "{" | | | | | Meth_decl | | | | | | | Static | | | | | | | static | | | | | | Void_kw | | | | | | void | | | | | | Main | | | | | | "(" | | | | | | | | Param | | | | | | | | | Array_type | | | | | | | | | | Pred_type | | | | | | | | | | string | | | | | | | | | | "[" | | | | | | | | | | "]" | | | | | | | | args | | | | | | ")" | | | | | | Blk | | | | | | | "{" | | | | | | | | Exp_stmt | | | | | | | | | Invo_exp | | | | | | | | | | Member_access_exp | | | | | | | | | | | Exp | | | | | | | | | | | Simple_name | | | | | | | | | | | | Choice_global | | | | | | | | | | | | Id | | | | | | | | | | | | Console | | | | | | | | | | | DOT | | | | | | | | | | | "." | | | | | | | | | | | Choice_global | | | | | | | | | | | Id | | | | | | | | | | | WriteLine | | | | | | | | | | "(" | | | | | | | | | | | | Exp | | | | | | | | | | | | Lit | | | | | | | | | | | | | Str_lit | | | | | | | | | | | | | "\"" | | | | | | | | | | | | | | Blank | | | | | | | | | | | | | "\"" | | | | | | | | | | ")" | | | | | | | | ";" | | | | | | | | Exp_stmt | | | | | | | | | Invo_exp | | | | | | | | | | Member_access_exp | | | | | | | | | | | Exp | | | | | | | | | | | Simple_name | | | | | | | | | | | | Choice_global | | | | | | | | | | | | Id | | | | | | | | | | | | Console | | | | | | | | | | | DOT | | | | | | | | | | | "." | | | | | | | | | | | Choice_global | | | | | | | | | | | Id | | | | | | | | | | | ReadKey | | | | | | | | | | "(" | | | | | | | | | | ")" | | | | | | | | ";" | | | | | | | "}" | | | | "}" | | "}" Original backtrace: Raised at file "stdlib.ml", line 29, characters 22-33 Called from file "parsing/Parse_csharp_tree_sitter.ml", line 2176, characters 18-29 Called from file "list.ml", line 92, characters 20-23 Called from file "parsing/Parse_csharp_tree_sitter.ml", line 2194, characters 4-28 Fatal error: exception (Failure "not implemented") Raised at file "parsing/Parse_csharp_tree_sitter.ml", line 2203, characters 12-15 Called from file "parsing/Parse_code.ml", line 72, characters 12-42 Called from file "bin/Main.ml", line 846, characters 12-78 Called from file "Common.ml", line 1294, characters 12-16 Re-raised at file "Common.ml", line 1292, characters 8-302 Called from file "Common.ml", line 1262, characters 45-49 Called from file "Common.ml", line 82, characters 14-18 Re-raised at file "Common.ml", line 87, characters 10-11 Called from file "Common.ml", line 1259, characters 6-10 Called from file "bin/Main.ml", line 1155, characters 2-55
This is the start of #1392 Test plan: + /home/pad/github/semgrep/semgrep-core/_build/default/bin/Main.exe -dump_ast -dump_ast hello_world.cs Some constructs are not handled yet CST was: | Using_dire | | using | | Simple_name | | | Choice_global | | | Id | | | System | | ";" | Name_decl | | namespace | | Simple_name | | | Choice_global | | | Id | | | HelloWorldApp | | "{" | | | Class_decl | | | | class | | | | Geeks | | | | "{" | | | | | Meth_decl | | | | | | | Static | | | | | | | static | | | | | | Void_kw | | | | | | void | | | | | | Main | | | | | | "(" | | | | | | | | Param | | | | | | | | | Array_type | | | | | | | | | | Pred_type | | | | | | | | | | string | | | | | | | | | | "[" | | | | | | | | | | "]" | | | | | | | | args | | | | | | ")" | | | | | | Blk | | | | | | | "{" | | | | | | | | Exp_stmt | | | | | | | | | Invo_exp | | | | | | | | | | Member_access_exp | | | | | | | | | | | Exp | | | | | | | | | | | Simple_name | | | | | | | | | | | | Choice_global | | | | | | | | | | | | Id | | | | | | | | | | | | Console | | | | | | | | | | | DOT | | | | | | | | | | | "." | | | | | | | | | | | Choice_global | | | | | | | | | | | Id | | | | | | | | | | | WriteLine | | | | | | | | | | "(" | | | | | | | | | | | | Exp | | | | | | | | | | | | Lit | | | | | | | | | | | | | Str_lit | | | | | | | | | | | | | "\"" | | | | | | | | | | | | | | Blank | | | | | | | | | | | | | "\"" | | | | | | | | | | ")" | | | | | | | | ";" | | | | | | | | Exp_stmt | | | | | | | | | Invo_exp | | | | | | | | | | Member_access_exp | | | | | | | | | | | Exp | | | | | | | | | | | Simple_name | | | | | | | | | | | | Choice_global | | | | | | | | | | | | Id | | | | | | | | | | | | Console | | | | | | | | | | | DOT | | | | | | | | | | | "." | | | | | | | | | | | Choice_global | | | | | | | | | | | Id | | | | | | | | | | | ReadKey | | | | | | | | | | "(" | | | | | | | | | | ")" | | | | | | | | ";" | | | | | | | "}" | | | | "}" | | "}" Original backtrace: Raised at file "stdlib.ml", line 29, characters 22-33 Called from file "parsing/Parse_csharp_tree_sitter.ml", line 2176, characters 18-29 Called from file "list.ml", line 92, characters 20-23 Called from file "parsing/Parse_csharp_tree_sitter.ml", line 2194, characters 4-28 Fatal error: exception (Failure "not implemented") Raised at file "parsing/Parse_csharp_tree_sitter.ml", line 2203, characters 12-15 Called from file "parsing/Parse_code.ml", line 72, characters 12-42 Called from file "bin/Main.ml", line 846, characters 12-78 Called from file "Common.ml", line 1294, characters 12-16 Re-raised at file "Common.ml", line 1292, characters 8-302 Called from file "Common.ml", line 1262, characters 45-49 Called from file "Common.ml", line 82, characters 14-18 Re-raised at file "Common.ml", line 87, characters 10-11 Called from file "Common.ml", line 1259, characters 6-10 Called from file "bin/Main.ml", line 1155, characters 2-55
@aryx looks like you added the basic parser; did you want to follow up with @mackowski ? |
No I didn't. I guess I need to add a few simple cases to be a better starting point and maybe some documentation on where to find the relevant AST definitions. |
The main work @mackowski is to replace all the todo in https://github.com/returntocorp/semgrep/blob/develop/semgrep-core/parsing/Parse_csharp_tree_sitter.ml with the relevant construct defined in https://github.com/returntocorp/pfff/blob/master/h_program-lang/AST_generic.ml |
looks good! Next week I will be on a small trip without my laptop but after that I will start working on that :) |
I have started an attempt. |
I can't find anything in AST_Generic to represent an expression with an infix operator, such as |
Yes you need to convert that in a Call (IdSpecial (Op Plus), e1, e2) |
Feel free to ask many many questions @Sjord. We know AST_generic.ml is not very well commented, so we know it's tedious to write those converters from tree-sitter CST to AST_generic.ml |
C# has multidimensional arrays: // Three-dimensional array.
int[, ,] array3D = new int[,,] { { { 1, 2, 3 }, { 4, 5, 6 } },
{ { 7, 8, 9 }, { 10, 11, 12 } } };
// The same array with dimensions specified.
int[, ,] array3Da = new int[2, 2, 3] { { { 1, 2, 3 }, { 4, 5, 6 } },
{ { 7, 8, 9 }, { 10, 11, 12 } } };
// Accessing array elements.
System.Console.WriteLine(array2D[0, 0]); I was thinking of representing this as a TyArray with a Tuple, so that writing However, this presents a problem when parsing something like this: int[, ,] array3D This is an array with a tuple of (nothing, nothing, nothing), and that can't be represented as a Tuple as far as I know. Any ideas? Edit: an alternative is to treat it as an array of arrays. C# also has those, but maybe it's not so bad if we mix up jagged arrays with multidimensional arrays. |
Hmm, yes not super easy to represent right now. We could extend the generic AST at some point to better handle those. And later when we improve the generic AST we can transform those OtherType in something more precise. |
How are things progressing @Sjord ? Anything we can do to help? |
Progress is slow but steady. The parser can currently parse most common features. We started with 329 todo's. I have resolved 201 of those, and added another 30. The TODO's I added are mostly when I am not sure if what I'm doing is the correct way. I ran into a couple of upstream issues in treesitter. The treesitter developers are very fast and helpful with resolving these, but I could use your help to get these fixes into semgrep-core. I am currently only working on the parser, and testing by running The parser is currently half-finished. We could merge it in the current state. It wouldn't break anything, but it is not really complete. Do you have a review process? Perhaps it is a good idea to start reviewing already, before the whole thing is done. |
Fantastic! We can definitely help later to update to the latest tree-sitter-csharp and also to extend the grammar to allow semgrep-specific constructs (metavariables, ellipsis). In the mean time, I think the easiest way to go forward and for you to make a Pull Request on our repo so we can review it and merge it. |
E.g. `a?.b` and `a?[0]`. AST_generic has no `?.` or `?[...]` equivalents, so we map this to a Conditional: `null == a ? null : a.b`. The downside of this is that it seems that `a` is evaluated twice.o Related to semgrep#1392
* Test file for conditional access * Parse conditional access in C# E.g. `a?.b` and `a?[0]`. AST_generic has no `?.` or `?[...]` equivalents, so we map this to a Conditional: `null == a ? null : a.b`. The downside of this is that it seems that `a` is evaluated twice.o Related to #1392 * Reindent Parse_csharp_tree_sitter.ml
@Sjord congratulations, you have done a lot, the community praises you. P.S. Sorry for my English, I'm brazilian guy that made a cheaper course :) |
Thanks, @cfguimaraes. You can help me improve the parser by creating small C# test files that have a construct that is currently not implemented. There are two ways to find those:
You can run
If you have a minimal test case, this makes it easier for me to implement correct parsing. What's also needed is semgrep integration tests, like these but for C#. However, I think C# is currently not sufficiently implemented to make passable tests, and I am not really sure how to write these. @aryx, can you comment on that? |
For the semgrep integration tests you can follow https://semgrep.dev/docs/status/#maturity-definitions and try at least the tests for the "Alpha" category. You can imitate the tests for lua/ which has currently the "Alpha" state. |
I've been looking at these links, I believe I can help you right now with option 2, parsing files and making it reproducible. Talk to you as soon as I found this exception in our codebase (is plenty of scenarios) |
@cfguimaraes I am excited that you are interested in helping! If you would like even faster responses, you can also join us on Slack: https://r2c.dev/slack. Most of the developers are there and we are very friendly! |
We really need the latest tree-sitter-c-sharp to make semgrep work correctly, even for alpha functionality. I started updating it in semgrep-grammars, but this broke the build after the PR was merged, and I don't fully understand why. |
What is the error message? What is broken? |
@mjambon, could you perhaps import the latest upstream C# grammar? The version in semgrep-c-sharp is pretty old and contains several serious bugs. |
@mjambon I can have a go at it. Good way to test for me the new ocaml-tree-sitter organization. |
Hello all :) @Sjord when running locally in windows via docker I have got these exceptions:
I know this isn't what you are want me to look for |
This is probably because of a bug in the tree-sitter grammar. It doesn't currently make much sense to create a test case for this, until we have an updated grammar.
This is new to me. This could be interesting to have a test case for. |
I found what is the cause of the Error you are interested in. I've made the minimum reproducible code that I can Example
{
public class ExampleClass
{
public async Task<ApiResponseTransacao> Method(Request request)
{
(int x, int y) = request.A.B().Value;
}
}
}
In the code above Request is a DTO Class, request.A is a string property and request.B is an extension method that returns an I want to know from you what is the best way you would like to receive these snippets. And of course, My next step is learn more about @aryx suggestion on https://semgrep.dev/docs/status/#maturity-definitions to known a little more about definitions and after @Sjord if you want, I can make a project with this code. |
Important too, I've been running docker this way |
Thanks, @cfguimaraes. Yes, I think a new GitHub project with code snippets is the easiest way. Then I will add them to the semgrep parsing tests as soon as I implement the parsing code for them. |
update to latest tree-sitter-csharp here: semgrep/ocaml-tree-sitter-semgrep#154 |
And here is the PR to update semgrep: #2567 (incomplete). |
@Sjord sorry for the late reply. The link with samples is https://github.com/cfguimaraes/semgrep-AST_generic_helpers.NotAnExpr |
Can we close this out now that C# has reached alpha: #3121? If we'd like to track additional improvements we should create new, specific issues (e.g. beta, GA, parse errors, etc). |
Closing out - feel free to open new, specific issues for additional C# work! Thanks for all the hard work here! |
I'm opening a ticket to indicate interest in support for the C# Language.
I can help with developing it if you can point me how to start and where to look.
The text was updated successfully, but these errors were encountered: