-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use external parsers fparser / lfortran #85
Comments
I'm also looking at using There's still some f2008 features it doesn't support yet either, but I'm already looking at implementing some of those |
The problem with However, in practice this is much more involved since the implementation of a server's request is usually very specific to the parser used. So, although it would be possible to swap fortl's parser with Fingers crossed this does not apply to |
Unfortunately the parser in ford is very tightly coupled to the other parts, so swapping it out for something else would require some work. But I do think it would be worth doing so. I'm very interested in helping support some nature of common parsing library, it would take a bunch of the maintenance burden off ford! The original author of ford had looked into using |
@ZedThree I wholeheartedly understand. I remember that post and I would be more than glad to also be part of it. However, I think we are running into a "show-stopper"; creating an abstract enough parser that can be used for Specifically for I know In general, I think a user friendly, robust and performant Fortran parser is a project that would require at least a couple people working on it continuously part-time to ensure timely bug fixes and quick implementation of new standards. Unfortunately, I don't see that happening anytime soon. From what I can tell this is one of the objectives for |
LFortran has a Fortran 2018 parser and AST that is very fast to parse and as far as we know complete. If you find a bug, we'll try to fix it quickly. If you are interested in using it, I am happy to help out. |
@certik I was wondering, does |
Yes, there is an API that you can use. We don't have a documentation for it, but there is an API that LFortran itself uses to walk and query the AST. |
I found this: https://github.com/lfortran/lfortran/blob/master/src/lfortran/parser/parser.h EDIT: Is this the Python parser (and its usage comments) I should be looking? BTW To test your parser I would also suggest you try parsing some of the examples in https://github.com/gnikit/fortls/tree/master/test/test_source They contain a lot of edge cases (unfortunately, they are mixed along with more traditional examples. Another test, which #define subroutine val
subroutine foo(val)
real :: val
end subroutine foo |
@ZedThree you might want to have a look at https://github.com/pyparsing/pyparsing. One of the things I wanted to investigate was to create a rudimentary fortran parser with pyparsing but I have been quite busy this last month. |
Yes, the parser is in You can test it like this: $ lfortran --show-ast test_submodule.f90
(TranslationUnit [(Submodule foo_module () submodule1 () [] [(ImplicitNone [] ())] [] [(Procedure foo1 [] [(SimpleAttribute AttrModule)] () [] [] [] [] [(Write 0 [(()) ((String "(A)"))] [] [(StrOp (StrOp (StrOp (String "testing :: ") Concat (FuncCallOrArray trim [] [(() a ())] [] [])) Concat (String "::")) Concat (FuncCallOrArray trim [] [(() b ())] [] []))] ())] [])])]) All the files seem to work, except |
Yes, @certik Python wrappers would be great, basically something that returns the AST and/or ASR trees for a URI, but I know that you are super busy so I don't want to add more things to your plate! |
Re: #235 (comment) A language server's parser is not meant to be a strict implementation of the standard, like that of a compiler's. It is actually meant to be the opposite; permissive and error resistant. How is a compiler going to parse incomplete syntax? This for example breaks lfortran (and every other compiler) while fortls is capable of parsing it and giving completion suggestions at both program name
implicit none
inte
integer :: val
print*, v
end program name You can't rely on cached valid states for dynamic features like completion, hover and syntax highlighting simply because there is absolutely no guarantee that such state existed or will exist while the user types. Moreover, a cached state is not useful for fetching info from the ASR if the underling ASR has changed, that will simply result in the wrong data being fetched. Cached states are good for improving performance of very few and specific LSP requests like like Potentially, you can turn a compiler parser to something that a language server can use by removing all failure checks that are present and instead carrying on, attempting to parse the rest of the code. That will almost definitely come with a performance hit, since you will have to check for more types of AST nodes while also screening for false-positives, remember source code is incomplete e.g. program main
implicit none
integer :: int = 1
int = int! while typing the second `int`, can you tell if it's a variable or an intrinsic function?
end program main In summary, a compiler's parser and a language server's parser are meant to handle two very different states of source code. Caching valid states is not a valid strategy since those rarely exist and unless you can convert the compiler's parser to be more error resistant for invalid syntax, using a compiler's parser in a language server would not work. |
There are three ways forward, each with pros / cons:
I encourage you to try to figure out from your end how we can collaborate instead of giving up on collaboration simply because there are slightly different requirements for a language server compared to a compiler. For example the C# compiler works as a language server. How does the C++ clang language server handle this? Do they really maintain a separate parser and semantics for If we can collaborate on a single project, we both win big time, as well as the whole community, that will not be fragmented maintaining two separate compiler frontends. The Fortran community is small, we need to figure out how to reuse each others work, instead of each of us working on our own project and not wanting to collaborate. I don't need to use LFortran's parser+semantics, I would be happy to use yours if it can be as fast and can handle as much Fortran as LFortran can. It seems LFortran's parser + semantics is closer to the "common goal", that's why I recommend figuring out how we can use it. |
For me the only viable solution is no. 2. I know that no. 3 seems that it would work but me personally, based completely on the knowledge I have so far about parsers and language servers cannot see how that would work robustly. LSP requests are served on the latest version of your doc, how could you query your AST for param X at line Y and column Z when that parameter did not exist in the cached AST. For
I agree, hence this issue, we just need to allocate some time to work on this and set some targets. At least for me the problem right now is that I don't have much free time for extra work.
I would say that it should be the opposite, use a better parser (LFortran) and potentially attach it to an existing LSP frontend like fortls OR copy the algorithms from fortls into your own frontend. fortls will become way more reusable once There is a lot more work to be done to make a good language server, be that fortls or a new one under LFortran. The parser is just one of the many things needed. In my personal view implementing some of the LSP requests is noticeably harder than writing a parser. |
Ok. Let's experiment with 2. and see if we can satisfy the LSP requirements. I'll start with incomplete statements, that would be the most useful and then we'll go from there. How should we represent this in AST --- InvalidStatement AST node? Is the idea that for a compiler application, the compiler might decide to just stop the processing until all syntax errors are fixed (that's the current behavior), while for the LSP application you will then simply ignore InvalidStatement and try to construct ASR as best as you can, ignoring/recovering from whatever semantic errors you encounter? So for an LSP application you want to construct an ASR as best as you can for as large invalid source codes as you can? |
@gnikit let's continue the discussion. As fortran-lang, we should not be developing two parsers and semantics, the ball is in your court to lead this change. |
We have to distinguish between the usage of the 2 parsers. Importantly there are multiple fundamental differences between fortls and LFortran, that make replacing the parser of the former with the latter a difficult task.
Just because the fortls parser cannot parse every single semantic or definition of the standard does not make it inferior. As I demoed in the linked issue, it's equally easy to break any compilers' parser as it is to break fortls'. With regards to me making this happen. I am afraid it will take more than just myself. There are multiple components in Fortls and LFortran that we would have to redesign and/or create. I know some of them but I would have to spend a nontrivial amount of time researching the possible solutions to others, something I currently can't do. The tasks I see as needing research to be able to make informed design decisions are:
Finally, I would like to emphasize that we can't break fortls for the sake of merging the two parsers. Fortls has had a pretty good track record of not breaking the language server. This endeavour needs to be realistic, the merge should add features to the existing capabilities of the language server, not remove any. If it's deemed easier to create a second language server in LFortran we should do that instead, and I would be happy to be part of it. |
I will set aside some time (~2h) by Friday to better understand what |
You have to first define the requirements on the parser. For example if we said the requirements is to only parse arithmetic expressions and nothing else, then indeed it's easy to make a stable parser. I propose the requirements are that it can parse all valid Fortran code. And possibly some invalid code, to be determined. LFortran can parse all valid Fortran code, as far as we know. Fortls cannot. Consequently, Fortls is either a prototype or alpha, but it cannot be beta or production (stable). LFortran's parser on the other hand is beta. Much more stable and further along. Consequently, we should switch to the more advanced parser as soon as we can, not wait, as it will get harder and harder the further fortls is developed. |
@gnikit Are you aware of tree-sitter? It was designed specifically for language servers and similar tools, so syntax errors don't prevent the rest of the file from being parsed. E.g.:
I unfortunately can't say whether it would be easier to update LFortran's parser or write a tree-sitter grammar from scratch, but it is another option that you might want to look into. |
Tree-sitter certainly seems like it has significant upsides, as it's designed to be very fast and fault tolerant -- ideal for things like syntax highlighting and LSP. The downside might be that it's designed for parsing single files at a time, and Fortran often requires information from used modules at the parsing stage. I'm not sure how much of a limitation that is in practice. |
The new helix editor uses tree-sitter although not for fortran as yet , although fortls is implemented. Perhaps this? |
Ah, I didn't realise there was a Fortran parser already implemented, thanks @freevryheid! While it's not as complete as any of the parsers discussed here ( Tree-sitter also comes with a query language which is supposed to be very fast. I've not had a play with that yet, but it looks promising. I'll try and find some time to experiment with getting into |
Thanks for the link @Sean1708, tree-sitter looks interesting, but I don't think we will be making a push for it anytime soon. This functionality will be in addition to the existing Python parser which is relatively error tolerant but slow as hell. For fortls currently the priorities are focused on the LSP side of things and also ironing out some bugs before the major v3.0.0 release (which I have postponed since October) |
The current parser for
fortls
is flimsy and commonly runs into problems when it comes to finding implementations of overloaded methods and preprocessor definitions. It would be interesting to see iffparser
could be used to do the parsing of the source files instead.Potential problems:
f2018
and newer standardsThe text was updated successfully, but these errors were encountered: