New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Kotlin tags #3034
base: master
Are you sure you want to change the base?
Conversation
2d02f07
to
84b0295
Compare
scripts/update-ctags.py
Outdated
@@ -13,12 +13,18 @@ | |||
dstdir = os.path.abspath(sys.argv[2]) | |||
|
|||
os.chdir(dstdir + '/parsers') | |||
parser_dst_files = glob.glob('*.c') + glob.glob('*.h') | |||
parser_dst_files = glob.glob('**/*.c') + glob.glob('**/*.h') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I would slightly prefer mirroring the ctags directory structure - i.e. having peg outside the parsers directory. That would mean explicitly mentioning peg
and parsers
directories in the globs and in the code below eliminating the
if os.path.exists('parsers/' + f):
shutil.copy('parsers/' + f, dstdir + '/parsers/' + f)
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can add ctags/peg/
directory.
name�16384�com.example.helloworld.Worker�0 | ||
process�16�com.example.helloworld.AbstractWorker�0 | ||
process�16�com.example.helloworld.Worker�0 | ||
result�16384�com.example.helloworld.main�0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't checked but does the unit test cover all kinds from TMParserMapEntry map_KOTLIN
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it covers all 8 tag kinds.
Looks good to me in general. I'm not familiar with the peg parsers - I assume these are generated and one has to always run |
Yes, they are not committed in the git, but they are generated by |
Maybe a good idea, yes. It could be easily forgotten otherwise. |
os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) | ||
print('Copying peg parsers... ({} files)'.format(len(peg_dst_files))) | ||
for f in peg_dst_files: | ||
shutil.copy(f, dstdir + '/peg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the code does more or less the same thing twice, it could be changed to something like the following:
os.chdir(srcdir + '/peg')
os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files)))
for dir in ['parsers', 'peg']:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually thinking that the script could use some "readability improvements". Like using functions and giving it some structure... But it seemed a bit out of scope for this PR, so I stayed with the current style.
Would you mind if I implemented it in follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind if I implemented it in follow-up PR?
Yep, agreed. #3032 will do some more changes to the script so maybe better wait until both of these pull requests are merged.
I just noticed the size of the parser - 25klocs and 1MB of sources. Have you checked how much the size of stripped libgeany.so increases by this patch? |
@techee maybe its time for "somebody" to add the capability for uctags to load parsers from dll :-) Although the latest Ubuntu uctags package has an installed size of just 2Mb, and we don't use it all (yet :-) Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses? |
Yes, the generated code is humongous. I guess it is price for implementing full Kotlin grammar. I have tried to check the library size with
Frankly, the speed could be better. This has already have been discussed when I was adding the parser to ctags and it even resulted in some improvements in PackCC (which is used to generate the code), but it is still slower than handwritten parsers. You can have a look at universal-ctags/ctags#2866 and universal-ctags/ctags#2878 if you're interested. Anyway, the pauses during editing are not noticeable for reasonably sized files. The only situation where the speed might be annoying is when user opens big kotlin project. But I'm not sure how this actually works, does Geany always process all open files on start? Or is there some kind of caching? I think some of the projects-related plugin even allowed to set option to index all files, even those that are not open. This could then take quite some time. |
IIUC yes, no, yes, probably |
Implementing the whole grammar has problems for this use-case, not only speed, but also robustness. It is being used on files that are being edited and may be eronneous or incomplet. Unless the parser can recover from an error it is going to lose all or many symbols in more cases if it parses things that are not essential to detecting the tags since it will then find more errors. It is a problem with many languages, not just kotlin (see for example #2916), but the fact many tag parsers skip lots of code reduces the impact of partly edited code (outside of declarations). This is also why full fat IDEs to try to insist that at least the the structural syntax is complete (eg {}s match) so their parsers have some handles to resync and recover after errors. |
The kotlin parser can recover from errors. If it fails to parse something, it continues until it finds next parseable top-level object (e.g. function or class). The backtracking parser is actually quite good for this. I did try at first to write simple, handwritten parser that only gets the important tags. However the problem with Kotlin is that the grammar is quite complex and it lead to way too many inaccuracies. That is why I had to implement it all in the end. Also, it was very hard to track scope, when only some parts of the code were parsed. |
It's not that bad, I guess I'm still stuck with the "it must fit a floppy" mentality.
This is quite surprising, I would have expected the performance to be somewhere between a hand-written parser and a regex parser but it probably depends on the grammar and the lookahead it has to perform. Have you tried profiling the parser to see if there's not some unexpected bottleneck? With hand-written parsers I used the following: https://wiki.geany.org/howtos/profiling/gperftools
To be fair, hand-written ctags parsers aren't that great at error recovery either - when you type an extra |
If the table above is valid and from what I remember the C parser speed is about 50MB of sources per second and the kotlin parser is about 100x slower than the C parser, it gives us 500KB/s for kotlin. And the pauses should be just something like 0.2s not to be noticeable so about 100KB kotlin source files would be roughly the maximum for Geany. |
I did a lot of profiling on PackCC and it turns out it is not very efficient with memory allocations (see universal-ctags/ctags#2866 (comment)), the backtracking is constantly allocating and deallocating small chunks of memory. So, more rules leads to more backtracking, which leads to more allocations, which slows it down. I have even proposed couple possible optimizations, but I did not have from enough manpower to actually implement it (sadly, I am not very proficient in C). I might look into the grammar optimization in future, so the parser might become somewhat faster (especially, if it slows Geany down, that would annoy me a lot). Probably could beat the RegExp based ones (for comparable grammars). According to creator of ctags, the peg parser is about 15-30 times slower than handwritten parser (citation from universal-ctags/ctags#2866 (comment)). Even with grammar optimizations, it would still probably be about an order of magnitude slower than let's say current C parser. |
Hmm, I'm afraid I don't have very good news regarding the performance. I tried it on Raspberry Pi 400 (which should be slightly faster than regular Raspberry Pi 4) with the following file which IMO isn't that big and the lag was already very annoying (to the point that I'd much rather have parsing disabled than editing this way). IMO Raspberry Pi 4 is a reasonable low-end machine Geany should support without big speed problems (apart from that Geany is the semi-official Raspberry Pi editor as it's installed by default) and the 500 LOC file isn't anything extreme either. So I'm not sure whether this parser should be used in Geany. It's shame that PEG parsers are so slow currently - if they were e.g. 10x slower than hand-written parsers, it would be fine I think but 100x is a little too much. By the way I also tried this file which seems to contain many anonymous functions and it seems to be a common pattern in Kotlin files and seeing the symbol tree filled with |
That is really surprising Well, I started this as a "Christmas project" last year, so I guess I'll just have a another one for this year: to make it faster.
Yes, lambdas can be quite common in Kotlin code and you're right, that they are probably useless in the symbol tree most of the time. I think the only valid reason to keep them is that there might be something else declared in their scope. Perhaps we could filter out all lambdas without any child nodes? |
That's quite probable.
I know those! #3032 is Christmas 2018 :-).
That's a good point. Filtering tags should in theory be possible, the question is how many such things we should do in Geany. But maybe anonymous functions are quite a common pattern to deserve this special handling. I'm just preparing a pull request that improves handling of anonymous tags so I might add something like that. |
a696977
to
8b69570
Compare
c088322
to
fe2acf5
Compare
7c66a56
to
784b5ba
Compare
784b5ba
to
da3726e
Compare
fa83ce4
to
fca1f25
Compare
fca1f25
to
05d2472
Compare
This is a third incarnation of PR to add support for Kotlin language. Previous discussions can be found in #2778 and dolik-rce#1.
Included changes:
Disclaimer: The parser was written (and is maintained) by me.