Add support for Kotlin tags #3034

dolik-rce · 2021-11-29T15:20:02Z

This is a third incarnation of PR to add support for Kotlin language. Previous discussions can be found in #2778 and dolik-rce#1.

Included changes:

Small modification of update-ctags.sh, so it can import peg based parsers from ctags.
Pull Kotlin parser from ctags and integrate it into tagmanager.
Rename filetypes.Kotlin.conf to filetypes.kotlin, to make it work correctly with tagmanager.
Added test for kotlin tags.

Disclaimer: The parser was written (and is maintained) by me.

techee · 2021-12-05T21:37:27Z

scripts/update-ctags.py

@@ -13,12 +13,18 @@
 dstdir = os.path.abspath(sys.argv[2])

 os.chdir(dstdir + '/parsers')
-parser_dst_files = glob.glob('*.c') + glob.glob('*.h')
+parser_dst_files = glob.glob('**/*.c') + glob.glob('**/*.h')


Maybe I would slightly prefer mirroring the ctags directory structure - i.e. having peg outside the parsers directory. That would mean explicitly mentioning peg and parsers directories in the globs and in the code below eliminating the

if os.path.exists('parsers/' + f): shutil.copy('parsers/' + f, dstdir + '/parsers/' + f)

part.

Ok, I can add ctags/peg/ directory.

techee · 2021-12-05T21:39:02Z

tests/ctags/kotlin.kt.tags

+name�16384�com.example.helloworld.Worker�0
+process�16�com.example.helloworld.AbstractWorker�0
+process�16�com.example.helloworld.Worker�0
+result�16384�com.example.helloworld.main�0


Haven't checked but does the unit test cover all kinds from TMParserMapEntry map_KOTLIN?

Yes, it covers all 8 tag kinds.

techee · 2021-12-05T21:40:50Z

Looks good to me in general. I'm not familiar with the peg parsers - I assume these are generated and one has to always run make before so the sources get created, right?

dolik-rce · 2021-12-06T05:28:07Z

I'm not familiar with the peg parsers - I assume these are generated and one has to always run make before so the sources get created, right?

Yes, they are not committed in the git, but they are generated by packcc as needed. At the very least make peg/kotllin.c should be called, otherwise the update script fails. Should I add the call in the script?

techee · 2021-12-06T13:25:37Z

Yes, they are not committed in the git, but they are generated by packcc as needed. At the very least make peg/kotllin.c should be called, otherwise the update script fails. Should I add the call in the script?

Maybe a good idea, yes. It could be easily forgotten otherwise.

techee · 2021-12-06T17:12:18Z

scripts/update-ctags.py

+os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files)))
+print('Copying peg parsers... ({} files)'.format(len(peg_dst_files)))
+for f in peg_dst_files:
+    shutil.copy(f, dstdir + '/peg')


Since the code does more or less the same thing twice, it could be changed to something like the following:

os.chdir(srcdir + '/peg') os.system('make -C {} peg/{}'.format(srcdir, ' peg/'.join(peg_dst_files))) for dir in ['parsers', 'peg']: ...

I was actually thinking that the script could use some "readability improvements". Like using functions and giving it some structure... But it seemed a bit out of scope for this PR, so I stayed with the current style.

Would you mind if I implemented it in follow-up PR?

Would you mind if I implemented it in follow-up PR?

Yep, agreed. #3032 will do some more changes to the script so maybe better wait until both of these pull requests are merged.

techee · 2021-12-06T20:21:35Z

I just noticed the size of the parser - 25klocs and 1MB of sources. Have you checked how much the size of stripped libgeany.so increases by this patch?

elextr · 2021-12-07T00:41:38Z

@techee maybe its time for "somebody" to add the capability for uctags to load parsers from dll :-)

Although the latest Ubuntu uctags package has an installed size of just 2Mb, and we don't use it all (yet :-)

Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?

dolik-rce · 2021-12-07T08:40:30Z

I just noticed the size of the parser - 25klocs and 1MB of sources. Have you checked how much the size of stripped libgeany.so increases by this patch?

Yes, the generated code is humongous. I guess it is price for implementing full Kotlin grammar. I have tried to check the library size with make && strip --strip-unneeded. Current master is about 3.57 MB, this branch yields 3.72 MB, so roughly 150kB. I agree it is quite a lot, but still manageable I hope.

Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?

Frankly, the speed could be better. This has already have been discussed when I was adding the parser to ctags and it even resulted in some improvements in PackCC (which is used to generate the code), but it is still slower than handwritten parsers. You can have a look at universal-ctags/ctags#2866 and universal-ctags/ctags#2878 if you're interested.

Anyway, the pauses during editing are not noticeable for reasonably sized files. The only situation where the speed might be annoying is when user opens big kotlin project. But I'm not sure how this actually works, does Geany always process all open files on start? Or is there some kind of caching? I think some of the projects-related plugin even allowed to set option to index all files, even those that are not open. This could then take quite some time.

elextr · 2021-12-07T08:50:59Z

does Geany always process all open files on start? Or is there some kind of caching? I think some of the projects-related plugin even allowed to set option to index all files, even those that are not open. This could then take quite some time.

IIUC yes, no, yes, probably

elextr · 2021-12-07T09:07:06Z

Implementing the whole grammar has problems for this use-case, not only speed, but also robustness. It is being used on files that are being edited and may be eronneous or incomplet.

Unless the parser can recover from an error it is going to lose all or many symbols in more cases if it parses things that are not essential to detecting the tags since it will then find more errors. It is a problem with many languages, not just kotlin (see for example #2916), but the fact many tag parsers skip lots of code reduces the impact of partly edited code (outside of declarations).

This is also why full fat IDEs to try to insist that at least the the structural syntax is complete (eg {}s match) so their parsers have some handles to resync and recover after errors.

dolik-rce · 2021-12-07T09:35:08Z

The kotlin parser can recover from errors. If it fails to parse something, it continues until it finds next parseable top-level object (e.g. function or class). The backtracking parser is actually quite good for this.

I did try at first to write simple, handwritten parser that only gets the important tags. However the problem with Kotlin is that the grammar is quite complex and it lead to way too many inaccuracies. That is why I had to implement it all in the end. Also, it was very hard to track scope, when only some parts of the code were parsed.

techee · 2021-12-07T15:29:20Z

Yes, the generated code is humongous. I guess it is price for implementing full Kotlin grammar. I have tried to check the library size with make && strip --strip-unneeded. Current master is about 3.57 MB, this branch yields 3.72 MB, so roughly 150kB. I agree it is quite a lot, but still manageable I hope.

It's not that bad, I guess I'm still stuck with the "it must fit a floppy" mentality.

Implementation	Time	Avg. per file	Tags added
C	4 s	66 us	567695
RegExp	155 s	2.5 ms	527908
Peg	523s	8.6 ms	782500

This is quite surprising, I would have expected the performance to be somewhere between a hand-written parser and a regex parser but it probably depends on the grammar and the lookahead it has to perform. Have you tried profiling the parser to see if there's not some unexpected bottleneck? With hand-written parsers I used the following:

https://wiki.geany.org/howtos/profiling/gperftools

Unless the parser can recover from an error it is going to lose all or many symbols in more cases if it parses things that are not essential to detecting the tags since it will then find more errors. It is a problem with many languages, not just kotlin (see for example #2916), but the fact many tag parsers skip lots of code reduces the impact of partly edited code (outside of declarations).

To be fair, hand-written ctags parsers aren't that great at error recovery either - when you type an extra { at the beginning of a file, it won't get parsed.

techee · 2021-12-07T16:06:29Z

Of more interest is how fast (slow) is it, remembering it runs while editing, will Kotlists get big pauses?

If the table above is valid and from what I remember the C parser speed is about 50MB of sources per second and the kotlin parser is about 100x slower than the C parser, it gives us 500KB/s for kotlin. And the pauses should be just something like 0.2s not to be noticeable so about 100KB kotlin source files would be roughly the maximum for Geany.

dolik-rce · 2021-12-07T18:40:27Z

This is quite surprising, I would have expected the performance to be somewhere between a hand-written parser and a regex parser but it probably depends on the grammar and the lookahead it has to perform. Have you tried profiling the parser to see if there's not some unexpected bottleneck?

I did a lot of profiling on PackCC and it turns out it is not very efficient with memory allocations (see universal-ctags/ctags#2866 (comment)), the backtracking is constantly allocating and deallocating small chunks of memory. So, more rules leads to more backtracking, which leads to more allocations, which slows it down. I have even proposed couple possible optimizations, but I did not have from enough manpower to actually implement it (sadly, I am not very proficient in C).

I might look into the grammar optimization in future, so the parser might become somewhat faster (especially, if it slows Geany down, that would annoy me a lot). Probably could beat the RegExp based ones (for comparable grammars). According to creator of ctags, the peg parser is about 15-30 times slower than handwritten parser (citation from universal-ctags/ctags#2866 (comment)). Even with grammar optimizations, it would still probably be about an order of magnitude slower than let's say current C parser.

techee · 2021-12-12T09:28:49Z

Hmm, I'm afraid I don't have very good news regarding the performance. I tried it on Raspberry Pi 400 (which should be slightly faster than regular Raspberry Pi 4) with the following file which IMO isn't that big

https://github.com/JetBrains/Exposed/blob/master/exposed-core/src/main/kotlin/org/jetbrains/exposed/sql/SQLExpressionBuilder.kt

and the lag was already very annoying (to the point that I'd much rather have parsing disabled than editing this way). IMO Raspberry Pi 4 is a reasonable low-end machine Geany should support without big speed problems (apart from that Geany is the semi-official Raspberry Pi editor as it's installed by default) and the 500 LOC file isn't anything extreme either. So I'm not sure whether this parser should be used in Geany. It's shame that PEG parsers are so slow currently - if they were e.g. 10x slower than hand-written parsers, it would be fine I think but 100x is a little too much.

By the way I also tried this file

https://github.com/JetBrains/Exposed/blob/master/exposed-core/src/main/kotlin/org/jetbrains/exposed/sql/SQLExpressionBuilder.kt

which seems to contain many anonymous functions and it seems to be a common pattern in Kotlin files and seeing the symbol tree filled with <lambda> doesn't seem that useful to me so maybe these shouldn't be reported to Geany. In any case, the performance issue seems to be a bigger problem though.

dolik-rce · 2021-12-12T10:07:57Z

That is really surprising ☹️ I'd expect RPi 4 to handle this much better. I don't have any slower hardware readily available right now, but I have tested it with cpulimit -l 50 -i geany SQLExpressionBuilder.kt, which on my i5 should mean it should get about 1.2GHz and was still pretty smooth. There might of course also be some difference due to memory speed.

Well, I started this as a "Christmas project" last year, so I guess I'll just have a another one for this year: to make it faster.

... seems to contain many anonymous functions and it seems to be a common pattern in Kotlin files and seeing the symbol tree filled with doesn't seem that useful to me so maybe these shouldn't be reported to Geany.

Yes, lambdas can be quite common in Kotlin code and you're right, that they are probably useless in the symbol tree most of the time. I think the only valid reason to keep them is that there might be something else declared in their scope. Perhaps we could filter out all lambdas without any child nodes?

techee · 2021-12-12T10:33:33Z

There might of course also be some difference due to memory speed.

That's quite probable.

Well, I started this as a "Christmas project" last year, so I guess I'll just have a another one for this year: to make it faster.

I know those! #3032 is Christmas 2018 :-).

I think the only valid reason to keep them is that there might be something else declared in their scope.

That's a good point. Filtering tags should in theory be possible, the question is how many such things we should do in Geany. But maybe anonymous functions are quite a common pattern to deserve this special handling. I'm just preparing a pull request that improves handling of anonymous tags so I might add something like that.

dolik-rce mentioned this pull request Nov 29, 2021

Add support for Kotlin tags dolik-rce/geany#1

Closed

dolik-rce force-pushed the add-kotlin-support-3 branch from 2d02f07 to 84b0295 Compare December 4, 2021 12:30

techee reviewed Dec 5, 2021

View reviewed changes

techee added the ctags parser label Dec 5, 2021

techee reviewed Dec 6, 2021

View reviewed changes

techee mentioned this pull request Dec 11, 2021

Use cxx parser from uctags #3032

Merged

dolik-rce force-pushed the add-kotlin-support-3 branch from a696977 to 8b69570 Compare January 23, 2022 14:08

elextr mentioned this pull request Apr 20, 2022

Status of ctags parsers in Geany #3170

Open

dolik-rce force-pushed the add-kotlin-support-3 branch 2 times, most recently from c088322 to fe2acf5 Compare August 15, 2022 12:17

dolik-rce force-pushed the add-kotlin-support-3 branch from 7c66a56 to 784b5ba Compare September 21, 2022 03:33

techee mentioned this pull request Sep 12, 2023

Q: optlib parsers in ctags #3557

Closed

dolik-rce force-pushed the add-kotlin-support-3 branch from 784b5ba to da3726e Compare October 31, 2023 16:15

Add support for Kotlin tags

ab8cc5e

follow ctags directory structure, call make before copying peg parsers

25e73f8

dolik-rce force-pushed the add-kotlin-support-3 branch 2 times, most recently from fa83ce4 to fca1f25 Compare October 31, 2023 17:59

fix meson build

05d2472

dolik-rce force-pushed the add-kotlin-support-3 branch from fca1f25 to 05d2472 Compare October 31, 2023 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Kotlin tags #3034

Add support for Kotlin tags #3034

dolik-rce commented Nov 29, 2021

techee Dec 5, 2021

dolik-rce Dec 6, 2021

techee Dec 5, 2021

dolik-rce Dec 6, 2021

techee commented Dec 5, 2021

dolik-rce commented Dec 6, 2021

techee commented Dec 6, 2021

techee Dec 6, 2021

dolik-rce Dec 6, 2021

techee Dec 6, 2021

techee commented Dec 6, 2021

elextr commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

elextr commented Dec 7, 2021

elextr commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

techee commented Dec 7, 2021

techee commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

techee commented Dec 12, 2021

dolik-rce commented Dec 12, 2021

techee commented Dec 12, 2021

Add support for Kotlin tags #3034

Are you sure you want to change the base?

Add support for Kotlin tags #3034

Conversation

dolik-rce commented Nov 29, 2021

techee Dec 5, 2021

Choose a reason for hiding this comment

dolik-rce Dec 6, 2021

Choose a reason for hiding this comment

techee Dec 5, 2021

Choose a reason for hiding this comment

dolik-rce Dec 6, 2021

Choose a reason for hiding this comment

techee commented Dec 5, 2021

dolik-rce commented Dec 6, 2021

techee commented Dec 6, 2021

techee Dec 6, 2021

Choose a reason for hiding this comment

dolik-rce Dec 6, 2021

Choose a reason for hiding this comment

techee Dec 6, 2021

Choose a reason for hiding this comment

techee commented Dec 6, 2021

elextr commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

elextr commented Dec 7, 2021

elextr commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

techee commented Dec 7, 2021

techee commented Dec 7, 2021

dolik-rce commented Dec 7, 2021

techee commented Dec 12, 2021

dolik-rce commented Dec 12, 2021

techee commented Dec 12, 2021