Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ctags parsers with big changes with Geany #2991

Merged
merged 23 commits into from Dec 7, 2021

Conversation

techee
Copy link
Member

@techee techee commented Nov 8, 2021

This pull request isn't meant to be merged, it should just serve as a discussion place which parsers apart from those in #2990 to select for syncing. I'm not so familiar with the state of Geany parsers and if there are some patches that should be first applied to uctags before we can grab the uctags version, it would be nice if people who know more than me (@b4n @eht16 @elextr and others) could help with this.

Just to start somewhere, I included a few parsers to this commit:

  1. html and go parsers - these are those I initially wrote and I think their development happened in uctags
  2. diff, jscript, make, objc, rust - there don't seem to be too many changes in these parsers and the uctags version seems to be a superset of the geany version

I went through all the parsers and divided them into 2 categories:

Parsers where uctags implementation is similar to the Geany implementation:

  • asm
  • basic
  • diff (already included)
  • fortran
  • go (already included)
  • html (already included)
  • jscript (already included)
  • lua
  • make (already included)
  • objc (already included)
  • pascal
  • perl
  • php
  • ruby
  • rust (already included)
  • sql
  • nsis

Parsers which differ a lot:

  • python (but here I assume the uctags version is the Colomban's new python parser we could grab)
  • iniconf
  • matlab
  • r
  • sh
  • tcl
  • tex
  • verilog
  • vhdl
  • and of course all the languages from c.c

Apart from python, the languages from the first group seem to be the most obvious candidates for syncing. So does anyone have a better knowledge of the state of these parsers?

@elextr
Copy link
Member

elextr commented Nov 8, 2021

Warning Brain dump:

Of the first list the only one I would suspect may have been modified in Geany independent of ctags is Lua.

Of the second list I'm pretty sure Verilog and VHDL were added to ctags then brought into Geany but probably have developed from that initial version so the ctags version should be good.

c.c -- runs screaming from the room. If languages in c.c have separate parsers in ctags we should import them instead (eg the new c/c++ parser) and only keep c.c for those languages that do not have ctags parsers (if any).

I presume that any parser not in the lists is the same in both Geany and ctags so won't need updating?

@techee
Copy link
Member Author

techee commented Nov 9, 2021

Of the first list the only one I would suspect may have been modified in Geany independent of ctags is Lua.

Of the second list I'm pretty sure Verilog and VHDL were added to ctags then brought into Geany but probably have > developed from that initial version so the ctags version should be good.

Thanks, let's see if others have some more information.

c.c -- runs screaming from the room. If languages in c.c have separate parsers in ctags we should import them instead (eg the new c/c++ parser) and only keep c.c for those languages that do not have ctags parsers (if any).

After #2984 is merged, I'd like to open a pull request switching us to the new c/c++ parser from uctags so these 2 most terrible languages (parsing-wise) should be eliminated from c.c. The rest of course remains. I remember the pre-babies version of Colomban talking about about separating those parsers from c.c but I suspect this task is delayed by about 15 years now ;-).

Anyway, the purpose of this PR is to switch to those parsers where we know we can just grab the uctag version - we'll have to deal with the rest later parser by parser.

I presume that any parser not in the lists is the same in both Geany and ctags so won't need updating?

Those with minimal or no changes are in PR #2990

@Davidy22
Copy link
Contributor

I'm here because I searched python ctag in pull requests to see if it was being addressed anywhere after being told that the GDScript ctags file that I adapted from the python ctags file I found in the geany repository was actually based on a decrepit version of the python parser. I intend to drag in the uctags python parser right after I'm finished redoing my GDScript alterations to the newer version of the python parser in ctags, unless someone else is doing this already.

The tests only differ in added "()" to all functions in generated tags.
The section name seems to be parsed correctly now.
There seem to be missing () which aren't present in the source anyway
so probably OK. Test1 procedure is correctly named here, not as the class
name TTest.
The new go tags contain extra scope information.
PHP and zephir now use "\\" as the context separator instead of "::".
For zephir, return values of methods are now parsed.
The new parser parses also method arguments.
Scope information seems to be correctly parsed now.
@techee
Copy link
Member Author

techee commented Nov 27, 2021

I just took the parsers from the first group plus python and tried them against our unit tests to see what happens. Fortran seems to produce different output in many situations so I dropped it for now. The asm parser uses the cpreprocessor.c/h that the new cxx parser uses and depends on some things from the new cxx parser so I dropped asm from now too. Finally, the new python parser generates different tags for imports and we should make some modifications in Geany to adjust for that so it's dropped for now too.

The rest of the parsers seem to work fine and the updates in unit tests seem to do the right thing. I pushed the updates to unit tests language by language including comments about the changes to simplify review.

I can of course drop some parser from this pull request if someone knows some updates were made to our parser that should go upstream first.

To summarize, this is the result of this pull request:

Parsers included here:

  • basic
  • diff
  • go
  • html
  • jscript
  • lua
  • make
  • objc
  • pascal
  • perl
  • php
  • ruby
  • rust
  • sql
  • nsis

Different parsers (longer-term stuff):

  • iniconf
  • matlab
  • r
  • sh
  • tcl
  • tex
  • verilog
  • vhdl
  • fortran
  • c.c

Parsers for which I will make separate PRs soon:

  • python (needs some more adjustments)
  • cxx
  • asm (depends on cxx)

@techee techee changed the title Decide what parsers from uctags we can sync with Geany Sync ctags parsers with big changes with Geany Nov 27, 2021
@techee
Copy link
Member Author

techee commented Nov 27, 2021

@Davidy22 I'll prepare a separate PR for python soon - not sure if the version I use for all the parsers here from the end of October is new enough for you but when the python parser is in, it should be no problem to update it to the version you need.

@Davidy22
Copy link
Contributor

I've already been muddling around with my modified for gdscript python parser so I think I have a handle on the tags in it. I can update my gdscript PR in a bit so you can yoink the not-gdscript file changes maybe

@techee
Copy link
Member Author

techee commented Nov 27, 2021

@Davidy22 Are all your changes submitted here?

https://github.com/universal-ctags/ctags

The point of all these pull requests is that we don't have any diffs of our own against universal ctags and all changes have to go there first and we just copy the corresponding parsers to Geany.

@Davidy22
Copy link
Contributor

Yeah the one on the ctags repo is my latest work, haven't updated my PR here in a bit, probably will once I get merged in lexilla and ctags, my open PRs linked in the pull request here. Geany does have a little extra layer of mappings that I fiddled to make the symbol table on the left show the right things

@techee
Copy link
Member Author

techee commented Nov 27, 2021

Then I'd suggest to wait until the PR with python I'm planning to make lands in Geany so we don't have to deal with 2 different parsers at the same time.

@elextr
Copy link
Member

elextr commented Nov 28, 2021

Applied #2990 and #2991 to Geany at eabc09a, works for me.

make check passes, just a query, are there any more test files we can copy from ctags?

Also checked any filetypes I happened to have a source file lying around for, and knew enough to check what should be generated, none seem to have any different failures from a version of Geany before they were applied.

If nobody beats me or objects I'd be happy to push #2990 and #2991 in a week (or so).

@Davidy22
Copy link
Contributor

Davidy22 commented Nov 28, 2021

Oh my work (#3012) is on a new parser for a language with similarities to python, no worries about colliding implementations

@eht16
Copy link
Member

eht16 commented Nov 28, 2021

I had a quick look on the changes and this looks good, we'll get some more new tag types and probably the newer uctags code is less buggy than the current ones.

Though at least in the Pascal parser we will lose two features:

  • parsing "type" definitions (IIRC "type" in Pascal is somewhat similar to "structs" in C)
  • parsing argument list of functions

This two features are not in the uctags Pascal parser. I don't know where they came from, maybe we added them ourselves in the past.

Since these could be useful for others as well, it might be best to push these two upstream to uctags, so we'll get them back with the next uctags update.

Until then, it might be ok to have the two features removed (which will also solve #2987) by this PR and get them back with the next uctags sync.

An alternative could be to remove the Pascal parser from this PR and merge #2987 in the meantime.

I prefer the first option: use the uctags Pascal parser with this PR.

What do you think?

@elextr
Copy link
Member

elextr commented Nov 28, 2021

maybe we added them ourselves in the past.

Is it from fixing the crash on typing "type"?

But anyway it would be nice to push improvements upstream.

I would agree that using the uctags parser is the preferable option instead of drawing out the alignment to upstream even further. Out of interest, do Pascalians get any new features with the uctags parser in return for losing "type" for a release or so?

@eht16
Copy link
Member

eht16 commented Nov 28, 2021

maybe we added them ourselves in the past.

Is it from fixing the crash on typing "type"?

Well, "fixing the crash on typing "type"" is the consequence of the feature that "type" is parsed at all which seems to be an exclusive Geany feature. though not sure if this is what you asked :D.

I would agree that using the uctags parser is the preferable option instead of drawing out the alignment to upstream even further. Out of interest, do Pascalians get any new features with the uctags parser in return for losing "type" for a release or so?

Nope, the parsers seems equal apart from the "type" tag type and the argument list parsing.

@elextr
Copy link
Member

elextr commented Nov 28, 2021

though not sure if this is what you asked :D.

It is what I asked, which just exposes my confusion :-)

Nope, the parsers seems equal apart from the "type" tag type and the argument list parsing.

Ok, probably should add to release news something like "Pascal parser has been upgraded to align with ctags, the Geany specific extensions of 'type' and argument list parsing have been pushed upstream and will return next update."

Unless the next Geany release takes so long that changes pushed upstream are accepted in a new parser beforehand and we can import it. Having synchronised with uctags at that stage importing should be simple if the symbols.c handling of those features is left in place.

@techee
Copy link
Member Author

techee commented Nov 28, 2021

make check passes, just a query, are there any more test files we can copy from ctags?

I think they have many more in uctags. In Geany though I think we don't even have to care about the parsers themselves (those are tested in uctags already) but rather the mapping to our internal representation (which is revealed in our tags files). So it's only important that all the tag types we support are generated by the tests.

@eht16
Copy link
Member

eht16 commented Nov 28, 2021

Though at least in the Pascal parser we will lose two features:

I just made a diff of the 2 implementations and our pascal parser is a clear superset of the uctags version (nothing dramatic though) so I'd rather keep our parser and then introduce our changes upstream and drop pascal from this PR. This means you can just merge #2987 because this parser would stay for now.

Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.

@techee
Copy link
Member Author

techee commented Nov 28, 2021

Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.

Cool, I was going to prepare the PR myself - you were too fast :-)

I just reverted the parser back to the original one (if desired, I can squash it with the previous commits).

@eht16
Copy link
Member

eht16 commented Nov 29, 2021

Ok. Let's go this way. I'll try to prepare PRs for uctags with the two missing features. If/When they are merged, we can sync the Pascal parser.

Cool, I was going to prepare the PR myself - you were too fast :-)

:)

However, I only PR'ed the argument list parsing feature.
While preparing tests for the "type" parsing, I noticed the current code is quite buggy and incomplete. It only parses very easy type declarations but it probably fails for real world uses.
For example:

type
  TTest=class
    procedure Test1;
    procedure  Test2;
    procedure   Test3;
  end;

Will create a tag for "TTest" but also for the procedures "Test2" and "Test3" which are wrong. And "Test1" is missing for some reason (fun fact: even our test case expects that broken behavior in https://github.com/geany/geany/blob/master/tests/ctags/bug612019.pas.tags).

I guess the parsing of "type" declarations needs to implemented in a more sophisticated way. At least I don't want to submit it upstream in the current state.
Since I don't want and can't fix the implementation, we maybe should just remove that particular feature instead of keeping an incomplete implementation.

@techee
Copy link
Member Author

techee commented Nov 29, 2021

Since I don't want and can't fix the implementation, we maybe should just remove that particular feature instead of keeping an incomplete implementation.

Yep, agreed.

I just added the iniconf parser to this pull request too - after looking at it in more detail, there are just extra things added and there are no changes in our unit tests.

…hp separators

This patch introduces a new function tm_parser_update_scope() which
can be used to modify scope information when the provided value from
ctags doesn't suit our needs.

This patch uses it to replace php scope separator \ with :: so there's
a single scope separator for this language.
"." is used as the context separator in ruby now, the comment is
probably just some historic artifact.
@elextr
Copy link
Member

elextr commented Dec 1, 2021

@techee can #2990 and #2991 be merged, or have you more fiddling to do?

PS both WFM (at least for the languages I tried)

@elextr elextr mentioned this pull request Dec 2, 2021
@techee
Copy link
Member Author

techee commented Dec 2, 2021

@techee can #2990 and #2991 be merged, or have you more fiddling to do?

I think #2990 is OK to be merged (just added a comment there regarding what I did).

I would still wait with this one. First, I guess merging #2990 will cause some merge conflicts here so these will have to be resolved first.

Second, there is the problem with the go parser I mentioned in #3032 (comment) and subsequent comment and I'd like to know the opinion of @masatake.

Also, someone might want to have a look at the patch I added yesterday here to rewrite PHP scope information if I didn't do something stupid there.

# Conflicts:
#	ctags/Makefile.am
This patch allows us to enable/disable ctags roles for certain languages
and kinds. Roles are currently disabled only for the Go kind 'p'
for the reasons mentioned in the comment message in the patch.
@techee
Copy link
Member Author

techee commented Dec 2, 2021

@elextr I think I'm more or less done here. Merge conflicts are resolved, the scope separator problem of PHP should be fixed now and the Go problem from universal-ctags/ctags#3211 too. All the bugs introduced by me are your responsibility now :-P.

@elextr
Copy link
Member

elextr commented Dec 3, 2021

All the bugs introduced by me are your responsibility now :-P.

Ok, so long as you fix #3032 to add all the new capabilities to Geany :-)

Will try this on the weekend and merge next week if it WFM, giving others time to try it too.

This patch is similar to the patch allowing us to enable/disable roles,
it just does this for ctags kinds. This can be useful when there is
a bug in a ctags parser causing incorrect parsing when certain kind is
enabled - which is what happens when parsing cython source files
with the 'z' flag enabled.
@techee
Copy link
Member Author

techee commented Dec 3, 2021

Ok, so long as you fix #3032 to add all the new capabilities to Geany :-)

Enabling some of the kinds may be much easier than making sure nothing breaks here ;-)

I added one more patch here to allow us to enable/disable certain kinds in ctags - will help to resolve the cython problem in #3031 until it's fixed in ctags and will allow us to easily handle such problems in the future.

Instead of enabling all kinds and then manually disabling them if
there are problems with them, we can enable/disable them based on whether
we actually use them - i.e., when they are mapped to something else
than tm_tag_undef_t.
@techee
Copy link
Member Author

techee commented Dec 4, 2021

Sorry for the last-minute push I just did, I just got an idea we could enable/disable kinds automatically based on whether we use them for the given language or not instead of enabling them all and this is just much better than hard-coding some kinds when there's some problem for them. This could also speed up parsing as the corresponding code paths can be skipped in individual parsers (and potentially reduces crashes because less code from the parser will be executed). This should fix the problem with cython parameters automatically as we don't use the 'z' kind.

@elextr elextr merged commit ad1debc into geany:master Dec 7, 2021
@elextr
Copy link
Member

elextr commented Dec 7, 2021

Still WFM, tests pass, merged before @techee can sneak more in :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants