Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ctags with upstream so that most parsers can be copied from uctags #2018

Merged
merged 36 commits into from Apr 6, 2019

Conversation

Projects
None yet
5 participants
@techee
Copy link
Member

commented Dec 17, 2018

Had to create the new pull request quickly - not having a huge ctags-related pull request in Geany brings bad luck!

This pull request brings the changes from upstream ctags made in the last 2 years so that parsers are compatible with uctags parsers. This isn't a complete sync to avoid too big pull request, just the minimal changes required so syncing parsers isn't blocked. In addition, to verify that subparser support works, this pull request uses html, javascript and css parsers from upstream and enables subparser support for HTML.

In general most of the patches are fortunately boring, the "interesting" ones are:

8822e4e - using kindIndex instead of kindDefinition. This one took a bit of work and I hope I didn't introduce any problems on the way. In some functions which aren't relevant to Geany I left just function stubs to avoid extra syncing with upstream. This patch also disables regex parsing - the parsing is only disabled inside ctags, I kept the regex parsers there for now until we decide what we want to do about them.

72d4b5d - using upstream javascript parser. I haven't verified completely if the generated tags are correct (seems OK to someone who doesn't really know javascript) but better to check by somebody else.

5de9b3e - Support subparsers in Geany. The biggest part of the patch is additional mapping of subparser tag types to the tag type used by Geany. This is necessary because we don't want subparser tag types to clash with main parser tag types.

I tried to copy all the parsers from upstream to Geany to see if it compiles (I have only tried the compilation part, not if the parsers actually work correctly) and the only ones that didn't compile because of upstream changes were:

  1. asm - would need upstream preprocessor.c/h implementation. But it probably wouldn't be hard to convert it to our lcpp.c/h

  2. make and tcl parsers - need subparser.c/h and it would require more syncs with upstream

  3. c.c and cxx parsers - I haven't actually tried these beasts

techee added some commits Dec 11, 2018

Use latest version of vstring
This also requires adding trashbox.c/h which is now used by vstring and
inline macros from inline.h.
Rename fieldSpec to fieldDefinition
See b56bd065123d69087acd6f202499d71a86a7ea7a upstream.
Rename kindOption to kindDefinition
See e112e8ab6e0933b5bd7922e0dfb969b1f28c60fa upstream
Rename kinds field in parserDefinition to kindTable
See 09ae690face8b5cde940e2d7cf40f8860381067b upstream.
Rename structure fields about field in parserDefinition
See a739fa5fb790bc349a66b2bee0bf42cf289994e8 upstream.
Use kindIndex instead of kindDefinition
This patch replaces kindDefinition related entries from sTagEntryInfo
with kindIndex so kinds are referenced indirectly using the index. For
more info please refer to commits:

16a2541c0698bd8ee03c1be8172ef3191f6e695a
f92e6bf2aeb21fd6b04756487f98d0eefa16d9ce

Some other changes had to be made to make the sources compile (without
bringing all the diffs from upstream). At some places, which aren't used
by Geany, only stub implementations have been created.

In particular, the regex parser has been disabled (for now?) because its
current implementation doesn't allow accessing kindDefinitions using
index and allowing this would require big changes in its implementation.
The affected parsers are Cobol, ActionScript and HTML. For HTML we can
use the token-based parser from upstream, and we should consider
whether Cobol and ActionScript are worth the effort to maintain a separate
regex implementation using GRegex (IMO these languages are dead enough
not to justify the extra effort).

The patch also disables tests for languages using regex parsers.
Rename roleDesc to roleDefinition
See 1345725842c196cc0523ff60231192bcd588961b upstream. Since we don't care
about roles in Geany, we don't have to do the additional stuff the upstream
patch does.
Add XTAG_ANONYMOUS used by jscript
See 0e4c5d4a0461bc8d9616fe3b97d75b91d014246e upstream.
Don't use hash value as an Anonymous field identifier
Instead of something like "Anonymous0ab283cd9402" use sequential integer
values like "Anonymous1".
Call anonReset in main part
See 3c91b1ea509df238feb86c9cbd552b621e462653 upstream.
Support subparsers in Geany and add HTML parser demonstrating this fe…
…ature

This feature requires several changes:

1. Propagating language of the tag from ctags to Geany so we know whether
the tag comes from a master parser or a subparser.

2. We need to address the problem that tag types from a subparsers can
clash with tag types from master parsers or other subparsers used by the
master parser. For instance, HTML and both its css and javascript
subparsers use tm_tag_class_t but HTML uses it for <h2> headings, and
css and javascript for classes. Representing all of them using
tm_tag_class_t would lead to complete mess where all of these types would
for instance be listed in the same branch of the tree in the sidebar.

To avoid this problem, this patch adds another mapping for subparsers where
each tag type can be mapped to another tag type (which isn't used neither
by master parser or other subparsers). To avoid unwanted clashes with other
parsers, only tags explicitly mentioned in such mappings are added to tag
manager; other subparser tags are discarded.

For HTML this patch introduces mapping only for tm_tag_function_t (which
in this case maps to the same type) to mimick the previous HTML parser
behavior but other javascript and css tag types can be added this way
in the future too.

3. Since in most of the code Geany and tag manager assume that tags from
one file use the same language, subparser's tags are modified to have the
same language like the master parser.

4. HTML parser itself was copied from upstream without any modifications.
Tests were fixed as the parser now correctly ignores comments.
Rename truncateLine field of tagEntryInfo
See 0e70b22791877322598f03ecbe3eb26a6b661001 upstream. Needed for Fortran
parser.
Add dummy mbcs.h and trace.h
Included by javascript parser.
Introduce an accessor to `locate' field of `Option'
See fb5ef68859f71ff2949f1d9a7cab7515f523532f upstream. Needed for Fortran.
Add numarray.c/h
Needed by various parsers.
Add getLanguageForFilename() and getLanguageForCommand()
See

416c5e6b8807feaec318d7f8addbb4107370c187
334e072f9d6d9954ebd3eb89bbceb252c20ae9dd

upstream. Needed for Sh parser.
@elextr

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

Asciidoc parser is broken, it no longer shows the document outline, instead the headings are grouped by section levels. Same with ReST.

@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2018

@elextr Thanks for noticing - apparently scope info is missing, this also affects current master. It seems we don't have asciidoc and rest unit tests :(.

@@ -78,7 +78,7 @@ static void makeAsciidocTag (const vString* const name, const int kind)
{
tagEntryInfo e;

initTagEntry (&e, vStringValue (name), &(AsciidocKinds [kind]));
initTagEntry (&e, vStringValue (name), kind);

e.lineNumber--; /* we want the line before the '---' underline chars */

This comment has been minimized.

Copy link
@elextr

elextr Dec 18, 2018

Member

I suspect the scope information that used to be stored in extensions about here is the cause of the breakage of asciidoc structuring and similar change for rest. Needs to be fixed, note that if upstream won't accept it then these two parsers will have to remain Geany specific.

This comment has been minimized.

Copy link
@techee

techee Dec 18, 2018

Author Member

No, this is correct - it's the "Use kindIndex instead of kindDefinition" patch mentioned above. It's now enough to use the index (which is "kind" here) instead of the value from the structure.

But the problem seems to be this patch:

3e8365c#diff-349c047a4c33ab772a7ad84be2c13b4c

Upstream there's some additional comment why using cork isn't good in this case:

https://github.com/universal-ctags/ctags/blob/master/parsers/asciidoc.c

I'll prepare a patch tomorrow to fix that.

This comment has been minimized.

Copy link
@elextr

elextr Dec 18, 2018

Member

yes, the lines that patch was removing was what I was referring to.

@elextr

This comment has been minimized.

Copy link
Member

commented Dec 18, 2018

@techee overlapping comments :)

Would tests have picked up extension stuff?

@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 18, 2018

Would tests have picked up extension stuff?

Yep, this would appear in the tags file. I'll grab some tests from uctags (and maybe check if we have at least 1 test for every file type) so problems like that don't appear in the future - especially important now that we make many changes in our ctags implementation

@elextr

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

Hehe smile I gave COBOL a little try, but it now it seems a lot more of a nightmare to do it right (from what I gathered, not only the official syntax uses a pretty random notation, but it also seem close to impossible to parse without knowing every single bit of it -- e.g. it's not easy to locate "statements" or alike). But well… I'll still try and find a way.

If you really insist on having this fun, couldn't you just use the readLineFromInputFile() ctags interface and more or less only simulate the regex expressions from the existing regex parser (without actually having to learn how the language looks like)?

Or just use glib regex and use the regexen from the ctags Cobol modified as needed for glib?

techee added some commits Dec 21, 2018

Add more ctags unit tests
This patch adds unit tests for: nsis, docbook, haskell, haxe, abaqus, vala,
abc.

The only missing unit tests are for GLSL and Ferite parsers which
however share the implementation with the C parser and should be
reasonably well covered by other C-like language tests.

The tests were put together from various tutorials and help of the
languages in order to cover the tags these parsers generate. No guarantee
they'd compile with real parsers.

@techee techee force-pushed the techee:ctags_sync3 branch from 608fe0f to 5863f57 Jan 14, 2019

@techee

This comment has been minimized.

Copy link
Member Author

commented Jan 14, 2019

@b4n I hope I addressed all the minor problems (apart from the missing parsers) - let me know if I forgot about something.

@elextr

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

@elextr Since you like finding all the possible cases where the current C parser fails, would you give it a try?

@techee ahh, no, not until this gets committed, then you can make a PR to use the new C++ parser instead of the one in c.c. After all once this is committed we will need another ctags PR to replace it 😁

@elextr

This comment has been minimized.

Copy link
Member

commented Jan 14, 2019

Latest commits don't seem to have broken anything, and using a rubbish example from the web, latex seems to be working (well at least I have symbols).

Seems as though nobody else cares enough to check their favourite language still works, so as soon as @b4n accepts it should be merged to allow it to mature for a bit before the next release.

@techee

This comment has been minimized.

Copy link
Member Author

commented Jan 15, 2019

@techee ahh, no, not until this gets committed, then you can make a PR to use the new C++ parser instead of the one in c.c.

Sure. I just thought you might be interested to try it before and see if it fixes some of the problems with the existing C/C++ parser for you.

@elextr

This comment has been minimized.

Copy link
Member

commented Mar 6, 2019

@techee

In particular, the regex parser has been disabled (for now?) because its
current implementation doesn't allow accessing kindDefinitions using
index and allowing this would require big changes in its implementation.
The affected parsers are Cobol, ActionScript and HTML. For HTML we can
use the token-based parser from upstream, and we should consider
whether Cobol and ActionScript are worth the effort to maintain a separate
regex implementation using GRegex (IMO these languages are dead enough
not to justify the extra effort).

Basically it looks like this is stalled because it removes features, so how are those languages handled upstream?

@elextr elextr merged commit a1cf475 into geany:master Apr 6, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@elextr

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

Merged because

  1. it fixes three languages, Asciidoc, Rest (Geanys own documentation) and Latex and progresses the process of making it easier to maintain parsers from upstream.

  2. It has been working for me for months

  3. If we keep delaying every ctags update for a year we will always have huge updates :(

The downside is it loses Cobol, but that appears to require major work which can be better provided as a pull request on top of this when agreement on either making a non-regex parser or accepting two regex parsers (since ctags uses a different variant to Glib).

Summary: we need to make progress, not block because its not perfect.

@codebrainz

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

I don't think we should release with this until the broken languages are fixed.

This PR shouldn't have been merged until that was done, as per @b4n's review requesting changes first, or at least until there was some kind of consensus that it should be merged.

@elextr

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

See #2119. As @b4n refused a second regex engine the "requested changes" amount to whole new parsers. Thats not in the scope of this PR. Now they can be made as clean standalone PRs instead of buried inside an already too big PR.

@b4n

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

@elextr please don't squash-merge this kind of big PRs with great commit splitting, we now lost the whole history here, and although yes, some commits were big already, most were very well split apart. I'll also make bisecting a potential issue a lot harder.

Anyway, as per the merge itself: I obviously am not a fan, but well, I also admit we need to move forward at some point, and this kind of forces (my) hand. Sad it happens right when I'm back with some time for Geany dev… well I guess them I should have started with this one yesterday instead of #2092.

  1. it fixes three languages, Asciidoc, Rest (Geanys own documentation) and Latex and progresses the process of making it easier to maintain parsers from upstream.

Let's just be fair: I already proposed a standalone fix for Asciidoc and Rest (#2019) without this PR, and you rejected it because merging this was a better choice -- which is fair enough, but then not so much an argument for pushing this one :)
I wasn't aware that Latex was broken in master, probably to blame on my lack of Geany time lately.

  1. It has been working for me for months

That's a great point :)

  1. If we keep delaying every ctags update for a year we will always have huge updates :(

yes, but with this PR, so long as it can still be merged in Geany, we should have a clean-ish diff from uctags, meaning update should be fairly easy no matter how big they are, as most can just be applied as-is.

The downside is it loses Cobol

An ActionScript as well. I don't think we can release in the current state without those, especially as we have at least one active COBOL user that even submitted changes recently.

Anyway, we now have to pull up our sleeves and fix what needs to be fixed before the release. Help is welcome.

@elextr

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

Yes, I know there is one Cobol user, and I agree its not nice for them, but keeping Cobol blocked everything else, so it was not nice for everyone else, not just the three languages that didn't work, but also all improvements from upstream for many other languages.

And since it will be a separate PR the Cobol/Actionscript or whatever else can potentially be a short term fix, and it will be easy to remove later when a better one comes along :)

PRs with great commit splitting

Sorry. I did look, but maybe because I really don't understand the ctags code organisation they seemed to just be progressive things, so you could never revert just one, the whole lot would stop working.

happens right when I'm back with some time for Geany

Neat :)

Talking to @codebrainz in another forum he suggested that the release can be delayed if needed, and I would agree, the changes (other than this :) are not so significant that it would hurt for them to wait. And besides we missed the last one by several months, why break a good record :)

Anyway, we now have to pull up our sleeves and fix what needs to be fixed before the release. Help is welcome.

Lets discuss on #2119

@codebrainz

This comment has been minimized.

Copy link
Member

commented Apr 6, 2019

please don't squash-merge this kind of big PRs with great commit splitting

Couldn't the squashed merge be reverted and re-merged with the individual commits? (not that I agree with merging this at all until it doesn't cause serious regressions).

@techee

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2019

I didn't port the regex parser from ctags because it seemed too much work to me (for the two not-so-important languages - at least to me) but it might not be that hard if I take the current version of the parser from uctags and apply the GRegex-related changes on top of that. I was just a little worried I might miss some semantic differences between the two regex engines.

I might have some time to have a look at it in the next week but no guarantee.

@Skif-off

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

I don't know where I can ask: what about "Hacking Geany" > "Adding a filetype"? Is this section needs updating?

@elextr

This comment has been minimized.

Copy link
Member

commented Apr 14, 2019

@Skif-off the part that possibly is affected is the Adding a TagManager parser section, but its very general and looks like its written for those who already know what they are doing (which is not me as far as adding ctags parsers, so I may be wrong).

Since its so general I don't think its affected by any changes to the ctags layout. What it might possibly need is an improvement, but thats something to make a separate issue/pull request.

@Skif-off

This comment has been minimized.

Copy link
Contributor

commented Apr 15, 2019

@elextr, yes, section Adding a TagManager parser is interesting firstly (possible changes in updating add_top_level_items() & adding TMParserMapEntry ).
Ok, thanks, I thought so, but I decided to clarify this moment, just in case :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.