Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync ctags main part #1263

Merged
merged 102 commits into from Dec 17, 2018

Conversation

8 participants
@techee
Copy link
Member

commented Oct 15, 2016

Alright, hopefully my last Colomban-killer patchset. I tried to sync as much as possible from ctags/main here (no big parser changes at this point, only those caused by some change in main).

I'm not sure how much it's worth reviewing patch by patch (maybe except those patches that touch some Geany code or parsers) - it's much easier to diff this against universal-ctags and see the differences - there just a few of them. I made some minor changes to uctags as well so better to diff against https://github.com/techee/ctags/tree/geany_sync2 which contains those changes.

I tried to librarify ctags a bit by adding CTAGS_LIB macro by which I protect the library-specific stuff. There are however still some things which are either missing in uctags or which we should change in Geany:

  • varType missing in uctags
  • GRegex vs GNU regex - I kept GRegex in lregex.c (but synced all the rest) but we should think whether not to convert back to GNU regex
  • isIgnoreToken() works differently than in uctags. If it simply just dropped certain tags, we could move the logic into TM but right now, it helps the C parser with parsing by suggesting whether the contents of braces should be dropped too. I'm not sure if we need to preserve this functionality (never used the ignored.tags file myself) but if we move to the new cxx parser we'd have to keep this diff unless there's some interest for this in uctags.
  • lcpp contains lots of changes as it's tightly related to c.c. However, as it's used by c.c only and quite independent of the rest, if we move to the cxx parser we could keep a separate copy of lcpp for the obsolete c.c parser.

Except 829ea7d I tried to do all the work piece by piece so nothing should be lost. There wasn't much work on parsers except the mentioned patch where parsers using nestlevel had to be updated to its new API and using cork (which works so every uctags parser could be now ported to Geany).

I think it's too late for Geany 1.29 but it would be good to have it ready for early 1.30 so it can be tested during the whole release cycle.

techee added some commits Oct 6, 2016

Grab the complete uctags vString implementation
All the changes in the first half of .c/.h were introduced by me in uctags
(and thus they are guaranteed to be great ;-).

The following functions were missing in our ctags implementation and are
added by this commit (currently unused):

extern vString *vStringNewOrClear (vString *const string);
extern char    *vStringDeleteUnwrap (vString *const string);
extern void vStringCatSWithEscaping (vString* b, const char *s);
extern void vStringCatSWithEscapingAsPattern (vString *output, const char* input);

read.c has been updated to use vStringResize() instead of
vStringAutoResize().
Eliminate uses of g_stat()
Convert g_stat() to stat() and introduce eStat() to routines.c/h and use
it instead of getFileSize() and isExecutable().

On the way grab implementations of isSameFile() and tempFile() from
universal-ctags (I haven't checked in detail if all the ifdef cases do the
right thing but these functions aren't probably called in Geany so
we don't have to worry much). Also drop unused setCurrentDirectory().
Drop some more unused functions from routines.c/h
Also get the error() implementation from error.c/h (modified slightly to
make sure exit() isn't called and which doesn't call errorPrinter() as
this one should be set somewhere else in the code and it doesn't happen
now).
Remove most uses of glib calls
Some special cases which require more work still remain, plus there's
still GRegex.
Grab uctags implementation of strlist
Except missing mio_free() in stringListNewFromFile() which has been ported
to uctags.

Except the use of ptrarray for strlist implementation, there's nothing
interesting in the Geany version which would be worth preserving.
Grab uctags keyword.c/h and add types.h with type declarations
keyword.c/h contains only changes made by me. To make the sync complete,
add type.h with type declarations (and remove the added declarations from
their original locations).
Make combinePathAndFile() return char * instead of vString
Also move eStrdup() to the correct position in the header.

In addition add the same includes into debug.h as those in uctags (the
removal of vstring inclusion inside routines.h causes compilation errors -
it would be best to explicitly include all needed files in every source to
avoid problems like this but let's do just syncing the two implementations
for now).
Sync the rest of routines.c/h
Requires checking for errno.h limits.h in autoconf as there are ifdefs
around those in routines.c.
Grab args.c/h uctags implementation
Basically dead code for us.
Grab ctags version of general.h and make related changes
Added e_msoft.h, changed configure.ac to make more checks for header files
so we have corresponding macros defined, drop HAVE_REGEX macro check which
isn't defined any more, drop debugging function from lua parser (complains
about missing definition of errout).

Tested only on Linux (TODO: Windows, OS X)
Grab ctags implementation of sort.c/h
I haven't studies the changes much but this file is responsible for sorting
tag files which isn't interesing for us.
Create geany.c/h and put isIgnoreToken()
This is a (hopefully) temporary file where we put geany-specific code
that for some reason has to be still in ctags. Put isIgnoreToken()
in this file.
Sync options.c/h (and introduce a lot of new garbage)
This commit started innocently as an attempt to grab options.c/h from
uctags. However it started referencing other files (which started
referencing other files) so the result is lots of new files are added.

When trying to make it compile, I run into many differences of existing
code which would have to be resolved. As all the added code is more or
less garbage for Geany, I just simply commented-out various function
bodies so it compiles without additional modifications. Should get fixed
once more files get synced with uctags.
Some sync of lcpp.c/h
Sync things which are possible to sync from lcpp.c/h. This file is heavily
dependent on c.c and since our c.c differs quite significantly, it's
not possible to use the universal-ctags implementation right now.
Move eTagFile from entry.h to entry.c
Add access functions and use them outside entry.c. In addition to functions
defined in uctags setMaxTagsLine() had to be added because it is needed
in parser.c (at the moment, will be probably gone in some of the next
commits).
Sync the beginning of entry.c/h
Mostly sTagEntryInfo and eTagFile taken from uctags and sync of includes.
@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 6, 2018

@elextr I agree with @b4n - parsers cannot be updated one at a time unless you have these changes in the ctags core. And I agree with you it would be nice if this could be merged at the beginning of the release cycle so it can be tested.

I haven't checked what has changed in ctags in the last two years and if there's something more we need to import in order to be able to grab the ctags parsers but once these patches are merged, I'll try to prepare a new patch set so we can sync the parsers easily. And I'll do my best to do it in a timely manner and not in several months ;-).

masatake and others added some commits Apr 12, 2017

main: remove duplicated declarations
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Add dummy definitions to Assert* macros for suppressing compiler warn…
…ings

main/field.c:968:23: warning: \
     suggest braces around empty body in an ‘else’ statement [-Wempty-body]

     AssertNotReached();

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Don't export some local functions
They are already fixed upstream, so it doesn't add to the diff.
@b4n
Copy link
Member

left a comment

OK, here's my latest with the help of GCC.
See also techee#1 which contains some small stuff that I think are relevant, but please review them, don't take them as gospel.

Apart from that, I think the next thing is merging an see how it goes, as as far as I can tell it works very well.

pushNarrowedInputStream (language,
startLine, startCharOffset,
endLine, endCharOffset,
sourceLineOffset);
#ifndef CTAGS_LIB
tagFileResized = createTagsWithFallback1 (language);
#else
/* Simple parsing without rescans - not used by any sub-parsers anyway */

This comment has been minimized.

Copy link
@b4n

b4n Dec 9, 2018

Member

I don't really care right now because as you pointed out there's no use case yet, but you mentioning in the commit message that "the C and Fortran parsers" being the only parsers using retry mode and that they do not support sub-parsers makes the retry case irrelevant is not correct: nothing prevents a parser from calling the C or Fortran parsers as sub-parsers. IIUC, the upstream Flex or so parser actually does this.

But as said, don't worry too much about that right now; we can fix this stuff when we actually have use for it.

This comment has been minimized.

Copy link
@techee

techee Dec 10, 2018

Author Member

OK, good point. I wanted to avoid re-parsing within subparsers because this would complicate things in the tag manager - if re-parsing happens in the main parser, we clear the tag list and start over. However, if such a thing happened within a subparser, we'd have to have some way to clear just the part of the tags generated by the subparser while leaving the tags generated by the main parser and we currently don't have anything like that in place.

Personally I'd just sacrifice re-parsing in this case (i.e. when using flex running cpp parser as the subparser) to simplify things. If I remember correctly, re-parsing happens quite rarely, at least when I tried the linux kernel sources.

Show resolved Hide resolved ctags/main/lcpp.c
Show resolved Hide resolved ctags/parsers/python.c Outdated
Show resolved Hide resolved ctags/parsers/rest.c Outdated
Show resolved Hide resolved ctags/parsers/asciidoc.c Outdated
@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 10, 2018

@b4n Thanks for the patches, I've added them to this pull request and also removed the currently unused code from asciidoc and rest parsers.

I guess it's kind of safe to assume now that there won't be too many changes in this pull request, right? If so (and if I have some time left this week), I can start checking what changes have been made in the current ctags master in the last two years which we might need to have all the necessary stuff from main so we can just take over parsers from ctags.

@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 10, 2018

I had a brief look at what needs to be done so we can grab ctags parsers - the history of the Python parser summarizes it quite nicely:

https://github.com/universal-ctags/ctags/commits/master/parsers/python.c

There seem to have been some refactorings/renames in main we'll need to get but there doesn't seem to be anything major fortunately. I'll make a pull request to make these changes first so parser syncing can start.

Question: there seem to be quite massive changes in the lregex.c/h upstream and this one isn't so trivial to merge because we use GRegex and ctags uses GNU Regex. Syncing this would mean taking care of the different semantics of the two regex libraries which might be quite a lot of work. What if we just removed support of regex-based parsers from Geany and #if0'd the bodies of all functions from lregex.c? Currently we use regex parsers for HTML, Cobol and ActionScript - for HTML there's a token-based parser upstream we can use which would mean we'd just kill parsing support for Cobol and ActionScript. I may be biased but I don't think these particular languages are so important to justify all the work related to the regex-based parsers.

@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2018

Just for info, I've synced ctags to the state where the majority of parsers can just be copied from upstream ctags without any modification. The code is here

https://github.com/techee/geany/commits/ctags_sync3

and I'll convert it to a proper pull request once this pull request is merged. As part of this branch I used the upstream versions of html/jscript/css parsers to be able to debug subparser support and it seems to be working pretty well now after some patches.

@kugel-

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

BTW: We should consider using git-subtree for future updates. This allows us to track an upstream git repository while having local changes. Then updating becomes a simple merge.

Avoid warnings about unused variables in rest and asciidoc parsers
We don't use the return value of getNestingLevel(), only its
nestingLevelsPop() semantics. Get rid of the return value and rename the
function to avoid confusion.

@techee techee force-pushed the techee:ctags_sync_main branch from e47612b to 457658d Dec 17, 2018

@b4n

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

I guess it's kind of safe to assume now that there won't be too many changes in this pull request, right?

yeah, should be pretty close to merge.

Question: there seem to be quite massive changes in the lregex.c/h upstream and this one isn't so trivial to merge because we use GRegex and ctags uses GNU Regex. Syncing this would mean taking care of the different semantics of the two regex libraries which might be quite a lot of work. What if we just removed support of regex-based parsers from Geany and #if0'd the bodies of all functions from lregex.c? Currently we use regex parsers for HTML, Cobol and ActionScript - for HTML there's a token-based parser upstream we can use which would mean we'd just kill parsing support for Cobol and ActionScript. I may be biased but I don't think these particular languages are so important to justify all the work related to the regex-based parsers.

Hum… on one hand it's kind of sad not to allow regex parsers because some u-ctags parser use it (12 built-in ones apparently), and it makes creating a basic parser easier; on the other I already said that I'd rather not have an additional copy of the GNU regex lib and I understand that having a GRegex module is trickier. Ideally we'd submit an alternative GRegex-based engine upstream and keep it up-to-date over there.

Anyway, for the case at hand: I don't really like dropping support for languages we have, but me being myself I'll give a shot at translating those parsers as "real" ones, which should make them faster as well, and unless they have highly complicated regexes, it should be fairly easy.
So I'd say you can for now assume we can at least temporarily drop lregex, and we'll see what we can do after.

@b4n

b4n approved these changes Dec 17, 2018

@techee

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2018

Hum… on one hand it's kind of sad not to allow regex parsers because some u-ctags parser use it (12 built-in ones apparently), and it makes creating a basic parser easier; on the other I already said that I'd rather not have an additional copy of the GNU regex lib and I understand that having a GRegex module is trickier. Ideally we'd submit an alternative GRegex-based engine upstream and keep it up-to-date over there.

Doesn't the regex syntax differ between them? If so, it would mean also to support different regexes for all the regex-based parsers which is probably not going to happen.

Anyway, for the case at hand: I don't really like dropping support for languages we have, but me being myself I'll give a shot at translating those parsers as "real" ones, which should make them faster as well, and unless they have highly complicated regexes, it should be fairly easy.

The diff against ctags looked really scary but maybe if I just took the uctags version and GRegex-ified it, it might not be so bad. I just wasn't particularly motivated by the languages we use - ActionScript (which is used by Flash and is about to die) and Cobol which might be used by some legacy mainframe software but that's about it. I also don't think it's worth writing proper parsers for these. It might be different if we want to use more regex-based parsers from uctags but my current thinking is something like: "If a language is popular enough, it probably has a proper parser in uctags and then we should support it in Geany. If the language is not popular enough to have an uctags parser, there's no need to support it in Geany".

So I'd say you can for now assume we can at least temporarily drop lregex, and we'll see what we can do after.

This is what I did in the ctags_sync3 branch.

@b4n b4n merged commit 457658d into geany:master Dec 17, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

b4n added a commit that referenced this pull request Dec 17, 2018

Merge pull request #1263 from techee/ctags_sync_main
First part of syncing with Universal-CTags.
@b4n

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

Doesn't the regex syntax differ between them?

Hum, good point, it might. Although, it probably is "mostly" the same if using the same extension mode. IIRC GRegex is using gnu regex, but maybe modified.

The diff against ctags looked really scary but maybe if I just took the uctags version and GRegex-ified it, it might not be so bad.

That'd be a good surprise ^^

I just wasn't particularly motivated by the languages we use - ActionScript (which is used by Flash and is about to die) and Cobol which might be used by some legacy mainframe software but that's about it.

I'd be more interested in Cobol than ActionScript, but yeah both are probably niches in 2018.

I also don't think it's worth writing proper parsers for these.

Meh, were would the fun be then? :(

It might be different if we want to use more regex-based parsers from uctags but my current thinking is something like: "If a language is popular enough, it probably has a proper parser in uctags and then we should support it in Geany. If the language is not popular enough to have an uctags parser, there's no need to support it in Geany".

Makes some sense. And actually, the largest use case for the regex engine in ctags is custom language definitions on the CLI/options file, and we don't support that, so it's no so useful.

@b4n

This comment has been minimized.

Copy link
Member

commented Dec 17, 2018

So… more than 2 years after, this is merged! Thanks, and we're ready for more :)

@elextr

This comment has been minimized.

Copy link
Member

commented Dec 19, 2018

I'd be more interested in Cobol than ActionScript, but yeah both are probably niches in 2018.

@b4n, Cobol at least would seem to be still alive, see #2021

And anyway whats wrong with ctags using a different regex library, its something we link to, and its a dynamic dll, not as if its anything inside Geany itself. Hell, its probably already in memory for most Linuxen.

@DansLeRuSH

This comment has been minimized.

Copy link

commented Dec 19, 2018

Hi everyone !

If I may, and in my humble opinion, it would be a mistake to drop support for COBOL :

  • Huge heritage of code lines (220 out of 310 billion code lines developed worldwide, last year's estimate)
  • There are still many companies on the mainframe (including banks and insurance companies)
  • Geany is one of the few to take it into account and it is an argument
  • Companies running on AIX mainframe solutions (Linux) need a local IDE
  • On my last three missions, I managed to replace Notepad+++ by Geany (integration of the compilation exe on F5), especially with the filedef I provide you today

So yes I know that COBOL isn't modern, sexy but it is widely used. And in this case, why not also abandon Assembler, Fortran or Pascal ? (and I spare you the REXX filedef I developed) 😉

[ Source ]

@elextr

This comment has been minimized.

Copy link
Member

commented Dec 19, 2018

@DansLeRuSH its great that you have contributed filetype improvements. But if Cobol is so useful then Cobol users should step up and provide support for the parser.

Geany is a totally volunteer project, its is not supported by any companies. Those banks and companies you mention should be able to have their people contribute if they want support to continue (sadly I suspect they all use Eclipse).

The suggestion to abandon Cobol is because something needs to be done to continue support, and in a volunteer project, if no one contributes that, then support can not continue.

I'm not an expert on whats needed, but IIUC the problem is that the Cobol parser uses a different Regex library from the rest of Geany, and, as I asked above I'm not sure why thats an issue, and what is needed to make it work. Perhaps open a specific issue to discuss it if its an issue [pun intended].

@DansLeRuSH

This comment has been minimized.

Copy link

commented Dec 20, 2018

@DansLeRuSH its great that you have contributed filetype improvements.

You're welcome, I love Geany and if I could help 😊

But if Cobol is so useful then Cobol users should step up and provide support for the parser.

You're absolutely right ! I told so to my past and actual colleagues.
Perhaps improving actual possibilities could help 🤔

Geany is a totally volunteer project, its is not supported by any companies. Those banks and companies you mention should be able to have their people contribute if they want support to continue

The main problem here is that those companies love to give a lot of money to historical software suppliers, have no trust in open source projects and there's a gap between CIO/supplier trade agreements and developpers needs. But yes I'm agree with you.

(sadly I suspect they all use Eclipse).

You're partly right if it's a classical mainframe on z/OS. The problem here is that IBM has built a solution on top of Eclipse (IDz) but it's not open source and really expensive, small companies can't afford that.

The suggestion to abandon Cobol is because something needs to be done to continue support, and in a volunteer project, if no one contributes that, then support can not continue.

I'm aware of that and above all, thank you for the hard work !

I'm not an expert on whats needed, but IIUC the problem is that the Cobol parser uses a different Regex library from the rest of Geany, and, as I asked above I'm not sure why thats an issue, and what is needed to make it work. Perhaps open a specific issue to discuss it if its an issue [pun intended].

I'm a proud daddy and I haven't so much time but I could help sometimes If you need 🖖 🤓

@elextr

This comment has been minimized.

Copy link
Member

commented Dec 20, 2018

@DansLeRuSH just a final comment on this closed PR, when we say Cobol is removed, what we mean is that support for symbols is removed, but editing highlighting etc is still supported. The filetype is still available.

@elextr

This comment has been minimized.

Copy link
Member

commented Dec 20, 2018

The filetype is still available.

Which means the changes in #2021 to the Cobol filetype are still useful since they apply to the highlighting and other non-parser functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.