New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cxx parser from uctags #3032
Conversation
This is to avoid clash with cpreprocessor.c/h used by the new cxx parser. Merging lcpp.c/h with cpreprocessor.c/h would be difficult (at least for now) because of the differences in c.c so keep them separate for now.
As a result, when we copy the new cxx parser, we don't have clashes of these symbols from the two different parsers.
This patch only makes the parser compile, it doesn't enable it yet.
There are several things needed for this: 1. The new preprocessor has to be defined as a separate parser. 2. Tags from the new c/c++ parsers and the preprocessor parser have to be mapped to Geany types. We still need to keep the old mappings because some parsers like Ferite or GLSL still use the old C parser. 3. Anonymous tags have a different name so we have to reflect this in tm_tag_is_anon().
The changes are mostly these: 1. Spaces in function argument list (int var1, int var2, ...) - before (int var1,int var2,...) - now 2. Anonymous tags anon_struct_1 anon_union_2 anon_typedef_3 vs __anon1 __anon2 __anon3 3. Improved parsing of the new parser
919a234
to
1f03474
Compare
Just repushed, there was a problem with one unit test. |
Wow 77 files changed, although some are tests. |
Yeah, this plus the new cxx parser consisting of about 30 files. |
Hmmm, well, I can't tell any difference from the old one, are you sure its being used? None of the features here seem to work, maybe Geany needs more to support them? |
Also following error was output on terminal:
I will look for it tomorrow and try to make a MRE. [Edit: which I guess proves its being used :)] |
These tag kinds are currently not mapped to anything in Geany but better to make sure the parser works correctly before adding more stuff (I did the same with other parsers too):
Have you tried some tricky file that produced wrong output before with the new parser?
I'll have a look at this. |
I created a bug report regarding the warning here: It should be harmless though. |
You can use is |
Yep, I'm aware of that and I want to address that in the future. The problem is that our internal representation of tags Line 87 in eabc09a
doesn't contain a field where this information could be stored and adding more fields is an ABI change because this structure is accessible to plugins so better avoid that. But I think we could use the |
Its ok to add fields at the end of an ABI structure, just don't change the order of any existing ones. |
It wouldn't matter if you store pointers only to TMTag. But when you have values like
the size of TMTag increases with a new field and things break. |
I've just pushed fix for the warning from |
I guess you mean a pointer to an array is passed from TM and used by plugins, not that plugins allocate Thats going to be a bit of a problem when languages that are not C need additional data, sigh, add it to the TM hate list. Since it was stolen from Anjuta, maybe we should steal their Sqlite replacement :) |
Anyway crisis temporarily averted. |
I actually meant TMTags allocated by plugins e.g. on stack but I was wrong - in that case it's not a problem so it shouldn't be theoretically a problem to add more fields.
No way, I think our TM is pretty good now (many parts rewritten/removed by me and Colomban) and storing tags to a database doesn't solve anything for us. |
@masatake Maybe one question - currently we have scope separators hard-coded for individual languages here: geany/src/tagmanager/tm_parser.c Line 764 in d8f2f14
Is it possible to get this information from ctags? |
Thats outlawed by HACKING for exactly this reason. |
To answer myself after looking at several parsers: no, these are hard-coded in parsers themselves in many cases. Maybe it's possible for those using ATTACH_SEPARATORS() but it's just some parsers. |
Sorry to be late. In tags output, you can see separator definitions:
If you need improvement in this area, please, make an issue at u-ctags. |
I'm reading mini-geany.c. However, there is no way to access 'parserDefinition' where a default scope separator is defined. How about following interface:
Of course, a parser defines separators explicitly. But it is a bit different topic. |
@masatake Thanks for your detailed explanation. I don't think we really need any special API for that, I was just asking in case we were doing something stupid unnecessarily :-) Those multiple scope separators in PHP are quite unpleasant though - our code currently assumes there's a single type of scope separator per language. I guess the easiest way for us will be to rewrite tags we receive from ctags to have a single type of scope separator. Is there any other language that uses multiple types of scope separators? (I'm not sure if multiple types of scope separators are a good idea even if the given language uses these for certain kinds - I think tools processing ctags files won't be very happy they have to deal with this.) I also started realizing we'll have to take "roles" into account which we currently ignore - in the go compiler for instance the role is used to distinguish between the package of the file and imported packages which are 2 different things and confuse our code:
Am I right to assume that
|
@masatake After thinking about it for a while, is it actually a good idea to distinguish between imports and package definitions using a role in go (and possibly some other languages)? The way I understand semantics of roles is that they further distinguish the nature of a single kind but if this extra information is ignored, nothing bad happens. For instance in Python I think the following is alright because it just distinguishes different types of imports
But package definitions and imports of other packages are two fundamentally different things and IMO there should rather be two different kinds in go for these. Note: Take the above as a note of someone who is lazy to do the extra work in Geany and possibly just tries to figure out how to delegate some work to others :-). What I wrote above make sense in my (very biased) brain but whatever is implemented should primarily be a good thing for ctags. |
OK, understood, I think no API is needed, I just converted
It's true I misunderstood roles, the problem is still present though - I just created a new issue against ctags here universal-ctags/ctags#3211 so we don't pollute this thread. |
@elextr What are the tags you are missing with the new parser? Is it the local variables? It is the 'l' kind you can map e.g. to tm_tag_variable_t and then you get all local variables parsed (and everything works - sidebar, going to tag definition/declaration, scope completion etc.). I just think that maybe showing local variables in the sidebar is a little verbose. I think we should improve our tag mapping capabilities for the sidebar and also improve the maintainability of the code by moving the mappings from Line 459 in 808b7a3
to tm_ctags.c next to the ctags mappings so they are side by side and easier to manage. It should be also made possible to disable some kinds for the sidebar while keeping them for the rest of what we use tags for like going to tag definitions, declarations etc. One example might be the local variables (but I don't use the sidebar myself much and maybe we want them) but it is also possible to get tags for function parameters - while these can be good for autocompletion or jumping in the code, we don't want to show them separately in the sidebar as they are already part of function prototypes. |
#define map_CPP map_C | ||
# define COMMON_C_NEW_PARSER \ | ||
{'h', tm_tag_undef_t}, \ | ||
{'l', tm_tag_undef_t}, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elextr If you want to experiment, map the 'l' kind here to tm_tag_variable_t and you'll get all local variables parsed.
The syntax is slightly different from the previous syntax and is described here: https://docs.ctags.io/en/latest/parser-cxx.html Basic usage should be the same, uctags just doesn't support Geany's wildcard ignores like G_GNUC_*. On the other hand the new parser is much more resilient to macros so there shouldn't be so much need for manual ignores. The original code is still kept for parsers from c.c that still use the old preprocessor.
I just added a patch to pass our |
# Conflicts: # ctags/Makefile.am
I just resolved the conflict (only Makefile.am) for this pull request, #3031 and #3035. All the 3 pull requests are more or less ready to be merged from my perspective. For the cxx parser here, I'd just point out 2 things to decide:
|
Will test soon(ish).
This is one of the downsides of following closely upstream, and as you say, not much we can do
Please. |
Seems to work.
Hmmm, I see what you mean, looking at the typedefs at the start of tm_parser.c the struct is shown as I enabled local variables as you suggested (why I was editing But not all (see example below) :-(. What about function parameters? These two will help with many simple cases to show options for But its raising the question of how far should we try to go? Without But even with all of those, the trend to inferred types is going to be a big limitation (ok, not for C but for C++ and for other languages that infer types). For example (showing examples of STL includes and scope and inference): #include <utility>
#include <string>
using namespace std::string_literals;
f(){
// the type returned from make_pair is inferred from its parameters as std::pair<int, std::string>
// the types of i and s are inferred by decomposing that return type
auto [i, s] = std::make_pair(1, "abc"s);
// as above but assigning the pair undecomposed
auto p = std::make_pair(1, "abc"s);
s.si????
p.se????
} If I do ctrl+space at the ????s what should autocomplete show (answer "size" and "second"). But in fact i and s are not parsed as locals currently so "si" gets me tons of Linux signal symbols, and although p is parsed, because it is inferred, ctags gives it no type and no scope autocompletion, so it gets a motley mixture of "se" symbols from C!!!. Still being able to get explicitly declared locals and parameters would be a big help with Geany 😄 |
The disconnection between typedefs and e.g. anonymous enums isn't something new - it was already present in the old parser. The new thing here is that the name lacks the information about what kind of anonymous type it is. For instance, before, you'd get If it's not a problem, I would open a separate pull request for this - it's not just a simple renaming of tag names, the problem is that these anonymous tags may appear in scopes as parents of other tags and then we have to update the scope information too. Also we should start using the information from ctags to detect whether a tag is anonymous and not depend on the detection based on just tag name.
Our scope completion but also jumping to tag definition/declaration isn't just ready for enabling local variables yet - there will be too many false positives. As you said, we can't fully rely on the scope information because we don't know what files get parsed and I think we'll have to find some hybrid approach where we use the scope information in some cases and in others not.
They should eventually get enabled too, but again, we aren't ready for them yet. We primarily need to improve our abilities to be able decide what get shown in the symbol tree and how. For instance, while we want the parameters to be parsed by ctags, we don't want to see them below the function name in the sidebar - those would be confused with local variables. Once all the currently pending parser pull requests are merged, I'd like to move the tree mappings from symbols.c to tm_parsers and improve their flexibility and also simplify maintenance and adding new parsers. For instance for makefiles the definition would look as follows:
|
Ok, I had "autocomplete symbols" off and it seems ctrl+space won't scope autocomplete, just name autocomplete, I didn't realise that. Both scope and name work with it on. Scope autocomplete works for locals too 👍 (eg typing map-> at tm_parser.c:674) but as you pointed out also works outside the scope of the local 😞
Well technically parameters are more correct than locals since they are visible for the whole of the function whereas locals are only visible from the declaration to end of scope. Neither Vscode or Eclipse show locals or parameters in their symbols for functions, which sort of makes sense, you can't scope specify them (ie can't do funct_name::local_name or funct_name::param_name) but both show their declarations in a tooltip on hover (which I'm still ambivalent on, it can be annoying). And the locals and parameters names are styled different, but thats not something we can do because Scintilla. Anyway this is just experimenting, not part of this PR, which so far seems to work, I would be happy for it to be merged so it gets more testing. |
Yeah, both of them should be disabled for the sidebar.
👍 |
Docs LGTM |
If nobody raises a problem or beats me to it will squish and merge "soon" |
Yeah :-). Anyway, I really think that all the parser pull requests should be merged soon (notice - no apostrophes) to get some testing. If there are problems, we can either report them upstream and get a fixed version or just revert back to the old parser - that should be mostly trivial. The only parser I'd like to have a look at first is #3034 and test it on Raspberry Pi to see how it works on slower hardware. |
@elextr One more thing: once this pull request is merged, it will be possible to use the upstream ASM parser that depends on the new cxx preprocessor this pull request adds (the parser would have been included in the "parsers with big changes" pull request otherwise if there weren't this dependency). Should I create a separate pull request for that or add it e.g. into #3035? |
Just the weekend in case somebody has a problem, really, truely, honestly ... 😄
I would, small is good, especially in pull requests. |
Just a gentle reminder - the weekend is over, another weekend is coming :-). |
Well, I didn't say which weekend did I?? 😀 Just ran out of time, hopefully soon. |
There ya go, just had to paranoid re-check it still worked for me. 😄 |
This pull request switches us to the new cxx parser for C and C++.
There is nothing really special worth noting here apart from the
change how anonymous tags are called in the new parser. In the
old parser we had e.g.
which are now called just
The question is whether we should do something about that or just keep it this way (it should be possible to rename the tags in our code, the only problem is that we have to go through all the tags and apart from tag names also update scope names).
Fixes #2349, fixes #1314, fixes #1249, fixes #1944, fixes #2916