New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of anonymous tags #3059
Conversation
Its an API addition so bump it, increasing APIs is no problem, the previous field was uncommented so it should not have been used by plugins and its not an ABI change so long as it remains the same field type. |
Good point, done. |
I havn't looked at the details yet. But points 1 and 2 sound sane. Points 3 and 4 sound useful so long as the kind1 is right. Point 5 would be a good improvement, the disconnect between the struct and the typedef was annoying. What happens with this if a declaration that has both the struct name and the typedef? Footnotes
|
src/tagmanager/tm_parser.c
Outdated
|
||
if (lang == TM_PARSER_C || lang == TM_PARSER_CPP) | ||
{ | ||
return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sscanf really understand regexes? This is also new over the old code in tm_tag_is_anon()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code actually was there before the new cxx parser was merged. In the cxx pull request I changed it to __anon
detection only but then thought there might be some old tag files around so better to use the previous checks too (plus now we get anonymous tags in this format again). Since it worked before, I guess it's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
src/tagmanager/tm_workspace.c
Outdated
@@ -695,7 +695,7 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const GPtrArray *src, | |||
for (i = 0; i < count && num < max_num; ++i) | |||
{ | |||
if (tm_parser_langs_compatible(lang, (*tag)->lang) && | |||
!tm_tag_is_anon(*tag) && | |||
!((*tag)->flags & tm_tag_flag_anon_t) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the this is not an improvement. Might keep tm_tag_is_anon()
and reduce its implementation to just test the flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, looking at it now side by side, this is much less readable. Will fix.
@@ -51,6 +51,3 @@ | |||
main_diff = set(main_src_files) - set(main_dst_files) | |||
if main_diff: | |||
print('Files added to main: ' + str(main_diff)) | |||
|
|||
os.chdir(dstdir) | |||
os.system('patch -p1 <ctags_changes.patch') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay
src/tagmanager/tm_ctags.c
Outdated
/* Temporarily store kind letter for anon name generation to flags | ||
* as we don't have any other place where to store it. This will | ||
* be cleared later in rename_anon_tags() */ | ||
tag->flags |= (guint)kind_letter << 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just add a member to TMTag
instead of this workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if it was OK to add new members to TMTag when it's exposed to plugins. I guess ABI should be bumped in this case?
But I think having the original ctags kind letter would be useful to have in TMTag because once we map it to our TMTagType
representation, there's no way back to get the original kind. This might be needed if we decide to generate tags files in the ctags format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was discussed elsewhere, appending members is generally OK without bumping the ABI.
But I wouldn't fear ABI bumps anyway. For most users those go unnoticed because they just get their recompiled plugins from their distro, it's just a minor inconvenience for those who compile the plugins themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kugel- is right, add the members to the end and its not necessary to bump the ABI.
The ABI problem really only manifests itself when a distro releases Geany with a new ABI, but not the plugins at the same time (Debian has done that before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since then the Debian package implements a technical barrier, where plugins depend on the exact ABI version (so in practice its not possible to update the geany package to a conflicting version, without also updating the plugins).
src/tagmanager/tm_ctags.c
Outdated
if (!new_name) | ||
{ | ||
gchar buf[50]; | ||
guint anon_counter = GPOINTER_TO_UINT(g_hash_table_lookup(anon_counter_table, kind_name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a hash_table is a overweight here. Since kind
is in the range 0...255 a simple array would sufficie
guint anon_counter = anon_counter_table[kind]++;
Alternatively, you could use a hash table that uses kind
as the key and not a more expensive hash of kind_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will change. I was also thinking I could allocate the removed_typedefs
GPtrArray lazily if needed so it isn't allocated every time (and unnecessarily for most languages).
|
||
strncpy(str, nested_tag->scope, prefix_len); | ||
strcpy(str + prefix_len, new_name); | ||
strcpy(str + prefix_len + new_name_len, pos + orig_name_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such string building could use asprintf()
(not requesting a change here; also I see that GLIB only provides g_vasprintf()
but not g_asprintf()
, very strange)
@@ -11,6 +9,8 @@ Named5 | |||
Named6�2�Constants�0 | |||
Named7�2�Constants�0 | |||
a�4�Enum#1�0 | |||
anon_enum_1�2�Constants�1 | |||
anon_enum_2�2�Constants�1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the new naming is really appreciated by Fortran users? I guess we'll see ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. But the new naming is now universal for all languages and we won't need to handle each of them separately. Hopefully none of the 2 users who use Geany for Fortran coding will complain ;-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, lets see if there's any (negative) feedback.
sscanf(name, "Interface#%u%c", &i, &dummy) == 1 || | ||
sscanf(name, "Enum#%u%c", &i, &dummy) == 1; | ||
} | ||
return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test files for java script indicate that the naming is changed. Why is that not handled here (I know it also wasn't in the old tm_tag_is_anon()
but I would like to understand why)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically we should now always get the anon flag in our tags file (notice the 1
at the end) so it should be needed only for legacy tags files. But it's true we should probably check for the __anon
name in case we read tags in the ctags file format and this file is generated without the E
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about existing tag files for java script?
-AnonymousFunction2�16�(n)�class3.c3m1�0
+anon_function_1�16�()�class2.c2m1�1
This in fact is the ctags 'kind' but it should be the kind as ordinary humans understand it. For instance, when you check the cxx parser here in the 3rd column, it is what you would expect to see: geany/ctags/parsers/cxx/cxx_tag.c Line 42 in 5ea3e3e
Both will be kept. This pull request addresses just the anonymous tags and in this case the struct isn't anonymous. |
@kugel- Thanks for the review! I'll go through the points now. |
src/tagmanager/tm_ctags.c
Outdated
const gchar *kind_name = tm_ctags_get_kind_name(kind, tag->lang); | ||
|
||
if (!anon_counter_table) | ||
anon_counter_table = g_new0(gchar, 256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, well, I thought of simply placing the array on the stack (at the function begin). I'm also not sure if 8 bit counter will suit everyone. I think generated sources (from a large vala file or someting like that) could easily overflow 256 anon structs.
I think a gint anon_counter_table[256]
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but then you have to zero it manually (which will be very fast but you will have to do it every time instead of lazily like now). No problem changing it but I think it's about the same like the current code.
The gchar
is definitely my error, I got confused by that 256 value, will correct that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strong about array vs. g_new0, do as you chose (hint: zero-initialization of arrays: gint foo[N] = {0}
)
src/tagmanager/tm_parser.c
Outdated
|
||
if (lang == TM_PARSER_C || lang == TM_PARSER_CPP) | ||
{ | ||
return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
@@ -11,6 +9,8 @@ Named5 | |||
Named6�2�Constants�0 | |||
Named7�2�Constants�0 | |||
a�4�Enum#1�0 | |||
anon_enum_1�2�Constants�1 | |||
anon_enum_2�2�Constants�1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, lets see if there's any (negative) feedback.
sscanf(name, "Interface#%u%c", &i, &dummy) == 1 || | ||
sscanf(name, "Enum#%u%c", &i, &dummy) == 1; | ||
} | ||
return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about existing tag files for java script?
-AnonymousFunction2�16�(n)�class3.c3m1�0
+anon_function_1�16�()�class2.c2m1�1
I would maybe just keep it unhandled - this bug has been there for some time already and nobody has noticed. In addition, I think PHP parser also generates anonymous tags (they just aren't captured by our unit tests) and we didn't detect them either. |
I think I can go with that. What happens if we read such tag files? Some symbols might not be marked as anonymous, anything else? |
That's pretty much it. The biggest problem this can cause is that they'll appear in the autocompletion popup where they shouldn't be normally shown because this name doesn't really exist in the code. |
@@ -94,14 +101,15 @@ typedef struct TMTag | |||
TMSourceFile *file; /**< File in which the tag occurs; NULL for global tags */ | |||
gulong line; /**< Line number of the tag */ | |||
gboolean local; /**< Is the tag of local scope */ | |||
guint pointerOrder; | |||
guint flags; /**< Additional flags */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds Doxygen documentation for the member, which traditionally approves it for plugin use. Is that your intention? Otherwise I would add a simple comment (like with kind_letter) until someone officially requests this member to be part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional - plugins should know if a tag is anonymous (I currently don't do it right in my ProjectOrganizer plugin but in Find Tag where you can find a tag by name I should ignore anonymous tags).
I didn't do this with the kind_letter because there I really don't see much reason why plugins should look at it (and in addition, it may not always be present in the TMTag structure, for instance when the tag is read from our custom tag file format).
src/tagmanager/tm_workspace.c
Outdated
@@ -836,7 +836,7 @@ find_scope_members (const GPtrArray *tags_array, const gchar *name, TMSourceFile | |||
|
|||
/* anonymous type defined in a different file than the variable - | |||
* this isn't the type we are looking for */ | |||
if (tm_tag_is_anon(test_tag) && (file != test_tag->file || test_tag->file == NULL)) | |||
if (tm_tag_is_anon(tag) && (file != test_tag->file || test_tag->file == NULL)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confuses me and Github draws the indentation strangely.
Anyway, I seems wrong to do s/test_tag/tag/, don't you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dammit, I should stop coding today, I start making errors. Sure thing, this is wrong.
Alright. I think some commits should be squashed but overall it looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General approval from my side but I think some later fixup commits could be squashed into the initial ones.
Thanks a lot for your review Thomas, really appreciated! |
I guess you were lucky, I usually don't find much spare time to review other people's valuable work (like yours) while working on my own topics. That's really tragical but that's life I think (with two young children). |
Sadly, the same here, reviewing code just isn't much fun, especially when everyone finds something else interesting and worth the work. I should definitely contribute more in this area myself. (And I don't even have children so no excuses there.) |
So long as it also includes |
Yes, those are used for C++ (and for all other languages they are listed usually at the beginning of the parser). |
@techee Can you cleanup the commits so it can be merged? |
We now use varType instead of pointerOrder and it has been the case for quite and we can use the pointerOrder field for storing something useful. When we store something else inside the renamed pointerOrder, we will break compatibility with some very old tag files but I don't think there are any such files around (also pointerOrder was probably just reported by the C parser). Bump API for the pointerOrder change to flags
Report anonymous tags using XTAG_ANONYMOUS. These anonymous tags aren't generated by the upstream uctags parser so this change hasn't been made available there (will have to be part of some future sync).
The flag is set when XTAG_ANONYMOUS is defined and is used for detecting anonymous tags inside tm_tag_is_anon().
At this point all built in parsers should report anonymous tags correctly so for them we don't need to determine anonymous tags based on tag name. Since tag files might be generated without this information, we still have to determine anonymous tags based on name for them.
Check all the collected tags once a file is parsed (i.e. when we have all tags, including those from subparsers) and renamea them in the form anon_enum_1 anon_struct_1 anon_enum_2 anon_struct_2 where the second component is a ctags kind and the number is per-kind. In addition, scopes of the nested tags have to be updated if the parent tag is an anonymous tag. Finally, for anonymous tags of the form typedef struct Foo{int a;}; we can use the name of the typedef instead of generating the anonymous name. In this case we can drop the typedef tag once the anonymous tag is updated with its name. More details can be found in comments.
Done. |
@kugel- Yay, thanks! |
There are several problems with how we handle anonymous tags which this pull request tries to resolve:
pointerOrder
member of TMTag (see the corresponding commit about more details toflags
and uses one bit to indicate whether a tag is anonymous. This is technically a API change but since pointerOrder always contained 0 and was unused (or maybe used by some super-old parser 15 years back), it doesn't really matter (no g-p plugin uses this field).__anonNUM
- before they were reported as e.ganon_struct_NUM
- i.e. they contained information about the type in their name which made them easier to identify in the symbol tree. This pull request adds back this naming.anon_enum_1
,anon_struct_2
,anon_enum_3
which is a bit strange as it suggests there are 3 enums but there are 2 instead. This pull request makes these numbers per-type so the above becomesanon_enum_1
,anon_struct_1
,anon_enum_2
.typedef struct{int a} Foo;
you would previously get an anonymous tag for the struct under whicha
would be shown and then separately tagFoo
as a typedef. After this patch, you get a tagFoo
for the struct witha
as its member and the anonymous name is dropped which makes things much clearer in the sidebar.And finally, thanks to the fact that we do renaming of anonymous tags ourselves now, the last diff against ctags main can be dropped so we can use the upstream version without any modifications and the patch file isn't necessary any more. Hurray!!!