Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of anonymous tags #3059

Merged
merged 6 commits into from Feb 12, 2022
Merged

Improve handling of anonymous tags #3059

merged 6 commits into from Feb 12, 2022

Conversation

techee
Copy link
Member

@techee techee commented Dec 19, 2021

There are several problems with how we handle anonymous tags which this pull request tries to resolve:

  1. Anonymous tags are currently determined based on how these tags are named by ctags which may not be completely reliable and unnecessary. ctags now has XTAG_ANONYMOUS flag we can query and determine whether a tag is anonymous or not based on that. Our fortran parser didn't report XTAG_ANONYMOUS so it has been updated to do so.
  2. In order to store the information about anonymous tags, this pull request renames the unused pointerOrder member of TMTag (see the corresponding commit about more details to flags and uses one bit to indicate whether a tag is anonymous. This is technically a API change but since pointerOrder always contained 0 and was unused (or maybe used by some super-old parser 15 years back), it doesn't really matter (no g-p plugin uses this field).
  3. With the introduction of the new cxx parser, anonymous tags are reported as __anonNUM - before they were reported as e.g anon_struct_NUM - i.e. they contained information about the type in their name which made them easier to identify in the symbol tree. This pull request adds back this naming.
  4. In addition to (3), the NUM in the previous parser was global for all types - i.e. tags were named as anon_enum_1, anon_struct_2, anon_enum_3 which is a bit strange as it suggests there are 3 enums but there are 2 instead. This pull request makes these numbers per-type so the above becomes anon_enum_1, anon_struct_1, anon_enum_2.
  5. This pull request sets the name of anonymous tag if it's followed by the corresponding typedef tag and the typedef tag is removed. For instance, for typedef struct{int a} Foo; you would previously get an anonymous tag for the struct under which a would be shown and then separately tag Foo as a typedef. After this patch, you get a tag Foo for the struct with a as its member and the anonymous name is dropped which makes things much clearer in the sidebar.

And finally, thanks to the fact that we do renaming of anonymous tags ourselves now, the last diff against ctags main can be dropped so we can use the upstream version without any modifications and the patch file isn't necessary any more. Hurray!!!

@techee techee added the tag-manager Tag manager related patches and bugs label Dec 19, 2021
@elextr
Copy link
Member

elextr commented Dec 20, 2021

This is technically a API change but since pointerOrder always contained 0 and was unused (or maybe used by some super-old parser 15 years back), it doesn't really matter (no g-p plugin uses this field).

Its an API addition so bump it, increasing APIs is no problem, the previous field was uncommented so it should not have been used by plugins and its not an ABI change so long as it remains the same field type.

@techee
Copy link
Member Author

techee commented Dec 20, 2021

Its an API addition so bump it, increasing APIs is no problem, the previous field was uncommented so it should not have been used by plugins and its not an ABI change so long as it remains the same field type.

Good point, done.

@elextr
Copy link
Member

elextr commented Dec 20, 2021

I havn't looked at the details yet.

But points 1 and 2 sound sane.

Points 3 and 4 sound useful so long as the kind1 is right.

Point 5 would be a good improvement, the disconnect between the struct and the typedef was annoying. What happens with this if a declaration that has both the struct name and the typedef?

Footnotes

  1. using "kind" in the common type theory meaning of "classification system for types" eg "struct" "class" etc


if (lang == TM_PARSER_C || lang == TM_PARSER_CPP)
{
return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sscanf really understand regexes? This is also new over the old code in tm_tag_is_anon()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes on Linux but IIUC its not portable, especially a-z meaning lower case ASCII alphabetic, see here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code actually was there before the new cxx parser was merged. In the cxx pull request I changed it to __anon detection only but then thought there might be some old tag files around so better to use the previous checks too (plus now we get anonymous tags in this format again). Since it worked before, I guess it's OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@@ -695,7 +695,7 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const GPtrArray *src,
for (i = 0; i < count && num < max_num; ++i)
{
if (tm_parser_langs_compatible(lang, (*tag)->lang) &&
!tm_tag_is_anon(*tag) &&
!((*tag)->flags & tm_tag_flag_anon_t) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the this is not an improvement. Might keep tm_tag_is_anon() and reduce its implementation to just test the flag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, looking at it now side by side, this is much less readable. Will fix.

@@ -51,6 +51,3 @@
main_diff = set(main_src_files) - set(main_dst_files)
if main_diff:
print('Files added to main: ' + str(main_diff))

os.chdir(dstdir)
os.system('patch -p1 <ctags_changes.patch')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay

/* Temporarily store kind letter for anon name generation to flags
* as we don't have any other place where to store it. This will
* be cleared later in rename_anon_tags() */
tag->flags |= (guint)kind_letter << 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just add a member to TMTag instead of this workaround?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if it was OK to add new members to TMTag when it's exposed to plugins. I guess ABI should be bumped in this case?

But I think having the original ctags kind letter would be useful to have in TMTag because once we map it to our TMTagType representation, there's no way back to get the original kind. This might be needed if we decide to generate tags files in the ctags format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was discussed elsewhere, appending members is generally OK without bumping the ABI.

But I wouldn't fear ABI bumps anyway. For most users those go unnoticed because they just get their recompiled plugins from their distro, it's just a minor inconvenience for those who compile the plugins themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kugel- is right, add the members to the end and its not necessary to bump the ABI.

The ABI problem really only manifests itself when a distro releases Geany with a new ABI, but not the plugins at the same time (Debian has done that before).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since then the Debian package implements a technical barrier, where plugins depend on the exact ABI version (so in practice its not possible to update the geany package to a conflicting version, without also updating the plugins).

if (!new_name)
{
gchar buf[50];
guint anon_counter = GPOINTER_TO_UINT(g_hash_table_lookup(anon_counter_table, kind_name));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a hash_table is a overweight here. Since kind is in the range 0...255 a simple array would sufficie

guint anon_counter = anon_counter_table[kind]++;

Alternatively, you could use a hash table that uses kind as the key and not a more expensive hash of kind_name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will change. I was also thinking I could allocate the removed_typedefs GPtrArray lazily if needed so it isn't allocated every time (and unnecessarily for most languages).


strncpy(str, nested_tag->scope, prefix_len);
strcpy(str + prefix_len, new_name);
strcpy(str + prefix_len + new_name_len, pos + orig_name_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such string building could use asprintf() (not requesting a change here; also I see that GLIB only provides g_vasprintf() but not g_asprintf(), very strange)

@@ -11,6 +9,8 @@ Named5
Named6�2�Constants�0
Named7�2�Constants�0
a�4�Enum#1�0
anon_enum_1�2�Constants�1
anon_enum_2�2�Constants�1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the new naming is really appreciated by Fortran users? I guess we'll see ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. But the new naming is now universal for all languages and we won't need to handle each of them separately. Hopefully none of the 2 users who use Geany for Fortran coding will complain ;-).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, lets see if there's any (negative) feedback.

sscanf(name, "Interface#%u%c", &i, &dummy) == 1 ||
sscanf(name, "Enum#%u%c", &i, &dummy) == 1;
}
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test files for java script indicate that the naming is changed. Why is that not handled here (I know it also wasn't in the old tm_tag_is_anon() but I would like to understand why)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we should now always get the anon flag in our tags file (notice the 1 at the end) so it should be needed only for legacy tags files. But it's true we should probably check for the __anon name in case we read tags in the ctags file format and this file is generated without the E field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what about existing tag files for java script?

-AnonymousFunction2�16�(n)�class3.c3m1�0
+anon_function_1�16�()�class2.c2m1�1

@techee
Copy link
Member Author

techee commented Dec 20, 2021

Points 3 and 4 sound useful so long as the kind1 is right.

This in fact is the ctags 'kind' but it should be the kind as ordinary humans understand it. For instance, when you check the cxx parser here in the 3rd column, it is what you would expect to see:

#define CXX_COMMON_KINDS(_langPrefix, _szMemberDescription, _syncWith) \

Point 5 would be a good improvement, the disconnect between the struct and the typedef was annoying. What happens with this if a declaration that has both the struct name and the typedef?

Both will be kept. This pull request addresses just the anonymous tags and in this case the struct isn't anonymous.

@techee
Copy link
Member Author

techee commented Dec 20, 2021

@kugel- Thanks for the review! I'll go through the points now.

const gchar *kind_name = tm_ctags_get_kind_name(kind, tag->lang);

if (!anon_counter_table)
anon_counter_table = g_new0(gchar, 256);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, well, I thought of simply placing the array on the stack (at the function begin). I'm also not sure if 8 bit counter will suit everyone. I think generated sources (from a large vala file or someting like that) could easily overflow 256 anon structs.

I think a gint anon_counter_table[256] should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but then you have to zero it manually (which will be very fast but you will have to do it every time instead of lazily like now). No problem changing it but I think it's about the same like the current code.

The gchar is definitely my error, I got confused by that 256 value, will correct that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strong about array vs. g_new0, do as you chose (hint: zero-initialization of arrays: gint foo[N] = {0})


if (lang == TM_PARSER_C || lang == TM_PARSER_CPP)
{
return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@@ -11,6 +9,8 @@ Named5
Named6�2�Constants�0
Named7�2�Constants�0
a�4�Enum#1�0
anon_enum_1�2�Constants�1
anon_enum_2�2�Constants�1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, lets see if there's any (negative) feedback.

sscanf(name, "Interface#%u%c", &i, &dummy) == 1 ||
sscanf(name, "Enum#%u%c", &i, &dummy) == 1;
}
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But what about existing tag files for java script?

-AnonymousFunction2�16�(n)�class3.c3m1�0
+anon_function_1�16�()�class2.c2m1�1

@techee
Copy link
Member Author

techee commented Dec 20, 2021

But what about existing tag files for java script?

I would maybe just keep it unhandled - this bug has been there for some time already and nobody has noticed. In addition, I think PHP parser also generates anonymous tags (they just aren't captured by our unit tests) and we didn't detect them either.

@kugel-
Copy link
Member

kugel- commented Dec 20, 2021

But what about existing tag files for java script?

I would maybe just keep it unhandled - this bug has been there for some time already and nobody has noticed. In addition, I think PHP parser also generates anonymous tags (they just aren't captured by our unit tests) and we didn't detect them either.

I think I can go with that. What happens if we read such tag files? Some symbols might not be marked as anonymous, anything else?

@techee
Copy link
Member Author

techee commented Dec 20, 2021

Some symbols might not be marked as anonymous

That's pretty much it. The biggest problem this can cause is that they'll appear in the autocompletion popup where they shouldn't be normally shown because this name doesn't really exist in the code.

@@ -94,14 +101,15 @@ typedef struct TMTag
TMSourceFile *file; /**< File in which the tag occurs; NULL for global tags */
gulong line; /**< Line number of the tag */
gboolean local; /**< Is the tag of local scope */
guint pointerOrder;
guint flags; /**< Additional flags */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds Doxygen documentation for the member, which traditionally approves it for plugin use. Is that your intention? Otherwise I would add a simple comment (like with kind_letter) until someone officially requests this member to be part of the API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional - plugins should know if a tag is anonymous (I currently don't do it right in my ProjectOrganizer plugin but in Find Tag where you can find a tag by name I should ignore anonymous tags).

I didn't do this with the kind_letter because there I really don't see much reason why plugins should look at it (and in addition, it may not always be present in the TMTag structure, for instance when the tag is read from our custom tag file format).

@@ -836,7 +836,7 @@ find_scope_members (const GPtrArray *tags_array, const gchar *name, TMSourceFile

/* anonymous type defined in a different file than the variable -
* this isn't the type we are looking for */
if (tm_tag_is_anon(test_tag) && (file != test_tag->file || test_tag->file == NULL))
if (tm_tag_is_anon(tag) && (file != test_tag->file || test_tag->file == NULL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confuses me and Github draws the indentation strangely.

Anyway, I seems wrong to do s/test_tag/tag/, don't you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dammit, I should stop coding today, I start making errors. Sure thing, this is wrong.

@kugel-
Copy link
Member

kugel- commented Dec 20, 2021

Alright. I think some commits should be squashed but overall it looks good to me.

Copy link
Member

@kugel- kugel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approval from my side but I think some later fixup commits could be squashed into the initial ones.

@techee
Copy link
Member Author

techee commented Dec 20, 2021

Alright. I think some commits should be squashed but overall it looks good to me.

Thanks a lot for your review Thomas, really appreciated!

@kugel-
Copy link
Member

kugel- commented Dec 20, 2021

I guess you were lucky, I usually don't find much spare time to review other people's valuable work (like yours) while working on my own topics. That's really tragical but that's life I think (with two young children).

@techee
Copy link
Member Author

techee commented Dec 20, 2021

I guess you were lucky, I usually don't find much spare time to review other people's valuable work (like yours) while working on my own topics. That's really tragical but that's life I think (with two young children).

Sadly, the same here, reviewing code just isn't much fun, especially when everyone finds something else interesting and worth the work. I should definitely contribute more in this area myself. (And I don't even have children so no excuses there.)

@elextr
Copy link
Member

elextr commented Dec 20, 2021

This in fact is the ctags 'kind' but it should be the kind as ordinary humans understand it. For instance, when you check the cxx parser here in the 3rd column, it is what you would expect to see:

So long as it also includes g_aCXXCPPKinds for C++?

@techee
Copy link
Member Author

techee commented Dec 23, 2021

So long as it also includes g_aCXXCPPKinds for C++?

Yes, those are used for C++ (and for all other languages they are listed usually at the beginning of the parser).

@kugel-
Copy link
Member

kugel- commented Jan 10, 2022

@techee Can you cleanup the commits so it can be merged?

We now use varType instead of pointerOrder and it has been the case for
quite and we can use the pointerOrder field for storing something useful.
When we store something else inside the renamed pointerOrder, we will
break compatibility with some very old tag files but I don't think there
are any such files around (also pointerOrder was probably just reported
by the C parser).

Bump API for the pointerOrder change to flags
Report anonymous tags using XTAG_ANONYMOUS. These anonymous tags aren't
generated by the upstream uctags parser so this change hasn't been made
available there (will have to be part of some future sync).
The flag is set when XTAG_ANONYMOUS is defined and is used for detecting
anonymous tags inside tm_tag_is_anon().
At this point all built in parsers should report anonymous tags correctly
so for them we don't need to determine anonymous tags based on tag name.
Since tag files might be generated without this information, we still
have to determine anonymous tags based on name for them.
Check all the collected tags once a file is parsed (i.e. when
we have all tags, including those from subparsers) and renamea
them in the form

anon_enum_1
anon_struct_1
anon_enum_2
anon_struct_2

where the second component is a ctags kind and the number is
per-kind.

In addition, scopes of the nested tags have to be updated if the parent
tag is an anonymous tag.

Finally, for anonymous tags of the form typedef struct Foo{int a;};
we can use the name of the typedef instead of generating the
anonymous name. In this case we can drop the typedef tag once
the anonymous tag is updated with its name.

More details can be found in comments.
@techee
Copy link
Member Author

techee commented Jan 10, 2022

@techee Can you cleanup the commits so it can be merged?

Done.

@kugel- kugel- merged commit 9642f17 into geany:master Feb 12, 2022
kugel- added a commit that referenced this pull request Feb 12, 2022
Anonymous enums are now flagged as such (last digit == 1).

GDScript support (#3012) and the anonymous tags PR (#3059) did not know about
each other so this wasn't catched by CI until now.
@techee
Copy link
Member Author

techee commented Feb 12, 2022

@kugel- Yay, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag-manager Tag manager related patches and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants