Improve handling of anonymous tags #3059

techee · 2021-12-19T20:58:01Z

There are several problems with how we handle anonymous tags which this pull request tries to resolve:

Anonymous tags are currently determined based on how these tags are named by ctags which may not be completely reliable and unnecessary. ctags now has XTAG_ANONYMOUS flag we can query and determine whether a tag is anonymous or not based on that. Our fortran parser didn't report XTAG_ANONYMOUS so it has been updated to do so.
In order to store the information about anonymous tags, this pull request renames the unused pointerOrder member of TMTag (see the corresponding commit about more details to flags and uses one bit to indicate whether a tag is anonymous. This is technically a API change but since pointerOrder always contained 0 and was unused (or maybe used by some super-old parser 15 years back), it doesn't really matter (no g-p plugin uses this field).
With the introduction of the new cxx parser, anonymous tags are reported as __anonNUM - before they were reported as e.g anon_struct_NUM - i.e. they contained information about the type in their name which made them easier to identify in the symbol tree. This pull request adds back this naming.
In addition to (3), the NUM in the previous parser was global for all types - i.e. tags were named as anon_enum_1, anon_struct_2, anon_enum_3 which is a bit strange as it suggests there are 3 enums but there are 2 instead. This pull request makes these numbers per-type so the above becomes anon_enum_1, anon_struct_1, anon_enum_2.
This pull request sets the name of anonymous tag if it's followed by the corresponding typedef tag and the typedef tag is removed. For instance, for typedef struct{int a} Foo; you would previously get an anonymous tag for the struct under which a would be shown and then separately tag Foo as a typedef. After this patch, you get a tag Foo for the struct with a as its member and the anonymous name is dropped which makes things much clearer in the sidebar.

And finally, thanks to the fact that we do renaming of anonymous tags ourselves now, the last diff against ctags main can be dropped so we can use the upstream version without any modifications and the patch file isn't necessary any more. Hurray!!!

elextr · 2021-12-20T00:01:25Z

This is technically a API change but since pointerOrder always contained 0 and was unused (or maybe used by some super-old parser 15 years back), it doesn't really matter (no g-p plugin uses this field).

Its an API addition so bump it, increasing APIs is no problem, the previous field was uncommented so it should not have been used by plugins and its not an ABI change so long as it remains the same field type.

techee · 2021-12-20T09:33:25Z

Its an API addition so bump it, increasing APIs is no problem, the previous field was uncommented so it should not have been used by plugins and its not an ABI change so long as it remains the same field type.

Good point, done.

elextr · 2021-12-20T10:08:02Z

I havn't looked at the details yet.

But points 1 and 2 sound sane.

Points 3 and 4 sound useful so long as the kind¹ is right.

Point 5 would be a good improvement, the disconnect between the struct and the typedef was annoying. What happens with this if a declaration that has both the struct name and the typedef?

using "kind" in the common type theory meaning of "classification system for types" eg "struct" "class" etc ↩

kugel- · 2021-12-20T11:11:19Z

src/tagmanager/tm_parser.c

+
+	if (lang == TM_PARSER_C || lang == TM_PARSER_CPP)
+	{
+		return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 ||


Does sscanf really understand regexes? This is also new over the old code in tm_tag_is_anon()

Yes on Linux but IIUC its not portable, especially a-z meaning lower case ASCII alphabetic, see here

This code actually was there before the new cxx parser was merged. In the cxx pull request I changed it to __anon detection only but then thought there might be some old tag files around so better to use the previous checks too (plus now we get anonymous tags in this format again). Since it worked before, I guess it's OK.

kugel- · 2021-12-20T11:11:24Z

src/tagmanager/tm_workspace.c

@@ -695,7 +695,7 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const GPtrArray *src,
 	for (i = 0; i < count && num < max_num; ++i)
 	{
 		if (tm_parser_langs_compatible(lang, (*tag)->lang) &&
-			!tm_tag_is_anon(*tag) &&
+			!((*tag)->flags & tm_tag_flag_anon_t) &&


IMO the this is not an improvement. Might keep tm_tag_is_anon() and reduce its implementation to just test the flag

Yeah, you are right, looking at it now side by side, this is much less readable. Will fix.

kugel- · 2021-12-20T11:13:52Z

scripts/update-ctags.py

@@ -51,6 +51,3 @@
 main_diff = set(main_src_files) - set(main_dst_files)
 if main_diff:
    print('Files added to main: ' + str(main_diff))
-
-os.chdir(dstdir)
-os.system('patch -p1 <ctags_changes.patch')


kugel- · 2021-12-20T11:15:12Z

src/tagmanager/tm_ctags.c

+		/* Temporarily store kind letter for anon name generation to flags
+		 * as we don't have any other place where to store it. This will
+		 * be cleared later in rename_anon_tags() */
+		tag->flags |= (guint)kind_letter << 16;


Can't you just add a member to TMTag instead of this workaround?

I wasn't sure if it was OK to add new members to TMTag when it's exposed to plugins. I guess ABI should be bumped in this case?

But I think having the original ctags kind letter would be useful to have in TMTag because once we map it to our TMTagType representation, there's no way back to get the original kind. This might be needed if we decide to generate tags files in the ctags format.

I think it was discussed elsewhere, appending members is generally OK without bumping the ABI.

But I wouldn't fear ABI bumps anyway. For most users those go unnoticed because they just get their recompiled plugins from their distro, it's just a minor inconvenience for those who compile the plugins themselves.

@kugel- is right, add the members to the end and its not necessary to bump the ABI.

The ABI problem really only manifests itself when a distro releases Geany with a new ABI, but not the plugins at the same time (Debian has done that before).

Since then the Debian package implements a technical barrier, where plugins depend on the exact ABI version (so in practice its not possible to update the geany package to a conflicting version, without also updating the plugins).

kugel- · 2021-12-20T11:21:49Z

src/tagmanager/tm_ctags.c

+			if (!new_name)
+			{
+				gchar buf[50];
+				guint anon_counter = GPOINTER_TO_UINT(g_hash_table_lookup(anon_counter_table, kind_name));


I think a hash_table is a overweight here. Since kind is in the range 0...255 a simple array would sufficie

guint anon_counter = anon_counter_table[kind]++;

Alternatively, you could use a hash table that uses kind as the key and not a more expensive hash of kind_name

Good point, will change. I was also thinking I could allocate the removed_typedefs GPtrArray lazily if needed so it isn't allocated every time (and unnecessarily for most languages).

kugel- · 2021-12-20T11:24:21Z

src/tagmanager/tm_ctags.c

+
+					strncpy(str, nested_tag->scope, prefix_len);
+					strcpy(str + prefix_len, new_name);
+					strcpy(str + prefix_len + new_name_len, pos + orig_name_len);


Such string building could use asprintf() (not requesting a change here; also I see that GLIB only provides g_vasprintf() but not g_asprintf(), very strange)

kugel- · 2021-12-20T11:25:32Z

tests/ctags/enum.f90.tags

@@ -11,6 +9,8 @@ Named5
 Named6�2�Constants�0
 Named7�2�Constants�0
 a�4�Enum#1�0
+anon_enum_1�2�Constants�1
+anon_enum_2�2�Constants�1


I wonder if the new naming is really appreciated by Fortran users? I guess we'll see ;-)

Yeah. But the new naming is now universal for all languages and we won't need to handle each of them separately. Hopefully none of the 2 users who use Geany for Fortran coding will complain ;-).

Yea, lets see if there's any (negative) feedback.

kugel- · 2021-12-20T11:28:17Z

src/tagmanager/tm_parser.c

+			sscanf(name, "Interface#%u%c", &i, &dummy) == 1 ||
+			sscanf(name, "Enum#%u%c", &i, &dummy) == 1;
+	}
+	return FALSE;


The unit test files for java script indicate that the naming is changed. Why is that not handled here (I know it also wasn't in the old tm_tag_is_anon() but I would like to understand why)?

Basically we should now always get the anon flag in our tags file (notice the 1 at the end) so it should be needed only for legacy tags files. But it's true we should probably check for the __anon name in case we read tags in the ctags file format and this file is generated without the E field.

But what about existing tag files for java script?

-AnonymousFunction2�16�(n)�class3.c3m1�0 +anon_function_1�16�()�class2.c2m1�1

techee · 2021-12-20T15:27:49Z

Points 3 and 4 sound useful so long as the kind1 is right.

This in fact is the ctags 'kind' but it should be the kind as ordinary humans understand it. For instance, when you check the cxx parser here in the 3rd column, it is what you would expect to see:

geany/ctags/parsers/cxx/cxx_tag.c

Line 42 in 5ea3e3e

#define CXX_COMMON_KINDS(_langPrefix, _szMemberDescription, _syncWith) \

Point 5 would be a good improvement, the disconnect between the struct and the typedef was annoying. What happens with this if a declaration that has both the struct name and the typedef?

Both will be kept. This pull request addresses just the anonymous tags and in this case the struct isn't anonymous.

techee · 2021-12-20T15:28:56Z

@kugel- Thanks for the review! I'll go through the points now.

kugel- · 2021-12-20T20:36:30Z

src/tagmanager/tm_ctags.c

+				const gchar *kind_name = tm_ctags_get_kind_name(kind, tag->lang);
+
+				if (!anon_counter_table)
+					anon_counter_table = g_new0(gchar, 256);


Err, well, I thought of simply placing the array on the stack (at the function begin). I'm also not sure if 8 bit counter will suit everyone. I think generated sources (from a large vala file or someting like that) could easily overflow 256 anon structs.

I think a gint anon_counter_table[256] should work.

Yes, but then you have to zero it manually (which will be very fast but you will have to do it every time instead of lazily like now). No problem changing it but I think it's about the same like the current code.

The gchar is definitely my error, I got confused by that 256 value, will correct that.

I don't feel strong about array vs. g_new0, do as you chose (hint: zero-initialization of arrays: gint foo[N] = {0})

kugel- · 2021-12-20T20:37:07Z

src/tagmanager/tm_parser.c

+
+	if (lang == TM_PARSER_C || lang == TM_PARSER_CPP)
+	{
+		return sscanf(name, "anon_%*[a-z]_%u%c", &i, &dummy) == 1 ||


kugel- · 2021-12-20T20:41:32Z

tests/ctags/enum.f90.tags

@@ -11,6 +9,8 @@ Named5
 Named6�2�Constants�0
 Named7�2�Constants�0
 a�4�Enum#1�0
+anon_enum_1�2�Constants�1
+anon_enum_2�2�Constants�1


Yea, lets see if there's any (negative) feedback.

kugel- · 2021-12-20T20:43:03Z

src/tagmanager/tm_parser.c

+			sscanf(name, "Interface#%u%c", &i, &dummy) == 1 ||
+			sscanf(name, "Enum#%u%c", &i, &dummy) == 1;
+	}
+	return FALSE;


But what about existing tag files for java script?

-AnonymousFunction2�16�(n)�class3.c3m1�0 +anon_function_1�16�()�class2.c2m1�1

techee · 2021-12-20T21:07:09Z

But what about existing tag files for java script?

I would maybe just keep it unhandled - this bug has been there for some time already and nobody has noticed. In addition, I think PHP parser also generates anonymous tags (they just aren't captured by our unit tests) and we didn't detect them either.

kugel- · 2021-12-20T21:13:22Z

But what about existing tag files for java script?

I would maybe just keep it unhandled - this bug has been there for some time already and nobody has noticed. In addition, I think PHP parser also generates anonymous tags (they just aren't captured by our unit tests) and we didn't detect them either.

I think I can go with that. What happens if we read such tag files? Some symbols might not be marked as anonymous, anything else?

techee · 2021-12-20T21:21:18Z

Some symbols might not be marked as anonymous

That's pretty much it. The biggest problem this can cause is that they'll appear in the autocompletion popup where they shouldn't be normally shown because this name doesn't really exist in the code.

kugel- · 2021-12-20T21:21:10Z

src/tagmanager/tm_tag.h

@@ -94,14 +101,15 @@ typedef struct TMTag
 	TMSourceFile *file; /**< File in which the tag occurs; NULL for global tags */
 	gulong line; /**< Line number of the tag */
 	gboolean local; /**< Is the tag of local scope */
-	guint pointerOrder;
+	guint flags; /**< Additional flags */


This adds Doxygen documentation for the member, which traditionally approves it for plugin use. Is that your intention? Otherwise I would add a simple comment (like with kind_letter) until someone officially requests this member to be part of the API.

This was intentional - plugins should know if a tag is anonymous (I currently don't do it right in my ProjectOrganizer plugin but in Find Tag where you can find a tag by name I should ignore anonymous tags).

I didn't do this with the kind_letter because there I really don't see much reason why plugins should look at it (and in addition, it may not always be present in the TMTag structure, for instance when the tag is read from our custom tag file format).

kugel- · 2021-12-20T21:23:30Z

src/tagmanager/tm_workspace.c

@@ -836,7 +836,7 @@ find_scope_members (const GPtrArray *tags_array, const gchar *name, TMSourceFile

 			/* anonymous type defined in a different file than the variable -
 			 * this isn't the type we are looking for */
-			if (tm_tag_is_anon(test_tag) && (file != test_tag->file || test_tag->file == NULL))
+			if (tm_tag_is_anon(tag) && (file != test_tag->file || test_tag->file == NULL))


This confuses me and Github draws the indentation strangely.

Anyway, I seems wrong to do s/test_tag/tag/, don't you think?

Dammit, I should stop coding today, I start making errors. Sure thing, this is wrong.

kugel- · 2021-12-20T21:42:23Z

Alright. I think some commits should be squashed but overall it looks good to me.

kugel-

General approval from my side but I think some later fixup commits could be squashed into the initial ones.

techee · 2021-12-20T21:44:17Z

Alright. I think some commits should be squashed but overall it looks good to me.

Thanks a lot for your review Thomas, really appreciated!

kugel- · 2021-12-20T21:47:41Z

I guess you were lucky, I usually don't find much spare time to review other people's valuable work (like yours) while working on my own topics. That's really tragical but that's life I think (with two young children).

techee · 2021-12-20T22:20:51Z

I guess you were lucky, I usually don't find much spare time to review other people's valuable work (like yours) while working on my own topics. That's really tragical but that's life I think (with two young children).

Sadly, the same here, reviewing code just isn't much fun, especially when everyone finds something else interesting and worth the work. I should definitely contribute more in this area myself. (And I don't even have children so no excuses there.)

elextr · 2021-12-20T22:46:30Z

This in fact is the ctags 'kind' but it should be the kind as ordinary humans understand it. For instance, when you check the cxx parser here in the 3rd column, it is what you would expect to see:

So long as it also includes g_aCXXCPPKinds for C++?

techee · 2021-12-23T21:57:38Z

So long as it also includes g_aCXXCPPKinds for C++?

Yes, those are used for C++ (and for all other languages they are listed usually at the beginning of the parser).

kugel- · 2022-01-10T20:24:03Z

@techee Can you cleanup the commits so it can be merged?

We now use varType instead of pointerOrder and it has been the case for quite and we can use the pointerOrder field for storing something useful. When we store something else inside the renamed pointerOrder, we will break compatibility with some very old tag files but I don't think there are any such files around (also pointerOrder was probably just reported by the C parser). Bump API for the pointerOrder change to flags

Report anonymous tags using XTAG_ANONYMOUS. These anonymous tags aren't generated by the upstream uctags parser so this change hasn't been made available there (will have to be part of some future sync).

The flag is set when XTAG_ANONYMOUS is defined and is used for detecting anonymous tags inside tm_tag_is_anon().

At this point all built in parsers should report anonymous tags correctly so for them we don't need to determine anonymous tags based on tag name. Since tag files might be generated without this information, we still have to determine anonymous tags based on name for them.

Check all the collected tags once a file is parsed (i.e. when we have all tags, including those from subparsers) and renamea them in the form anon_enum_1 anon_struct_1 anon_enum_2 anon_struct_2 where the second component is a ctags kind and the number is per-kind. In addition, scopes of the nested tags have to be updated if the parent tag is an anonymous tag. Finally, for anonymous tags of the form typedef struct Foo{int a;}; we can use the name of the typedef instead of generating the anonymous name. In this case we can drop the typedef tag once the anonymous tag is updated with its name. More details can be found in comments.

techee · 2022-01-10T22:17:25Z

@techee Can you cleanup the commits so it can be merged?

Done.

Anonymous enums are now flagged as such (last digit == 1). GDScript support (#3012) and the anonymous tags PR (#3059) did not know about each other so this wasn't catched by CI until now.

techee · 2022-02-12T23:41:11Z

@kugel- Yay, thanks!

techee added the tag-manager Tag manager related patches and bugs label Dec 19, 2021

kugel- requested changes Dec 20, 2021

View reviewed changes

kugel- reviewed Dec 20, 2021

View reviewed changes

kugel- approved these changes Dec 20, 2021

View reviewed changes

techee added 6 commits January 10, 2022 21:43

Use XTAG_ANONYMOUS in Fortran parser

8805f56

Report anonymous tags using XTAG_ANONYMOUS. These anonymous tags aren't generated by the upstream uctags parser so this change hasn't been made available there (will have to be part of some future sync).

Add a flag representing anonymous tags

b4afb8e

The flag is set when XTAG_ANONYMOUS is defined and is used for detecting anonymous tags inside tm_tag_is_anon().

Update unit tests with the anonymous tag changes

6f62616

techee force-pushed the anon4 branch from fc607a1 to 6f62616 Compare January 10, 2022 22:14

kugel- merged commit 9642f17 into geany:master Feb 12, 2022

kugel- added a commit that referenced this pull request Feb 12, 2022

Update GDScript test

021144e

Anonymous enums are now flagged as such (last digit == 1). GDScript support (#3012) and the anonymous tags PR (#3059) did not know about each other so this wasn't catched by CI until now.

Improve handling of anonymous tags #3059

Improve handling of anonymous tags #3059

Conversation

techee commented Dec 19, 2021

elextr commented Dec 20, 2021

techee commented Dec 20, 2021

elextr commented Dec 20, 2021

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

techee commented Dec 20, 2021

techee commented Dec 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

techee commented Dec 20, 2021

kugel- commented Dec 20, 2021

techee commented Dec 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kugel- commented Dec 20, 2021

kugel- left a comment

Choose a reason for hiding this comment

techee commented Dec 20, 2021

kugel- commented Dec 20, 2021

techee commented Dec 20, 2021

elextr commented Dec 20, 2021

techee commented Dec 23, 2021

kugel- commented Jan 10, 2022

techee commented Jan 10, 2022

techee commented Feb 12, 2022