Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark tags as "local" only for files with a known C/C++ source extension #3490

Merged
merged 2 commits into from
Oct 7, 2023

Conversation

techee
Copy link
Member

@techee techee commented May 8, 2023

After some more thinking about it, we can fix #3454 very easily by ourselves by checking whether the tag originates from a source file with a known/common C/C++ extension - if not, always set "local" to FALSE.

The first patch adds is_c_source flag to TMSourceFile to indicate whether the file has one of the known C/C++ extensions, the second patch uses this flag and for C/C++ sources sets "local" to TRUE only when this flag is set (in addition to ctag's isFileScope flag).

@b4n Does this fix look OK to you?

Fixes #3454.

}
}
}

Copy link
Member

@b4n b4n May 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be simpler to have source_file->is_source, initialize it to TRUE and use it for all languages, and make it FALSE for C and C++ if the extension doesn't match any of those?

	source_file->is_source = TRUE;

	if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
	{
		const gchar *ext = strrchr(source_file->short_name, '.');

		if (! ext)
			source_file->is_source = FALSE;
		else
		{
			int i;
			const gchar *common_src_exts[] =
				{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};

			for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
			{
				if (strcmp(ext + 1, common_src_exts[i]) == 0)
					break
			}

			source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));
		}
	}

note: code above is entirely untested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source_file->is_source = TRUE;

Yes, we could do that - makes probably more sense as the is_source flag is valid for all languages and not specific to C/C++.

	if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
	{
		const gchar *ext = strrchr(source_file->short_name, '.');

		if (! ext)
			source_file->is_source = FALSE;
		else
		{
			int i;
			const gchar *common_src_exts[] =
				{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};

			for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
			{
				if (strcmp(ext + 1, common_src_exts[i]) == 0)
					break
			}

			source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));
		}
	}

Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where . is not found and one also has to think a bit when looking at

source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));

what exactly it means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could do that - makes probably more sense as the is_source flag is valid for all languages and not specific to C/C++.

Yeah, the main reason I suggest this would be to make the init_tag() check simply tag->local = tag_entry->isFileScope && file->is_source rather than having language-specific tests there as well.

Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where . is not found

Merely premature optimization (comparison only compares useful things, and traverse the basename only once), so no, it doesn't really matter.

If it's only a readability issue, you could easily rewrite this ever so slightly like so:

	source_file->is_source = TRUE;

	if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
	{
		const gchar *ext = strrchr(source_file->short_name, '.');

		source_file->is_source = FALSE;
		if (ext)
		{
			const gchar *common_src_exts[] =
				{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};

			for (int i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
			{
				if (strcmp(ext + 1, common_src_exts[i]) == 0)
				{
					source_file->is_source = TRUE;
					break;
				}
			}
		}
	}

anyway, either way is fine.

@b4n
Copy link
Member

b4n commented May 8, 2023

I'll try to test it properly tomorrow, but the approach, although a bit of a workaround (as it's basically overriding a similar check in uctags), it looks pretty good indeed (esp. with the discussion on the uctags PR).

@elextr
Copy link
Member

elextr commented May 9, 2023

What about header files, why are they not C/C++ source?

@kugel-
Copy link
Member

kugel- commented May 9, 2023

Sometimes we include C files in other C files. Then static is not file-local. So we should probably also consider includes (not just files with header extension) or just revert the original change.

@elextr
Copy link
Member

elextr commented May 9, 2023

Since (IIUC) the cause of #3454 is the intention to limit the symbols visible at any point by using heuristics (originally dodgy uctags ones) to exclude symbols that definitely can't be visible. The "its not the same language so it can't be visible" heuristic is the cause of the OP of #3454 exacerbated by simply using extensions, not the filetype.

As @kugel- pointed out for C/C++ there are situations where actual combinations of files affect visibility, and in reverse just because a .h file is open in Geany does not mean all its symbols are available in another C/C++ file but they are considered for autocompletes for example.

This is the effect of the inability of the uctags/Geany to know about the build, without actual dependencies and following includes its simply not possible to do it correctly.

So its necessary to resort to heuristics, and they will never handle all situations. The decision needs to be made to find the most useful tradeoff between how many omitted visible symbols there are vs how many incorrectly visible symbols there are.

This was one of the reasons I moved to Vscode for C++ and I can sort of confirm the problem using Vscode because I notice that when I add another file to a project but before I add it to the build, Vscode (even using an LSP) will give a very similar excessive set of rubbish symbols to uctags/Geany.

@techee
Copy link
Member Author

techee commented May 9, 2023

Sometimes we include C files in other C files. Then static is not file-local. So we should probably also consider includes (not just files with header extension) or just revert the original change.

That's true, I'm just wondering how such files typically look like and for what purpose they include the sources. One such scenario I can imagine is that you want to split a huge source file into multiple files without some refactoring and create master_file.c containing

#include "file1.c"
#include "file2.c"
#include "file3.c"
...

In this case, though, this PR isn't a problem as you don't have code in this master_file.c so you don't need those static declarations from the individual source files visible there.

Do you have some concrete examples of when including source files is used? In the worst case we can always do #3457, it would just be shame to do it because of some really obscure and almost never used scenario.

@elextr
Copy link
Member

elextr commented May 9, 2023

Do you have some concrete examples of when including source files is used?

An example in one of the projects I'm currently doing has platform specific files included like:

#if defined(LIN)
#include "plat_lin.cpp"
#elif defined(WIN)
#include "plat_win.cpp"
#elif defined(MAC)
#include "plat_mac.cpp"
#endif

then the code that uses those in the including file. Other selective includes are the network code (using either Boost++ or cURL) and other libraries where there is more than one solution that can be wrapped to a consistent interface.

Its much neater than lots of ifdefs throughout the code.

@b4n
Copy link
Member

b4n commented May 9, 2023

What about header files, why are they not C/C++ source?

You could come up with a better name for the flag, maybe most_probably_not_a_file_whose_local_symbols_will_be_visible_to_others? 😁

@b4n
Copy link
Member

b4n commented May 9, 2023

@elextr interesting… I usually see this still using separate source compiled conditionally with a common header defining the interface. But OK, so you have a case where there is no way to know but having complete knowledge of the whole project tree -- which we won't have (well, a plugin like projectorganizer could look at all the files, figure out the includes and make a slightly more educated filtering guess, but it'd still be a heuristic without knowing the real build settings and all).

But I agree with @techee, the filtering is useful to lower the amount of irrelevant suggestions in most situations, so we have to find where to draw the line. Or we go the Geany way and add a setting? 😄

@elextr
Copy link
Member

elextr commented May 9, 2023

You could come up with a better name for the flag

Yeah, I only realised that the problem was the name of the flag after I posted that.

Based on my following post semi_random_guess_if_symbols_should_leak_everywhere 😁

Hmmm, maybe just simply globally_visible?

[Edit: oops, wrong way round, maybe local_visibility]

But I agree with @techee, the filtering is useful to lower the amount of irrelevant suggestions in most situations, so we have to find where to draw the line.

Agree, see my long post

Or we go the Geany way and add a setting? 😄

Oh, how could we resist ...

@b4n
Copy link
Member

b4n commented May 9, 2023

Based on my following post semi_random_guess_if_symbols_should_leak_everywhere grin

Hmmm, maybe just simply globally_visible?

[Edit: oops, wrong way round, maybe local_visibility]

Actually if we want that to work for all languages, we could make it technical: trust_file_scope (which is really what this flags is about: whether we want to trust uctags' isFileScope)

@elextr
Copy link
Member

elextr commented May 9, 2023

Actually if we want that to work for all languages, we could make it technical: trust_file_scope (which is really what this flags is about: whether we want to trust uctags' isFileScope)

Isn't the real question if the Geany flag is a copy of isFileScope or our own heuristic, or some combination of both? For example where does it leave our own heuristic "inter-language symbols not allowed".

Isn't that what causes #3454 where symbols from *.foo are not visible in C code?

@b4n
Copy link
Member

b4n commented May 9, 2023

@elextr no, that's what I though initially, but it's actually because of the isFileScope filtering. uctags sets it when it does not match a known header extension, which my file didn't. Here we're just taking the more conservative approach of setting it only if it matches a known non-header extension.

@elextr
Copy link
Member

elextr commented May 9, 2023

only if it matches a known non-header extension.

Won't that break a whole lot of languages where there is no separation of header and body, where modules from one body file are imported into another body file?

@b4n
Copy link
Member

b4n commented May 9, 2023

@elextr well, we do that only for C and C++ ATM (see code), but that's basically the reason why I suggest a generic name merely stating how the flag is used rather than what we try to achieve with it (which does not make sense for languages with no header/source separation)

@elextr
Copy link
Member

elextr commented May 9, 2023

only for C and C++ ATM

And in a heartbeat he drops the obligatory "C++ isn't C" comment.

Although not used much yet, modules have been a thing since C++20, three years ... and the interfaces are in the .cpp files not in "header" files.

Not to mention private modules.

And declarations inside an anonymous namespace are internal linkage (implicit static).

@elextr walks away whistling and ducks a brick 😁

@elextr
Copy link
Member

elextr commented May 9, 2023

Just to flesh out how C++ modules impact this.

Modules break the assumption that there must be a declaration in a header file for a symbol to be used in another file, the module is exported from a body file and the declarations from the modules are imported by an import statement, no header needed as the compiler does the work.

So if we set the filescope/local/whatever flag for .cpp files that will make the symbols exported with the module invisible elsewhere, but they should be visible.

Its not just an academic C++ exercise, since this is basically how modules work in most languages if we can solve it for C++ we can extend the heuristic to other languages too.

[Edit: see also https://docs.ctags.io/en/latest/parser-cxx.html#file-scope]

@b4n
Copy link
Member

b4n commented May 10, 2023

I've been running this (well, almost… see below 😁) for a couple days and it seems to work nicely

My "small" patch on top just because:

diff --git a/src/tagmanager/tm_ctags.c b/src/tagmanager/tm_ctags.c
index a5cb9531e..16e2ebf04 100644
--- a/src/tagmanager/tm_ctags.c
+++ b/src/tagmanager/tm_ctags.c
@@ -119,12 +119,7 @@ static gboolean init_tag(TMTag *tag, TMSourceFile *file, const tagEntryInfo *tag
 
 	tag->name = g_strdup(tag_entry->name);
 	tag->type = type;
-	/* ctags sets "isFileScope" also for files with an unknown extension -
-	 * make sure we set "local" only for files with a known C/C++ extension */
-	if (file->lang == TM_PARSER_C || file->lang == TM_PARSER_CPP)
-		tag->local = tag_entry->isFileScope && file->is_c_source;
-	else
-		tag->local = tag_entry->isFileScope;
+	tag->local = tag_entry->isFileScope && file->trust_file_scope;
 	tag->flags = tm_tag_flag_none_t;
 	if (isTagExtraBitMarked(tag_entry, XTAG_ANONYMOUS))
 		tag->flags |= tm_tag_flag_anon_t;
diff --git a/src/tagmanager/tm_source_file.c b/src/tagmanager/tm_source_file.c
index 599ae4fa7..e5a27419b 100644
--- a/src/tagmanager/tm_source_file.c
+++ b/src/tagmanager/tm_source_file.c
@@ -609,20 +609,28 @@ static gboolean tm_source_file_init(TMSourceFile *source_file, const char *file_
 	else
 		source_file->lang = tm_ctags_get_named_lang(name);
 
-	source_file->is_c_source = FALSE;
+	source_file->trust_file_scope = TRUE;
 
+	/* ctags sets "isFileScope" for all C/C++ files without a known header
+	 * extension, but we don't want to use it for files with a less conventional
+	 * extension, not to exclude symbols we shouldn't */
 	if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
 	{
-		const gchar **ext;
-		const gchar *common_src_exts[] =
-			{".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL};
+		const gchar *ext = strrchr(source_file->short_name, '.');
 
-		for (ext = common_src_exts; *ext; ext++)
+		source_file->trust_file_scope = FALSE;
+		if (ext)
 		{
-			if (g_str_has_suffix(source_file->short_name, *ext))
+			const gchar *common_src_exts[] =
+				{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
+
+			for (guint i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
 			{
-				source_file->is_c_source = TRUE;
-				break;
+				if (strcmp(ext + 1, common_src_exts[i]) == 0)
+				{
+					source_file->trust_file_scope = TRUE;
+					break;
+				}
 			}
 		}
 	}
diff --git a/src/tagmanager/tm_source_file.h b/src/tagmanager/tm_source_file.h
index f3ca0cbc8..342d1259c 100644
--- a/src/tagmanager/tm_source_file.h
+++ b/src/tagmanager/tm_source_file.h
@@ -36,7 +36,7 @@ typedef struct TMSourceFile
 	char *short_name; /**< Just the name of the file (without the path) */
 	GPtrArray *tags_array; /**< Sorted tag array obtained by parsing the object. @elementtype{TMTag} */
 	/* Flag indicating whether the file is a C/C++ source (i.e. not a header) based on its extension */
-	gboolean is_c_source;
+	gboolean trust_file_scope;
 } TMSourceFile;
 
 GType tm_source_file_get_type(void);

@b4n
Copy link
Member

b4n commented May 10, 2023

Just to flesh out how C++ modules impact this. […]

I'll look a tad into modules out of curiosity one day, but does that mean that a kind of header equivalent is generated by the compiler, so that it can be installed and imported?

@b4n
Copy link
Member

b4n commented May 10, 2023

So if we set the filescope/local/whatever flag for .cpp files that will make the symbols exported with the module invisible elsewhere, but they should be visible.

Just looking quickly at https://en.cppreference.com/w/cpp/language/modules suggests that there will be an export keyword, making things fairly easy for the C++ parser, doesn't it?

@elextr
Copy link
Member

elextr commented May 10, 2023

I'll look a tad into modules out of curiosity one day, but does that mean that a kind of header equivalent is generated by the compiler, so that it can be installed and imported?

Presumably somewhere internal to compiler data files somewhere I guess ... oooh, that google is good https://gcc.gnu.org/wiki/cxx-modules#CMI_Location #3245

Just looking quickly at https://en.cppreference.com/w/cpp/language/modules suggests that there will be an export keyword, making things fairly easy for the C++ parser, doesn't it?

Yes, if for that declaration only having an "export" keyword it overrules the file long "local visibility only" ruling made from having a .cpp or equivalent extension.

ctags treats unknown C/C++ file extensions as if they were source
files. Because of this, the isFileScope flag is incorrectly set
to TRUE for such files even if they are headers and the defined
tags don't have a file-only scope.

To fix this, we intorduce a flag called trust_file_scope which we
set to TRUE only when the source file name matches one of the
known C/C++ extensions.

For all other languages the flag is set to TRUE.
@techee
Copy link
Member Author

techee commented May 15, 2023

@b4n I've just repushed a hybrid version of the patch - using the trust_file_scope name but using g_str_has_suffix() as I used in the original patch which I like a bit better (but I don't insist on that one too much if the other version is preferred).

Now the hard stuff - someone has to decide whether to merge this PR or #3457. I would lean towards merging this PR and if we start receiving bug reports related to this issue, we can revert to #3457.

@techee techee added this to the 1.39/2.0 milestone Aug 12, 2023
@b4n b4n merged commit 57631f1 into geany:master Oct 7, 2023
4 checks passed
@b4n
Copy link
Member

b4n commented Oct 7, 2023

I've been running the ever so slightly different implementation of this for a long while and it worked nicely for me. Just tested this PR again, and it seems to indeed actually behave the same, so I'm merging it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot go to tag if using an unconventional file extension
4 participants