-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark tags as "local" only for files with a known C/C++ source extension #3490
Conversation
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be simpler to have source_file->is_source
, initialize it to TRUE
and use it for all languages, and make it FALSE
for C and C++ if the extension doesn't match any of those?
source_file->is_source = TRUE;
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
{
const gchar *ext = strrchr(source_file->short_name, '.');
if (! ext)
source_file->is_source = FALSE;
else
{
int i;
const gchar *common_src_exts[] =
{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
{
if (strcmp(ext + 1, common_src_exts[i]) == 0)
break
}
source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));
}
}
note: code above is entirely untested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source_file->is_source = TRUE;
Yes, we could do that - makes probably more sense as the is_source
flag is valid for all languages and not specific to C/C++.
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
{
const gchar *ext = strrchr(source_file->short_name, '.');
if (! ext)
source_file->is_source = FALSE;
else
{
int i;
const gchar *common_src_exts[] =
{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
{
if (strcmp(ext + 1, common_src_exts[i]) == 0)
break
}
source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));
}
}
Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where .
is not found and one also has to think a bit when looking at
source_file->is_source = (i < G_N_ELEMENTS(common_src_exts));
what exactly it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do that - makes probably more sense as the
is_source
flag is valid for all languages and not specific to C/C++.
Yeah, the main reason I suggest this would be to make the init_tag()
check simply tag->local = tag_entry->isFileScope && file->is_source
rather than having language-specific tests there as well.
Does this code offer any benefit compared to what I wrote? I personally find my code a bit easier to read as you avoid the extra branch where
.
is not found
Merely premature optimization (comparison only compares useful things, and traverse the basename only once), so no, it doesn't really matter.
If it's only a readability issue, you could easily rewrite this ever so slightly like so:
source_file->is_source = TRUE;
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
{
const gchar *ext = strrchr(source_file->short_name, '.');
source_file->is_source = FALSE;
if (ext)
{
const gchar *common_src_exts[] =
{"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
for (int i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
{
if (strcmp(ext + 1, common_src_exts[i]) == 0)
{
source_file->is_source = TRUE;
break;
}
}
}
}
anyway, either way is fine.
I'll try to test it properly tomorrow, but the approach, although a bit of a workaround (as it's basically overriding a similar check in uctags), it looks pretty good indeed (esp. with the discussion on the uctags PR). |
What about header files, why are they not C/C++ source? |
Sometimes we include C files in other C files. Then |
Since (IIUC) the cause of #3454 is the intention to limit the symbols visible at any point by using heuristics (originally dodgy uctags ones) to exclude symbols that definitely can't be visible. The "its not the same language so it can't be visible" heuristic is the cause of the OP of #3454 exacerbated by simply using extensions, not the filetype. As @kugel- pointed out for C/C++ there are situations where actual combinations of files affect visibility, and in reverse just because a This is the effect of the inability of the uctags/Geany to know about the build, without actual dependencies and following includes its simply not possible to do it correctly. So its necessary to resort to heuristics, and they will never handle all situations. The decision needs to be made to find the most useful tradeoff between how many omitted visible symbols there are vs how many incorrectly visible symbols there are. This was one of the reasons I moved to Vscode for C++ and I can sort of confirm the problem using Vscode because I notice that when I add another file to a project but before I add it to the build, Vscode (even using an LSP) will give a very similar excessive set of rubbish symbols to uctags/Geany. |
That's true, I'm just wondering how such files typically look like and for what purpose they include the sources. One such scenario I can imagine is that you want to split a huge source file into multiple files without some refactoring and create
In this case, though, this PR isn't a problem as you don't have code in this Do you have some concrete examples of when including source files is used? In the worst case we can always do #3457, it would just be shame to do it because of some really obscure and almost never used scenario. |
An example in one of the projects I'm currently doing has platform specific files included like: #if defined(LIN)
#include "plat_lin.cpp"
#elif defined(WIN)
#include "plat_win.cpp"
#elif defined(MAC)
#include "plat_mac.cpp"
#endif then the code that uses those in the including file. Other selective includes are the network code (using either Boost++ or cURL) and other libraries where there is more than one solution that can be wrapped to a consistent interface. Its much neater than lots of |
You could come up with a better name for the flag, maybe |
@elextr interesting… I usually see this still using separate source compiled conditionally with a common header defining the interface. But OK, so you have a case where there is no way to know but having complete knowledge of the whole project tree -- which we won't have (well, a plugin like projectorganizer could look at all the files, figure out the includes and make a slightly more educated filtering guess, but it'd still be a heuristic without knowing the real build settings and all). But I agree with @techee, the filtering is useful to lower the amount of irrelevant suggestions in most situations, so we have to find where to draw the line. Or we go the Geany way and add a setting? 😄 |
Yeah, I only realised that the problem was the name of the flag after I posted that. Based on my following post Hmmm, maybe just simply [Edit: oops, wrong way round, maybe
Agree, see my long post
Oh, how could we resist ... |
Actually if we want that to work for all languages, we could make it technical: |
Isn't the real question if the Geany flag is a copy of isFileScope or our own heuristic, or some combination of both? For example where does it leave our own heuristic "inter-language symbols not allowed". Isn't that what causes #3454 where symbols from |
@elextr no, that's what I though initially, but it's actually because of the |
Won't that break a whole lot of languages where there is no separation of header and body, where modules from one body file are imported into another body file? |
@elextr well, we do that only for C and C++ ATM (see code), but that's basically the reason why I suggest a generic name merely stating how the flag is used rather than what we try to achieve with it (which does not make sense for languages with no header/source separation) |
And in a heartbeat he drops the obligatory "C++ isn't C" comment. Although not used much yet, modules have been a thing since C++20, three years ... and the interfaces are in the Not to mention private modules. And declarations inside an anonymous namespace are internal linkage (implicit static). @elextr walks away whistling and ducks a brick 😁 |
Just to flesh out how C++ modules impact this. Modules break the assumption that there must be a declaration in a header file for a symbol to be used in another file, the module is exported from a body file and the declarations from the modules are imported by an So if we set the filescope/local/whatever flag for Its not just an academic C++ exercise, since this is basically how modules work in most languages if we can solve it for C++ we can extend the heuristic to other languages too. [Edit: see also https://docs.ctags.io/en/latest/parser-cxx.html#file-scope] |
I've been running this (well, almost… see below 😁) for a couple days and it seems to work nicely My "small" patch on top just because: diff --git a/src/tagmanager/tm_ctags.c b/src/tagmanager/tm_ctags.c
index a5cb9531e..16e2ebf04 100644
--- a/src/tagmanager/tm_ctags.c
+++ b/src/tagmanager/tm_ctags.c
@@ -119,12 +119,7 @@ static gboolean init_tag(TMTag *tag, TMSourceFile *file, const tagEntryInfo *tag
tag->name = g_strdup(tag_entry->name);
tag->type = type;
- /* ctags sets "isFileScope" also for files with an unknown extension -
- * make sure we set "local" only for files with a known C/C++ extension */
- if (file->lang == TM_PARSER_C || file->lang == TM_PARSER_CPP)
- tag->local = tag_entry->isFileScope && file->is_c_source;
- else
- tag->local = tag_entry->isFileScope;
+ tag->local = tag_entry->isFileScope && file->trust_file_scope;
tag->flags = tm_tag_flag_none_t;
if (isTagExtraBitMarked(tag_entry, XTAG_ANONYMOUS))
tag->flags |= tm_tag_flag_anon_t;
diff --git a/src/tagmanager/tm_source_file.c b/src/tagmanager/tm_source_file.c
index 599ae4fa7..e5a27419b 100644
--- a/src/tagmanager/tm_source_file.c
+++ b/src/tagmanager/tm_source_file.c
@@ -609,20 +609,28 @@ static gboolean tm_source_file_init(TMSourceFile *source_file, const char *file_
else
source_file->lang = tm_ctags_get_named_lang(name);
- source_file->is_c_source = FALSE;
+ source_file->trust_file_scope = TRUE;
+ /* ctags sets "isFileScope" for all C/C++ files without a known header
+ * extension, but we don't want to use it for files with a less conventional
+ * extension, not to exclude symbols we shouldn't */
if (source_file->lang == TM_PARSER_C || source_file->lang == TM_PARSER_CPP)
{
- const gchar **ext;
- const gchar *common_src_exts[] =
- {".c", ".C", ".cc", ".cp", ".cpp", ".cxx", ".c++", ".CPP", ".CXX", NULL};
+ const gchar *ext = strrchr(source_file->short_name, '.');
- for (ext = common_src_exts; *ext; ext++)
+ source_file->trust_file_scope = FALSE;
+ if (ext)
{
- if (g_str_has_suffix(source_file->short_name, *ext))
+ const gchar *common_src_exts[] =
+ {"c", "C", "cc", "cp", "cpp", "cxx", "c++", "CPP", "CXX"};
+
+ for (guint i = 0; i < G_N_ELEMENTS(common_src_exts); i++)
{
- source_file->is_c_source = TRUE;
- break;
+ if (strcmp(ext + 1, common_src_exts[i]) == 0)
+ {
+ source_file->trust_file_scope = TRUE;
+ break;
+ }
}
}
}
diff --git a/src/tagmanager/tm_source_file.h b/src/tagmanager/tm_source_file.h
index f3ca0cbc8..342d1259c 100644
--- a/src/tagmanager/tm_source_file.h
+++ b/src/tagmanager/tm_source_file.h
@@ -36,7 +36,7 @@ typedef struct TMSourceFile
char *short_name; /**< Just the name of the file (without the path) */
GPtrArray *tags_array; /**< Sorted tag array obtained by parsing the object. @elementtype{TMTag} */
/* Flag indicating whether the file is a C/C++ source (i.e. not a header) based on its extension */
- gboolean is_c_source;
+ gboolean trust_file_scope;
} TMSourceFile;
GType tm_source_file_get_type(void); |
I'll look a tad into modules out of curiosity one day, but does that mean that a kind of header equivalent is generated by the compiler, so that it can be installed and imported? |
Just looking quickly at https://en.cppreference.com/w/cpp/language/modules suggests that there will be an |
Presumably somewhere internal to compiler data files somewhere I guess ... oooh, that google is good https://gcc.gnu.org/wiki/cxx-modules#CMI_Location #3245
Yes, if for that declaration only having an "export" keyword it overrules the file long "local visibility only" ruling made from having a |
ctags treats unknown C/C++ file extensions as if they were source files. Because of this, the isFileScope flag is incorrectly set to TRUE for such files even if they are headers and the defined tags don't have a file-only scope. To fix this, we intorduce a flag called trust_file_scope which we set to TRUE only when the source file name matches one of the known C/C++ extensions. For all other languages the flag is set to TRUE.
@b4n I've just repushed a hybrid version of the patch - using the Now the hard stuff - someone has to decide whether to merge this PR or #3457. I would lean towards merging this PR and if we start receiving bug reports related to this issue, we can revert to #3457. |
I've been running the ever so slightly different implementation of this for a long while and it worked nicely for me. Just tested this PR again, and it seems to indeed actually behave the same, so I'm merging it. |
After some more thinking about it, we can fix #3454 very easily by ourselves by checking whether the tag originates from a source file with a known/common C/C++ extension - if not, always set "local" to FALSE.
The first patch adds
is_c_source
flag to TMSourceFile to indicate whether the file has one of the known C/C++ extensions, the second patch uses this flag and for C/C++ sources sets "local" to TRUE only when this flag is set (in addition to ctag's isFileScope flag).@b4n Does this fix look OK to you?
Fixes #3454.