-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sort autocompletion tags also based on their presence in included files #3269
Conversation
/* sort local vars first (with highest line number first), followed | ||
* by tags from current file, followed by workspace tags, followed by | ||
* global tags */ | ||
/* sort local vars first (with highest line number first), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this list to the manual.
When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this is not something that should be documented. For users this whole thing is "geany does its best to place more relevant symbols to the top of the list" and how Geany does this is an implementation detail. You won't really notice this logic in practice, just that the symbol you want to type (hopefully) appears at or near the top of the list. If we happen to add support for LSP, each server will have its own logic which will be totally different from what we use in Geany now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?
Currently it's those before the cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's those before the cursor.
Ok, its fine for C, if locals only and sorting only its ok for C++ so long as no other declarations (eg members) that happen after the cursor are not excluded from the autocomplete (even if they get sorted later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elextr We have already been there :-)
I haven't changed anything about this in this PR, it works as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never rely on my umm, you know thingy, umm, oh yeah ... memory!!
Before I look too deep: So I understand this only affects the sorting but does not add tags*? I previously understood that it adds tags for (private?) members that are declared in class declarations so that they can be autocompleted in the corresponding source file but not in other source files. *: the new tm_tag_header_t only says "foo.h is included" so that we can later lookup foo.h and import tags from that file, right? So not a real tag but more of a pointer to a TMSourceFile? |
Adding to @kugel- question, my understanding is that it only looks up include files that are open or loaded via tagfiles, it won't open any extra files, is that right? |
Yes (except for the header tags themselves for every include, see below).
For every include in the parsed file such as
the header tag generates tag whose name is
This is completely independent of what files we have open, it can be whatever file name. It's up to us to match this file name to the open
so we can take the header tag and quickly look up the corresponding
No, it doesn't (and didn't) add any tags, it serves for sorting purposes only but in two different situations where in (2) it affects what results the user gets:
and expects we show a list of |
Before I forget, thinking about it now, we should probably store the header tags to the generated tag files too as they can be useful for global tags as well. I won't do it now (so we don't have to fight with modified unit tests if some rebases are needed), we should just keep it in mind. |
src/tagmanager/tm_parser.c
Outdated
@@ -69,7 +69,7 @@ static GHashTable *subparser_map = NULL; | |||
{'u', tm_tag_union_t}, /* union */ \ | |||
{'v', tm_tag_variable_t}, /* variable */ \ | |||
{'x', tm_tag_externvar_t}, /* externvar */ \ | |||
{'h', tm_tag_undef_t}, /* header */ \ | |||
{'h', tm_tag_header_t}, /* header */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called header
in ctags but maybe we should call it tm_tag_include_t
to make it clear this is something that's generated for every #include
directive and not for every header file in project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
@techee "Yes" would have done ;-) but thanks for being explicit. As for the documentation, yeah, "sorted by likely relevance" should do, just something to point out its not simply alphabetic. |
Also I assume the header tags are not followed recursively, so symbols in includes included are not sorted higher? (not asking for it, keep this PR simple he says quickly :-) |
See the last paragraph in my first post ;-). It wasn't much more complicated to implement, it just didn't seem to produce better results. Maybe if it were implemented as a separate sort step
so it wouldn't compete with symbols from directly included files, it would work better but I'm not sure if it wouldn't complicate things too much and if the benefits are worth it. |
Ahh. I was thinking of where there was a "single include" situation where the direct include file does nothing much more than include others, but yeah it could be less good in other situations. Lets keep it simple for now. |
Yeah, I was thinking that too but then the big fat headers like |
I'm wondering if it it's really appropriate for non-scope auto-completion. If I hack sidebar.c, why would sidebar.h symbols be more relevant than gtk_* symbols? For scope auto-completion (this->) it maybe a different story. |
Both are equally relevant but the question is when the autocompletion starts returning useful results. For GTK it will be only after you type a sufficiently long prefix like
|
@kugel- your comment is appropriate for C, but not for C++ (have I mentioned C++ is NOT C). To repeat my comment from #3175, C may only have minimal declarations in headers, but C++ typically has far more, especially when using generic programming (ie templates), because the body needs to be available when the template is used. For example Geany So for C++ more names come from the associated headers (rather than random general headers) than is the case with C, and as @techee explained above the heuristic moves these names up the list, making it more useful. Edit: also just to point out, nobody using C++ writes |
I understand that for C++ scope-completion is more relevant, and for that I can understand the idea as you often do Anyway I won't argue further if you both agree it's a good idea. My worry is that all of Geany's headers (except sidebar.h) are penalized when editing sidebar.c and I have no reason to assume that I want sidebar.h symbols more than other Geany symbols (might even be that the opposite is true, not sure). |
I have to correct you there, as my edit above said, C++ programmers almost never do |
Apart from that,
Since tags from the current file are already placed above other symbols, you won't see much difference because Also note that the "penalization" already happens, it's just in the form of alphabetic ordering and more or less random in terms of what would be useful for the user. For instance, when you have symbols
Here,
GTK gets penalized but to get the symbol you want, you'll have to type much longer prefix anyway so it isn't a problem. |
This allows us to quickly access TMSourceFile based on the file name.
75640df
to
dd98544
Compare
Just repushed to fix a small conflict in |
dd98544
to
7d56b6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have overlooked, but is there something that prevents this logic to be applied to header files (if such avoidance even wanted)?
I can imagine a header file foo.h that include a identically-named file, either using #include_next <foo.h>
or with include path trickery.
Actually thinking about it, it might even make sense to sort tags of the corresponding source file earlier, when editing a header file. For example when I want to create a declaration in a header file it's likely to be a function that might be already implemented in the corresponding source file.
src/tagmanager/tm_workspace.c
Outdated
@@ -742,9 +797,54 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const char *name, | |||
} | |||
|
|||
|
|||
/* fill 'includes' with the included files except the header corresponding | |||
* to the current file which is returned in 'header' */ | |||
static void fill_includes(TMSourceFile *source, TMSourceFile **header, GHashTable *includes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it idiomatic to make this get_includes()
that returns the included files list.
Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because sort_found_tags()
is special-casing the corresponding header anyway and might make this function be useful in other situations as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it idiomatic to make this get_includes() that returns the included files list.
This is a relict of when I experimented with running this function recursively. Since it didn't work very well, it indeed looks better as you describe.
Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because sort_found_tags() is special-casing the corresponding header anyway and might make this function be useful in other situations as well.
Can do.
src/tagmanager/tm_workspace.c
Outdated
if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0) | ||
*header = hdr; | ||
else | ||
g_hash_table_add(includes, hdr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing hash collision detection. The ptr array source_file_map
might contain TMSourceFile
s with different basenames that have the same hash value. So here you'd need to check if the TMSourceFile matches with the header tag
Glib handles that, sorry for the noise
src/tagmanager/tm_workspace.c
Outdated
@@ -768,6 +871,16 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data | |||
return -1; | |||
else if (t2->file == info->file && t1->file != info->file) | |||
return 1; | |||
else if (info->header_file && | |||
t1->file == info->header_file && t2->file != info->header_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, if t1 is the corresponding header, t2 can't be (can it?). The && t2->file ...
part confuses me and could be left out, right?
Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand the question correctly but
t1->file == info->header_file && t2->file != info->header_file
says that when we have two tags, t1
and t2
, and t1
comes from header_file
and t2
does not come from this header file, then t1
should be sorted in front of t2
. The initial check for info->header_file
non-NULL value is there because for global tags t1->file
will also be NULL and then this check would do a wrong thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Tags are compared, not files. Nevermind.
There is nothing like that. I wanted to avoid hard-coding header file extensions and without this we can't say if the current file is a header or a source.
This case isn't currently handled (not sure if it's common enough and also the consequences won't be too bad). Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".
Makes sense, we'd just have to do this reverse-include thing to lookup sources. To find headers, we can use the header tags for includes from ctags but we don't have anything in the opposite direction. |
I think this assumption easily breaks down on larger (maybe proprietary) code bases. Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR. |
Agree that include path clashes is not uncommon, especially for portable software that has multiple backends, and therefore multiple versions of the same include. Can we use the path instead of just the name, presumably that is handled since Geany can open two files of the same name and symbols don't get mixed up (hopefully can't try ATM). That should be unique unless a global tags file overlaps local files. Also agree that this PR should be kept simple, just make an issue for remembering it and do later. |
Yes, though the probability you are editing a file affected by this is probably low.
That would be one possibility (though maybe a bit an overkill). I was thinking we could use a path-based heuristic assuming that related headers are stored closer to the corresponding sources on the file system. For instance in
the header for |
If you mean what I said above, I agree. But note that we can't know exactly what header at what path to use. We only see includes like
but we don't know their exact path (which may be determined by e.g. |
We must have known a path to get symbols from the file? |
We are talking about the situation where you have
in your source file and 2 identically named headers at paths
Symbols from which of these headers should be placed to the top of the autocompletion list when editing your source file? |
Oh, I thought you were saying we didn't know the paths. Ok, clearly the best is:
😄 In other words we can't know how the project is organised, thats a known limitation of Geany's version of "lightweight" IDE, so I doubt there is a simple heuristic that will improve some organisations without making others worse, let alone improve all cases. project/windows_backend/header.hpp:
project/linux_backend/header.hpp:
What type do we use for I guess an associated question would be what would happen if both the declarations of |
Ok, let's hard-code that path and the problem's solved forever ;-). With your What might be a bigger problem is some Anyway, the worst case that happens is that tags won't be sorted in the most optimal way which can however happen for many other reasons so right now I'd suggest to just keep the implementation simple and possibly improve it in the future.
ctags extracts tags from both branches. |
👍
Ok, so the |
Thinking about it again, this won't work - for global tags we don't have file names stored for individual tags so having names of included headers won't help us. So leaving this as it is. |
@kugel- I made the changes you requested in the latest commit. Is this all you requested or did I forget about something? |
Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier? At least this is easy to explain (e.g. in the manual) and we won't get the wrong header (unless it's named completely different, of course). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGBI, not tested myself!
Might squash the last two commits into earlier ones but it's your call.
Good idea, can do, shouldn't be too complicated.
I personally wouldn't explain anything to users :-). Basically, from user's perspective, the "AI" in Geany does its best to sort tags so the most relevant ones come at the top and that's it.
Yeah, that's the plan. |
Despite @kugel-'s documentation allergy ;-) we do need to note that this happens to avoid regular issues being raised for the list being out of alphabetical order, but I agree we can skip explaining the algorithm. |
Not @kugel-'s, mine :-). To be specific, I really think this is implementation detail that users shouldn't worry about.
We already did it in the previous PR: a91a9c6 |
I have mentioned my poor, erm, thingy, umm, oh yeah, memory!! So fine, nothing to do. |
@kugel- Done in the last commit. |
src/tagmanager/tm_workspace.c
Outdated
TMSourceFile *hdr = tm_files->pdata[j]; | ||
gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); | ||
|
||
if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be improved.
We're called for a foo.c and loop over all includes here. We have the file name already, and then lookup the TMSourceFile
(and then for every TMSourceFile
we check if the file name is foo.h).
Instead, we can already check for foo.h earlier (when looking up the TMSourceFile
). And then we can return a copy (or ref?) of the GPtrArray
as header_candidates
instead of creating a new list. Something like this:
GPtrArray *tm_files = g_hash_table_lookup(theWorkspace->source_file_map, hdr_name);
if (g_strcmp0(hdr_name, src_comps[0]))
Perhaps it would make sense to separate this part out of get_includes()
and make a distinct g_hash_table_lookup()
for the expected header name? Hm but then it wouldn't depend on foo.c actually including foo.h (could also be seen as a good thing, not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand completely what you suggest but it's true we can return the GPtrArray directly and avoid creation of the list - I just made a new commit doing this. Is this what you had in mind?
Note that the inner loop should still be there as we should also use all the includes of the same name similarly to the header file.
Separating extraction of header candidates to a separate function would be possible but it will probably lead to a lot of duplicated code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, g_ptr_array_find()
was introduced in glib 2.54 which we don't have. Wouldn't be hard to implemented, it's jut an extra complication and maybe using the list is a simpler solution then (despite some extra allocation which I think won't matter much in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you can return the GPtrArray right after g_hash_table_lookup(theWorkspace->source_file_map, hdr_name)
, before the inner loop. The GPtrArray has all headers with the same basename already and the inner loop only compares the basename once more.
The inner loop still adds to the includes
hash table but it doesn't need to deal with the header candidates.
Ah, g_ptr_array_find() was introduced in glib 2.54 which we don't have
All files in the GPtrArray have the same base name, so you just need to compare the basename against->pdata[0]
FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inner loop still adds to the includes hash table but it doesn't need to deal with the header candidates.
Yeah, got it now, fixed in the last commit.
FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).
That would fix the problem. 2.54 is from 2017.
a7bc7fc
to
13f254c
Compare
{ | ||
guint j; | ||
|
||
for (j = 0; j < tm_files->len; j++) | ||
if (!*header_candidates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't any caller that passes non-NULL, right? I would define get_includes()
such that it always writes to *header_candidates
(unless there are none).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check
if (!*header_candidates)
is to avoid doing the comparisons below if we already set header_candidates
in the previous iteration. Also notice *header_candidates = NULL;
at the beginning of the function.
What you describe would be
if (!header_candidates)
(i.e. without the dereference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I didn't notice the outer loop. So I guess the idea is to avoid multiple assignments to *header_candidates
in case a file is included multiple times (maybe legitimately, via different directories)?
So I think this is alright now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess the idea is to avoid multiple assignments to *header_candidates in case a file is included multiple times (maybe legitimately, via different directories)?
This is just an optimization so the code inside the if
block doesn't have to be executed if we already found header_candidates
.
src/tagmanager/tm_workspace.c
Outdated
{ | ||
TMSourceFile *hdr = tm_files->pdata[j]; | ||
TMSourceFile *hdr = tm_files->pdata[0]; | ||
gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this only now. Doesn't that break on files with multiple dots in name? I thought we want to compare the base name without extension, i.e. up until the last dot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Well, I could g_strdup()
the string, use strrchr() to find the .
and replace it with \0
which would give us the base name. Or is there some simpler solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would use g_path_get_basename()
instead of g_strdup()
. But it doesn't matter much so yea I don't know a simpler solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the intention here is to get the filename without the last extension, g_path_get_basename()
does not do that. But utils_remove_ext_from_filename()
does.
Note that there are other strsplit()
s above that may need changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if @techee wants to use Geany utility function in tagmanager code
AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if @techee wants to use Geany utility function in tagmanager code
AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.
No problem with using Geany utility functions in TM, I just think that we should do some bigger Geany-fication of TM code separately (once the TM code stabilizes and there are no bigger PRs pending - which will hopefully be soon). So for now I just used the strrchr()
in the fix. g_path_get_basename()
gets called previously already so we don't have to do that here.
@elextr gazes at sky and whistles ... 😁 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Should I leave the PR as it is or rebase/squash some of the commits first? |
Squashing makes sense. You could probably even reduce to the 3 initial commits as the other ones are fix-ups that popped up during review. |
Enable header tag generation in ctags, introduce new tm_tag_header_t TMTagType. Disable usage of this tag for normal Geany usage such as symbol tree, tag generation or autocompletion. We'll use this tag type only for sorting autocompletion symbols.
This patch extends the current sorting of the tags which looks this way: - sort local vars first (with highest line number first), - followed by tags from current file, - followed by workspace tags, - followed by global tags by two new entries so it looks this way: - sort local vars first (with highest line number first), - followed by tags from current file, - NEW: followed by tags from header, - NEW: followed by tags from other included files, - followed by workspace tags, - followed by global tags Sorting tags from included files seems to be a natural heuristic as users explicitly included these and want their symbols to be present in the current file. As a special case, symbols from the header of the current source file (included file with the same base name as the current file) are listed in front of symbols from other included files as headers tend to be tightly coupled with corresponding source files in C/C++.
7cc5d52
to
151bf64
Compare
@kugel- Yes, that's what I thought, done. So unless there are any objections, I'll merge this PR in about a week. |
Great |
This is a reworked and extended implementation of sorting based on tag presence in header files discussed (and dropped) from #3185. This implementation uses the header tags from ctags so we know precisely which files are included so we can use not only the tags from the source's header file but also tags from other included files. After this change the ordering sequence looks this way:
To me, the result seems to improve autocompletion results (it's a bit hard to tell for sure, one can always prepare artificial examples to test where it works better, with the real world it's harder to spot a difference between different orderings unless something is visibly broken).
For reference, I also tried the option to use the includes from the included headers recursively to some depth but the results didn't seem visibly better and were actually slightly worse (by including
<gtk.h>
one got all the GTK symbols which were then fighting with symbols from explicitly included headers that started to appear lower in the list which was often worse).