Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort autocompletion tags also based on their presence in included files #3269

Merged
merged 3 commits into from
Nov 11, 2022

Conversation

techee
Copy link
Member

@techee techee commented Aug 29, 2022

This is a reworked and extended implementation of sorting based on tag presence in header files discussed (and dropped) from #3185. This implementation uses the header tags from ctags so we know precisely which files are included so we can use not only the tags from the source's header file but also tags from other included files. After this change the ordering sequence looks this way:

  • sort local vars first (with highest line number first),
  • followed by tags from current file,
  • NEW: followed by tags from header,
  • NEW: followed by tags from other included files,
  • followed by workspace tags,
  • followed by global tags

To me, the result seems to improve autocompletion results (it's a bit hard to tell for sure, one can always prepare artificial examples to test where it works better, with the real world it's harder to spot a difference between different orderings unless something is visibly broken).

For reference, I also tried the option to use the includes from the included headers recursively to some depth but the results didn't seem visibly better and were actually slightly worse (by including <gtk.h> one got all the GTK symbols which were then fighting with symbols from explicitly included headers that started to appear lower in the list which was often worse).

@techee techee added the tag-manager Tag manager related patches and bugs label Aug 29, 2022
/* sort local vars first (with highest line number first), followed
* by tags from current file, followed by workspace tags, followed by
* global tags */
/* sort local vars first (with highest line number first),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this list to the manual.

When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is not something that should be documented. For users this whole thing is "geany does its best to place more relevant symbols to the top of the list" and how Geany does this is an implementation detail. You won't really notice this logic in practice, just that the symbol you want to type (hopefully) appears at or near the top of the list. If we happen to add support for LSP, each server will have its own logic which will be totally different from what we use in Geany now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it says "local" vars, do you mean all locals in the current function, or only those declared before the cursor?

Currently it's those before the cursor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it's those before the cursor.

Ok, its fine for C, if locals only and sorting only its ok for C++ so long as no other declarations (eg members) that happen after the cursor are not excluded from the autocomplete (even if they get sorted later).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elextr We have already been there :-)

#3175 (comment)

I haven't changed anything about this in this PR, it works as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never rely on my umm, you know thingy, umm, oh yeah ... memory!!

@kugel-
Copy link
Member

kugel- commented Aug 30, 2022

Before I look too deep: So I understand this only affects the sorting but does not add tags*? I previously understood that it adds tags for (private?) members that are declared in class declarations so that they can be autocompleted in the corresponding source file but not in other source files.

*: the new tm_tag_header_t only says "foo.h is included" so that we can later lookup foo.h and import tags from that file, right? So not a real tag but more of a pointer to a TMSourceFile?

@elextr
Copy link
Member

elextr commented Aug 30, 2022

Adding to @kugel- question, my understanding is that it only looks up include files that are open or loaded via tagfiles, it won't open any extra files, is that right?

@techee
Copy link
Member Author

techee commented Aug 30, 2022

Before I look too deep: So I understand this only affects the sorting but does not add tags*?

Yes (except for the header tags themselves for every include, see below).

*: the new tm_tag_header_t only says "foo.h is included" so that we can later lookup foo.h and import tags from that file, right? So not a real tag but more of a pointer to a TMSourceFile?

Adding to @kugel- question, my understanding is that it only looks up include files that are open or loaded via tagfiles, it won't open any extra files, is that right?

For every include in the parsed file such as

#include <some_path/some_header.h>

the header tag generates tag whose name is

some_path/some_header.h

This is completely independent of what files we have open, it can be whatever file name. It's up to us to match this file name to the open TMSourceFile files. This is what the first patch serves for - it maintains a hash table

file_name -> List<TMSourceFile>

so we can take the header tag and quickly look up the corresponding TMSourceFile. The List is there because there may be multiple identically named files opened from different paths - it's hard to make any guess about the paths since we don't know which directories are searched for the includes by the build system. When we don't find any corresponding TMSourceFile, the file isn't open in Geany and we don't do anything for that include. The header_tag -> TMSourceFile lookup happens dynamically when performing autocompletion and it isn't stored anywhere.

I previously understood that it adds tags for (private?) members that are declared in class declarations so that they can be autocompleted in the corresponding source file but not in other source files.

No, it doesn't (and didn't) add any tags, it serves for sorting purposes only but in two different situations where in (2) it affects what results the user gets:

  1. Sorting the tags in non-scope autocompletion popup - this just affects the order in which the user sees the tags where I think it makes sense to see tags from included files before other project files.
  2. Sorting the candidate tags which are searched for members in scope autocompletion. Let's say the user has just written
name.

and expects we show a list of name members. Some tag names are very common and for instance in this case, I can imagine many classes have the name member possibly each of a different type (and different members) so it's important we pick the right name tag. When we sort all the name tags so we first check the tags from the header file, we have a good chance of showing the right members than some name from a totally different class unrelated to the current file.

@techee
Copy link
Member Author

techee commented Aug 30, 2022

Before I forget, thinking about it now, we should probably store the header tags to the generated tag files too as they can be useful for global tags as well. I won't do it now (so we don't have to fight with modified unit tests if some rebases are needed), we should just keep it in mind.

@@ -69,7 +69,7 @@ static GHashTable *subparser_map = NULL;
{'u', tm_tag_union_t}, /* union */ \
{'v', tm_tag_variable_t}, /* variable */ \
{'x', tm_tag_externvar_t}, /* externvar */ \
{'h', tm_tag_undef_t}, /* header */ \
{'h', tm_tag_header_t}, /* header */ \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's called header in ctags but maybe we should call it tm_tag_include_t to make it clear this is something that's generated for every #include directive and not for every header file in project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

@elextr
Copy link
Member

elextr commented Aug 30, 2022

@techee "Yes" would have done ;-) but thanks for being explicit.

As for the documentation, yeah, "sorted by likely relevance" should do, just something to point out its not simply alphabetic.

@elextr
Copy link
Member

elextr commented Aug 30, 2022

Also I assume the header tags are not followed recursively, so symbols in includes included are not sorted higher? (not asking for it, keep this PR simple he says quickly :-)

@techee
Copy link
Member Author

techee commented Aug 30, 2022

Also I assume the header tags are not followed recursively, so symbols in includes included are not sorted higher? (not asking for it, keep this PR simple he says quickly :-)

See the last paragraph in my first post ;-). It wasn't much more complicated to implement, it just didn't seem to produce better results. Maybe if it were implemented as a separate sort step

...
- followed by tags from header,
- followed by tags from other directly included files,
- followed by tags from indirectly included files,
...

so it wouldn't compete with symbols from directly included files, it would work better but I'm not sure if it wouldn't complicate things too much and if the benefits are worth it.

@elextr
Copy link
Member

elextr commented Aug 30, 2022

See the last paragraph in my first post ;-).

Ahh.

I was thinking of where there was a "single include" situation where the direct include file does nothing much more than include others, but yeah it could be less good in other situations. Lets keep it simple for now.

@techee
Copy link
Member Author

techee commented Aug 30, 2022

I was thinking of where there was a "single include" situation where the direct include file does nothing much more than include others, but yeah it could be less good in other situations. Lets keep it simple for now.

Yeah, I was thinking that too but then the big fat headers like gtk.h "killed" humble project header files by the sheer amount of symbols they declare. But haven't tried the extra sort step which should mitigate this problem. But yeah, I agree this is something for a possible subsequent PR.

@kugel-
Copy link
Member

kugel- commented Aug 30, 2022

I'm wondering if it it's really appropriate for non-scope auto-completion. If I hack sidebar.c, why would sidebar.h symbols be more relevant than gtk_* symbols?

For scope auto-completion (this->) it maybe a different story.

@techee
Copy link
Member Author

techee commented Aug 30, 2022

I'm wondering if it it's really appropriate for non-scope auto-completion. If I hack sidebar.c, why would sidebar.h symbols be more relevant than gtk_* symbols?

Both are equally relevant but the question is when the autocompletion starts returning useful results. For GTK it will be only after you type a sufficiently long prefix like gtk_something_ when the symbol you want starts appearing. You definitely won't get anything useful when you just type g. On the other hand, if you want something from sidebar.h that starts with g, just typing the single g might already give you what you want. But when this is mixed with all the symbols from GTK, this symbol from sidebar.h gets lost in all the other GTK symbols and you won't get useful results after typing the g. In short:

  1. This doesn't improve (but also doesn't worsen) anything when you want a GTK symbol
  2. But improves autocompletion when you want a symbol from your own sources

@elextr
Copy link
Member

elextr commented Aug 31, 2022

@kugel- your comment is appropriate for C, but not for C++ (have I mentioned C++ is NOT C).

To repeat my comment from #3175, C may only have minimal declarations in headers, but C++ typically has far more, especially when using generic programming (ie templates), because the body needs to be available when the template is used.

For example Geany .h files are 20% of the .c files, Scintilla (which does not use templates) src/*.h is 30% of src/*.cxx but some of the headers in includes are also relevant so it would really be more, or one of my projects that uses templates the *.hpp files are nearly 50% of the *.cpp files.

So for C++ more names come from the associated headers (rather than random general headers) than is the case with C, and as @techee explained above the heuristic moves these names up the list, making it more useful.

Edit: also just to point out, nobody using C++ writes this-> so member names are not scope autocompletes.

@kugel-
Copy link
Member

kugel- commented Aug 31, 2022

I understand that for C++ scope-completion is more relevant, and for that I can understand the idea as you often do ClassName:: or this->

Anyway I won't argue further if you both agree it's a good idea. My worry is that all of Geany's headers (except sidebar.h) are penalized when editing sidebar.c and I have no reason to assume that I want sidebar.h symbols more than other Geany symbols (might even be that the opposite is true, not sure).

@elextr
Copy link
Member

elextr commented Aug 31, 2022

I understand that for C++ scope-completion is more relevant, and for that I can understand the idea as you often do ClassName:: or this->

I have to correct you there, as my edit above said, C++ programmers almost never do this-> (grep Scintilla source if you don't believe me), scope completion is not more relevant to C++, member names are most often not qualified. Please forget the Python idea of self.xxx, it does not apply to C++.

@techee
Copy link
Member Author

techee commented Aug 31, 2022

Edit: also just to point out, nobody using C++ writes this-> so member names are not scope autocompletes.

Apart from that, this isn't recognized or handled by scope autocompletion in any way right now.

My worry is that all of Geany's headers (except sidebar.h) are penalized when editing sidebar.c and I have no reason to assume that I want sidebar.h symbols more than other Geany symbols (might even be that the opposite is true, not sure).

Since tags from the current file are already placed above other symbols, you won't see much difference because symbols.h (assuming you are editing symbols.c) just contains declarations of the functions from symbols.c which are already placed above. However in C++ headers also contain members which are not declared in the source files and members are accessed frequently in C++ code and you want non-scope autocompletion for these more often than some random members from other header files.

Also note that the "penalization" already happens, it's just in the form of alphabetic ordering and more or less random in terms of what would be useful for the user. For instance, when you have symbols ga, gb, and gz in your project declared in some header files and for simplification we assume that all gtk symbols start with the gtk_ prefix, after typing g without this PR you get an autocompletion popup containing

ga
gb
gtk_...
gtk_...

Here, ga and gb "won" and gtk symbols were penailized, and gz was just killed by the amount of gtk symbols and has no chance to even appear in the list. After this PR, assuming the current file includes the headers containing these symbols, you get

ga
gb
gz
gtk_...
gtk_...

GTK gets penalized but to get the symbol you want, you'll have to type much longer prefix anyway so it isn't a problem.

This allows us to quickly access TMSourceFile based on the file name.
@techee
Copy link
Member Author

techee commented Sep 5, 2022

Just repushed to fix a small conflict in tm_workspace_is_autocomplete_tag() with the recently merged #3268.

Copy link
Member

@kugel- kugel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have overlooked, but is there something that prevents this logic to be applied to header files (if such avoidance even wanted)?

I can imagine a header file foo.h that include a identically-named file, either using #include_next <foo.h> or with include path trickery.

Actually thinking about it, it might even make sense to sort tags of the corresponding source file earlier, when editing a header file. For example when I want to create a declaration in a header file it's likely to be a function that might be already implemented in the corresponding source file.

@@ -742,9 +797,54 @@ static void fill_find_tags_array_prefix(GPtrArray *dst, const char *name,
}


/* fill 'includes' with the included files except the header corresponding
* to the current file which is returned in 'header' */
static void fill_includes(TMSourceFile *source, TMSourceFile **header, GHashTable *includes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it idiomatic to make this get_includes() that returns the included files list.

Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because sort_found_tags() is special-casing the corresponding header anyway and might make this function be useful in other situations as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it idiomatic to make this get_includes() that returns the included files list.

This is a relict of when I experimented with running this function recursively. Since it didn't work very well, it indeed looks better as you describe.

Also, I find it strange that the corresponding header is excluded from the included files list. I think it wouldn't hurt because sort_found_tags() is special-casing the corresponding header anyway and might make this function be useful in other situations as well.

Can do.

if (*header == NULL && g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
*header = hdr;
else
g_hash_table_add(includes, hdr);
Copy link
Member

@kugel- kugel- Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing hash collision detection. The ptr array source_file_map might contain TMSourceFiles with different basenames that have the same hash value. So here you'd need to check if the TMSourceFile matches with the header tag

Glib handles that, sorry for the noise

@@ -768,6 +871,16 @@ static gint sort_found_tags(gconstpointer a, gconstpointer b, gpointer user_data
return -1;
else if (t2->file == info->file && t1->file != info->file)
return 1;
else if (info->header_file &&
t1->file == info->header_file && t2->file != info->header_file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, if t1 is the corresponding header, t2 can't be (can it?). The && t2->file ... part confuses me and could be left out, right?

Same below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand the question correctly but

t1->file == info->header_file && t2->file != info->header_file

says that when we have two tags, t1 and t2, and t1 comes from header_file and t2 does not come from this header file, then t1 should be sorted in front of t2. The initial check for info->header_file non-NULL value is there because for global tags t1->file will also be NULL and then this check would do a wrong thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Tags are compared, not files. Nevermind.

@techee
Copy link
Member Author

techee commented Sep 9, 2022

I might have overlooked, but is there something that prevents this logic to be applied to header files (if such avoidance even wanted)?

There is nothing like that. I wanted to avoid hard-coding header file extensions and without this we can't say if the current file is a header or a source.

I can imagine a header file foo.h that include a identically-named file, either using #include_next <foo.h> or with include path trickery.

This case isn't currently handled (not sure if it's common enough and also the consequences won't be too bad). Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".

Actually thinking about it, it might even make sense to sort tags of the corresponding source file earlier, when editing a header file. For example when I want to create a declaration in a header file it's likely to be a function that might be already implemented in the corresponding source file.

Makes sense, we'd just have to do this reverse-include thing to lookup sources. To find headers, we can use the header tags for includes from ctags but we don't have anything in the opposite direction.

@kugel-
Copy link
Member

kugel- commented Sep 9, 2022

Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".

I think this assumption easily breaks down on larger (maybe proprietary) code bases. Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.

@elextr
Copy link
Member

elextr commented Sep 10, 2022

Also I kind of assumed with the code that all the includes are uniquely named and don't do anything special about the duplicates. The first found header with the given name "wins".

I think this assumption easily breaks down on larger (maybe proprietary) code bases. Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.

Agree that include path clashes is not uncommon, especially for portable software that has multiple backends, and therefore multiple versions of the same include. Can we use the path instead of just the name, presumably that is handled since Geany can open two files of the same name and symbols don't get mixed up (hopefully can't try ATM). That should be unique unless a global tags file overlaps local files.

Also agree that this PR should be kept simple, just make an issue for remembering it and do later.

@techee
Copy link
Member Author

techee commented Sep 10, 2022

I think this assumption easily breaks down on larger (maybe proprietary) code bases.

Yes, though the probability you are editing a file affected by this is probably low.

Perhaps we can improve by looking for common tags (tags for function declarations must match with the source file in question right?) but not necessarily in this PR.

That would be one possibility (though maybe a bit an overkill). I was thinking we could use a path-based heuristic assuming that related headers are stored closer to the corresponding sources on the file system. For instance in

/A/B/C1/D/foo.h
/A/B/C/D/foo.h
/A/B/C/D2/foo.c

the header for foo.c would be /A/B/C/D/foo.h because the identical path prefix (3 directories) is longer than for /A/B/C1/D/foo.h (2 directories).

@techee
Copy link
Member Author

techee commented Sep 10, 2022

Agree that include path clashes is not uncommon, especially for portable software that has multiple backends, and therefore multiple versions of the same include. Can we use the path instead of just the name, presumably that is handled since Geany can open two files of the same name and symbols don't get mixed up (hopefully can't try ATM).

If you mean what I said above, I agree. But note that we can't know exactly what header at what path to use. We only see includes like

#include "header.h"

but we don't know their exact path (which may be determined by e.g. gcc -I include_path)

@elextr
Copy link
Member

elextr commented Sep 10, 2022

but we don't know their exact path

We must have known a path to get symbols from the file?

@techee
Copy link
Member Author

techee commented Sep 11, 2022

We must have known a path to get symbols from the file?

We are talking about the situation where you have

#include "header.h"

in your source file and 2 identically named headers at paths

/A/B/C1/header.h
/A/B/C2/header.h

Symbols from which of these headers should be placed to the top of the autocompletion list when editing your source file?

@elextr
Copy link
Member

elextr commented Sep 11, 2022

Oh, I thought you were saying we didn't know the paths. Ok, clearly the best is:

/A/B/C1/include/header.hpp

😄

In other words we can't know how the project is organised, thats a known limitation of Geany's version of "lightweight" IDE, so I doubt there is a simple heuristic that will improve some organisations without making others worse, let alone improve all cases.

project/windows_backend/header.hpp:

class Something {
    windows_something_handle h;
public:
    Something();
   do_something();
    ~Something();
};

project/linux_backend/header.hpp:

class Something {
    linux_something_file_descriptor h;
public:
    Something();
    do_something();
    ~Something();
};

What type do we use for h when writing Something::do_something()?

I guess an associated question would be what would happen if both the declarations of Something are in the same file with an #ifdef deciding between them?

@techee
Copy link
Member Author

techee commented Sep 12, 2022

Ok, clearly the best is:

/A/B/C1/include/header.hpp

Ok, let's hard-code that path and the problem's solved forever ;-).

With your Something backend example the problem isn't so big because at least the member names are identical so non-scope autocompletion is going to work (of course scope autocompletion for h could be a problem).

What might be a bigger problem is some utils.h created at various places with different utility functions for say different submodules. Here the path-based heuristic I suggested or the @kugel- 's tag-similarity-based heuristic might work.

Anyway, the worst case that happens is that tags won't be sorted in the most optimal way which can however happen for many other reasons so right now I'd suggest to just keep the implementation simple and possibly improve it in the future.

I guess an associated question would be what would happen if both the declarations of Something are in the same file with an #ifdef deciding between them?

ctags extracts tags from both branches.

@elextr
Copy link
Member

elextr commented Sep 12, 2022

so right now I'd suggest to just keep the implementation simple and possibly improve it in the future.

👍

ctags extracts tags from both branches.

Ok, so the Something problem already exists, so its a separate problem not something (pun intended :-) changed by this PR.

@techee
Copy link
Member Author

techee commented Sep 14, 2022

Before I forget, thinking about it now, we should probably store the header tags to the generated tag files too as they can be useful for global tags as well. I won't do it now (so we don't have to fight with modified unit tests if some rebases are needed), we should just keep it in mind.

Thinking about it again, this won't work - for global tags we don't have file names stored for individual tags so having names of included headers won't help us. So leaving this as it is.

@techee
Copy link
Member Author

techee commented Sep 14, 2022

@kugel- I made the changes you requested in the latest commit. Is this all you requested or did I forget about something?

@kugel-
Copy link
Member

kugel- commented Sep 14, 2022

Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier? At least this is easy to explain (e.g. in the manual) and we won't get the wrong header (unless it's named completely different, of course).

Copy link
Member

@kugel- kugel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGBI, not tested myself!

Might squash the last two commits into earlier ones but it's your call.

@techee
Copy link
Member Author

techee commented Sep 14, 2022

Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier?

Good idea, can do, shouldn't be too complicated.

At least this is easy to explain (e.g. in the manual) and we won't get the wrong header (unless it's named completely different, of course).

I personally wouldn't explain anything to users :-). Basically, from user's perspective, the "AI" in Geany does its best to sort tags so the most relevant ones come at the top and that's it.

Might squash the last two commits into earlier ones but it's your call.

Yeah, that's the plan.

@elextr
Copy link
Member

elextr commented Sep 15, 2022

I personally wouldn't explain anything to users :-). Basically, from user's perspective, the "AI" in Geany does its best to sort tags so the most relevant ones come at the top and that's it.

Despite @kugel-'s documentation allergy ;-) we do need to note that this happens to avoid regular issues being raised for the list being out of alphabetical order, but I agree we can skip explaining the algorithm.

@techee
Copy link
Member Author

techee commented Sep 15, 2022

Despite @kugel-'s documentation allergy ;-)

Not @kugel-'s, mine :-). To be specific, I really think this is implementation detail that users shouldn't worry about.

we do need to note that this happens to avoid regular issues being raised for the list being out of alphabetical order, but I agree we can skip explaining the algorithm.

We already did it in the previous PR: a91a9c6

@elextr
Copy link
Member

elextr commented Sep 15, 2022

We already did it in the previous PR:

I have mentioned my poor, erm, thingy, umm, oh yeah, memory!!

So fine, nothing to do.

@techee
Copy link
Member Author

techee commented Sep 15, 2022

Regarding header ambiguity, why not keep it simple and accept /all/ headers with matching names to sort their tags earlier?

@kugel- Done in the last commit.

TMSourceFile *hdr = tm_files->pdata[j];
gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);

if (g_strcmp0(hdr_comps[0], src_comps[0]) == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be improved.

We're called for a foo.c and loop over all includes here. We have the file name already, and then lookup the TMSourceFile (and then for every TMSourceFile we check if the file name is foo.h).

Instead, we can already check for foo.h earlier (when looking up the TMSourceFile). And then we can return a copy (or ref?) of the GPtrArray as header_candidates instead of creating a new list. Something like this:

GPtrArray *tm_files = g_hash_table_lookup(theWorkspace->source_file_map, hdr_name);
if (g_strcmp0(hdr_name, src_comps[0]))

Perhaps it would make sense to separate this part out of get_includes() and make a distinct g_hash_table_lookup() for the expected header name? Hm but then it wouldn't depend on foo.c actually including foo.h (could also be seen as a good thing, not sure)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I understand completely what you suggest but it's true we can return the GPtrArray directly and avoid creation of the list - I just made a new commit doing this. Is this what you had in mind?

Note that the inner loop should still be there as we should also use all the includes of the same name similarly to the header file.

Separating extraction of header candidates to a separate function would be possible but it will probably lead to a lot of duplicated code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, g_ptr_array_find() was introduced in glib 2.54 which we don't have. Wouldn't be hard to implemented, it's jut an extra complication and maybe using the list is a simpler solution then (despite some extra allocation which I think won't matter much in this case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean you can return the GPtrArray right after g_hash_table_lookup(theWorkspace->source_file_map, hdr_name), before the inner loop. The GPtrArray has all headers with the same basename already and the inner loop only compares the basename once more.

The inner loop still adds to the includes hash table but it doesn't need to deal with the header candidates.

Ah, g_ptr_array_find() was introduced in glib 2.54 which we don't have
All files in the GPtrArray have the same base name, so you just need to compare the basename against ->pdata[0]

FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner loop still adds to the includes hash table but it doesn't need to deal with the header candidates.

Yeah, got it now, fixed in the last commit.

FWIW, we will probably bump GTK dependency to 3.24 soon and then we also get glib 2.54+ (which is a bit older isn't it?).

That would fix the problem. 2.54 is from 2017.

{
guint j;

for (j = 0; j < tm_files->len; j++)
if (!*header_candidates)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't any caller that passes non-NULL, right? I would define get_includes() such that it always writes to *header_candidates (unless there are none).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check

if (!*header_candidates)

is to avoid doing the comparisons below if we already set header_candidates in the previous iteration. Also notice *header_candidates = NULL; at the beginning of the function.

What you describe would be

if (!header_candidates)

(i.e. without the dereference)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I didn't notice the outer loop. So I guess the idea is to avoid multiple assignments to *header_candidates in case a file is included multiple times (maybe legitimately, via different directories)?

So I think this is alright now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I guess the idea is to avoid multiple assignments to *header_candidates in case a file is included multiple times (maybe legitimately, via different directories)?

This is just an optimization so the code inside the if block doesn't have to be executed if we already found header_candidates.

{
TMSourceFile *hdr = tm_files->pdata[j];
TMSourceFile *hdr = tm_files->pdata[0];
gchar **hdr_comps = g_strsplit(hdr->short_name, ".", 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this only now. Doesn't that break on files with multiple dots in name? I thought we want to compare the base name without extension, i.e. up until the last dot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Well, I could g_strdup() the string, use strrchr() to find the . and replace it with \0 which would give us the base name. Or is there some simpler solution?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would use g_path_get_basename() instead of g_strdup(). But it doesn't matter much so yea I don't know a simpler solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the intention here is to get the filename without the last extension, g_path_get_basename() does not do that. But utils_remove_ext_from_filename() does.

Note that there are other strsplit()s above that may need changing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that. g_path_get_basename() (or g_strdup()) must be combined with strrchr() and setting the last dot to NUL as per @techee suggestion. utils_remove_ext_from_filename() does that and is a good candidate (though I'm not sure if @techee wants to use Geany utility function in tagmanager code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if @techee wants to use Geany utility function in tagmanager code

AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if @techee wants to use Geany utility function in tagmanager code

AFAIK there is no "upstream" for tagmanager, and this PR is modifying it anyway, so why not.

No problem with using Geany utility functions in TM, I just think that we should do some bigger Geany-fication of TM code separately (once the TM code stabilizes and there are no bigger PRs pending - which will hopefully be soon). So for now I just used the strrchr() in the fix. g_path_get_basename() gets called previously already so we don't have to do that here.

@elextr
Copy link
Member

elextr commented Sep 21, 2022

which will hopefully be soon

@elextr gazes at sky and whistles ... 😁

Copy link
Member

@kugel- kugel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@techee
Copy link
Member Author

techee commented Oct 31, 2022

Should I leave the PR as it is or rebase/squash some of the commits first?

@kugel-
Copy link
Member

kugel- commented Nov 2, 2022

Squashing makes sense. You could probably even reduce to the 3 initial commits as the other ones are fix-ups that popped up during review.

Enable header tag generation in ctags, introduce new tm_tag_header_t
TMTagType. Disable usage of this tag for normal Geany usage such
as symbol tree, tag generation or autocompletion. We'll use this
tag type only for sorting autocompletion symbols.
This patch extends the current sorting of the tags which looks this way:

- sort local vars first (with highest line number first),
- followed by tags from current file,
- followed by workspace tags,
- followed by global tags

by two new entries so it looks this way:

- sort local vars first (with highest line number first),
- followed by tags from current file,
- NEW: followed by tags from header,
- NEW: followed by tags from other included files,
- followed by workspace tags,
- followed by global tags

Sorting tags from included files seems to be a natural heuristic as users
explicitly included these and want their symbols to be present in the
current file. As a special case, symbols from the header of the current
source file (included file with the same base name as the current file)
are listed in front of symbols from other included files as headers
tend to be tightly coupled with corresponding source files in C/C++.
@techee
Copy link
Member Author

techee commented Nov 3, 2022

@kugel- Yes, that's what I thought, done.

So unless there are any objections, I'll merge this PR in about a week.

@kugel-
Copy link
Member

kugel- commented Nov 4, 2022

Great

@techee techee merged commit f1c3665 into geany:master Nov 11, 2022
@b4n b4n added this to the 1.39/2.0 milestone Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag-manager Tag manager related patches and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants