New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tm: Cleanup include lookup #1991

Merged
merged 3 commits into from Nov 13, 2018

Conversation

Projects
None yet
3 participants
@b4n
Copy link
Member

b4n commented Nov 10, 2018

Don't use the files inode as the hash. Although it looks like a good idea for de-duplicating links as well, it has several issues, including non-uniqueness of inodes across file systems.
The way it was done hashing the inode but comparing the file name string pointers also made the hash mostly irrelevant, as it just stored filenames sharing the same inode in the same hash bucket but without
actually doing any de-duplication, making the whole thing a convoluted way of converting to a list.

Instead, hash and compare the filenames themselves, which, even though it doesn't handle links de-duplication, is better than the non-functional previous code.

Also, directly build the list and only use the hash table as a way for checking for duplicates, which is both faster and gives a stable output.

See #1989

Show resolved Hide resolved src/tagmanager/tm_workspace.c Outdated
Show resolved Hide resolved src/tagmanager/tm_workspace.c Outdated
@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Nov 10, 2018

The whole tm_file_inode_hash function can also be dropped, because it is unused

tm: Cleanup include lookup
Don't use the files inode as the hash.  Although it looks like a good
idea for de-duplicating links as well, it has several issues, including
non-uniqueness of inodes across file systems.
The way it was done hashing the inode but comparing the file name
string pointers also made the hash mostly irrelevant, as it just stored
filenames sharing the same inode in the same hash bucket but without
actually doing any de-duplication, making the whole thing a convoluted
way of converting to a list.

Instead, hash and compare the filenames themselves, which, even though
it doesn't handle links de-duplication, is better than the
non-functional previous code.

Also, directly build the list and only use the hash table as a way for
checking for duplicates, which is both faster and gives a stable
output.

@b4n b4n force-pushed the b4n:tm-lookup-includes-cleaup branch from 08a3892 to fc6a9bb Nov 12, 2018

@b4n

This comment has been minimized.

Copy link
Member Author

b4n commented Nov 12, 2018

Oops, that's what you get for making changes and committing in a hurry.
Fixed, and this time I built it before, and ran our tests.

@bmwiedemann

This comment has been minimized.

Copy link

bmwiedemann commented Nov 12, 2018

There is 6-9 lines of code duplication around the g_list_prepend calls - could be improved in a later PR.

@b4n

This comment has been minimized.

Copy link
Member Author

b4n commented Nov 12, 2018

@bmwiedemann what do you mean, between the glob and non-glob versions? I don't really mind given how the glob conditional is not trivial.

@b4n b4n force-pushed the b4n:tm-lookup-includes-cleaup branch from 998760c to 8b68c5a Nov 12, 2018

@b4n

This comment has been minimized.

Copy link
Member Author

b4n commented Nov 12, 2018

I just added an extra 2 things here:

  1. process files in the order they appear on the command line. Before, the order was stable for a given command line, but was reversed, which IMO is clearly not the expected behavior. This said, basically nobody should care.
  2. I added a test case that verifies the processing order.
@elextr

This comment has been minimized.

Copy link
Member

elextr commented Nov 12, 2018

LGBI

Is the runner.sh change really part of this, or is it a general change that possibly should be separate?

@b4n

This comment has been minimized.

Copy link
Member Author

b4n commented Nov 12, 2018

Is the runner.sh change really part of this, or is it a general change that possibly should be separate?

Well, it's not really specific to this, but it's needed for this test case because it requires parsing more than one input file at once, which the current automated setup doesn't allow. So yeah, it can be used by more test cases in theory, but in practice until now we didn't have a use case.

@elextr

This comment has been minimized.

Copy link
Member

elextr commented Nov 12, 2018

but it's needed for this test case because it requires parsing more than one input file at once

Thats fine then.

@b4n b4n merged commit 8b68c5a into geany:master Nov 13, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

b4n added a commit that referenced this pull request Nov 13, 2018

Merge pull request #1991 from b4n/tm-lookup-includes-cleaup
Process files in the order they appear on the command line when
generating tags file, instead of a more or less random order.

Closes #1989.

@b4n b4n added this to the 1.34 milestone Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment