New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make geany -g tags output reproducible #1989
Conversation
without this patch, openSUSE's glfw package would always differ in /usr/share/geany/tags/glfw.c.tags because inode numbers differ between builds (that happen in disposable VMs)
|
You don't say what the actual problem is, but I presume your problem is that the order of tags changes because the order of files in the list This change will also fail to de-dup paths that are linked to the same file, but hopefully that will be rare. If we can live with that, performance wise this should be better (stated without benchmarking of course :) because it no longer stats the filesystem to make a hash key. |
|
@elextr we did list files explicitly in https://build.opensuse.org/package/view_file/openSUSE:Factory/glfw/glfw.spec line 88 (still missing a |
Geany either:
Both of these do it in the hash table list order. So, maybe to ensure reproducible output you can pre-combine the files you want in the order you want by running the C preprocessor first and then pass the result of that to Geany. If you can get reproducible output without depending on implementation details that might change to support other languages besides C, that would be good. |
|
@elextr I do not understand 'This change will also fail to de-dup paths that are linked to the same file' Diffs looked like http://rb.zq1.de/compare.factory-20180730/glfw-compare.out --- old//usr/share/geany/tags/glfw.c.tags 2017-02-12 12:00:00.000000000 +0000
+++ new//usr/share/geany/tags/glfw.c.tags 2017-02-12 12:00:00.000000000 +0000
@@ -525,23 +525,23 @@
DN_MULTISHOTÌ65536Ö0
DN_RENAMEÌ65536Ö0
DTTOIFÌ131072Í(dirtype)Ö0
-DT_BLKÌ4Îanon_enum_26Ö0
+DT_BLKÌ4Îanon_enum_29Ö0
DT_BLKÌ65536Ö0
-DT_CHRÌ4Îanon_enum_26Ö0
+DT_CHRÌ4Îanon_enum_29Ö0
DT_CHRÌ65536Ö0
[...]
-KHR_xlib_surfaceÌ64Î_GLFWlibrary::anon_struct_20Ö0ÏGLFWbool
+KHR_xlib_surfaceÌ64Î_GLFWlibrary::anon_struct_4Ö0ÏGLFWboolSo is there any reason against merging this change? |
Yeah, it looks like that, which seems absurd. My guess would be that the goal was to do what @elextr expects, that is using the inode itself as a key, which indeed would have the nice property of avoid re-computing the same file even under a different path, but as is it actually leads to never match any entry (as However, this function seems a bit absurd. I don't see why it doesn't simply builds the list directly, because traversing a GHashTable is not very fast, and IIUC it wouldn't change anything. So we could probably change it to just do that, which would be simpler and fix everyone's concerns. Also, relying on the order of traversal of a GHashTable seems fragile, I don't think it is guaranteed to be stable across GLib versions, architectures or whatnot. |
|
@b4n, no the current code works just fine, the hash key is the inode, which is an Originally a list was used but then it was de-duped, so likely thats why it got changed to a hash but that change is copied from Anjuta, so who knows. Not sure why duplicates are a problem, TM should handle multiple definitions of the same tagname? @b4n TM-spurtese needed. @bmwiedemann did you try running |
Not it's not, it's hashed as an inode, but the key is a filename, so the equal function will compare the string pointers. Well, if we used a hash table as a mean of deduping and a list as a mean of well, building the list, it would not be a problem. |
ahhh, ok. So a) it doesn't work now because its wrong, and b) comparing strings won't actually work either if both paths resolve to the same inode, the hash table will see the hash collide but the strings being different and add it again. So using inode as the hash is just a slow way of making a hash of a file path. In fact Anjuta seems to have made a right hash of the whole thing [pun intended]. Maybe thats why it switched to sqlite for its tags later.
Yeah, since 2.40 |
Well, the inode gets a goodish hash, but the equal check makes it worse than a string hash and comparison on the file path yeah.
The code already checks if the hash table contains the element before inserting it, so it should be OK no matter which version is used. |
Except that it will always fail to find it due to the bug, so it gets added anyway.
Well, as I said above, everything gets added because a hash collision won't stop different paths being added, and due to the bug all paths are seen as different at the moment anyway, so duplicates so far havn't caused any "trouble" (the word from the original de-dup comment). @b4n do you know what "trouble" they might cause? So clearly the inode stuff is not useful, and we should simply use |
|
Isn't the inode only unique within the same file system? We can't assume all files are on the same fs. |
|
@kugel- true. But it doesn't matter, its a hash not the key, they can collide, it just costs time if they do. Nothing should break. Also what do you get for "inode" for files served via Samba? But the fact that the key comparisons are wrong in the current code means it just creates a list of all the files using a g_hash in pseudo random order, then copies that list to a g_list, no de-duping, nothing, nada, useless waste of time. So unless somebody can identify the "trouble" having multiple copies of the same file will cause (apart from some space and time) then I think we should just expand the globs straight into the g_list and thats it. |
|
I agree, keep it simple, especially if it allows this issue to be resolved.
|
|
Also waiting to see if the OP had success Even if we go the |
|
yes, cat'ing files on the caller side also helps: -geany -c geany_config -g glfw.c.tags $(find src \( ! -name CMakeFiles \) -type f \( -iname "*.c" -o -iname "*.h" \) \( ! -iname "win32*" \) \( ! -iname "cocoa*" \) | sort
-) include/GLFW/glfw3.h
+cat $(find src \( ! -name CMakeFiles \) -type f \( -iname "*.c" -o -iname "*.h" \) \( ! -iname "win32*" \) \( ! -iname "cocoa*" \) | sort) include/GLFW/glfw3.h > tags.in.h
+geany -c geany_config -g glfw.c.tags tags.in.hbut having tools themselves more deterministic is still a plus, because it avoids having to patch all callers. |
Sure, but its a workaround until Geany is changed. Also in many cases the callers should care which order files are included, for context sensitive languages like C/C++ knowing that an identifier is a type name is important to correct parsing. So you want to control order so that the declaration of the type name is before the use. Essentially you want things to be seen in the order that the compiler will see them. So the files need to be explicitly specified in the correct order. That means Geany should keep the order the same as the command line, except that where a command line item is a glob, it is expanded to all matching items. This behaviour is deterministic so meets both requirements. |
|
See #1991 |
Without this patch, openSUSE's glfw package would always differ in
/usr/share/geany/tags/glfw.c.tagsbecause inode numbers differ between builds (that happen in disposable VMs)
This was previously discussed in https://bugzilla.opensuse.org/show_bug.cgi?id=1049382