-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libzim creates (again) invalid title indexes #688
Comments
@mgautierfr @veloman-yunkan Could on of you please have urgently a look to this. Pretty concerned that we have a libzim in the field which creates broken ZIM files. |
@kelson42 I will take care of it |
@veloman-yunkan Merci |
The wrong order is for the pair of articles with the following titles:
Looks like the asterisk symbol in the beginning of the title of the second article was ignored when sorting the title index. Now investigating why that happened. |
It's quite a nasty title ^^ But it's the original one unfortunately : https://fr.ifixit.com/Tutoriel/*Read+Windows+EOL+warning*+How+to+install+the+Xbox+One+Wireless+Receiver+1713+on+Windows+7+and+Windows+8.1/102955 |
The previous hypothesis was wrong. The root cause of the problem has something to do with the handling of redirects.
|
Or else the zeros in the title index are a result of some kind of memory/storage corruption. |
The value |
@veloman-yunkan Ouch... was kind of hoping this was a bug in the checking part of the algorithm :( Good luck for next steps. |
@veloman-yunkan : Are you sure it is not 1442 times? |
This is probably the issue. This way we will exclude removed dirent from the title index : https://github.com/openzim/libzim/blob/master/src/writer/titleListingHandler.cpp#L83-L85 |
Mmm, the existing dirent is here for real and it is legit, it is not removed and must be in the index as far as I have understood. It is more that when we try to add the a second one, the entry is not added (since duplicate) but it is still present in the index. So we have one with the appropriate index, one with 0.
You can reproduce the bug very easily by adding twice the same entry I believe. You will get the error I got in the logs, an entry with appropriate index and another one with 0. We must just remove the one with 0 from the index, isn't it?
|
@veloman-yunkan I've made a PR already (#690) |
@benoit74 No, the count of zeros in the title index is 1440. But it's wonderful that you could link it to the count of errors in the ZIM creation log. |
We have a new ifixit.com scraper which uses python-libzim. see openzim/ifixit
A first test ZIM file has been created withthe Zimfarm and is available at https://mirror.download.kiwix.org/zim/.hidden/dev/ifixit_fr_all_2022-04.zim
But the ZIM file seems to have an invalid structure. Here is the
zimcheck
output:Obviously this is a blocker!
The text was updated successfully, but these errors were encountered: