-
-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method to get the cluster size of a ZIM file #83
Comments
@mgautierfr Might be interesting as your are polishing zimmerge/zimpatch Kiran's code |
@mgautierfr The problem here is that this is not written in the ZIM. Could we add it as a new header? |
No :) It would be a change (potentially incompatible) in the zim format. |
Yes, I ask for a new API. Considering that the Header can be extended without breaking the format, a default value shoudl just set. |
There is no way to access cluster outside of libzim. And I want it to stay that way. (The same with dirent) Internal things must stay internal. What is your use case ? How this would help you ? |
@mgautierfr |
How they could use it ? |
These tools should should be able to know what is the ZIM cluster size to be able to then recreate one with the same clustersize. |
If you provide the same content and compress the same way, the cluster should have the same size. |
@mgautierfr How do I guaranty "the same way" if I ignore the cluster size? Considering the cluster is a variable in the compression process. |
I cannot tell you how to avoid to use the cluster size if you don't tell me how you would use it. Honestly, I don't think is is possible to zimpatch to guaranty this "same way" using libzim. I've already change the way zim's creator decides (and when) to close a cluster and this is totally internal. There is no way to force a specific set of articles to go in the same cluster. We cannot guaranty that articles will be written in the same clusters the same way the x years old implementation was doing it. |
We don't have the information about a cluster size (except by decompressing it and see how much have been read). |
zimdiff/zimpatch must be able to rebuild exactly the same file like the original one. Do achieve this, they need to know what was the cluster size used by the zimwriter. The zimlib should implement a solution to deliver this information.
Moved from https://phabricator.wikimedia.org/T54082
The text was updated successfully, but these errors were encountered: