Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicate cluster writing #264

Open
kelson42 opened this issue Aug 10, 2019 · 2 comments

Comments

@kelson42
Copy link
Contributor

commented Aug 10, 2019

Currently clusters, which represent 99% of the whole size of a ZIM files, are written two times to the file system:

  • Each cluster is written to a file first
  • Then all the clusters are concatenate to a ZIM file.

If we could do everything in one pass, this would:

  • Save disk usage
  • Save time because fs access are slow

We might be able to do so by just writing ZIM files on the fly. That would imply to keep the header at the beginning on the file, but to write dirents + indexes at the end. Would should be able to do so without modifying the ZIM spec (probably by pre-allocating enough fs space for the variable part of the header: mimetypes, etc...)

@kelson42

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

Reopen, I guess this was the wrong ticket written in the PR

@kelson42 kelson42 reopened this Aug 21, 2019
@mgautierfr

This comment has been minimized.

Copy link
Collaborator

commented Sep 2, 2019

Yes, this PR is still open, I wrongly closed it.

https://manpages.ubuntu.com/manpages/disco/en/man2/fallocate.2.html could help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.