Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: build against ZIMs from 2021 #77

Merged
merged 15 commits into from
Feb 15, 2021
Merged

refactor: build against ZIMs from 2021 #77

merged 15 commits into from
Feb 15, 2021

Conversation

lidel
Copy link
Member

@lidel lidel commented Feb 7, 2021

This is a WIP PR that attempts to close #66, #24 and unblock #60, #61

Preview for Turkish Wikipedia (from snapshot wikipedia_tr_all_maxi_2021-01.zim)
(note it is WIP, root cid may change, not pinned anywhere yet, so only linked pages will work reliably)

TODO

We no longer need to add .html to every article,
Images and JS scripts are at different paths.

Most of image cistomizations are no longer needed due to move to WebP,
but keeping them to limit the size of changed code surface.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@kelson42
Copy link

@lidel I see you have updated the Dockerfile which is really super important to me. But there is not documentation about how to use the Dockerfile. I would have expected the Dockerfile to take a few arguments like:

  • Source ZIM path
  • Temporary directory
  • Storage directory
  • Pinning?
  • ....

@lidel
Copy link
Member Author

lidel commented Feb 10, 2021

@kelson42 Docker image is not yet usable, we need to wire up Main Page (openzim/zim-tools#219) detection first. I will add Docker inputs/outputs to the README, when ready.

This requires zim-tools 2.2.0 or later

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
This puts original ZIM file in the root directory:

- makes unpacked version easier to audit
- mitigates the problem of links pointing at the source ZIM archive
  disapearing after 3 months (kiwix is unable to pay for infinite
  hosting)
- enables us to start experimenting with ZIMs read from IPFS without
  unpacking them, and comparing with the unpacked version

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
Keeping it for now in case anything there is useful
for providing search feature.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
Base automatically changed from master to main February 14, 2021 02:13
- use native link styles
- add heartbeat

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
We copy "kiwix main page" to /wiki/index.html
his way original one can still be loaded if needed
and we can decide to manually change the /wiki/ landing
after the build is done without the need for re-running it.

Example for tr:
  /wiki/
  /wiki/index.html
    → https://tr.wikipedia.org/wiki/Kullanıcı:The_other_Kiwix_guy/Landing
  /wiki/Anasayfa
    → https://tr.wikipedia.org/wiki/Anasayfa

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
This is a workaround to fix issue described in:
openzim/zim-tools#224
@lidel lidel marked this pull request as ready for review February 15, 2021 14:54
@lidel
Copy link
Member Author

lidel commented Feb 15, 2021

Ok, this PR produces pretty good version of Turkish wikipedis, I'm going to merge this as-is.
We will tackle smaller bugs in separate issues (fill them is you find any!), and will confinue Turkish topic in #60

@lidel lidel merged commit 0739d06 into main Feb 15, 2021
@lidel lidel deleted the fix/build-2021 branch February 15, 2021 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch to zimdump from zim-tools links should be exactly the same as wikipedia
2 participants