Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIMit 2.0 #193

Closed
kelson42 opened this issue May 26, 2023 · 7 comments
Closed

ZIMit 2.0 #193

kelson42 opened this issue May 26, 2023 · 7 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

kelson42 commented May 26, 2023

Since it launches end of 2021, Zimit has proven to be an interesting tool to build efficiently portable/versatil offline version of "random" Web sites. But what we have is only a version 1.0 and it still suffers of many weaknesses.

The most important weakness is that it relies on ServiceWorkers. Because of this we:

  • have to deal with all the HTTPS/Certificate stuff with Kiwix Hotspot
  • have to modify all our readers and this is almost impossible to achieve

We have worked hard to improve the situation the last two years but this has proven to be a really serious and challenging issue. Not only for us as software publisher, but as well for users which were facing a lot of trouble of dealing with this kind of ZIM.

Fortunately, after studying in detail how the URL rewriting works (this is what all of this is about), we have achieved to make a POC version of ZIMit 2.0 which provides the same level of feature but without using ServiceWorker. In a nutshell this POC:

  • Stores rewritten HTML/JS/CSS in ZIM file (URL rewritting using code of Pywb)
  • Load Wombat (no SW needed) to do URL rewriting live in Web Browser
  • Modify libkiwix to allow a few things like for example data-driven fuzzy URL matching

Here a screencast of the POC (wiht local ZIM file of kiwix.org)

We should now schedule the ZIMit 2.0 project so we can release it before end of 2023.

@kelson42 kelson42 added this to the 2.0.0 milestone May 26, 2023
@kelson42 kelson42 added the task label May 26, 2023
@kelson42 kelson42 self-assigned this May 26, 2023
@rgaudin
Copy link
Member

rgaudin commented May 26, 2023

Some important information for those who've been struggling with Service Worker and want some details:

  • Role of SW is to replace a Server Backend. In the context of WR's Replayweb.page, serverless is a prerequisite.
  • In ZIM/Kiwix context, we already have a ZIM reader that can serve as a backend.
  • A very important constraint we assigned ourselves when building zimit was that we wanted to create regular ZIM files. We were not creating something else and thus shouldn't have to adapt our readers to zimit-made ZIMs.
  • We still had to adapt to SW but figured “it's standard web technology”.
  • 3y later, SW requirement proved to be too much of a pain, due to it requiring a Secure Context.
  • This new approach indeed works as follows:
    • We still store WARC Headers as individual ZIM entries
    • We still store WARC Payload as individual ZIM entries but:
      • Those are not raw Payload from crawler: HTML, CSS and JS entries go through pywb's Rewriter first
      • We also insert wombat-init-variables into every HTML entries
    • we don't include wabac.js anymore so we don't register a SW nor have a UI (iframe) nor do we manage missing entries (404)
    • Wombat is included directly (was coming with wabac.js) and has the same role: rewriting JS-emmited events
    • libkiwix (kiwix-serve or other readers) test the fuzzy rules on unfound requests.
    • We want to store the fuzzy rules inside the ZIM probably and have them consumed by libkiwix. This will be available to all ZIMs.
    • libkiwix will also use the HTTP headers from the WARC Headers when sending the response. This will also be available to any ZIMs ; might help with some duplicate cases.
    • libkiwix will also conditionnaly (maybe via a ZIM private tag) rewrite the response to replace some known variables that are required for wombat to work (like $SERVER_URL = "http://172.16.16.4:8080/my-zim/";)
  • Some of those details may change in the future. Check the POC details:

This move shall (it's only a proof-of-concept) free us from the most pressing issue we face with zimit and allow us to focus on other features. It also makes WARC more important for us and maybe ZIM more attractive to WARC users.

Interested parties are encouraged to subscribe to this ticket to be notified when implementation starts. We'll then probably look for real-world use cases to test the solution against

@rgaudin
Copy link
Member

rgaudin commented Jun 1, 2023

Found another scenario that could benefit from part of this solution.

With libkiwix reading/parsing/serving custom HTTP headers for entries, it would be possible for a ZIM reader to return an HTTP redirect for an entry to another URL (in-ZIM or not) with just a single lightweight H/ entry.

Not sure if wanted though.

@BenjaminJMueller
Copy link

Hi everybody,

any news on this/ on a date, when the beta will be available?
We are waiting with bated breath for the possibility to use Zimit Files in KiWix.

All the best,
Benjamin

@kelson42
Copy link
Contributor Author

From now, my best guess is 6 to 12 months.

@Jaifroid
Copy link

See also kiwix/overview#95

@kelson42
Copy link
Contributor Author

@rgaudin @benoit74 I propose to close this issue as this is not really helpful anymore. zimit2 will be released in the next week, there is a milestone and this is enough to track progress.

@benoit74
Copy link
Collaborator

benoit74 commented Jun 3, 2024

Other interesting issue to link for the posterity: kiwix/overview#95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants