Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to use the webRequest API (as a fallback if ServiceWorker is unavailable) to inject the content in pages #193

Closed
mossroy opened this issue Apr 3, 2017 · 11 comments

Comments

@mossroy
Copy link
Contributor

mossroy commented Apr 3, 2017

https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/webRequest

When I discovered in #187 that ServiceWorkers were not supported in WebExtensions in Firefox, I opened https://bugzilla.mozilla.org/show_bug.cgi?id=1344561 to ask for it.
I was given a workaround with the webRequest API, which would need to be extended a bit : https://bugzilla.mozilla.org/show_bug.cgi?id=1255894

If this last feature is implemented in Firefox, it might replace the ServiceWorker. Maybe it's a third option after jQuery and ServiceWorker modes?

@mossroy mossroy added this to the v2.1 milestone Apr 3, 2017
@mossroy mossroy self-assigned this Apr 3, 2017
@mossroy mossroy changed the title Use the webRequest API instead of a ServiceWorker to inject the content in pages Try to use the webRequest API (as a fallback if ServiceWorker is unavailable) to inject the content in pages Apr 22, 2017
@mossroy mossroy modified the milestones: v2.3, v2.2 Jul 18, 2017
@mossroy
Copy link
Contributor Author

mossroy commented Sep 21, 2017

The API has been implemented by Mozilla, and will be available with Firefox 57

@mossroy
Copy link
Contributor Author

mossroy commented Dec 12, 2017

The filterResponseData function of WebRequest API has been implemented in Firefox >=57 : https://bugzilla.mozilla.org/show_bug.cgi?id=1255894

The WebRequest API itself is cross-browser BUT, for now, the filterResponseData function (which should allow us to create the response) is not supported by Chrome/Edge/Opera. See https://developer.mozilla.org/Add-ons/WebExtensions/API/webRequest and https://developer.chrome.com/extensions/webRequest

But maybe Chrome might eventually implement this function. It does not seem to be planned, but has been asked by many extension developers :
https://bugs.chromium.org/p/chromium/issues/detail?id=104058
https://bugs.chromium.org/p/chromium/issues/detail?id=487422

In any case, this API is only available for browser extensions

mossroy added a commit that referenced this issue Dec 18, 2017
And try to use it (not working at all, yet)
See #193
@Jaifroid
Copy link
Member

Do you have a sense of what the advantages of using the WebRequest API would be, if Chrome (and Edge) eventually support filterResponseData? Cleaner code or a performance advantage? Would we need fewer transformations on the intercepted data than we currently do? We currently directly intercept and transform the data coming from the decompression engine, some of it before the HTML is injected into the DOM, and some afterwards using jQuery DOM manipulation.

@mossroy
Copy link
Contributor Author

mossroy commented Dec 29, 2017

In jQuery mode, we need to transform the DOM. And there are many caveats to do that. For example, we inject the images by looking for tags. It works most of the time, but it will miss many other ways to display images : background images, CSS styles, javascript etc. It's the same thing for hyperlinks (that could be triggered by some javascript, image maps etc)
Another issue is that we have to handle manually the multi-threading, throttling, prioritizing etc.
I think using jQuery to transform the DOM is really ugly. But we had no other choice for many contexts.

The ServiceWorker or WebRequest modes should not have these issues. Because the browser traps all the HTTP requests, we can not miss any of them. And the optimization should be handled by the browser itself (which knows how to do this very well) : fetching the visible elements first for example.

I say "should" because those 2 modes are not fully working (it works mostly for ServiceWorker, but not yet for WebRequest), and it's not supported in all environments :

  • ServiceWorkers are supported by not-too-old Firefox and Chrome, and soon by Edge (https://caniuse.com/#feat=serviceworkers), but has some limitations : on Chrome, it does not work on file:// URLs; on Firefox, it does not work inside an extension
  • The WebRequest API is only supported inside an extension. And the filterResponseData (that is necessary for us) is currently only supported by Firefox

In short, with one of those 2 modes, we should need no transformation of the DOM, and it should be faster (it's already noticeable in the already-almost-working ServiceWorker mode, even if its code can be optimized).
I've started working on the WebRequest mode in the branch https://github.com/kiwix/kiwix-js/tree/first-webRequest-API-attempts , but it's still a work-in-progress. I'd like to see if it can work and if it's easy to implement.

@Jaifroid
Copy link
Member

Jaifroid commented Dec 29, 2017

OK, thanks for the helpful and interesting explanation. So it sounds like we're looking for ways to implement something like a mini web server that the decompression engine can sit behind. I wonder if Web Workers might also be used in a similar way.

@mossroy
Copy link
Contributor Author

mossroy commented Dec 29, 2017

Yes, ServiceWorkers act a bit like a transparent proxy server between the DOM and the Internet. And apparently the WebRequest API can do similar things.
WebWorkers are useful to run tasks in a separate thread and context. But they can't intercept HTTP requests by themselves.

mossroy added a commit that referenced this issue Jan 3, 2018
And try to use it (not working at all, yet)
See #193
@mossroy
Copy link
Contributor Author

mossroy commented Jan 3, 2018

I'm afraid it will not be possible to use the webRequest API for now.
I opened https://bugzilla.mozilla.org/show_bug.cgi?id=1427747 because this API can not trap the requests generated by our content (because they are on the moz-extension protocol, and their current implementation of webRequest only traps HTTP or HTTPS protocols).

@Jaifroid
Copy link
Member

Jaifroid commented Jan 3, 2018

That's a shame, @mossroy , but thanks for trying.
There are two other possibilities I can think of for improving performance (I realize these are separate issues, but just for quick comparison of options):

  • We could reduce our dependency on slow jQuery trawls through the DOM with some fast regular expressions. I have some ideas I could try out. It's no less "ugly" than doing DOM trawls, but they are very fast.
  • The main bottleneck is still decompression with xzdec, so another avenue might be looking into compiling with Emscripten one of the more recent standard ZIM-reading tools rather than relying on the old xz tools. But I imagine this would be quite a radical change, and it's quite daunting to know how to go about it, despite the Emscripten promise of being able to compile most C++ into JavaScript.

@mossroy
Copy link
Contributor Author

mossroy commented Jan 3, 2018

Yes, that's bad news.

Regarding your proposals :
Replacing some jQuery DOM modifications by some regexp on the raw HTML code : maybe. But we have to balance the speed gains with the extra complexity it adds : regular expressions are much less readable than jQuery commands, and jQuery handles all the cross-browser issues. As you also said, it's probably not the main bottleneck
Regarding xz decompression, we might backport the code of @sharun-s (that runs it in a webworker). But it's probably a bit of work. I thought #259 might do the same, but I have no news from the maintainer.

We had other ideas to improve the speed of jQuery mode : prioritizing the image loading (by backporting your code), caching the DirEntry (by backporting @sharun-s code)

Regarding emscripten, #219 might also help, and what you describe is #116 . It would certainly improve performance and compatibility, but relies on the WebAssembly API, that is now supported on most latest browsers, but not older ones. See http://caniuse.com/#feat=wasm . In any case, I really think we will need to implement #116 because maintaining the low-level backend code that reads ZIM files will probably become more and more complicated.

@Jaifroid
Copy link
Member

Jaifroid commented Jan 3, 2018

OK, thanks @mossroy . #116 looks interesting -- is it for sure dependent on Web Assembly? Maybe I should ask on that thread. What confuses me about the file access issues there is that we already have an Emscripten-compiled binary that works on very large files and that doesn't use Web Assembly, so clearly some kind of large file access is working with Emscripten compilation. Problem with Web Assembly from my perspective is that it looks like it will never be ported to Windows Mobile. Edge will get it, but not Mobile at least in its current form (MS may be merging Mobile fully into Windows 10, but that will only be for future, currently "mythical" mobile devices).

@mossroy mossroy modified the milestones: v2.3, v2.4 Jan 14, 2018
@mossroy
Copy link
Contributor Author

mossroy commented Apr 29, 2018

I close this issue, as it is not possible to use this API (https://bugzilla.mozilla.org/show_bug.cgi?id=1427747 closed as WONTFIX).
On the other hand, https://bugzilla.mozilla.org/show_bug.cgi?id=1344561 (to allow ServiceWorkers inside Firefox WebExtensions, as in Chrome) has been reopened : maybe some hope?

@mossroy mossroy closed this as completed Apr 29, 2018
@Jaifroid Jaifroid removed this from the v3.3 milestone Dec 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants