-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In jQuery mode, load the images after CSS (and javascript?) #149
Comments
Hi, I'm the person porting this excellent HTML5 app to Windows 10 UWP, targeting Windows Mobile (another minority phone OS, like Firefox OS, so I sympathize....). I am also noticing the issue that with some Wikipedia pages, from a range of ZIM files, the CSS never gets injected due to (I think) a timeout. While it's still possible to read the full text of the page in the browser (and see all images), when this happens, the app is left in an unstable state. It is possible to search for new pages to load, but attempting to load the new page doesn't actually work, and dev tools show it's not just a display issue -- the html of the new page simply fails to load over A.html. Possibly an asynchronous call is still hanging around waiting for the CSS from the previous page, but that is just a guess. Only solution when this happens is to exit the app and relaunch. It would be interesting to see whether loading the CSS first, before the images, solves the issue. Additional info: this happens on Windows Mobile, but I have not seen it happen on the non-mobile UWP version of the app. Same codebase, same underlying browser engine (Edge). Yet it doesn't seem to be a memory problem: I've seen heavy pages load the CSS quickly, and I've seen relatively light, text-only pages fail to load the CSS., but I haven't found the cause. As it happened "reliably" with some pages, I'll try to debug this. |
OK, so the issue I outlined here is nothing to do with CSS, it's actually a low memory bug. See #253 (EMSCRIPTEN buffer overrun in xzdec.js). [I did experiment with trying to load CSS before images in the code, but changing the order in the jQuery mode functions does nothing. A more convoluted method like that mentioned in OP is needed.] |
Preloads the stylesheets, creates BLOB URLs for them and inserts them into the raw HTML before the html is injected into the iframe.
@sharun-s , @mossroy , sorry about the hiatus, but I've now finished the code for preloading the stylesheets, and I've submitted it as a branch: https://github.com/kiwix/kiwix-js/tree/Preload-stylesheets . What it does is to wait for the stylesheet promises to resolve (via callback function) and then create an array of BLOB URLs for them. These are then spliced into the raw HTML before the html is injected into the iframe. The good news is that we cut down on the redraws, as the stylesheet is loaded at the same time as the HTML is loaded into the iframe, and so picture placeholders are all in the right place already just waiting to be filled. Also, this is done deliberately without jQuery for speed. Regexes are my "speciality"... The not so good news is that the bottleneck is now proven to be the amount of time it takes to retrieve stylesheets from the ZIM file via the readBinaryFile subroutine. These seem to take a lot longer than images to retrieve, and I'm suspecting that it is down to their location in the ZIM file (after all, the common stylesheets are likely to be stored only once somewhere in the ZIM file, and not necessarily close, in physical terms, to the article). I think if @sharun-s 's fast image-loading routines could be "generalized" so that they load any file at all, not just images, we might be able to speed up this bottleneck. In the meantime, I'd be really grateful if you could test (and review) my code, which I've tried to document where it isn't self-explanatory. It's only been lightly tested on a few ZIM articles in Firefox, Edge and Internet Explorer, but I've not noticed any bugs yet. |
Nice! I took a quick look. I like all the jquery removed.
Are you seeing the same kind of delay across all browsers? Or it varies? And are you testing on win 10 mobile? |
Thanks for the comments, @sharun-s .
Yes, it uses a positive lookahead to check that the
A lookahead merely asserts that the regex in parentheses can be matched at some point ahead in the string, so allows searches outside of the sequential matching order.
I've tested on Edge Mobile, and it works but yes the delay is quite noticeable, and there are memory issues generally with this app on large ZIM files with lots of images, which I still need to work on (it's not specific to this code). On Edge PC the delay is acceptable but noticeable. On Firefox, which is fast generally with this app, the delay is very acceptable and hardly noticeable. I haven't tested on Chrome. |
Interesting didn't realize how the lookahead worked. Good to know! Also good to know about the delays on Edge. How big is your ZIM file? And is it split? |
For Edge Mobile, apart from Ray Charles, I've tested only on unsplit ZIM file of 14.7Gb (Spanish Wikipedia with images) and a no-image version of the English Wikipedia which is 18.7Gb. The no-image one loads significantly faster than the image one, despite larger file size. It's the searching through the file multiple times that demands more and more memory. |
I've experimentally added some local caching of common Wikipedia CSS files in the filesystem. I discovered that these files exist in both the English language and the Spanish language Wikipedia ZIM files with images, and it seems pointless to extract them from the ZIM at great cost in terms of time and memory usage, especially on mobile, when it is just a question of four or five files. See what you think. NB I know it's not a "universal" or elegant solution, but it sure speeds up the most common use case for kiwix-js, which is displaying articles from Wikipedia... We could quite easily add a number of other use cases, such as wikitravel or Wiktionary, if we support those ZIM files. Of course, we'd need to keep the stylesheets up to date periodically.... NB the stylesheets are stored in "../-/s/*" relative to the A directory, which is the same URL as that stored in the ZIM file. This will make it easy to add further files unless there turn out to be conflicts (same file URL with different content across different Wikis). @mossroy, if you don't want to adopt such a solution in kiwix-js, which I'd understand, I would nevertheless implement it in kiwix-js-windows, because on initial testing it significantly reduces the memory footprint for articles from large Wikipedia ZIM files where the CSS seems to be cached at the other end of the file from the article in question. Of course, if we can come up with a better solution through enhanced caching while keeping the memory footprint low, I'd be all for that. Clearly, this solution needs to be tested against a range of different language ZIM files. PS Direct link to the commit that has these changes: eca226c |
@Jaifroid the issue will be switching back and forth, involving commenting/uncommenting/changing code each time the need arises. Which could turn into a headache. One option is to add a simple switch (maybe customCSS=true|false) in init.js so that the app and required modules on first load, either use "default" css processing or "custom" css processing. Lots of stuff can be configured at load time this way. Eventually all these switches/settings can be exposed to the user via the config page. |
Yes, I'd be happy to make it user configurable, especially as the same logic can be used for #265 (Provide an optional dark theme both for interface and Wiki articles) and for #34 (On mobile, adapt the layout of articles like on mobile version of online Wikipedia). @sharun-s , is it likely that your web worker code, if applied to CSS retrieval as well as to images, would speed up retrieval enough to make this kind of filesystem caching unnecessary? At the moment I'm viewing this just as an experimental submission for testing the speed differential and memory footprint, particularly on mobile. |
@Jaifroid well the webworker is only going to retrieve the CSS dirent (not the blob) and from the timing data I have seen CSS dirent retrieval doesn't take any real time to matter much whether it happens on the worker or the main thread.. |
Now implemented in: 4ff71cc . The available param values for @mossroy , this commit (4ff71cc) goes some way towards implementing #34 (adapt layout of articles like on mobile version of Wikipedia). It doesn't do it in a responsive way, but includes a param which can be modified to set the default source, one of which is "mobile". You can test this now by changing the default in init.js, but we should implement it as a user-selectable choice in the config page. PS A couple more styles probably need adding to implement the Wikipedia mobile style fully. At the moment it's just one mobile-specific stylesheet. |
@sharun-s , @mossroy , Commit 2a1ccc0 implements a user-selectable display mode (in the user interface). It allows the user to choose between locally cached stylesheets, the Wikipedia mobile display style, or using the styles embedded in the ZIM file. Please have a play in particular with the mobile display style. Here are the Configuration settings: And here is the mobile transformation applied to a Ray Charles page: I've tested in Firefox, Edge PC, Edge Mobile, Internet Explorer 11, and UWP App, against these ZIM files:
|
Nice work @Jaifroid Look forward to playing with it over the weekend when I have more time. |
@sharun-s , I have a Lumia 950XL for device testing, but I also test it using the Windows Mobile emulator in Visual Studio. Kiwix-js-windows is being developed for UWP on both Windows 10 Mobile (includes tablets) and as a Windows Store app. I found the stylesheets by capturing them as the app is accessing them and before they are turned into BLOBs (breaking programme flow at that point and examining the variables). It's empirical. There are probably a few more stylesheets, and I need to check on some other languages, though it's not necessary to have a complete set, since the app will access any that are not cached in the fs. For mobile, I've used the |
Got it. Was just curious about the process. |
@mossroy , @sharun-s , I'd be really grateful if you could try out the code in the Preload-stylesheets branch. It's quite revealing on the performance issues. I've done major work putting most of the transformations code into a separate module, The transformations between the desktop and the mobile stylesheets work both in the "standard" Wikipedia and Wikivoyage files (I've tested with Obviously this needs lots of testing, and I've done a lot of debugging already and re-organizing the code into a logical flow and tidying it into the |
@Jaifroid I'll have a look at it. |
Thanks for the report, @mossroy . Obviously I can't debug FFOS. Guesses: 1) It could be that this version uses the "oneTimeOnly" parameter to create the blobs, which FFOS might not like. I should revert that here, though will keep for kiwix-js-windows. 2) This version also uses blobs for stylesheets (I could easily revert that part to dumping the stylesheet data into the HTML but in principle blobs use less memory because they're destroyed after first use), but the same issue would affect CSS in that case (oneTimeOnly). |
@mossroy, @sharun-s : I've done some performance improvement tests based on the timings I've now put into the console log. These are on Edge and Firefox. The original spreadsheet is here. Below are screenshots. Percentage improvement from click on article directory entry to first paint are: 479% for Edge and 79.9% for Firefox. Not shown here, but also significant, is that the time to load images to completion is also much shorter in both browsers. This is confirming my theory about the problem... (will explain later, after I've done tests on UWP app). MS Edge: Firefox: |
@mossroy, @sharun-s: I have pasted the test on UWP mobiles in kiwix/kiwix-js-pwa#14 , but the headline gain is 2,288% reduction in time taken from receipt of raw HTML to first paint, a reduction from 31 seconds to 1.3 seconds! Please note that 31 seconds isn't typical: I have deliberately chosen an extremely heavy article, which is very long, has six stylesheets and dozens and dozens of images, some of them little tiny ones, others large, inside a 15.5Gb single ZIM file. This was always going to be pretty taxing for mobile. Note that time to retrieve images is dramatically reduced also (not measured here), which is counter-intuitive, but confirms some things that @sharun-s discovered with his code. Also the app is far more stable (reduced memory requirements). Why the huge improvement when CSS is just ASCII text and ought to be retrieved instantly? The problem is the asynchronous process! How so? Well, if you open a ZIM file (a small one) with 7-zip or somesuch, you can see that stylesheets are stored once at the front of the file, technically in the "-" namespace, presumably named such so that they would come before any of the other namespaces. Documentation is here: http://www.openzim.org/wiki/ZIM_file_format This is followed by the "A" namespace where all the HTML articles are. I am not sure whether images are all grouped together in the "I" namespace, or whether the namespace is just a directory device for finding binary pointers to image files grouped together with the articles that use them. Either way, clearly the stylesheets are separated by some distance in a large ZIM file from the images. Asynchronous javascript programming is meant to improve responsiveness by mimicking different threads, though in reality it just delays operations that will take a long time until "front-end" operations (visible to the user) have finished. However, what happens when you delay the extraction of stylesheets and images so they all try to happen after the first paint? You get manic to-ing and fro-ing across different parts (front and middle, from "-" namespace to "I" namespace, maybe), all competing with the browser trying to repaint as new CSS and images come in higgledy-piggledy. If xzec.js is being sent backwards and forwards across the file arbitrarily to get stylesheet A, then image Z, then pause to refresh the screen, then stylesheet B, then another repaint, then image Y, then repaint, then stylesheet C then image X, etc., you get a logjam. Separating out the extraction of stylesheets from the extraction of images and then, possibly, javascript, and not allowing one to proceed till the other is complete, unjams the logs to some extent. Logically, the best file-extraction performance would be achieved by going for one stylesheet at a time, synchronously, and ditching the promise structure for this part of the code (I haven't done that). Even so, xzdec has to jump over a large gap repeatedly to go from extracting the HTML to extracting the stylesheets to extracting the images + javascript. Caching the stylesheets cuts out a huge part of that gap jumping by xzdec, as their retrieval can be left to the browser engine on first paint. If this makes some kind of sense then the implication is that we should reorganize things like this:
@sharun-s 's idea of loading images above the fold first after first paint is excellent. Currently images seem to load from the bottom up, which is useless to humans who tend to read from top down. I hope this makes some kind of sense and apologies for length. |
"Uncontrolled" asynchronous task scheduling works well enough in a lot of cases. It's the cases where things break down that require different approaches. One approach is more control of the order. The other approach is more concurrency (i.e. workers). This is a deep rabbit hole but the more people understand it the better the solutions will be. So good to see you working through it. |
@Jaifroid : I've run some tests on your branch on a FirefoxOS device. (a little suspense...) It "simply" comes from how the src->data-kiwixsrc trick is done. If I remove the string below from the replace regexp, it works : Now I should be able to test your CSS changes on my slow and small FirefoxOS devices. |
@mossroy Thank you for testing, and I've now removed the image hiding code. I now think the reason FFOS rejected the onload statement is because there is a missing semicolon:
It's the |
I now see that you were suspecting it before I found out (17588ab#commitcomment-23145180). Sorry, I had missed that comment. |
I did some quick testing on small archives (wikivoyage_fr_all_2015-11.zim, and Ray Charles archive), comparing this branch (with CSS preload) and master, on 2 identical Firefox OS devices (ZIM files in internal memory to have the same access speed) Sorry I did not have the time to do more tests for now. It's more difficult to test with bigger archives, as I would have to put them on identical big µSD cards (they do not fit in internal memory), which I don't currently have. |
A few more tests : Now, if I use the latest commit 92a77b3, or af10e6b (and I disable strict mode), the images don't load any more on Firefox OS, the console stops at "Time to First Paint", where on desktop it carries on with "** About to request images # 1 to 10..." etc. I did not investigate why so far. In all cases, I still see CSS differences (even on desktop) between using cache and not using cache. For example, if I use wikivoyage_fr_all_2015-11.zim with Kiwix 2.1, it looks like this : |
Thank you very much, @mossroy . I was actually worried about functions being declared inside functions, but it seemed to work on all the platforms I tested, so didn't try to refactor the code to make sure all functions are declared at the top level or immediately inside another function (thought I'm not sure what "immediately" means in this context). I guess FFOS only supports a certain level of nesting or recursively nested functions in strict mode. I'll see if there's an obvious fix -- moving nested functions up a level, for example. Can I test FFOS using https://addons.mozilla.org/en-US/firefox/addon/firefox-os-simulator/ ? This is for FFOS 1.1. In the description it says "To test apps with Firefox OS 1.2+, please use WebIDE built into Firefox!" Do you know if the Web IDE in Firefox gives an accurate simulation of FFOS on a phone? Regarding stylesheet changes, I'll test against the version of wikivoyage fr that is currently on kiwix, though it is a later version (2017-03), and it looks as if wikivoyage changed quite substantially its stylesheets, just from looking at the screenshot you've posted. However, the code is supposed to fall back to getting the stylesheet from the ZIM if it can't find it, so it's probably the case that back in 2015 they were using the same stylesheet name for desktop (style.css) but with different selectors and css content. Will let you know if the issue persists with the latest ZIM. |
Briefly tested this change. Looks good. |
@Jaifroid , you indeed can use the Firefox OS simulator. The simulator uses the same web engine (and version) as on a real device. For example, I can reproduce the bugs I mentioned above in the Simulator. Note that you can debug the app too, with the same tools as in Firefox desktop. |
@mossroy I think I've squashed the FFOS "strict mode" bug with commit bd31cd7 . The problem was the If you try it, you will notice images are still squashed, but they now show and the check box for turning them off now works. I will work on unsquashing the images (I have an idea for an easy fix), but I wanted to let you know that the branch now appears to be working in FFOS. Will post again when image aspect ratio is cured, so you may want to wait till then. |
I confirm, it now works out-of-the-box with Firefox OS. |
A quick test with one device on master and another one on this branch shows a huge usability difference between them. |
That's good to know, thanks @mossroy . I haven't worked on CSS since you last tested -- that's next on my list. Will let you know when I've tested Wikivoyage France. Note that I've just committed a correction to the mime type declaration for one-by-one images (I had stupidly left it as "text/css" as a result of re-using stylesheets code). It doesn't make any visual difference, surprisingly. TODO: Fix image maps in Wikivoyage (showing as elongated); fix stylesheet transformations for Wikivoyage ZIM files; fix radio button selection on FFOS for switching between desktop / mobile CSS (currently stuck on desktop and can't be changed, FFOS issue only). |
@mossroy In the summer you gave me the instructions above (on this thread) and they worked like a charm. However, recent versions of Friefox seem to have disabled the FFOS Simulator. I have managed to re-enable the Legacy addon / extension (Firefox OS 2.2 Simulator), but it no longer shows up in the Web IDE, so no way to run the app. I'm using Firefox Developer Edition (because legacy addons cannot be enabled in the standard Firefox). Do you know how to get it working again? It was very useful for ensuring for debugging and flagging anything non-standard. |
It still works with Firefox 52 ESR (which still accepts legacy extensions, and will be supported until june 2018). You can get it from https://www.mozilla.org/en-US/firefox/organizations/ |
Brilliant, thank you! |
Image loading performance has been thoroughly addressed. |
So that the CSS stylesheet is applied as soon as possible.
Maybe with something like https://stackoverflow.com/questions/2358205/invoking-a-jquery-function-after-each-has-completed
The text was updated successfully, but these errors were encountered: