Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Make available offline using Service Worker #571

Closed
wants to merge 1 commit into from
Closed

WIP: Make available offline using Service Worker #571

wants to merge 1 commit into from

Conversation

sorin-davidoi
Copy link
Contributor

@sorin-davidoi sorin-davidoi commented Jan 22, 2018

This PR introduces a Service Worker in order to dynamically cache the assets (only what the page requests, so respecting all the flags) and statically precache the chapters. The end result is that the resulting book would be available offline in supported browsers - Chrome (desktop and Android), Opera and Firefox (desktop and Android), for now, Edge and Safari later. As a bonus, if the book is used while offline, Google Analytics events are also cached and sent out when the user is back online.

Things intentionally left unaddressed (waiting for feedback):

  • user notification/configuration - I personally think caching should be done in the background, with a small toast notification informing the user that the book is available offline.

A small caveat is that the book is not available offline after the first-page load, but only after the second one. This is because on first load we precache the chapters, but the assets will be cached only when they are requested the second time.

Alongside supporting offline usage, this will also introduce a massive performance boost on second visits, since there is no need to hit the network for any of the assets.

The Service Worker is not installed during development (same procedure as for Google Analytics).

If you want to test this:

  • Add offline-support = true to the [output.html] section
  • Disable the check for localhost at the end of book.js

Closes #546.

@sorin-davidoi
Copy link
Contributor Author

sorin-davidoi commented Jan 23, 2018

This also opens up the possibility of implementing search in a different way than #472 does it. Since all the pages are in the cache, we can implement it client-side only, though we would have to see what impact that would have on the memory and CPU usage.

The rough idea would be to run the following steps lazily (e.g. when the search field is focused, after the chapters are precached, inside a requestIdleCallback call):

  • Fetch chapters from the Service Worker cache
  • (Re)Generate the search index using Elisticlunr inside a Web Worker (off the main thread)
  • Cache the search index in localStorage or some other place

@sorin-davidoi
Copy link
Contributor Author

sorin-davidoi commented Jan 23, 2018

I've looked into how much space will the chapters occupy in cache:

  • mdBook: 329 KiB
  • The Rust Programming Language, 1st edition: 2.6 MiB
  • The Rust Programming Language, 2nd edition: 4.6 MiB

@projektir
Copy link
Contributor

This is great!

The only thing I've found is the clipboard seems broken in offline mode.

@projektir
Copy link
Contributor

Oh, another thing I am seeing is on this branch, every time I click on a sidebar chapter, the sidebar closes.

@Michael-F-Bryan
Copy link
Contributor

Alongside supporting offline usage, this will also introduce a massive performance boost on second visits, since there is no need to hit the network for any of the assets.

I have a couple questions regarding caching and how long the service worker persists for:

  • How do we figure out when to invalidate the cache? For example, say I visited The Book last week and they've made an update since then, will I see the changed content?
  • Does the service worker/cache persist across browser sessions?

@sorin-davidoi
Copy link
Contributor Author

How do we figure out when to invalidate the cache? For example, say I visited The Book last week and they've made an update since then, will I see the changed content?

When you visit again, you will receive a new sw.js which will re-fetch the chapters that have changed. But the first page you see will be from cache, only subsequent visits (when you navigate to a new page) will serve the new content. The common pattern for this is to show a small toast notification (e.g. "Rust by example has been updated! Refresh to see the new version."), like https://www.chromestatus.com/ is doing.

Does the service worker/cache persist across browser sessions?

Yes.

I've just published this over at https://sorin-davidoi.github.io/mdBook/ for easier testing. Load the page, refresh and then go offline (Chrome Dev Tools -> Network tab -> Offline). You should be able to see all the content.

@Michael-F-Bryan
Copy link
Contributor

I just had a look at Chrome's audit tool and it mentioned something about the "service worker does not serve the manifest's start_url". Other than that it seems to work quite well.

screenshot_2018-01-25_012525


Oh, another thing I am seeing is on this branch, every time I click on a sidebar chapter, the sidebar closes.

@sorin-davidoi is @projektir's problem something to do with local storage not persisting the sidebar's state properly?

@sorin-davidoi
Copy link
Contributor Author

I just had a look at Chrome's audit tool and it mentioned something about the "service worker does not serve the manifest's start_url"

Yes, we should also have a manifest.json. I want to add it, but not in this PR.

Oh, another thing I am seeing is on this branch, every time I click on a sidebar chapter, the sidebar closes.

Not sure, since I can't replicate it. Maybe his browser window was not wide enough?

@sorin-davidoi
Copy link
Contributor Author

sorin-davidoi commented Jan 24, 2018

@Michael-F-Bryan I'm trying to refactor this to solve the following:

A small caveat is that the book is not available offline after the first-page load, but only after the second one. This is because on first load we precache the chapters, but the assets will be cached only when they are requested the second time.

This would mean that we will have a guarantee - if the Service Worker is installed, everything is cached, and we can display that to the user (something we can't do right now).

This requires that we need to insert the assets into the Service Worker the same way we insert the chapters (with revision information). My idea is to store all the assets in Rust and inject them in the page like we do with additional_css and additional_js:

const MATHJAX_URL: &'static str = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
const CLIPBOARD_URL: &'static str = "https://cdn.jsdelivr.net/clipboard.js/1.6.1/clipboard.min.js";
const GOOGLE_ANALYTICS_URL: &'static str = "https://www.google-analytics.com/analytics.js";
const INITIAL_CSS:[&'static str; 7] = [
    "book.css",
    "https://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,600italic,700italic,800italic,400,300,600,700,800",
    "https://fonts.googleapis.com/css?family=Source+Code+Pro:500",
    "https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css",
    "highlight.css",
    "tomorrow-night.css",
    "ayu-highlight.css",
];
const INITIAL_JS:[&'static str; 2] = [
    "highlight.js",
    "book.js",
];
const PLAYPEN_EDITOR_JS:[&'static str; 5] = [
    "editor.js",
    "ace.js",
    "mode-rust.js",
    "theme-dawn.js",
    "theme-tomorrow_night.js",
];

This way, we can compute their revision (checksum) when building the book. However, we can't compute the revision information of remote assets (we don't need it for files which are versioned, like FontAwesome, but we need it for fonts and the Google Analytics script).

What is your opinion on not relying on CDNs and serving the assets locally? Looking at #511 it seems like the main reason for using them is caching, which would be resolved with the Service Worker.

@Michael-F-Bryan
Copy link
Contributor

What is your opinion on not relying on CDNs and serving the assets locally?

Yes please!

A lot of people have asked about serving all assets locally and making it possible to view when not connected to the internet. Making the entire thing self-contained is also pretty important for people who want to package mdbook or anything that depends on it (like Rust). In particular the Debian guys had a lot of trouble because there's no way to verify the javaScript we use, and it's frowned upon for applications to reach out to the internet without the user expressly asking it to (e.g. serving locally but still fetching stuff from a CDN).

I really want to go through the HTML renderer and rewrite large chunks of the internals. Over time that section of the codebase has "grown organically" as new features get added and different people work on it, and the technical debt is starting to make things brittle and hard to maintain. A better dependency and asset management system will probably just "fall out" when we go through and restructure everything.

@sorin-davidoi sorin-davidoi changed the title Make available offline using Service Worker WIP: Make available offline using Service Worker Jan 25, 2018
@sorin-davidoi
Copy link
Contributor Author

sorin-davidoi commented Jan 25, 2018

@Michael-F-Bryan Perfect, what about I split this PR into three?

  • A pull request to remove the CDNs
  • A pull request to move the asset paths in Rust (as in the code sample above)
  • This PR, which should be much cleaner and easier to review once the previous two are merged and this gets refactored and rebased

@Michael-F-Bryan
Copy link
Contributor

A pull request to move the asset paths in Rust (as in the code sample above)

How were you planning to move the assets? As well as the standard use case when someone doesn't have access to the internet, you may want to skim through #46 and #271 to give you a better idea for what Debian (and other organisations with similar requirements) are looking for.

In particular, @tjis mentioned:

Debian packages aren't supposed to make network connections without the user's consent. Also, packages aren't supposed to contain embedded resources that could be provided by other packages instead. Therefore, both the CDN and the fallback are problematic for Debian.

We are running into this issue while packaging rust, which uses mdBook to build the various rust books. In order to make the generated books comply with debian policy, we strip the embedded resources from mdBook and alter the templates so that the generated documentation complies.

But this is far from ideal. Not only might future mdBook changes break our present patch, it also means we have to ship a patched copy of mdBook with the rust source package, rather than using a pristine packaged version of mdBook and build-depend on that.

@sorin-davidoi
Copy link
Contributor Author

So, our external dependencies are (as far as I can tell):

  • clipboard.js
  • MathJax
  • Google Analytics
  • Google Fonts
  • Font Awesome
  • highlight.js
  • Ace Editor

My first thought was to hard-code the paths, but it seems that reading them from the environment (falling back to the local assets) might be a better solution?

@Michael-F-Bryan
Copy link
Contributor

My first thought was to hard-code the paths, but it seems that reading them from the environment (falling back to the local assets) might be a better solution?

I had something similar in mind. Cargo lets a build script set environment variables during the compilation process, which you can pair with the env!() macro to pass an arbitrary string from the build script to the compiled crate. The idea is we'd use something like const HIGHLIGHT_JS: &[u8] = include_bytes!(env!("HIGHLIGHT_JS")); to embed a copy of highlight.js in the binary at compile time.

This approach lets the people packaging mdbook override the file to use while compiling and installing, for example using highlight.js from the debian repositories instead of our vendored version.

Ideally all dependencies should be embedded in the mdbook executable, that way we don't need to call out to a CDN.

@sorin-davidoi
Copy link
Contributor Author

Closing for now. Will probably open PRs with the issues mentioned above before taking another shot at this.

@jasonwilliams
Copy link
Member

My first thought was to hard-code the paths, but it seems that reading them from the environment (falling back to the local assets) might be a better solution?

I had something similar in mind. Cargo lets a build script set environment variables during the compilation process, which you can pair with the env!() macro to pass an arbitrary string from the build script to the compiled crate. The idea is we'd use something like const HIGHLIGHT_JS: &[u8] = include_bytes!(env!("HIGHLIGHT_JS")); to embed a copy of highlight.js in the binary at compile time.

This approach lets the people packaging mdbook override the file to use while compiling and installing, for example using highlight.js from the debian repositories instead of our vendored version.

Ideally all dependencies should be embedded in the mdbook executable, that way we don't need to call out to a CDN.

@Michael-F-Bryan Does this mean that the build script pulls all those external deps from the CDNs when building? Or do we manually fetch the assets and put them in some folder?
Would like to pick up this bit but unsure on the implementation

@jasonwilliams
Copy link
Member

A small caveat is that the book is not available offline after the first-page load, but only after the second one. This is because on first load we precache the chapters, but the assets will be cached only when they are requested the second time.

@sorin-davidoi is this because you need to actually hit the (chapter) page for those assets to load and go into the cache? So you won't actually cache a chapter until you visit it?

@sorin-davidoi
Copy link
Contributor Author

To be honest I don't remember 😞

@Michael-F-Bryan
Copy link
Contributor

Does this mean that the build script pulls all those external deps from the CDNs when building? Or do we manually fetch the assets and put them in some folder?

I believe the current state of affairs is that we've vendored a copy of the files in the mdbook repository itself. So it's not pulled from a CDN or npm when building the crate.

@sorin-davidoi is this because you need to actually hit the (chapter) page for those assets to load and go into the cache? So you won't actually cache a chapter until you visit it?

That sounds about right. It's been a while since the original PR though, so there's a good chance the story around service workers has changed or this could be implemented differently.

@peaceshi
Copy link

nice.
I did something like this by using CDN.
https://peaceshi.github.io/GameProgrammingPatterns/index.html

@jasonwilliams
Copy link
Member

jasonwilliams commented Feb 22, 2020

There’s still a couple of bugs with #1000 but I believe it’s getting there.

@jasonwilliams
Copy link
Member

Help Needed!

#1000 seems to be working pretty well from what i can see so far, I just need some feedback now. The PR allows the rust book to be read offline, new changes will still take affect.

You can just navigate to https://jason-williams.co.uk/book/ and try it out, then leave any feedback in the issue above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offline support
5 participants