[m.imgur.com] page load performance #1

Open
samccone opened this Issue Mar 8, 2016 · 16 comments

Projects

None yet
@samccone
samccone commented Mar 8, 2016

🏇 m.imgur.com page load perf 🏇


**TL;**DR

While server-side rendering is critical to get a fast first meaningful paint on
single page apps, JS can still undermine the user experience advantages of
server-side rendering.

By disabling javascript on m.imgur.com, the time from navigation→ the paint of
the primary image is decreased by more than 50% from around 7 seconds to 3
seconds (on a nexus 5 - 3g connection).


What:

m.imgur.com recently rolled out a new version of the site moving from a backbone
app to a serverside rendered React app.

Per usual I was looking at the latest Nicolas Cage gifs when I noticed that the
page seemed to feel sluggish when first loaded, preventing me from being able to
scroll down to see the comments or even see the complete image for almost 2
seconds. Now I don't know about you but 2 seconds between me and Cage is simply
unacceptable.


Looking at a timeline recording of loading an
image on imgur on my phone (Nexus 5) the problem is made quite clear.

image

Because the page is rendered serverside, the image starts to download right away, and gets progressively rendered and displayed to the user. However, the image appears to stop loading at around 3 seconds, which directly lines up with when a whole bunch of JavaScript starts to execute.

image

The fun thing about all of this JavaScript executing is that it is locking up
the main thread in the browser, thus preventing the image from being painted to
the screen until the JS work is all done. As an experiment I loaded the page
with JavaScript disabled just to see the performance difference and was not
surprised to see the page take well under ½ as long to finish displaying the
complete image with 0 user facing jank.

The Why:

When we look at the first long frame and sort the javascript activity by bottom
up we see that the browser is spending almost 40% of the time in two methods
called o and then two anonymous functions…
image

Looking close at the logic in those methods we see a familiar face:

image

It is the browserify module loading logic, this is the exact same problem that we ran into on
tumblr.com.

The next most relevant bit of logic is all contained in the React mount logic;
there may be some slight optimizations to be had here, but it is far away from
being the biggest offender.

image

The "Fix":

The interesting story around imgur is how by moving to a server side rendered
site, you would think that the performance gains would be immense. However, the
time spent in the javascript bootup phase is causing a negative user experience,
due to the JS execution blocking the main thread, and thus blocking image
rendering. This problem, while almost undetectable on a macbook pro or modern
computer, is painfully obvious when you test the site on a mobile phone on a
non-wifi internet connection. There are several ways to fix this issue, all of
which do not require an overwhelming amount of work.

  1. Defer the JavaScript execution until the image onload event
  2. Use a tool like rollup or Nolan Lawson's new rollupify
    https://github.com/nolanlawson/rollupify/
    to drop the overhead (40% of the JS time) of browserify
  3. Break up any block of script that takes over 100ms. Potentially use a
    scheduling API like
    requestIdleCallback.
    https://twitter.com/igrigorik/status/706890158323269633
@paulirish
Contributor

For me, the illuminating item was this:

main thread availability required for painting updated image data

imgur has an <img src="…"> right in the HTML. This allows the network request to kick off right at the start. (That's awesome and a huge improvement over the m.imgur of two years ago).

Images resources are big and can easily be over 50k, thus taking some time to finish the download.

But it turns out that the main thread must be available in order to paint any new data from an image. And since the main thread is completely swamped with a big block of javascript, it can't update the screen. I saw one example where the image download finished just 5ms after a 3000ms long script block started.. Had the network request finished just a tad earlier, the page would have visually loaded a whole 3 seconds earlier.

@laposheureux

Appreciate the audit - we're still iterating and expanding the rollout and addressing performance as we can (tuning post-load performance for slower devices and server rendering behavior have admittedly been higher on the priority list so far). #3 seems like a farther off possibility, but 1 and 2 seem pretty reasonable.

PS: Tony says hi.

@Timgur
Timgur commented Mar 8, 2016

Hi! Thanks Sam for this audit, it is always great to see how we can improve.

@jhnns
jhnns commented Mar 10, 2016

Now I don't know about you but 2 seconds between me and Cage is simply unacceptable.

Sounds like a reasonable motivation 👍

(Anyway, thanks for the insights)

@trodrigues

Use a tool like rollup or Nolan Lawson's new rollupify https://github.com/nolanlawson/rollupify/ to drop the overhead (40% of the JS time) of browserify

@samccone (or anyone else): what's the current best practice in this case for webpack users? Would running Clojure Compiler on the code generated by webpack yield any improvements? I recently did a small experiment with this which did make the file smaller but I didn't evaluate whether this module loading logic would be simplified.

@jhnns
jhnns commented Mar 10, 2016

Use a tool like rollup or Nolan Lawson's new rollupify

I don't think that rollup will speed up the execution dramatically. It reduces the ungzipped file size, yes, but all the code still needs to be executed. There are just less function calls for the module initialization but I'd bet that this is not the main part.

what's the current best practice in this case for webpack users?

I think, it's best to defer module execution as much as possible. So, inside your entry files you should not just require() or import() all modules immediately. It's better to use asynchronous module loading, like require.ensure() or System.import() as soon as the modules are actually required (or inside requestIdleCallback).

However, I don't think that this is specific to webpack. It applies to all frontend JavaScript developers that have to deal with a large codebase.

Another possibility is to move as much stuff as possible into a separate web- or service-worker.

@nolanlawson

@jhnns From the screenshots it looks like Browserify's module loader is taking up a non-trivial amount of self time (~10-13%), so a tool like Rollup should indeed be able to eliminate that (since there's no loader; all deps are just initialized in the same scope).

That said, I think you're right that @samccone's 40% figure is exaggerated, since it seems based on the total time instead of the self time. And I agree with all your other points.

@samccone

There are just less function calls for the module initialization but I'd bet that this is not the main part.

^ in fact the function call overhead is significant and by using a tool like rollup it should significantly drop the module wrapping overhead cost. The only reason I feel confident saying yhis is because by looking at the timeline and profile we can slice up exactly where the time was being spent.

(refer to https://docs.google.com/document/d/1E2w0UQ4RhId5cMYsDcdcNwsgL0gP_S6SDv27yi1mCEY/edit for a more detailed look at the module overhead)

As far as self time vs total time in this example, I used total because when I walked the down the "boot" tree no one was doing any work, it was simply the passing of the require vars into new closures, again I would refer to the above doc for a closer look but this module boot time overhead seems to be endemic to a majority of web apps out there.

Refer to @cramforce 's win that they got by switching to a tool that did topological sorting of dependencies:
https://twitter.com/cramforce/status/690640077865619456

If you would like to compare the raw output btw these tools checkout
https://github.com/samccone/The-cost-of-transpiling-es2015-in-2016

tl;dr The module overhead is significant

@jhnns
jhnns commented Mar 10, 2016

@samccone thanks for the additional information. Really interesting! 👍

I know that browserify can be optimized by removing the path resolution stuff.

Does anyone have more experience with rollup? Is it "valid" to put all stuff into the same scope or does it introduce additional problems? Personally, I would have a bad feeling to change the production code this way, but it might be a valid approach, just as minifying it. And what are the improvements in execution time in real-world examples? Is it worth it?

@robcolburn

There is ability for WebPack / Babel to perform optimizations, such as dead code elimination (aka Tree Shaking) or dropping strict compliance, though currently the setup is complex, and documentation gets scattered.

@nolanlawson

I know that browserify can be optimized by removing the path resolution stuff.

Is there a plugin or something? Or are you just talking about bundle-collapser, which it appears Imgur is already using?

Does anyone have more experience with rollup? Is it valid...

ES6 module semantics mean it's perfectly valid to hoist everything into a single scope, and yes it can have big savings. I've written about this here and here. I've also gotten some good feedback about rollupify here. I've also used it on a small internal project and saw a more modest gain (<5% range).

@matita matita added a commit to matita/min-gh-jekyll that referenced this issue Mar 11, 2016
@matita matita Saved from perfs/audits#1 00fd9e6
@jhnns
jhnns commented Mar 11, 2016

Or are you just talking about bundle-collapser, which it appears Imgur is already using?

Yes, I'm talking about bundle-collapser. It makes sense to turn paths into IDs because this can be done at compile time.

@jhnns
jhnns commented Mar 11, 2016

@nolanlawson thanks for linking these interesting articles. I mostly agree with all your points 👍

@cramforce

Closure Compiler can also do scope collapsing for CommonJS modules. There may be some edge cases, but in general this works really well. Configuration is pretty tough, though.

Here is a sample PR that turned this on for my project AMP: https://github.com/ampproject/amphtml/pull/1218/files
This is switching from Babel/Browserify to Closure Compiler for production codegen. We continue to use Browserify for development and have no plans in changing that.

@derekkraan

Just read this, wondering if these ideas could have any impact:

  1. Give up the main thread at certain points in the code (for example: where browserify loads a module). Could do this with setTimeout. Then the browser would have more opportunities to write images etc to the page.
  2. Initialize the JS in an onload on the image itself. Most important content gets loaded and displayed to the user, then you init JS and enable interaction with the site.
@BTMPL
BTMPL commented Apr 7, 2016

@derekkraan your second idea is exactly what @samccone suggested as a fix (his #1 suggested solution). Using setTimeout to halt the JS execution can be done, but you would have to know exactly when and for how long to call it, and on an mobile connection that would be tricky, as it's generally of a lower quality and more prone to sudden drops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment