Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Design Doc: Javascript rewriting ideas

Jeff Kaufman edited this page Jan 5, 2017 · 1 revision

Javascript rewriting ideas

Jan Maessen, January 2012

We intend to parse scripts both inline and included and rewrite them in various ways and/or enable manipulation of html and css based on what we learn from the javascript (for example, we might learn that certain DOM elements are not subject to rewriting, or that we only perform write(...) operations to insert fresh scripts). Naturally, our ability to manipulate scripts will be constrained by the fact that we cannot change page behavior, and that any program analysis is necessarily limited and must be conservative.

Here are some operations we currently perform:

  • Minification. We do this without a full parser. With in-place optimization we can pretty much turn it on all the time, though we're not quite set up to do that yet.
  • Inlining and outlining: move scripts on and off the page. Inlining speeds up fetches, which is useful if those fetches would be blocking page load/render anyway. Outlining improves cacheability if scripts are shared among multiple pages or a page is reloaded often and script content changes more slowly than page content.
  • js combining: In its simplest form, consolidate adjacent scripts, right now always externally. We do this by stringifying the original scripts, then loading the strings and eval-ing in page context. This deals with:
  • Duplicated declarations (would get bound in different order).
  • Exceptions within combined blocks.
  • IE comments and alternate languages block combining.
  • Library recognition and redirection: When common js libraries are used unmodified, redirect requests for local copies to a common CDN copy. This also increases the chance of useful caching due to sharing between sites. We do this for ajax.googleapis.com by default, but the set of hashes are configurable.
  • Defer Javascript: Don't load JavaScript code until OnLoad. This requires special treatment of calls like document.write so that the apparent page context works correctly. It's pretty much impossible to get this to work 100% of the time, but it speeds page load a lot on the large proportion of sites where it does work.

Here are some operations we should consider performing:

  • Error catching and recovery. When a page contains JS errors, reload + back off optimizations automatically. We talked about this a lot, but with the property cache in place it might now be feasible. Best: handle by browser class (with say a partial order: all browsers, webkit / IE / gecko / other, under webkit Chrome / Safari / Android / other, individual browser versions under those. Can we do lattice fetches from property cache though? Just a single browser / property pair.
  • Obfuscation. Mostly local var / argument renaming; we could be more aggressive and rename top-level entities, but it's probably not worth the bother and requires understanding dynamic code loading.
  • Idiom recognition and replacement. Recognize common JQuery etc. idioms and replace them with faster plain JS. Lazyload JQuery or eliminate it if we can get rid of all of them. Should we bother doing this with Defer JS on?
  • Tag scripts for deferred loading. Less safe but easier than defer, with lower overhead on supported browsers?
  • Early fetch for deferred JS to make sure fetching paths stay busy. Consider something akin to defer JS + combine JS, where we pull down all the data we need early in the page and then execute it at onload.
  • Intercept DOM changes and lazily load previously-unreferenced CSS. Not worth it? Just load remaining CSS before executing deferred JavaScript?
  • Partial evaluation up to and including OnLoad a la Fast Previews. Hard to do compatibly, but big wins? May be a big advantage over defer JS.
  • Definition sinking: Some .js consists primarily of a series of definitions (usually function definitions). It should be easy to perform at least a limited dependency analysis for definition-only code, and then push the definitions as far down the page as possible. We can do this either with whole js files, or with interdependent regions. The latter may provide big benefits if a page includes a large js library, then calls a few of its functions in blocking contexts within the page, but most of the library code can be loaded later. One concern here is if a library consists of stubs that are loaded lazily, we must ensure that in the process of re-ordering the code we can't accidentally clobber previously-loaded definitions
  • Lazy loading: some code might be transformed so that it is loaded lazily on demand. Not clear if this is a win after OnLoad, so defer JS may be good enough. iframe insertion: If we can delimit the portion of a page that a script operates upon, we may be able to insert an iframe into the page and make that portion of the page load and render asynchronously, speeding up overall load time.
  • Constant propagation & Dead code removal: If we can determine some definitions are never updated, we can propagate that information to points of use. If some definitions are never referenced, we can eliminate code. Dynamic code loading may make this difficult in general, and may require us to analyze strings that are passed to eval or inserted into the document DOM. Some constant propagation and/or type propagation will likely be necessary to do anything but the simplest analysis.

Challenges:

  • Dynamic code evaluation using eval or insertion of script elements into the DOM. We may be able to analyze static strings and obtain some insight here, but it's officially a Hard Problem. To make matters worse, it's a common technique for dynamically loading code, and Souder's second book recommends it. We're doing it in ads starting this quarter, so we can count on it to be a common case.
  • Access of properties by computed strings (e.g. what does window[string1]string2 call?)
  • Ordering constraints for dynamic code loading. If a page references a js library, and that library loads code lazily, we must be careful about reordering definitions: if we load part of the library early, that might cause definitions that we otherwise would have deferred to the end of page load to be created. We'll then clobber them at the end of the page.
  • The prototype-based object model, which means we can't tell what method calls refer to when we're passed some data.
Clone this wiki locally