Skip to content

oduwsdl/Reconstructive

Repository files navigation

Reconstructive

Traditionally, web archival replay systems rewrite link and resource references in HTML/CSS/JavaScript responses so that they resolve to their corresponding archival version. Failure to do so would result in a broken rendering of archived pages (composite mementos) as the embedded resource references might resolve to their live version or an invalid location. With the growing use of JavaScript in web applications, often resources are injected dynamically, hence rewriting such references is not possible from the server side. To mitigate this issue, some JavaScript is injected in the page that overrides the global namespace to modify the DOM and monitor every network activity. We proposed a ServiceWorker-based solution to this issue that requires no server-side rewriting, but catches every network request, even those that were initiated due to dynamic resource injection.

Reconstructive is a ServiceWorker module for client-side reconstruction of composite mementos by rerouting resource requests to corresponding archived copies. This is an implementation of a published research paper. This can be used in archival replay systems such as IPWB or in the UI of memento aggregators such as MemGator.

The following figure illustrates an example where an external image reference in an archived web page would have leaked to the live-web, but due to the presence of Reconstructive, it was successfully rerouted to the corresponding archived copy instead.

Reconstructive Example

Read our introductory blog post Introducing Reconstructive - An Archival Replay ServiceWorker Module for more details.

Getting Started

Assuming that your ServiceWorker script (e.g., serviceworker.js) is already registered, add the following lines in that script.

importScripts('https://oduwsdl.github.io/Reconstructive/reconstructive.js');
const rc = new Reconstructive();
self.addEventListener('fetch', rc.reroute);

This will start monitoring every request originated from its scope and reroute them to their appropriate mementos at /memento/<datetime>/<urir> as necessary. However, the default rerouting might not work for every archival replay system. So, Reconstructive allows customization to fit to different needs.

Configuration and Customization

When the script is imported, it provides a class named Reconstructive. An instance from this class can be created with various configuration options. The instance has following public members:

  • exclusions - Object of rerouting exclusion functions.
  • reroute - Callback function to be bound on fetch event.
  • rewrite - Function to rewrite response to fix any replay issues and add an archival banner.
  • createBanner - Function to return the banner markup to the rewrite function.

Update Configurations

The constructor method of the Reconstructive class accepts an object that allows overwriting default configuration options and adding new members as necessary. Following are the default options:

{
  id: `${NAME}:${VERSION}`,
  urimPattern: `${self.location.origin}/memento/<datetime>/<urir>`,
  bannerElementLocation: `${self.location.origin}/reconstructive-banner.js`,
  bannerLogoLocation: '',
  bannerLogoHref: '/',
  showBanner: false,
  debug: false
}

To instantiate an object rc with custom configurations, initialize as following:

const rc = new Reconstructive({
  urimPattern: `${self.location.origin}/archived/<datetime>/<urir>`,
  bannerElementLocation: 'https://oduwsdl.github.io/Reconstructive/reconstructive-banner.js',
  bannerLogoLocation: 'https://oduwsdl.github.io/Reconstructive/resources/reconstructive-logo.svg',
  bannerLogoHref: `${self.location.origin}`,
  showBanner: true,
  debug: true,
  customColor: '#0C383B'
});

We have updated four existing options and added a new one, customColor, which we can use later in our custom logic.

Adding Exclusions

The exclusions property of the class is an object of functions. Each member of this object checks for certain criteria and returns a boolean to express whether or not the fetch event should be excluded from being rerouted. Following is the default exclusions object.

{
  notGet: event => event.request.method !== 'GET',
  bannerElement: event => this.showBanner && event.request.url.endsWith(this.bannerElementLocation),
  bannerLogo: event => this.showBanner && this.bannerLogoLocation && event.request.url.endsWith(this.bannerLogoLocation),
  localResource: event => !(this._regexps.urimPattern.test(event.request.url) || this._regexps.urimPattern.test(event.request.referrer))
}

Add more members to the object to add more exclusions or modify/delete existing ones.

rc.exclusions.analytics = event => event.request.url.endsWith('custom-analytics.js');

We have added a new exclusion named analytics which will return true if the requested URL ends with custom-analytics.js. This exclusion will ensure that the request will not be routed to an archived version of the file. In a practical application such exclusion rules should be kept very tight to avoid any false positives.

Custom Rerouting

Reconstructive does not register itself as a ServiceWorker, instead it is added as a module to an existing ServiceWorker for archival replay rerouting logic. Hence, it is possible to have some custom ServiceWorker logic in place while selectively calling reroute() function on some requests.

self.addEventListener('fetch', function(event) {
  if (event.requests.url.startsWith('https://example.com/api/')) {
    event.respondWith(fetch(event.request, {
      mode: 'cors'
    }));
  } else {
    rc.reroute(event);
  }
});

Custom Rewriting

Reconstructive has a rewrite() method that tries to make necessary changes in the HTML pages to fix some common replay issues and changes hyperlinks to their archival context. However, there might be times when you need some custom rewriting logic in your archival replay system. To accomplish this either override the rewrite() method of the instance or extend the Reconstructive class with an updated rewrite() method. The method is called with original response and event objects and returns a Promise that resolves to a Response object. We are illustrating the first approach below.

const customRewrite = (response, event) => {
  let customResponse = new Response();
  // Do something with the original response to create a custom response.
  return customResponse;
};
rc.rewrite = customRewrite;

Note: When overriding a method of a class instance the context of this inside the custom function could be different (use the instance name e.g., rc in place of this instead).

Custom Banner

Reconstructive has a createBanner() method that creates a banner markup using Web Components. This markup is then injected into navigational HTML pages by the rewrite() method if the showBanner configuration option is set to true. However, the default banner might not be suitable for every archival replay system. This can be updated by overriding the createBanner() method the same way as described above for the rewrite() method. Note that the banner is included by the built-in rewrite() method, which if overriden, may not include the banner unless createBanner() is called by the customRewrite() too.

const customCreateBanner = (response, event) => {
  return `<custom-replay-banner background="${rc.customColor}"></custom-replay-banner>`;
};
rc.createBanner = customCreateBanner;

As an aside, we used rc.customColor here that was an additional configuration option we supplied at the instance initialization.

How it Works?

In order to reroute requests to the URI of a potential archived copy (also known as Memento URI or URI-M) Reconstructive needs the request URL and the referrer URL, of which the latter must be a URI-M. It extracts the datetime and the original URI (or URI-R) of the referrer then combines them with the request URL as necessary to construct a potential URI-M for the request to be rerouted to. If the request URL is already a URI-M, it simply adds a custom request header X-ServiceWorker and fetches the response from the server. When necessary, the response is rewritten on the client-side to fix some quirks to make sure that the replay works as expected or to optionally add an archival banner. The following flowchart diagram shows what happens in every request/response cycle of a fetch event in Reconstructive.

Reconstructive Flowchart

Citing Project

A publication related to this project appeared in the proceedings of JCDL 2017 (Read the PDF). Please cite it as below:

Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson. Client-side Reconstruction of Composite Mementos Using ServiceWorker. In Proceedings of the 17th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL 2017, pp. 237-240, Toronto, Ontario, Canada, June 2017.

@inproceedings{jcdl-2017-alam-reconstructive,
  author    = {Sawood Alam and
               Mat Kelly and
               Michele Weigle and
               Michael L. Nelson},
  title     = {{Client-side Reconstruction of Composite Mementos Using ServiceWorker}},
  booktitle = {Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital Libraries},
  series    = {JCDL '17},
  year      = {2017},
  month     = {jun},
  location  = {Toronto, Ontario, Canada},
  pages     = {237--240},
  numpages  = {4},
  url       = {https://doi.org/10.1109/JCDL.2017.7991579},
  doi       = {10.1109/JCDL.2017.7991579},
  isbn      = {978-1-5386-3861-3},
  publisher = {ACM},
  address   = {New York, NY, USA}
}

Releases

No releases published

Packages

No packages published

Languages