Experimental proxy and wrapper for safely embedding Web Archives (.warc.gz, .wacz) into web pages.
This particular implementation uses Netlify and its Edge Functions as its backbone.
See also: warc-embed (Self-hosted + NGINX version)
warc-embed-netlify serves an HTML document containing a pre-configured instance of <replay-web-page>, webrecorder's front-end archive playback system, pointing at a proxied version of the requested archive.
The playback will only start when said document is embedded in a cross-origin <iframe> for security reasons (XSS prevention in the context of an <iframe> needing both allow-script and allow-same-origin).
See details for the /embed route.
warc-embed-netlify pulls the requested archive file and adds the HTTP headers <replay-web-page> requires in order to download and interpret the file, such as access-control-allow-origin and content-type.
It also offers a very basic polyfill for range requests, required for playing back .wacz files, if the server hosting the archive file does not support this feature.
See details for the /archive.warc.gz route - for the /archive.wacz route.
<!-- On https://*.domain.ext: -->
<iframe
src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>The proxy will only pull archive files from hosts listed in allowlist.js.
Edit this file to determine which domains a specific instance of the proxy can pull files from.
This project hosts its own copy of replayweb.page.
You may update it to the latest version by running ./update-replay-web-page.sh and pushing changes.
At the time of writing this README, Netlify's free plan grants 3M Netlify Edge function hits per month and per account.
See Netlify's pricing.
Attaching a subdomain to this deployment:
See Netlify's documentation on domains management.
Serves an HTML document containing an instance of <replay-web-page>, pointing at a proxied archive file.
Must be embedded in a cross-origin <iframe>, preferably on the same parent domain to avoid thrid-party cookie limitations:
warcembed.domain.ext: Hosts warc-embed-netlify
www.domain.ext: Has iframes pointing to warc.domain.ext/embed
GET, HEAD
| Name | Required ? | Description |
|---|---|---|
archive-url |
Yes | Full url to the .warc.gz or .wacz file to embed. Must point to a host listed in allowlist. |
original-url |
Yes | Url of the page that was archived. |
<!-- On https://*.domain.ext: -->
<iframe
src="https://warcembed.domain.ext/embed/?archive-url=https://otherdomain.ext/archive.warc.gz&original-url=https://what-was-archived.ext/path"
allow="allow-scripts allow-modals allow-forms allow-same-origin"
>
</iframe>Pulls a given .wacz or warc.gz file from the url given by ?archive-url and serves it with the headers needed to playback including:
access-control-allow-originaccept-rangescontent-typecontent-disposition
The <replay-web-page> instance in the document generated by /embed points to this route.
Files need to be hosted on a server supporting range requests: archive.js will try to detect support for range requests, and provide a basic polyfill for it if not.
GET, HEAD
| Name | Required ? | Description |
|---|---|---|
archive-url |
Yes | Full url to the .wacz or .warc.gz file to embed. Must point to a host listed in allowlist. |
This project can be run locally using the Netlify CLI. No account is needed.
In your terminal:
# Install netlify-cli globally
npm install netlify-cli -g
# Start the development server (should run on port 8888 by default)
netlify dev