Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signed HTTP Exchanges and WebPackage #121

Open
lidel opened this issue Oct 16, 2018 · 24 comments

Comments

Projects
None yet
6 participants
@lidel
Copy link
Member

commented Oct 16, 2018

This issue tracks ideas, use cases and work related to Web Packaging, especially Signed HTTP Exchanges (SXGs) and Bundled HTTP Exchanges which open the door to associating an origin with content that was not explicitly retrieved from that origin by the browser.
Previous workarounds for the "origin problem" can be found in #89 and #66.

Background

Google is championing work on "Web Packaging" to solve MITM (aka "misattribution problem") of the AMP Project. Signed HTTP Exchanges (SXG) decouple the origin of the content from who distributes it. Content can be published on the web, without relying on a specific server, connection, or hosting service, which is highly relevant for IPFS, as it is great at distributing immutable bundles of data.

A longer overview can be found at developers.google.com: Signed HTTP Exchanges:

2018-11-08--00-02-09

The Google Chrome team is working towards making this an IETF spec and have a prototype built for Chrome with an origin trial starting with Chrome 71.

People would like to use content offline and in other situations where there isn’t a direct connection to the server where the content originates. However, it’s difficult to distribute and verify the authenticity of applications and content without a connection to the network. [..]

Previous attempts at packaging web resources [..] were motivated by speeding up the download of resources from a single server [..] This attempt is instead motivated by avoiding a connection to the origin server at all. #

It is worth noting that this is still a very PoC spec and current version of SXGs is considered harmful by Mozilla and the spec needs further work.

Potential IPFS Use Cases

How does this fit in with P2P distribution?
Is the future of web publishing signed+versioned bundles over IPFS?

IPFS as transport for SXG

  • Signed/Bundled Exchange provide means to separate the URL authority from the delivery mechanism of the document. This allows the IPFS to deliver documents on behalf of a third party which signed the exchange bundle and play nice with legacy PKI.
    • In simpler words: a bundle with entire website (or parts of it) can be loaded over IPFS and browser supporting signed exchange will validate signatures and render content with original domain and green lock in the location bar. Click below to watch 1 minute demo:

      ipfs-webpackage
      Source: https://github.com/jimpick/signed-exchange-test

    • This means one could set up DNSLink pointing at Signed HTTP Exchange and users of IPFS Companion would load cached websites over IPFS while keeping "original" URLs in location bar

      • Digression: right now Chrome "lies" to user and displays "https://" as a protocol, which raises valid concerns. I suspect it will end up being "wpack://" or something like that.
    • Alternative way to use this, would be to create Service Worker orchestration that loads website via SXG snapshot fetched from IPFS as means of failover/workaround for DDoS or censorship scenarios. (See initial experimentation in #121 (comment))

Archival Use Cases (WebPackage Bundles)

  • Bundled Exchange file format could provide standardized means of creating future-proof website snapshots

  • ? (add more ipfs-specific uses in comments below!)

Learning Materials

WebPackage 101

  1. Fixing AMP URLs with Web Packaging (20min primer on Web Packaging)

    After this talk, you will have a solid grasp on the proposed solution to AMP's URL misattribution problem, and how Cloudflare is positioned to take the necessary steps to provide this fix to existing AMP publishers with minimal setup, and no code required.

  2. Web Packaging Format Explainer

    This document describes use cases for packaging websites and explains how to use the cluster of specifications in this repository to accomplish those use cases. It serves similar role as typical "Introduction" or "Using" and other non-normative sections of specs.

  3. Use Cases and Requirements for Web Packages

    Longer, more comprehensible read. This document lists use cases for signing and/or bundling collections of web pages, and extracts a set of requirements from them.

Known Problems and Concerns

  • 2018 Q1: Mozilla's Position: "harmful" (in current form)

    Mozilla has concerns about the shift in the web security model required for handling web-packaged information. Specifically, the ability for an origin to act on behalf of another without a client ever contacting the authoritative server is worrisome, as is the removal of a guarantee of confidentiality from the web security model (the host serving the web package has access to plain text). We recognise that the use cases satisfied by web packaging are useful, and would be likely to support an approach that enabled such use cases so long as the foregoing concerns could be addressed.
    https://mozilla.github.io/standards-positions/ & mozilla/standards-positions#29 (comment)

  • 2019 Q2: Signed HTTP Exchanges Could Allow Ads to get Around DNS-based ad blocking

    https://www.reddit.com/r/pihole/comments/alwkh1/

  • Built-In Tracking in Signed Packages - WICG/webpackage#422

References

Web Packaging Primer

Additional Resources

cc @jimpick @mikeal

@jimpick

This comment has been minimized.

Copy link

commented Oct 17, 2018

This is pretty crude, but these are the files I used in my demo:

https://github.com/jimpick/signed-exchange-test

Probably the only thing really re-usable in that is the service worker (sw-ipfs.js) which intercepts requests and loads the associated .sxg file from the ipfs.io gateway.

@mikeal

This comment has been minimized.

Copy link
Member

commented Oct 17, 2018

I doubt I'll have time to dive into this before I go on vacation but this is awesome!

@jimpick

This comment has been minimized.

Copy link

commented Oct 27, 2018

Instructions on how to get into the origin trial here:

https://twitter.com/kinu/status/1055825077281939456

@jimpick

This comment has been minimized.

Copy link

commented Nov 5, 2018

@lidel

This comment has been minimized.

Copy link
Member Author

commented Nov 5, 2018

Content servers: If you want to host SXGs created by publishers on their behalf, you can participate in the origin trial to have the SXGs processed by Chrome without requiring your users turn on a flag.
/signed-exchanges#participate_in_the_origin_trial

IIUC we could look into enabling this on our HTTP Gateway,
that way we could demo .sxg with regular Chrome without passing any special flags.

@lidel

This comment has been minimized.

Copy link
Member Author

commented Nov 14, 2018

A new talk just landed:
🎥 From Low Friction to Zero Friction with Web Packaging and Portals (Chrome Dev Summit 2018)

The focus is on UX, but I gathered highlights related to Web Packaging:

  • Introduction to Web Packaging: ~6:46
  • Signing exchanges for your site: ~10:49 (https://bit.ly/try-sxg)
  • How Cloudflare plans to support SXGs: ~14:05
  • Update on Bundled Exchanges with example use in offline new reader: ~21:39
  • Roadmap: ~23:21

ps. Our Origin Trial setup is tracked in ipfs/infra#453

@lidel lidel added the Origin label Nov 15, 2018

@lidel

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2018

PSA:
Origin Trial for Signed HTTP Exchanges is enabled for ipfs.io Gateway

This means anyone can publish SXG on IPFS and it loads in regular Chrome 71 without any additional setup on user side.

Quick demo:

  1. Install Google Chrome 71
  2. Open SXG from our gateway, for example:
    https://ipfs.io/ipfs/QmVnnXjwXyEKhnrC1L7wegepUum2zN4JZUgtvA7DYtj4rG/sxg-location.sxg
  3. You will see location bar being replaced with Origin read from SXG!
    For the sample above it will be a localhost URL:

    2018-11-29--19-28-37

To create .sxg with Origin of your own domain follow steps from #creating_your_sxg.

This is just a brief update, expect a post at blog.ipfs.io with more details soon.

@jimpick

This comment has been minimized.

Copy link

commented Nov 29, 2018

This is so exciting! I'm going to experiment a bit with this later...

@jimpick

This comment has been minimized.

Copy link

commented Dec 6, 2018

I had a good meeting with the Google Chrome HTTP Signed Exchanges team in Tokyo today. I prepared a little demo:

https://ipfs.v6z.me/

It only works with Chrome Canary (Chrome Beta doesn't seem to work).

The top-level "bootstrap" website with the original index.html and service worker is published to IPFS (using IPNS). That's given it's own SSL certificate using https://cloudflare-ipfs.com/

Then the web content is processed with gen-signedexchange to generate a bunch of .sxg "HTTP Signed Exchange" files, which are published to IPFS (not using IPNS), and finally a 'ipfs-hash.txt' file with the hash of the content is written to the bootstrap site. The service worker looks at that file, and for any file that is being fetched, it will generate a redirect to the published content .sxg files hosted on the public ipfs.io gateway that Protocol Labs runs (which has the correct HTTP headers for the origin trial).

It's a little hard to explain to somebody unfamiliar with all the parts involved. Now that the demo is actually working, I'd love to do a proper blog post for it!

Source code for the demo: https://github.com/jimpick/signed-exchange-test/tree/ipfs.v6z.me-origin-trial

(sorry, no documentation yet ... I only get it working yesterday)

@lidel

This comment has been minimized.

Copy link
Member Author

commented Jan 15, 2019

Spec change opening ability to load SXG from locally running IPFS node: WICG/webpackage#352

@kyledrake

This comment has been minimized.

Copy link

commented Feb 18, 2019

Heads up that there's an AMP conf coming up in Tokyo that will likely have relevant discussion https://www.ampproject.org/amp-conf/

@lidel

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2019

PSA: Origin Trial ends on Mar 6, 2019 – I will extended our token till the trial end.

@lidel

This comment has been minimized.

Copy link
Member Author

commented Mar 21, 2019

Discussion about Service Worker and subresource SXG prefetching integration:

@lidel

This comment has been minimized.

Copy link
Member Author

commented Apr 17, 2019

Cloudflare announced seamless generation of SXG for existing websites as "AMP Real URL".
The feature will be available for free.

  • https://blog.cloudflare.com/announcing-amp-real-url/

    Google’s AMP Crawler downloads the content of your website and stores it in the AMP Cache many times a day. If your site has AMP Real URL enabled Cloudflare will digitally sign the content we provide to that crawler, cryptographically proving it was generated by you. That signature is all a modern browser (currently just Chrome on Android) needs to show the correct URL in the address bar when a visitor arrives to your AMP content from Google’s search results.

This older blogpost contains details on how signed content can be announced to the crawler.

@jimpick

This comment has been minimized.

Copy link

commented Apr 30, 2019

I tried copying an .sxg file found "in the wild" to IPFS and loading it through the gateway:

https://ipfs.io/ipfs/QmcMMFKpj4WtnfDinDh6vuTU5ViQD5ncVtRvTzXWYEyo5w/test1.sxg

In Chrome DevTools, the following error was displayed:

Screenshot 2019-04-29 22 53 05

Looks like we might need to tweak the header on the gateway.

@jimpick

This comment has been minimized.

Copy link

commented May 1, 2019

Good news, the gateway looks like it's updated and .sxg files are loading. :-)

https://ipfs.io/ipfs/QmWgYzCJuNupFeX1RLv27srqU1t7z6HJamUMeR9rm1zF2w

Edit: Actually, not yet ... I checked the headers, and they aren't updated yet. I think that's using a fallback. This stuff is confusing.

@jimpick

This comment has been minimized.

Copy link

commented May 2, 2019

Found a video of an IETF presentation on Web Packaging from this March https://youtu.be/woLbXaX0Gf4?t=700

Interesting that it is being presented to the IETF as a peer-to-peer technology!

Also, listen to the questions to hear @ekr from Mozilla express his strong "considered harmful" position in person.

@jyasskin

This comment has been minimized.

Copy link

commented May 2, 2019

@jimpick Yep, the original use case was around peer-to-peer content distribution in places where mobile data is very expensive or unreliable. We only later realized we could think of the AMP cache as a "peer".

The big unsolved problem for our peer-to-peer model is the way clients discover packages. Doing it naively in the client gives the cache a full view of the client's browsing history, which isn't acceptable. When the peer is on the internet (e.g. AMP), the source of the link to the resource (e.g. Google Search) can provide discovery without leaking any more information. Maybe IPNS can be a more general way to discover packages cached nearby, if its privacy properties are right?

@jimpick

This comment has been minimized.

Copy link

commented May 2, 2019

@jyasskin You are absolutely correct in saying that naively sharing peer-to-peer will expose users privacy. Full privacy is a tough problem to design for.

We're actively working on enhancements to IPNS and the DHT to improve performance. And there is a lot of ongoing work in libp2p for private networks and relaying. I think it would be neat if it would be possible to be able to restrict lookups so that content is only ever retrieved via privacy preserving mechanisms, and that content can be shared or re-shared without danger. There's often going to be a tradeoff in privacy vs. performance.

To make things worse, many politicians, intelligence agencies, police forces and even corporate IT departments are opposed to true anonymity, so it gets into really tricky legal territory.

For non-client applications, there are many datasets which are essentially public and for which most people would prefer performance since the privacy concerns aren't too much of a problem. That's one reason we're primarily focused on package managers and performance this year.

@jimpick

This comment has been minimized.

Copy link

commented May 3, 2019

Here's my take on the WebPackage controversy, which is fundamentally a rethink about SSL certificates and what they represent to the reader:

  • on one side, represented by Google/Chrome/AMP, the idea is that SSL certificates can be used to sign content at the source, so a reader can look in the browser address bar, see the lock, and be assured that the content really came from "The Washington Post". This seems very reasonable to me. It's pretty much the same thing that Beaker Browser is doing with the Dat protocol (not using SSL).

  • on the other side, represented by Firefox, the idea is that SSL certificates are used to sign the connection/transport path. So a reader can look at the browser address bar, see the lock, and be assured that nobody has snooped on the connection between the original source and the reader's browser. This also seems very reasonable to me.

Right now, with AMP and HTTP Signed Exchanges, it is now possible for the "cached" exchanges to not transported directly from the Washington Post, but they are instead coming from Google or Cloudflare's CDN, which is not going to be spying on folks (they claim). Google is using their clout to provide what they call "privacy-preserving pre-fetch". Of course, if you are wearing a tinfoil hat, and you distrust Google, or the government, you might not think your privacy was preserved if Google's CDN is seeing all the documents being fetched.

The problem with peer-to-peer distribution and the experiments we (and others, such as @pfrazee at Beaker) are doing is that we are opening up the distribution to everybody, and the reader privacy problems get very tricky. So displaying the content with a lock saying it has come from "The Washington Post" might be true, but it's also quite possible that by retrieving the content via a peer-to-peer mechanism, there was a digital trail left, and reader privacy has been compromised ... so the "lock" displayed in the UI is misleading people to think that nobody can spy on them.

There has been much discussion about the reader privacy problem in the Dat community:

Clearly, there are ways to improve reader privacy on peer-to-peer networks. For example, access could be made using Tor. Or via an encrypted link to a place that the reader trusts. Content could be distributed via broadcast (eg. satellite) and multicast mechanisms, so there are no direct accesses. Not accessing things directly but via trusted intermediaries and privacy-preserving peer-to-peer networks could actually be a privacy improvement. Peer-to-peer distribution has clear advantages when it comes to censorship resistance.

I wonder if peer-to-peer web browsers for the distributed web need more than one UI element to display trust and privacy information?

Two cases:

  • the content is signed, but it was acquired via a pathway that is actively advertising that the user has a copy so there is zero reader privacy. This might be just fine if a user is altruistically sharing the content.
  • the content is signed, but it can be cryptographically verified that it made it's journey to the reader only via privacy-preserving peer-to-peer networks that the reader has specifically expressed trust in. In this case, a member of a vulnerable population would be protected from a malicious interloper.

Is this an area that could benefit from UX research?

@lidel

This comment has been minimized.

Copy link
Member Author

commented May 4, 2019

UX issues around "HTTPS spoofing"

To expand on @jimpick's take, I believe contributing factor to the controversy is the UX of how SXG@v=b3 got implemented in Google Chrome. Looking from sidelines it may feel rushed and AMP-driven.

To be specific, Google Chome makes SXG indistinguishable from regular HTTPS, which breaks basic assumptions around how users understand the green padlock in location bar (aka "nobody but me and the Origin server can see the payload"). UX of regular HTTPS is reused as-is, pretending that end-to-end HTTPS transport was used with Origin from location bar, which is not true.

Browser should be the user agent, and as one it should never lie or break this type of trust.

To me it feels like UX problem. There should be a different presentation in location bar than re-using the green padlock from HTTPS. Browser should be honest that WebPackage was used and show who was involved in rendering the page: who is the Publisher, when package was created, who was involved in Distributing the content etc.

Need for Demonstrating Archival Use Cases

I believe archiving is a missed opportunity to make a case for WebPackage and figure out technical details and UX in browser without going into politics of PKI and HTTPS spoofing.

Would love to see more happening around this use case. Browsers could add support for saving a website to a WebPackage bundle and loading it from it while making it obvious to the user that they are looking at an archived snapshot, with all details at hand.

This would add real value to the web by empowering individuals and institutions (Internet Archive, Wikipedia) with tools to fight the link rot and censorship. Imagine all Wikipedia References as reproducible snapshots of articles that could be downloaded, shared and read offline.

Worth looking at is the potential overlap with W3C's Packaged Web Publications: https://github.com/w3c/pwpub

Gateway Update: ipfs.io supports v=b3

Good news: we've updated HTTP headers at our IPFS Gateway.
Errors from #121 (comment) should be gone, responses for ipfs.ip/ipfs/**.sxg now include:

Content-Type: application/signed-exchange;v=b3
X-Content-Type-Options: nosniff

Test in Chrome 74+: index.html.sxg :)

@Gozala

This comment has been minimized.

Copy link

commented May 15, 2019

To me it feels like UX problem

I think problem is far greater than UX, that is users are not in control - If all the pages visited through chrome are served through AMP regardless of icon in the location bar user privacy is compromised.

@jimpick

This comment has been minimized.

Copy link

commented May 16, 2019

I found some more videos (thanks YouTube) that go over WebPackaging and Signed HTTP Exchanges in quite a bit of detail.

BlinkOn 9 (April 2018): https://www.youtube.com/watch?v=rcJ9BLymVQE

BlinkOn 10 (April 2019): https://www.youtube.com/watch?v=iTYr5qVbHdo

@lidel

This comment has been minimized.

Copy link
Member Author

commented May 23, 2019

Mozilla published 15 page paper which reaffirms their position: mozilla/standards-positions#29 (comment)

Concentrating on security issues is relatively easy. Coming to terms with a fundamental change to the security and content delivery model of the web is a more difficult task. This document tries to go further and explore other potentially problematic parts in the technology.
[...]
The increased exposure to security problems and the unknown effects of this on power dynamics is significant enough that we have to regard this as harmful until more information is available.

Quick takeaways:

  • Potential value of WebPackaging around offline uses (bundling web content) is recognized, but the paper does not spend much time on value proposition there because current (b3) spec and use cases are focused on SXG/AMP/content distribution and WebPackage Bundles are not implemented yet in Chrome. It could be a different story if bundles shipped before SXG.
  • Origin substitution (aka HTTPS/Origin spoofing) remains to be the main technical problem. At one point paper suggests iterative approach where WebPackaged content is assigned a separate Origin
    • some browser vendors are already double-keying Origin when handling "third-party cookies", I wonder if similar mechanism could be (re)used here
  • Complexity. Is value introduced by WebPackaging enough to justify it?
  • Power dynamics and unexpected consequences are big unknowns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.