Skip to content

Content delivery network over dat

Robin Millette edited this page Dec 12, 2017 · 4 revisions

Content delivery network over dat

Moved from https://cryptpad.facil.services/code/#/1/edit/xxwE9xM6840SIbC6yR4XsA/mCJrdS2czXXNwggHkL5fQ+4O/

Idea triggered by https://twitter.com/leokewitz/status/938148036160770048

What about user/reader privacy?

See the Why Not Ship Wikipedia on Dat? article recently published by Joe Hand.

Kewitz: I totally agree with this, although the base assumption that a single source offers better reader privacy doesn't consider the fact that there are other layers already compromised. Do you trust your ISP to not sell your history? What DNS are you using? What are your sources? Do you trust your source's host?

Proof of concept

Manually create a dat with 3-4 js libs (and maybe versions of those libs) and see if people use it.

Possible asset URLs

  • dat://450...ef3a/jquery@3.2.1/dist/jquery.min.js # 450...ef3a could hold multiple assets
  • dat://450...ef3a+321/jquery.min.js
  • dat://555...ef3a+321/asset.min.js # here, 555...ef3a only holds jquery

Basically, decide if we use dat version (+321) and decide how assets are spread amongst dats (or if we use a single dat for the whole cdn).

We'll most probably use a hostname instead of a straight dat key.

Only available to dat (beaker browser and such), no http(s) support

Example: https://twitter.com/leokewitz/status/938148036160770048 (fonts)

Source of assets

https://www.jsdelivr.com/

jsdelivr makes it easy to refer to assets on any github or npm project, and more.

Versions:

Kewitz: Is there any way to access version by file and not the version from the whole dat?

millette: pretty sure we can make a lightweight dat module to only store assets used most often.

dat supports versions. For example:

  • dat://4505deef3a8f1d4258e8fd1be49c68ba3efc475308ce484a7f8164d4dcf3dba1/fonts/
  • dat://4505deef3a8f1d4258e8fd1be49c68ba3efc475308ce484a7f8164d4dcf3dba1+3/fonts/

First example is the latest version, second example picks version 3.

Kewitz: Yep, the problem this is dat based and not file based. It would be really hard to index package versions within the whole dat version.

Kewitz: We can start by offering the latest version of libraries.

Robin: We want the same url to give the same asset each time. As for dat versionning, I still have to dig into that https://github.com/millette/dat-shell/issues/10

Subresource Integrity (SRI)

Discovery

Kewitz: What do you mean by Discovery?

millette: I want to use jquery version 17.2.13 (for instance), what's the link? See How many dats below. We could have a dat to hold mappings from asset to dat keys/versions/paths.

How many dats

How many dats to create a cdn? A single all encompassing dat, for all assets? Or a dat per assets (and all its versions).

Note that hyperdb might solve some of this.

One Dat

Advantages:

  • Easier to consume: same host and shared url.
  • Easier to maintain: just keep updating libraries.
  • Easier to create mirrors: 'dat clone and share'
    • but the demand on the mirror will be astronomical (10,000 of js assets...)
    • We could have sparse mirrors. Hosts that have the same Dat but only a few files each.

Disavantages:

  • If we use Dat builtin versioning system it will be hard to find libraries version within a Dat.
  • With a single dat for the whole cdn, you establish a central authority
    • Kewitz: that's true, but how to have a CDN with different dat urls?

More Dats

Advantages:

  • Spread the load and responsabilities
    • Kewitz: Not sure tho, Beaker use Sparse method as default, right?
    • Correct

Disavantages:

  • Bigger load to handle more dats (cpu, network)

IRC conversations

  • kewitz> OK, my idea is this: Intercept file requests in a special dat-node application
  • kewitz> if we don't have the file, we download and update the cdn-dat
  • millette> how do you intercept over p2p exactly?
  • kewitz> this means that the P2P network will have dynamic nodes that can fetch files from the internet and the rest of users are going to act as a cache.
  • kewitz> that's what I'll need to crack...
  • millette> I don't think it's possible like that.
  • millette> I think we need an http(s) service to request new assets (something that could be self-hosted too)
  • millette> and publish a directory (thru a dat) of asset mapping to dats (or dat urls)
  • kewitz> This is not possible since the file validation is done by the peer itself, the peer only request blocks and not files.