Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement IPFS/IPNS #156

Open
kyledrake opened this issue Apr 17, 2015 · 29 comments
Open

Implement IPFS/IPNS #156

kyledrake opened this issue Apr 17, 2015 · 29 comments

Comments

@kyledrake
Copy link
Member

kyledrake commented Apr 17, 2015

I spent a day playing with IPFS, and it blew my mind. We're starting implementation, right now.

It's an ongoing project (still in beta), but now's the time to start putting together a strategy for how we're going to implement it.

How Neocities Works Now

Here's how we currently serve files:

First, we don't use S3, GCS or any of these file storage services. The main reason is cost - bandwith is far too expensive at these providers. Right now we can serve over 150TB for $100, but with these providers it would cost upwards of $15,000. We're bootstrapped, so we need to live within our means, and this system allows us to do that (it's much faster than S3 too, but I won't go there).

So we're using dedicated servers with 7TB disk arrays.

A single canonical primary fileserver stores all of the sites. A proxy server (that can be turned into multiple proxy servers easily) stands between the fileserver and the public internet as a safety and scalability measure. It also protects us against our ISP shutting down the file server in the event of complaints regarding stored content (this gives us breathing room to deal with the complaints.. the proxy servers can get shut down, but the fileserver can not).

This primary filesystem replicates hourly to a replica fileserver using rsync over SSH. We have built a system that uses inotify to push file updates to the replica as they come in (allowing us to use the replica as a load balancer mirror), but this hasn't been implemented yet.

A large XFS volume has all the sites in subdirectories of a directory called sites. Example:

sites/kyledrake
sites/film

The proxy is running nginx, and translates site subdomains to the location on the fileserver (kyledrake.neocities.org => sites/kyledrake), so the underlying mechanism is a simple HTTP file server essentially.

I was told this would not work, but XFS actually handles a lot of subdirectories in a directory pretty well. It stores references in a B-Tree that allows for quick lookups. I'm not sure how it would handle billions of sites, but I expect it to work fine until we at least hit a million sites.

Once we pass the capabilities of a primary server + replica load balancing (read: when we get larger than 7TB), the plan was to implement a sharding technique. We looked into distributed filesystems, but they were too complex, too expensive, or didn't serve our purposes (GlusterFS for example becomes very slow when it can't find files that aren't there - it's not designed for the needs of web hosting).

IPFS (high level) Plans

Right now, we're going to implement IPFS in addition to our current system, using an asyncronous work system that copies changes over to IPFS, and then provides a hash of the added data.

My long-term goal is to replace our primary fileserver + replica system to a system completely based on IPFS. This will require some careful design, but I think it's worth exploring.

For now, I'm exploring integration strategies that have two copies. Then I'll start thinking how we could dump the entire thing into ipfs.

IPFS Strategy 1: Add the entire sites directory in

The first obvious strategy is to, maybe once a month, run ipfs add -r sites, and then use a script to extract the last hash, and provide that as a resource for people. I don't like this approach very much, because it only works if no updates to the sites are made. I want to be able to give IPFS the latest version of the sites. Also it's slow (though will probably improve later on).

IPFS Strategy 2: Site-level updates

Strategy 2 is ipfs add -r sites/kyledrake everytime a file on kyledrake.neocities.org gets updated. This will provide a hash that references the latest state of that particular site. We would store the hash from that in our database on the sites table, one for each site, and use that to reference how to access it with IPFS.

I like this better than strategy 1, and it's going to scale much better. It still has the "problem" (it's more of a feature really) of the hash changing for each site update, of course. We could make a pretty sweet archive history system using it, or we could unpin the previous hash and garbage collect it (I don't see an unpin command in the command list, though).

IPFS Strategy 3: File-level updates

Strategy 3 is ipfs add -r sites/kyledrake/index.html if I was updating that particular file. This creates a hash for each file, which we would then need to track for each individual file. Not sure this is necessary, or helpful. Strategy 2 seems like the most logical fit right now.

IPNS

IPNS in it's current form is a way to create a mutable version that uses DNS to tell IPFS where to find the files.

Run dig ipfs.git.sexy txt and you'll see this:

;; ANSWER SECTION:
ipfs.git.sexy.      300 IN  TXT "QmVyS3iAy7mvDA2HqQWm2aqZDcGDH3bCRLFkEutfBWNBqN"

This is an IPFS hash that points to the site. When you go to /ipns/ipfs.git.sexy, it automagically looks this up and uses it to find the IPFS hash needed.

I'm being told that this is changing, and that there's another one that uses a private key? I'm not exactly sure how this one works yet, but the idea is to provide an ipns/kyledrake.neocities.org as a lookup resource.

The obvious thing that this would require is a nameserver that we can programmically update that can handle a lot of TXT entries, which will be updated with new hashes via a messaging system like nsq or something like that.

Conclusions

The current idea is to implement Strategy 2, and then have a dynamically updating DNS.

I'm not sure I've got my head around this properly, so I'd love some feedback on this idea, or an alternative suggestion.

Paging @jbenet and @tjgillies!

@jbenet
Copy link

jbenet commented Apr 17, 2015

Thanks @kyledrake !! 😄 glad to be helpful.

I'm going to inline comments on your entire post, as it's more useful to be over-communicative than under.


I spent a day playing with IPFS, and it blew my mind. We're starting implementation, right now.

Thanks!!! Please ping when you need anything. 😄 happy to help.

How Neocities Works Now

Here's how we currently serve files:

First, we don't use S3, GCS or any of these file storage services. The main reason is cost - bandwith is far too expensive at these providers. Right now we can serve over 150TB for $100, but with these providers it would cost upwards of $15,000. We're bootstrapped, so we need to live within our means, and this system allows us to do that (it's much faster than S3 too, but I won't go there).

Yeah, I agree with these. These services aren't actually as cheap as they could be. Part of it is the cost of making it highly available / redundant.

It may be useful to keep a live index of the cheapest storage / bandwidth systems.

So we're using dedicated servers with 7TB disk arrays.

A single canonical primary fileserver stores all of the sites. A proxy server (that can be turned into multiple proxy servers easily) stands between the fileserver and the public internet as a safety and scalability measure. It also protects us against our ISP shutting down the file server in the event of complaints regarding stored content (this gives us breathing room to deal with the complaints.. the proxy servers can get shut down, but the fileserver can not).

Not sure i got this right-- so what prevents the ISP from shutting down your fileserver? it's hosted elsewhere?

This primary filesystem replicates hourly to a replica fileserver using rsync over SSH. We have built a system that uses inotify to push file updates to the replica as they come in (allowing us to use the replica as a load balancer mirror), but this hasn't been implemented yet.

This is a good setup. We've already had success watching directories and importing to IPFS. We don't yet have the commit abstraction and tools, which will let us to do very nice snapshotting with history (like git, but can make it periodically commit).

A large XFS volume has all the sites in subdirectories of a directory called sites. Example:

sites/kyledrake
sites/film

Each site is one flat-namespace identifier, correct? (like npm, not github)

The proxy is running nginx, and translates site subdomains to the location on the fileserver (kyledrake.neocities.org => sites/kyledrake), so the underlying mechanism is a simple HTTP file server essentially.

I was told this would not work, but XFS actually handles a lot of subdirectories in a directory pretty well. It stores references in a B-Tree that allows for quick lookups. I'm not sure how it would handle billions of sites, but I expect it to work fine until we at least hit a million sites.

Am sure XFS will be fine for a while, but there's also another way out: do what git and we do for storing objects in the fs:

For example:

kyledrake         -> /kyl/edr/ake/
film              -> /fil/m__/___/
jbenetsrandomsite -> /jbe/net/srandomsite/

This gives you directories which each hold 40-50,000 entries. Can play with the depth and fanout factors to get the right load on the fs.

IPFS (high level) Plans

Right now, we're going to implement IPFS in addition to our current system, using an asyncronous work system that copies changes over to IPFS, and then provides a hash of the added data.

SGTM. You will definitely stress current capabilities with 7TB :). Once https://github.com/jbenet/go-ipfs/tree/master/repo/fsrepo is merged in, things will be much better. Also, will be good to figure out whether you want 1 ipfs node with a 7TB+ local repo, or maybe use a small cluster of ipfs nodes each with a subset. (depends on how the local fs works).

My long-term goal is to replace our primary fileserver + replica system to a system completely based on IPFS. This will require some careful design, but I think it's worth exploring.

👍 pretty sure this will work well.

For now, I'm exploring integration strategies that have two copies. Then I'll start thinking how we could dump the entire thing into ipfs.

👍 yes, please do this. IPFS is getting ready fast, but it's not yet to be relied on solely. It won't eat your data, but it is a bit flakey on long-living daemons, etc.

IPFS Strategy 2: Site-level updates

Strategy 2 is ipfs add -r sites/kyledrake everytime a file on kyledrake.neocities.org gets updated. This will provide a hash that references the latest state of that particular site. We would store the hash from that in our database on the sites table, one for each site, and use that to reference how to access it with IPFS.

I would do this and skip Strategy 1 and 3 entirely. I'd do this because right now, ipfs add -r <path> will re-add all files under that site. It's not yet smart enough to use modtimes, and so on. (we will be, just not yet). We could look at helping support this much earlier than planned if you find it's giving you problems. My guess is that each site is small enough (typically <100MB?) to not be a problem. Could maybe specialcase very large sites to strategy 3.

I should mention that you can craft ipfs objects directly very easily. Meaning, that you can construct a "directory" object easily from all the hashes, so you can still have one root hash without having to re-add the entire sites/ dir. I'll walk you through how to do this-- and we also can make better tooling for this. It's time to do so.

I like this better than strategy 1, and it's going to scale much better. It still has the "problem" (it's more of a feature really) of the hash changing for each site update, of course. We could make a pretty sweet archive history system using it, or we could unpin the previous hash and garbage collect it (I don't see an unpin command in the command list, though).

Yeah, so once we add commit objects (it's perhaps time to!), we can just crate a commit like in git :D so full history. The hard part will be manipulating these histories as easily as we do in git. The tooling will take time to build out.

IPNS

IPNS in it's current form is a way to create a mutable version that uses DNS to tell IPFS where to find the files.

small correction: IPNS is a way to create a pointer out of a public key. Meaning:

<hash-of-public-key>  ---resolves--to-->  <ipfs-path>

So you can give someone a link like:

/ipns/<hash-of-my-public-key>/foo/bar/baz

And have it resolve to:

ipfs name publish <hash-of-obj1>
resolves to /ipfs/<hash-of-obj1>/foo/bar/baz
ipfs name publish <hash-of-obj2>
resolves to /ipfs/<hash-of-obj2>/foo/bar/baz
ipfs name publish <hash-of-obj3>
resolves to /ipfs/<hash-of-obj3>/foo/bar/baz

Now, this isn't working perfectly yet. We're working on it.

The DNS resolution started as a hack to get interop, and maybe it should move out of IPNS into a larger name resoluton system-- but it may be too much abstraction. So we'll likely keep it there.

The idea is that IPNS has a set of resolvers, such that:

/ipns/foo.com/...               resolves with DNS
/ipns/<hash-of-public-key>/...  resolves with IPNS Routing (as above)
/ipns/foo.bit/...               resolves with Namecoin (not implemented)
...

The idea is to be able to plug in name resolvers as desired.

Run dig ipfs.git.sexy txt and you'll see this:

;; ANSWER SECTION:
ipfs.git.sexy.      300 IN  TXT "QmVyS3iAy7mvDA2HqQWm2aqZDcGDH3bCRLFkEutfBWNBqN"

This is an IPFS hash that points to the site. When you go to /ipns/ipfs.git.sexy, it automagically looks this up and uses it to find the IPFS hash needed.

Yep! 👍 we will be changing the format soon to be that the TXT record is:

# resolve to ipfs path directly
DNS TXT "dnslink=/ipfs/QmVyS3iAy7mvDA2HqQWm2aqZDcGDH3bCRLFkEutfBWNBqN/foo/bar"

# resolve to ipns path, which in turn resolves to ipfs path
DNS TXT "dnslink=/ipns/Qmf7v2nNR7HXrhrAFMdgdUqHCxd2uqdmoudMDFLyREuNew"

I'm being told that this is changing, and that there's another one that uses a private key? I'm not exactly sure how this one works yet,

(see above)

but the idea is to provide an ipns/kyledrake.neocities.org as a lookup resource.

Yeah,

The obvious thing that this would require is a nameserver that we can programmically update that can handle a lot of TXT entries, which will be updated with new hashes via a messaging system like nsq or something like that.

It may be simpler to do this:

# setup one TXT record
neocities.org  TXT "dnslink=/ipns/<hash-of-neocities.org-public-key>"

# republish the root hash of `sites/` to IPNS whenevr it changes
ipfs add -r sites/ # or equivalent
ipfs name publish <last-hash-of-sites>

# sites can use
/ipns/neocities.org/kyledrake

# resolution:

/ipns/neocities.org/kyledrake/foo/bar
  -> neocities.org                             # look up TXT for domain name
  -> /ipns/<hash-of-neocities.org-public-key>  # get ipns path. resolve it.
  -> /ipfs/<last-hash-of-sites>                # get ipfs path. resolve it.
  -> /ipfs/<last-hash-of-sites>/kyledrake           # path lookup...
  -> /ipfs/<last-hash-of-sites>/kyledrake/foo       # path lookup...
  -> /ipfs/<last-hash-of-sites>/kyledrake/foo/bar   # path lookup...

So you only need to update 1 IPNS record.

I understand subdomains are critical on the web-facing side to be able to do subdomain origin separation between the sites. Two things on that:

Conclusions

The current idea is to implement Strategy 2, and then have a dynamically updating DNS.

Depending on the robustness we can give IPNS soon enough, may be easy to go that route. Another route is to use the gateway that takes the subdomain into account. There's no reason your HTTP gateway has to serve paths like ours do (/<http-request-path> -- usually /ipfs/...).

You can very well serve:

/ipfs/<some-internal-state-hash>/<http-request-subdomain>/<http-request-path>

Similar to "hosts" abstraction in most http servers.

I'm not sure I've got my head around this properly, so I'd love some feedback on this idea, or an alternative suggestion.

Lmk if comments above didnt make sense!

Paging @jbenet and @tjgillies!

o/

Also, cc @whyrusleeping

@whyrusleeping
Copy link

whyrusleeping commented Apr 17, 2015

Wow, this is great! I'm always around in irc if youve any questions

@exiledsurfer
Copy link

exiledsurfer commented Apr 17, 2015

just ... wow... jbenet !!

@kyledrake
Copy link
Member Author

kyledrake commented Apr 18, 2015

@jbenet Thank you for the really good feedback! This is super helpful, I get this a lot more now.

You're right, a single TXT record using the pubkeyhash is a cleaner approach.

I've setup a live test node for this.

I've got a pubkeyhash (NodeID in the whitepaper) of QmZz4ewNiJGJ4QGRVdLbsptFjMnRociJyEFUA6kv54qmkA.

So neocities.org TXT "dnslink=/ipns/QmZz4ewNiJGJ4QGRVdLbsptFjMnRociJyEFUA6kv54qmkA" would be the ipns ref. I tried to add it, and ran into this problem:

ipfstxterror

We've got an SPF record for the mail which is preventing us from adding this TXT record:

neocities.org.      59  IN  TXT "v=spf1 mx -all"

Keybase.io domain verification got around this by using a subdomain called _keybase:

$ dig txt _keybase.neocities.org
_keybase.neocities.org. 59  IN  TXT "keybase-site-verification=XlYL2JyH-f-6VSjK_29chFHhpLvCPCoN7BJ0-QcMhVU"

So, yeah, maybe there needs to be an _ipfs or something here, or I need to figure out how to combine the SPF record with the IPFS record (I don't know if this is allowed).

Anyhoo:

I set up a tiny mocked up version of our sites directory. Everytime there's an update (such as for above):

$ ipfs add -r testsites/
added QmcnPFzywu6F5cuhRPR4q1cqxjTQJm6ZDU5kduVpSdBEfM testsites/blog/index.html
added QmTXBWyNAAFtgfCpoi49KeT1MbEzcpMsb5g8ULxRMAZC1U testsites/blog
added QmRZa2UhUBYvF2jbsy4CKKxeBnY95zMb4Z3UCw8sRMbyoD testsites/kyledrake/index.html
added QmZLRFWaz9Kypt2ACNMDzA5uzACDRiCqwdkNSP1UZsu56D testsites/kyledrake/secret.txt
added QmUraf1AbRjqBJutHa4kuiwji7dA2mox4piHGfw5GPrZDD testsites/kyledrake
added Qmbs7VBpbWCNoVVoXoDS7wGwUt8FgmLdVfwNKh168QQTWD testsites/

$ ipfs name publish Qmbs7VBpbWCNoVVoXoDS7wGwUt8FgmLdVfwNKh168QQTWD
Published name QmZz4ewNiJGJ4QGRVdLbsptFjMnRociJyEFUA6kv54qmkA to Qmbs7VBpbWCNoVVoXoDS7wGwUt8FgmLdVfwNKh168QQTWD

I just tried this, and it actually worked: http://gateway.ipfs.io/ipns/QmZz4ewNiJGJ4QGRVdLbsptFjMnRociJyEFUA6kv54qmkA

So, of course the problem here is that running ipfs add -r sites/ for every update to something in the subdirs is going to be very slow, and will occur very frequently.

So the trick would be the ability to quickly create a directory object every time, without reimporting all the data again, and then add the directory object as a single ipfs add event. This would have to be extremely quick, as in less than a few seconds, even for millions of subdirectories.


RE The site origin, it's definitely a problem. Implementation of that Chrome paper is not happening anytime soon. It needs to have subdomains to be used safely.

In a *.neocities.org example, an attacker could totally screw up my stored episode list on the My Little Pony Episode Guide (http://mlpfim.neocities.org), and that would pretty much ruin my life. :)

I would have to put a proxy in front of it, but nginx has a pretty good regex mechanism and I'm pretty sure I could write a config to translate to subdomains pretty easily.


So I'm at the question: Is it possible to make the sites/ hash update quickly enough that it can be done in less than a few seconds, even if there are millions of subdirectories, tens (or hundreds) of millions of files and potentially several dozen updates within sites/ per second? That's the scalability question on that option. Do you think it's achievable even with improved tooling?

@kyledrake
Copy link
Member Author

kyledrake commented Apr 18, 2015

I did a little digging on the TXT issue. It turns out that you can have more than one TXT record, it's just that Amazon's Route53 interface doesn't let you add it (The Future Of Scaling (TM)).

This isn't really a problem for us because we're switching off of it pretty soon anyways, but that would likely be an issue for many people. You would need IPFS to parse multiple TXT records too of course to find the one it's looking for, incase it's not doing that yet.

@jbenet
Copy link

jbenet commented Apr 18, 2015

A common strategy is to separate the entries with a space ("foo=bar foo2=baz"). Not everyone handles this correctly but most big things do I think.

@jbenet
Copy link

jbenet commented Apr 18, 2015

So the trick would be the ability to quickly create a directory object every time, without reimporting all the data again, and then add the directory object as a single ipfs add event. This would have to be extremely quick, as in less than a few seconds, even for millions of subdirectories.

Yep, exactly. This can be done by creating the object yourself. it can be done either by

  • using go-ipfs as a library. (e.g. ipfs-bootstrapd -- this is what the gateways run). See also ipfswatch
  • using the ipfs object command/api to craft the object and add it directly. this means we can construct what the ipfs directory object should have (i.e. the links) and add it every time a site changes. I can perhaps write an example of this next week. (pls bug me to do so if helpful).

RE The site origin, it's definitely a problem. Implementation of that Chrome paper is not happening anytime soon. It needs to have subdomains to be used safely.

👍 yeah

I would have to put a proxy in front of it, but nginx has a pretty good regex mechanism and I'm pretty sure I could write a config to translate to subdomains pretty easily.

nginx is the best proxy. Yeah i think can write a config to manipulate the path and add incoming subdomain. I still think we'll want to make a special build of the gateway that understands what the sites root is supposed to be (as it will be a hotly changed path).

Is it possible to make the sites/ hash update quickly enough that it can be done in less than a few seconds, even if there are millions of subdirectories, tens (or hundreds) of millions of files and potentially several dozen updates within sites/ per second? That's the scalability question on that option. Do you think it's achievable even with improved tooling?

Yeah, as described above, can definitely do this, even now. It just means not running ipfs add -r sites/, but rather only on the one site, taking its root hash, and then manually creating what the new sites/ object has to be (i.e. modify/add/rm links in it), putting it, getting the new hash and updating the ipns record with that. this should take us for the changes, and ms for the ipns republish. (writes to ipns can be coalesced too, so if changes are coming in much faster than ipns republishes, can coalesce and republish at most once per ~500ms or something.

@jbenet
Copy link

jbenet commented Apr 18, 2015

(lmk if the TXT record thing is resolved. i do think putting multiple k=v pairs in one TXT is considered acceptable -- if unideal. I may be wrong though...).

@kyledrake
Copy link
Member Author

kyledrake commented Apr 18, 2015

RE the DNS TXT in SPF record idea:

Section 4.5 of RFC 7208:

   Starting with the set of records that were returned by the lookup,
   discard records that do not begin with a version section of exactly
   "v=spf1".  Note that the version section is terminated by either an
   SP character or the end of the record.  As an example, a record with
   a version section of "v=spf10" does not match and is discarded.

4.6:

   The check_host() function parses and interprets the SPF record to
   find a result for the current test.  The syntax of the record is
   validated first, and if there are any syntax errors anywhere in the
   record, check_host() returns immediately with the result "permerror",
   without further interpretation or evaluation.

As I understand it, a name=val pair is called a modifier. If it doesn't know name, it is considered an unknown-modifier 4.6.1. Term Evaluation.

Section 6:

Modifier Definitions


   Modifiers are name/value pairs that provide additional information.
   Modifiers always have an "=" separating the name and the value.

   The modifiers defined in this document ("redirect" and "exp") SHOULD
   appear at the end of the record, after all mechanisms, though
   syntactically they can appear anywhere in the record.  Ordering of
   these two modifiers does not matter.  These two modifiers MUST NOT
   appear in a record more than once each.  If they do, then
   check_host() exits with a result of "permerror".

   Unrecognized modifiers MUST be ignored no matter where, or how often,
   they appear in a record.  This allows implementations conforming to
   this document to gracefully handle records with modifiers that are
   defined in other specifications.

So it looks like you can throw an unknown modifier on the end safely, due to SPF wanting to confirm to future modifiers.

You would need to make IPFS check for the v=spf1 on the front and then parse it. It would probably be smartest to use a proper SPF parser here. I found one in Ruby, haven't found any for Go.

It feels a bit hackish. Not sure if this is something you want to standardize for the IPFS DNS stuff. It might make more sense to do a proper parsed structure and start with v=ipfs or something like that. And have a comedy _ipfs fallback, but that would be not so great because it could require two lookups to the DNS for a request.

CCing @postmodern, as he's maintaining the Ruby SPF parser and might have some thoughts on the sanity of this approach.

@postmodern, in a nutshell we're thinking about putting an IPFS link in as an unknown modifier for SPF records, because some derpie DNS services like Route53 don't let you have multiple TXT records even though it's supposed to be okay. This prevents people from using IPNS references that come in via a pubkeyhash published to a DNS TXT.

@postmodern
Copy link

postmodern commented Apr 18, 2015

I know nothing about IPFS (but I suspect I will have to learn about it soon). Most all of these security protocols require that unknown directives be ignored for future compatibility and resilience. As long as the name/value conform to the syntax, you should be able to get away with adding custom modifiers.

@kyledrake
Copy link
Member Author

kyledrake commented Apr 22, 2015

I had a very productive discussion with Juan today. We're ready to go:

Phase 1: IPFS integration with Neocities sites

For the first implementation, I am going to ipfs add -r sitename via a Sidekiq worker every time a file is changed. We will track the generated IPFS hash for the site in our database.

We'll add the following table to the database:

site_archives
-----
site_id (id, indexed)
ipfs_hash (string)
is_pinned (boolean)
is_deleted (boolean)
created_at (datetime, indexed)

This will reference each ipfs hash that's been created, based on a timestamp, and whether it has been pinned (persisted) or not. We'll try to persist everything for now (why not?)

With phase 1, our linking to the IPFS hash from our site is the "verification" that it's correct.

This gives us the ability to build an archive for sites as a consequence of the way it works.

A link to an IPFS gateway will be provided via the site profile, with an archive page link, and a WTF link so people know what it is. It will be listed as a preview, as it may go away or (more likely) change in it's implementation.

Phase 2

Phase 2 is destroy the central cloud guerrilla warfare: It makes it possible to leave Neocities, but still publish to the same mutable source.

We generate a NodeID (private key / pubkeyhash) for each site, and then use our DNS via a TXT to reference the pubkeyhash for IPNS. That will be signed to be the latest version of the site.

Upon request, the site owner can request their private key, and use it to "take control" of the site from us and publish to it directly if they want to. This does signing to verify that the data came from the private key. The result is that even if the site shuts down, the user could still publish to the IPNS resource using the private key via a different tool.

This is going to be a while before the code's ready for this. There's also some security/workflow concerns here that need to be considered. And we'll need to be able to run multiple NodeIDs, and currently the client only runs one. So for now, we start with phase 1.

@Fil
Copy link

Fil commented Apr 22, 2015

site_id (id, indexed) looks fine but wouldn't it be more open to decentralization if it were a UUID or a URI?

@kyledrake
Copy link
Member Author

kyledrake commented Jun 23, 2015

@Fil The id within our local database is always going to be local, so making the ID a UUID doesn't give us much here. In the future the ideal way to do this is to use IPNS records for each site contained in the nameserver TXT record for the domain.

@rugk
Copy link

rugk commented Jan 21, 2017

Based on the fact that we already have requests such as #202, how is the implementation of this feature going on?

@iuriguilherme
Copy link

iuriguilherme commented Jun 12, 2017

Mantaining IPFS hashes everytime something changes is a hassle that has already been ruled out by IPNS. I see it has been mentioned only in the regard that it can be used in the DNS level, but I personally use IPNS in the "hash level".

For every site I have, I only store / care / remember their IPNS hash, which gets updated like I'm explain below.

The current drawback for this approach is that this requires an IPFS node running for every single site.This is an IPNS current limitation. In the case of every neocities.org website, I don't know how it would scale. Also, on a raspberry pi, it takes a lot of seconds to update every 10MB website.

PS: one suggestion to overcome this drawback is to store every site in a single IPNS namespace: ipfs/kubo#1716 (comment)


This is a python script I use for updating my IPNS websites:

content of ipfs-update.py:

#!/usr/bin/env python

import json, subprocess, sys, os

try:
  sites = json.load(open('ipfs-sites.json'))
except Exception as e:
  print(e)
  sys.exit(1)

for site in sites.values():
  try:
    os.environ['IPFS_PATH'] = site['ipfs_path']
    ipfs_add = subprocess.Popen(
      ['/usr/bin/env', 'ipfs', 'add', '-rq', site['web_root']],
      stdout=subprocess.PIPE
    )
    ipfs_hash = ipfs_add.stdout.read()[-47:-1]
    ipfs_publish = subprocess.Popen(
      ['/usr/bin/env', 'ipfs', 'name', 'publish', ipfs_hash],
      stdout=subprocess.PIPE
    )
    print ipfs_publish.stdout.read()
  except Exception as e:
    print(e)
    sys.exit(1)

Example content of ipfs-sites.json. Note that each site has a folder where their files are stored, but it doesn't have to be in the same server. It can be remotely mounted with sshfs or it could reside even in the ipfs swarm, with some adjustments to the script. The thing we can't run away from is that every one must have an IPFS daemon running and configured to use a specific IPFS_PATH enviroment variable:

{
  "mysite": {
    "ipfs_path": "/home/ipfs/mysite",
    "web_root": "/var/www/mysite"
  },
  "othersite": {
    "ipfs_path": "/home/ipfs/othersite",
    "web_root": "/var/www/othersite"
  }
}

I even serve my clearnet websites using IPFS as backend, thus saving disk space (in my setup, the fileserver is separated from the ipfs/web server). My nginx web server configuration:

server www.example.com;
location / {
  proxy_pass http://localhost:8080/ipns/mylongipnshash/;
}

@kyledrake
Copy link
Member Author

kyledrake commented Jun 12, 2017

This requires one IPFS node running as daemon for each website. This is an IPNS current limitation. In the case of every neocities.org website, I don't know how it would scale.

There's a way to have multiple IPNS keys with one daemon:

ipfs key gen --type=ed25519 mykey
ipfs name publish --key=mykey /ipfs/QmeomffUNfmQy76CQGy9NdmqEnnHU9soCexBnGU3ezPHVH

Current issues with this (for Neocities):

  • The publish needs to be run every 12 hours.
  • The publish command is "slow". It took 22 seconds for me in my test I just did.

Not too bad for a few keys, but we would have to do this for 140k sites if we wanted to give each current Neocities site a key.

Supposedly the 12 hour update requirement will be gone soon, but I don't know much about this right now: https://discuss.ipfs.io/t/what-is-ipns-how-does-it-work-and-how-to-deal-with-mutable-content/457/17

@whyrusleeping probably knows the latest.

@kyledrake
Copy link
Member Author

kyledrake commented Jun 12, 2017

@rugk This is basically the situation right now:

The archiving side is implemented (though it's temporarily offline due to an infrastructure upgrade, coming back soon). The next step is to do IPNS keys for each Neocities site.

The hold up:

  • Tooling (for basically everything that's ever called itself distributed) is focused on an individual or company hosting a few dozen or less of their own keys, not a single company hosting hundreds of thousands for other people. This really isn't too crazy when you consider that the point of dweb is to obsolete large centralized infrastructures. :)

  • We're not using a nameserver that can handle 100k+ individual DNSLINK records. I'm probably going to have to write something custom to do this and start running our own nameservers.

  • There is no IPFS web browser (or a plugin that can do DNSLINK and understands the concept of security origins), so even if we resolve the above two issues, users will not be able to view the sites correctly.

All of these issues need to change in some measurable way before we can make more progress. The most actionable thing for me right now is the DNSLINK support for our nameserver infrastructure, which I'm going to work on when we re-write our proxy architecture to use redis replication and a pub/sub model.

@iuriguilherme
Copy link

iuriguilherme commented Jul 4, 2017

@kyledrake what about parallel updating and/or IPNS updating on demand for opt-in sites?

@lidel
Copy link

lidel commented Aug 27, 2018

@kyledrake I am working on improving performance and default support for DNSLink in IPFS Companion (our browser extension): ipfs/ipfs-companion#558.

TL;DR The idea for default DNSLink resolver introduced there is to trigger a blocking DNS TXT lookup in the presence of X-Ipfs-Path header. If found, original connection is dropped after headers are read and replaced with IPFS transport (there is no duplication of payload data). Lookup is cached so all following connections skip HTTP and go over IPFS directly.

Would it be possible to add X-Ipfs-Path header to every response?
It is already returned by simple sites backed by go-ipfs HTTP Gateways:

$ curl -Is https://explore.ipld.io/static/media/ipfs-logo.4831bd1a.svg | grep X-Ipfs-Path
X-Ipfs-Path: /ipns/explore.ipld.io/static/media/ipfs-logo.4831bd1a.svg

Neocities' CDN seem to filter it out, instead adds X-Neocities-CDN.

@kyledrake
Copy link
Member Author

kyledrake commented Aug 29, 2018

The CDN does not serve from go-ipfs, which is why the header is not there. I've just rolled out a change to the CDN that will allow for X-Ipfs-Path to display for domains with subdomains (ex: status.neocities.org), but it won't appear for neocities.org or for any custom domains we are hosting.

Does that work for your needs?

@lidel
Copy link

lidel commented Aug 29, 2018

It needs to be a valid ipns or ipfs path (/ipns/status.neocities.org), but yes, it will be enough (bit hacky tho).

btw: I made UX simpler and decided always doing the async lookups in "best-effort" mode.
X-Ipfs-Path will be just a hint that sync lookup should be done for the initial request (in a rare case of DNSLink cache miss).

See ipfs/ipfs-companion#558 (comment)
Ready for tests: ipfs-companion v2.4.4.10960 (Beta)

@lidel
Copy link

lidel commented Oct 16, 2018

@kyledrake this is now enabled by default in stable channel of ipfs-companion and unfortunately Neocities websites are super slow / broken when loaded over IPFS.

Perhaps it makes sense for you to temporarily remove X-Ipfs-Path header until Neocities is able to answer to IPFS queries in reasonable time.

@kyledrake
Copy link
Member Author

kyledrake commented Oct 19, 2018

@lidel I'm migrating Neocities over to new infrastructure by the end of the month, then we're going to re-plumb our IPFS system, which will hopefully fix this.

@sixcorners
Copy link

sixcorners commented Nov 9, 2018

It seems that dotfiles are not making their way to IPFS. Is this intentional? The main place I use a dotfile is for the /.well-known/ files.

@whyrusleeping
Copy link

whyrusleeping commented Nov 9, 2018

@sixcorners ipfs add ignores hidden files by default. You can use the --hidden option to change that.

@sixcorners
Copy link

sixcorners commented Nov 9, 2018

@whyrusleeping I'm kind of using neocities to host the files in IPFS. They are the ones adding the files.

@iuriguilherme
Copy link

iuriguilherme commented Jul 17, 2020

As of now we have the IPFS Companion which is a browser plugin that essentially makes IPFS behave like it's the regular web.

Because many (or all?) neocities sites have a TXT record with a dnslink and the latest site's CID (is it?), my browser is taking forever to resolve neocities.org sites. I even found my site's CID (by doing a dig mysite.neocities.org txt) and tried to pin it in one of my IPFS nodes but it hasn't found anyone with the files yet.

I have tried adding my whole site to one of my IPFS nodes, but I get a different CID. I pinned the folder I get from the "Download entire site" from the web interface. That is not the same CID on my dnslink.

Edit: I change one character in a file and the CID of my dnslink changed from v0 to v1. The DNS record (for the domain) have changed in around a minute. I have been able to pin the site. Which means probably sites that have not been updated recently (mine was last updated 6 months ago) won't show up on IPFS gateways / desktop / etc. So I'll leave this comment here for reference.

I am happy to host my own site via IPFS, I just need to know if the dnslink TXT is updated and how do I replicate the exact same hash (IPFS CID) locally.

Edit: The above statement remains true. What is the process for archiving so I can replicate with my local repository (I have the whole site on my laptop, the files should be the same).

@iuriguilherme
Copy link

iuriguilherme commented Jul 17, 2020

Also the blog is outdated, there's IPFS Desktop for windows too.

@ValdikSS
Copy link

ValdikSS commented Nov 29, 2020

Please pay attention to non-functioning IPFS serving issue: #352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests