New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement IPFS/IPNS #156
Comments
|
Thanks @kyledrake !! I'm going to inline comments on your entire post, as it's more useful to be over-communicative than under.
Thanks!!! Please ping when you need anything.
Yeah, I agree with these. These services aren't actually as cheap as they could be. Part of it is the cost of making it highly available / redundant. It may be useful to keep a live index of the cheapest storage / bandwidth systems.
Not sure i got this right-- so what prevents the ISP from shutting down your
This is a good setup. We've already had success watching directories and importing to IPFS. We don't yet have the commit abstraction and tools, which will let us to do very nice snapshotting with history (like git, but can make it periodically commit).
Each site is one flat-namespace identifier, correct? (like npm, not github)
Am sure XFS will be fine for a while, but there's also another way out: do what git and we do for storing objects in the fs: For example: This gives you directories which each hold
SGTM. You will definitely stress current capabilities with 7TB :). Once https://github.com/jbenet/go-ipfs/tree/master/repo/fsrepo is merged in, things will be much better. Also, will be good to figure out whether you want 1 ipfs node with a 7TB+ local repo, or maybe use a small cluster of ipfs nodes each with a subset. (depends on how the local fs works).
I would do this and skip Strategy 1 and 3 entirely. I'd do this because right now, I should mention that you can craft ipfs objects directly very easily. Meaning, that you can construct a "directory" object easily from all the hashes, so you can still have one root hash without having to re-add the entire
Yeah, so once we add commit objects (it's perhaps time to!), we can just crate a commit like in git :D so full history. The hard part will be manipulating these histories as easily as we do in git. The tooling will take time to build out.
small correction: IPNS is a way to create a pointer out of a public key. Meaning: So you can give someone a link like: And have it resolve to: Now, this isn't working perfectly yet. We're working on it. The DNS resolution started as a hack to get interop, and maybe it should move out of IPNS into a larger name resoluton system-- but it may be too much abstraction. So we'll likely keep it there. The idea is that IPNS has a set of resolvers, such that: The idea is to be able to plug in name resolvers as desired.
Yep! # resolve to ipfs path directly
DNS TXT "dnslink=/ipfs/QmVyS3iAy7mvDA2HqQWm2aqZDcGDH3bCRLFkEutfBWNBqN/foo/bar"
# resolve to ipns path, which in turn resolves to ipfs path
DNS TXT "dnslink=/ipns/Qmf7v2nNR7HXrhrAFMdgdUqHCxd2uqdmoudMDFLyREuNew"
(see above)
Yeah,
It may be simpler to do this: # setup one TXT record
neocities.org TXT "dnslink=/ipns/<hash-of-neocities.org-public-key>"
# republish the root hash of `sites/` to IPNS whenevr it changes
ipfs add -r sites/ # or equivalent
ipfs name publish <last-hash-of-sites>
# sites can use
/ipns/neocities.org/kyledrake
# resolution:
/ipns/neocities.org/kyledrake/foo/bar
-> neocities.org # look up TXT for domain name
-> /ipns/<hash-of-neocities.org-public-key> # get ipns path. resolve it.
-> /ipfs/<last-hash-of-sites> # get ipfs path. resolve it.
-> /ipfs/<last-hash-of-sites>/kyledrake # path lookup...
-> /ipfs/<last-hash-of-sites>/kyledrake/foo # path lookup...
-> /ipfs/<last-hash-of-sites>/kyledrake/foo/bar # path lookup...So you only need to update 1 IPNS record. I understand subdomains are critical on the web-facing side to be able to do subdomain origin separation between the sites. Two things on that:
Depending on the robustness we can give IPNS soon enough, may be easy to go that route. Another route is to use the gateway that takes the subdomain into account. There's no reason your HTTP gateway has to serve paths like ours do ( You can very well serve: Similar to "hosts" abstraction in most http servers.
Lmk if comments above didnt make sense!
o/ Also, cc @whyrusleeping |
|
Wow, this is great! I'm always around in irc if youve any questions |
|
just ... wow... jbenet !! |
|
@jbenet Thank you for the really good feedback! This is super helpful, I get this a lot more now. You're right, a single TXT record using the pubkeyhash is a cleaner approach. I've setup a live test node for this. I've got a pubkeyhash (NodeID in the whitepaper) of So We've got an SPF record for the mail which is preventing us from adding this TXT record: Keybase.io domain verification got around this by using a subdomain called So, yeah, maybe there needs to be an _ipfs or something here, or I need to figure out how to combine the SPF record with the IPFS record (I don't know if this is allowed). Anyhoo: I set up a tiny mocked up version of our I just tried this, and it actually worked: http://gateway.ipfs.io/ipns/QmZz4ewNiJGJ4QGRVdLbsptFjMnRociJyEFUA6kv54qmkA So, of course the problem here is that running So the trick would be the ability to quickly create a directory object every time, without reimporting all the data again, and then add the directory object as a single RE The site origin, it's definitely a problem. Implementation of that Chrome paper is not happening anytime soon. It needs to have subdomains to be used safely. In a *.neocities.org example, an attacker could totally screw up my stored episode list on the My Little Pony Episode Guide (http://mlpfim.neocities.org), and that would pretty much ruin my life. :) I would have to put a proxy in front of it, but So I'm at the question: Is it possible to make the sites/ hash update quickly enough that it can be done in less than a few seconds, even if there are millions of subdirectories, tens (or hundreds) of millions of files and potentially several dozen updates within |
|
I did a little digging on the TXT issue. It turns out that you can have more than one TXT record, it's just that Amazon's Route53 interface doesn't let you add it (The Future Of Scaling (TM)). This isn't really a problem for us because we're switching off of it pretty soon anyways, but that would likely be an issue for many people. You would need IPFS to parse multiple TXT records too of course to find the one it's looking for, incase it's not doing that yet. |
|
A common strategy is to separate the entries with a space ("foo=bar foo2=baz"). Not everyone handles this correctly but most big things do I think. |
Yep, exactly. This can be done by creating the object yourself. it can be done either by
nginx is the best proxy. Yeah i think can write a config to manipulate the path and add incoming subdomain. I still think we'll want to make a special build of the gateway that understands what the
Yeah, as described above, can definitely do this, even now. It just means not running |
|
(lmk if the TXT record thing is resolved. i do think putting multiple |
|
RE the DNS TXT in SPF record idea: 4.6: As I understand it, a name=val pair is called a So it looks like you can throw an unknown modifier on the end safely, due to SPF wanting to confirm to future modifiers. You would need to make IPFS check for the It feels a bit hackish. Not sure if this is something you want to standardize for the IPFS DNS stuff. It might make more sense to do a proper parsed structure and start with v=ipfs or something like that. And have a comedy _ipfs fallback, but that would be not so great because it could require two lookups to the DNS for a request. CCing @postmodern, as he's maintaining the Ruby SPF parser and might have some thoughts on the sanity of this approach. @postmodern, in a nutshell we're thinking about putting an IPFS link in as an unknown modifier for SPF records, because some derpie DNS services like Route53 don't let you have multiple TXT records even though it's supposed to be okay. This prevents people from using IPNS references that come in via a pubkeyhash published to a DNS TXT. |
|
I know nothing about IPFS (but I suspect I will have to learn about it soon). Most all of these security protocols require that unknown directives be ignored for future compatibility and resilience. As long as the name/value conform to the syntax, you should be able to get away with adding custom modifiers. |
|
I had a very productive discussion with Juan today. We're ready to go: Phase 1: IPFS integration with Neocities sitesFor the first implementation, I am going to We'll add the following table to the database: This will reference each ipfs hash that's been created, based on a timestamp, and whether it has been pinned (persisted) or not. We'll try to persist everything for now (why not?) With phase 1, our linking to the IPFS hash from our site is the "verification" that it's correct. This gives us the ability to build an archive for sites as a consequence of the way it works. A link to an IPFS gateway will be provided via the site profile, with an archive page link, and a WTF link so people know what it is. It will be listed as a preview, as it may go away or (more likely) change in it's implementation. Phase 2Phase 2 is destroy the central cloud guerrilla warfare: It makes it possible to leave Neocities, but still publish to the same mutable source. We generate a Upon request, the site owner can request their private key, and use it to "take control" of the site from us and publish to it directly if they want to. This does signing to verify that the data came from the private key. The result is that even if the site shuts down, the user could still publish to the IPNS resource using the private key via a different tool. This is going to be a while before the code's ready for this. There's also some security/workflow concerns here that need to be considered. And we'll need to be able to run multiple NodeIDs, and currently the client only runs one. So for now, we start with phase 1. |
|
|
|
@Fil The id within our local database is always going to be local, so making the ID a UUID doesn't give us much here. In the future the ideal way to do this is to use IPNS records for each site contained in the nameserver TXT record for the domain. |
|
Based on the fact that we already have requests such as #202, how is the implementation of this feature going on? |
|
Mantaining IPFS hashes everytime something changes is a hassle that has already been ruled out by IPNS. I see it has been mentioned only in the regard that it can be used in the DNS level, but I personally use IPNS in the "hash level". For every site I have, I only store / care / remember their IPNS hash, which gets updated like I'm explain below. The current drawback for this approach is that this requires an IPFS node running for every single site.This is an IPNS current limitation. In the case of every neocities.org website, I don't know how it would scale. Also, on a raspberry pi, it takes a lot of seconds to update every 10MB website. PS: one suggestion to overcome this drawback is to store every site in a single IPNS namespace: ipfs/kubo#1716 (comment) This is a python script I use for updating my IPNS websites: content of ipfs-update.py: #!/usr/bin/env python
import json, subprocess, sys, os
try:
sites = json.load(open('ipfs-sites.json'))
except Exception as e:
print(e)
sys.exit(1)
for site in sites.values():
try:
os.environ['IPFS_PATH'] = site['ipfs_path']
ipfs_add = subprocess.Popen(
['/usr/bin/env', 'ipfs', 'add', '-rq', site['web_root']],
stdout=subprocess.PIPE
)
ipfs_hash = ipfs_add.stdout.read()[-47:-1]
ipfs_publish = subprocess.Popen(
['/usr/bin/env', 'ipfs', 'name', 'publish', ipfs_hash],
stdout=subprocess.PIPE
)
print ipfs_publish.stdout.read()
except Exception as e:
print(e)
sys.exit(1)Example content of ipfs-sites.json. Note that each site has a folder where their files are stored, but it doesn't have to be in the same server. It can be remotely mounted with sshfs or it could reside even in the ipfs swarm, with some adjustments to the script. The thing we can't run away from is that every one must have an IPFS daemon running and configured to use a specific IPFS_PATH enviroment variable: {
"mysite": {
"ipfs_path": "/home/ipfs/mysite",
"web_root": "/var/www/mysite"
},
"othersite": {
"ipfs_path": "/home/ipfs/othersite",
"web_root": "/var/www/othersite"
}
}I even serve my clearnet websites using IPFS as backend, thus saving disk space (in my setup, the fileserver is separated from the ipfs/web server). My nginx web server configuration: server www.example.com;
location / {
proxy_pass http://localhost:8080/ipns/mylongipnshash/;
} |
There's a way to have multiple IPNS keys with one daemon: Current issues with this (for Neocities):
Not too bad for a few keys, but we would have to do this for 140k sites if we wanted to give each current Neocities site a key. Supposedly the 12 hour update requirement will be gone soon, but I don't know much about this right now: https://discuss.ipfs.io/t/what-is-ipns-how-does-it-work-and-how-to-deal-with-mutable-content/457/17 @whyrusleeping probably knows the latest. |
|
@rugk This is basically the situation right now: The archiving side is implemented (though it's temporarily offline due to an infrastructure upgrade, coming back soon). The next step is to do IPNS keys for each Neocities site. The hold up:
All of these issues need to change in some measurable way before we can make more progress. The most actionable thing for me right now is the DNSLINK support for our nameserver infrastructure, which I'm going to work on when we re-write our proxy architecture to use redis replication and a pub/sub model. |
|
@kyledrake what about parallel updating and/or IPNS updating on demand for opt-in sites? |
|
@kyledrake I am working on improving performance and default support for DNSLink in IPFS Companion (our browser extension): ipfs/ipfs-companion#558.
Would it be possible to add $ curl -Is https://explore.ipld.io/static/media/ipfs-logo.4831bd1a.svg | grep X-Ipfs-Path
X-Ipfs-Path: /ipns/explore.ipld.io/static/media/ipfs-logo.4831bd1a.svgNeocities' CDN seem to filter it out, instead adds |
|
The CDN does not serve from go-ipfs, which is why the header is not there. I've just rolled out a change to the CDN that will allow for X-Ipfs-Path to display for domains with subdomains (ex: status.neocities.org), but it won't appear for neocities.org or for any custom domains we are hosting. Does that work for your needs? |
|
It needs to be a valid ipns or ipfs path ( btw: I made UX simpler and decided always doing the async lookups in "best-effort" mode.
|
|
@kyledrake this is now enabled by default in stable channel of ipfs-companion and unfortunately Neocities websites are super slow / broken when loaded over IPFS. Perhaps it makes sense for you to temporarily remove |
|
@lidel I'm migrating Neocities over to new infrastructure by the end of the month, then we're going to re-plumb our IPFS system, which will hopefully fix this. |
|
It seems that dotfiles are not making their way to IPFS. Is this intentional? The main place I use a dotfile is for the /.well-known/ files. |
|
@sixcorners |
|
@whyrusleeping I'm kind of using neocities to host the files in IPFS. They are the ones adding the files. |
|
As of now we have the IPFS Companion which is a browser plugin that essentially makes IPFS behave like it's the regular web. Because many (or all?) neocities sites have a TXT record with a dnslink and the latest site's CID (is it?), my browser is taking forever to resolve neocities.org sites. I even found my site's CID (by doing a
Edit: I change one character in a file and the CID of my dnslink changed from v0 to v1. The DNS record (for the domain) have changed in around a minute. I have been able to pin the site. Which means probably sites that have not been updated recently (mine was last updated 6 months ago) won't show up on IPFS gateways / desktop / etc. So I'll leave this comment here for reference. I am happy to host my own site via IPFS, I just need to know if the dnslink TXT is updated and how do I replicate the exact same hash (IPFS CID) locally. Edit: The above statement remains true. What is the process for archiving so I can replicate with my local repository (I have the whole site on my laptop, the files should be the same). |
|
Also the blog is outdated, there's IPFS Desktop for windows too. |
|
Please pay attention to non-functioning IPFS serving issue: #352 |

kyledrake commentedApr 17, 2015
I spent a day playing with IPFS, and it blew my mind. We're starting implementation, right now.
It's an ongoing project (still in beta), but now's the time to start putting together a strategy for how we're going to implement it.
How Neocities Works Now
Here's how we currently serve files:
First, we don't use
S3,GCSor any of these file storage services. The main reason is cost - bandwith is far too expensive at these providers. Right now we can serve over 150TB for $100, but with these providers it would cost upwards of $15,000. We're bootstrapped, so we need to live within our means, and this system allows us to do that (it's much faster than S3 too, but I won't go there).So we're using dedicated servers with 7TB disk arrays.
A single canonical
primary fileserverstores all of the sites. Aproxyserver (that can be turned into multiple proxy servers easily) stands between the fileserver and the public internet as a safety and scalability measure. It also protects us against our ISP shutting down the file server in the event of complaints regarding stored content (this gives us breathing room to deal with the complaints.. the proxy servers can get shut down, but the fileserver can not).This primary filesystem replicates hourly to a
replica fileserverusing rsync over SSH. We have built a system that usesinotifyto push file updates to thereplicaas they come in (allowing us to use the replica as a load balancer mirror), but this hasn't been implemented yet.A large XFS volume has all the sites in subdirectories of a directory called
sites. Example:The proxy is running
nginx, and translates site subdomains to the location on the fileserver (kyledrake.neocities.org=>sites/kyledrake), so the underlying mechanism is a simple HTTP file server essentially.I was told this would not work, but XFS actually handles a lot of subdirectories in a directory pretty well. It stores references in a B-Tree that allows for quick lookups. I'm not sure how it would handle billions of sites, but I expect it to work fine until we at least hit a million sites.
Once we pass the capabilities of a primary server + replica load balancing (read: when we get larger than 7TB), the plan was to implement a sharding technique. We looked into distributed filesystems, but they were too complex, too expensive, or didn't serve our purposes (GlusterFS for example becomes very slow when it can't find files that aren't there - it's not designed for the needs of web hosting).
IPFS (high level) Plans
Right now, we're going to implement IPFS in addition to our current system, using an asyncronous work system that copies changes over to IPFS, and then provides a hash of the added data.
My long-term goal is to replace our primary fileserver + replica system to a system completely based on IPFS. This will require some careful design, but I think it's worth exploring.
For now, I'm exploring integration strategies that have two copies. Then I'll start thinking how we could dump the entire thing into ipfs.
IPFS Strategy 1: Add the entire
sitesdirectory inThe first obvious strategy is to, maybe once a month, run
ipfs add -r sites, and then use a script to extract the last hash, and provide that as a resource for people. I don't like this approach very much, because it only works if no updates to the sites are made. I want to be able to give IPFS the latest version of the sites. Also it's slow (though will probably improve later on).IPFS Strategy 2: Site-level updates
Strategy 2 is
ipfs add -r sites/kyledrakeeverytime a file onkyledrake.neocities.orggets updated. This will provide a hash that references the latest state of that particular site. We would store thehashfrom that in our database on thesitestable, one for each site, and use that to reference how to access it with IPFS.I like this better than strategy 1, and it's going to scale much better. It still has the "problem" (it's more of a feature really) of the hash changing for each site update, of course. We could make a pretty sweet archive history system using it, or we could unpin the previous hash and garbage collect it (I don't see an unpin command in the command list, though).
IPFS Strategy 3: File-level updates
Strategy 3 is
ipfs add -r sites/kyledrake/index.htmlif I was updating that particular file. This creates a hash for each file, which we would then need to track for each individual file. Not sure this is necessary, or helpful. Strategy 2 seems like the most logical fit right now.IPNS
IPNS in it's current form is a way to create a mutable version that uses DNS to tell IPFS where to find the files.
Run
dig ipfs.git.sexy txtand you'll see this:This is an IPFS hash that points to the site. When you go to
/ipns/ipfs.git.sexy, it automagically looks this up and uses it to find the IPFS hash needed.I'm being told that this is changing, and that there's another one that uses a private key? I'm not exactly sure how this one works yet, but the idea is to provide an ipns/kyledrake.neocities.org as a lookup resource.
The obvious thing that this would require is a nameserver that we can programmically update that can handle a lot of TXT entries, which will be updated with new hashes via a messaging system like nsq or something like that.
Conclusions
The current idea is to implement Strategy 2, and then have a dynamically updating DNS.
I'm not sure I've got my head around this properly, so I'd love some feedback on this idea, or an alternative suggestion.
Paging @jbenet and @tjgillies!
The text was updated successfully, but these errors were encountered: