Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an IPFS content provider #1096

Open
yuvipanda opened this issue Nov 18, 2021 · 5 comments
Open

Add an IPFS content provider #1096

yuvipanda opened this issue Nov 18, 2021 · 5 comments

Comments

@yuvipanda
Copy link
Collaborator

Proposed change

IPFS is a content addressable global 'file system' that can share directories using an immutable, globaly unique content ID. It can be used to store code as well as data. There are experiments in the pydata / zarr ecosystem on using it for some datasets as well pangeo-forge/roadmap#40.

I'd like us to add an IPFS content provider, perhaps using https://github.com/fsspec/ipfsspec as the provider backend. When given an IPFS content ID, we can just download that directory and let repo2docker do its normal thing. As content ids are immutable, this fits pretty well with what we wanna do.

Who would use this feature?

Ideally, this would eventually end up on mybinder.org and other binderhubs. IPFS can be a distributed alternative to storing code and content, vs something centralized like GitHub.

How much effort will adding it take?

I'd say most of the work would happen in https://github.com/fsspec/ipfsspec, and might already be done. Otherwise, I suspect it'll be minimal effort.

Who can do this work?

Some IPFS enthusiast, maybe :) Some basic understanding of IPFS concepts maybe necessary to fully implement this.

@yuvipanda
Copy link
Collaborator Author

/cc @d70-t, @rabernat (and others? idk). This would be the first step of letting a binder pull in content from IPFS directly.

@yuvipanda yuvipanda changed the title Support an IPFS content provider Add an IPFS content provider Nov 18, 2021
@d70-t
Copy link

d70-t commented Nov 18, 2021

If a content provider is mainly about getting a folder, then maybe

ipfs get <CID> -o <target_folder>

would do the trick (here's the doc)
The catch would be that you'd need to have a running ipfs node on the machine executing the ipfs get. But probably that's not too hard of a requirement 🤷 .


ipfsspec could also be used to retrieve content via gateways on other machines, however ipfsspec is more about getting the content into Python, not really to write it back out on disk (though I assume that should be easy to put on top).


Another option might become the upcoming ?format=car option (and probably also tar etc...), which would make gateways support streaming out an entire graph behind a cid in stead of just single files.

@yuvipanda
Copy link
Collaborator Author

I guess this would require the ipfs binary to be present. I do think that most users of repo2docker will not have a daemon running locally, so falling back to gateways is quite important. ?format=car sounds like a good option - do you know if it is actually being implemented right now? I can't quite tell from that issue what the status of that is.

@d70-t
Copy link

d70-t commented Nov 18, 2021

So I think it is being developed, but its more like a refactoring step which requires quite a bit of coordination. it is scheduled for the next next release.
But the functionality seems to be there already, you can use:

  • http(s)://<gateway>/api/v0/dag/export?arg=<CID> for CAR export (since v.0.10, the current one)
  • http(s)://<gateway>/api/v0/get?arg=<CID> for TAR export

E.g. you could use

curl -v -L "https://ipfs.io/api/v0/get?arg=QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG" > quickstart.tar

to obtain the quickstart folder from the tutorial using the public gateway at ipfs.io.

Likewise you'd obtain the same on your local gateway using:

curl -v -L "http://127.0.0.1:8080/api/v0/get?arg=QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG" > quickstart.tar

@d70-t
Copy link

d70-t commented Nov 25, 2021

I'm just thinking about which kinds of benefits an IPFS content provider would also have.
One thing which might be interesting would be to use the CID of the binder or .binder folder as a cache key for built docker images. That way, it might be possible to automatically built images less often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants