Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable outgoing ports for IPFS peering (part 2) #2082

Open
sheriflouis-FF opened this issue Nov 24, 2021 · 11 comments
Open

Enable outgoing ports for IPFS peering (part 2) #2082

sheriflouis-FF opened this issue Nov 24, 2021 · 11 comments

Comments

@sheriflouis-FF
Copy link

This is a continuation to #2069

I want to be able to use a remote IPFS pinning service like https://estuary.tech/.
estuary.tech's IPFS services listen on ports: TCP, 6744, 6745 and UDP 6746

Estuary primary node:
`
ListenAddrs: []string{
"/ip4/0.0.0.0/tcp/6744",
},

Estuary shuttle node: ListenAddrs: []string{
"/ip4/0.0.0.0/tcp/6745",
"/ip4/0.0.0.0/udp/6746/quic",
},
`

cc. @yuvipanda

@welcome
Copy link

welcome bot commented Nov 24, 2021

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@betatim
Copy link
Member

betatim commented Nov 26, 2021

Can you explain a bit more about why you need to access this in order to fetch files from IPFS?

I missed the conversation in #2069 so asking my question now: I think fetching data from IPFS is fine, however we want to prevent mybinder.org from becoming a source of traffic. I think the latter would only happen if it would become possible to run IPFS nodes on mybinder.org that end up "seeding" (BitTorrent terminology but I don't know the right IPFS jargon) files.

Even for sample data that you want to use (that is what #2069 sounds like) my question would be "why not upload a (sample dataset) to Zenodo if your normal hosts are too unreliable?" I can see the attraction of IPFS, it is a cool idea but on mybinder.org we try to only allow things that are really, really necessary because it becomes tricky to say no to things once you start allowing a lot of things.

Hence the question "Why can't you upload your sample data to a reliable hoster (of which Zenodo is one) to which we already give people access?" Having a good answer to that question helps justify why IPFS is allowed when other services aren't.

@minrk
Copy link
Member

minrk commented Nov 26, 2021

Yeah, I don't understand too much about IPFS, but binder definitely shouldn't ever be used to serve anything. getting IPFS data is appropriate, and I think it probably makes sense to support what's required for that, but it should only ever be a 'leech' on a p2p network.

@bollwyvl
Copy link

The pinning services like pinata, temporal, and eternum which I think this is about, would take over the peer-of-last-resort role, for a fee... but this is achieved by acting as a server long enough for the pinning service to download the whole DAG.

Anyhow: it seems like the threat of abuse rises substantially if a mybinder.org container can be used as an IPFS "seed" of a particular CID, even if the intent is for it to be used to bridge to a pinning service.

From a p2p perspective, there might be some very interesting things that can happen if we can get an in-browser extension working, as that would more equitably shift the burden to the swarm of browsers... and this might be sufficient to get a "pinnable" (if slow) solution. The use case might then be like:

  • we start a video call
    • i start a binder session
    • i do some stuff, and ipfs add it
      • i also add it to my browser's IPFS
        • my browser acts as the initial node, downloading the data (once) from the browser, and makes it available to the swarm
        • i can optionally use a pinning service
          • again, my browser is the initial node
      • i share the CID in chat
    • others (either on a binder, or on their home box) use that CID to do something, not knowing or caring how it's working

@sheriflouis-FF
Copy link
Author

It sounds like this got a bit out of scope.
Estuary is a hosted pinning service (and also backs-up content to Filecoin) as @bollwyvl suggested, but it is a free service.
This request is more meant for a mybinder instance to pull and push data from/to Estuary the same way it would if that was hosted in IPFS if #2069

@bollwyvl
Copy link

bit out of scope.

I'd argue we're right on scope, trying to determine the impact to security/abuse with respect to a relatively new networking paradigm. While p2p-in-the-cloud-chain sounds very cool, at the end of the day, somebody pays for every binder pod build and launch.

hosted pinning ... Filecoin... free service.

It does look like estuary, in specific, has an API to add arbitrary data, so perhaps that would work for your uploading needs. But of course, beware putting your creds into binder!

But even though we know nobody is going to be doing any FileCoin speculating on mybinder, "alpha" and "free-as-in-beer" and "IPFS" and "FileCoin" trigger alerts in my head.

the same way it would if that was hosted in IPFS

Right: the pinning API merely triggers the beginning of an ipfs pin by the remote agent... but for that to succeed, the requesting node has to exactly be an IPFS server, and that's what nobody is interested in offering for free.

@manics
Copy link
Member

manics commented Nov 29, 2021

@sheriflouis-FF For the benefit of people not familiar with IPFS I think it'd be helpful if you could explain:

  • What is Estuary/IPFS pinning/any other terms you've mentioned?
  • What is its relation to IPFS?
  • What significant benefits does this offer to users of mybinder.org?

As others have pointed out a lot of people abuse the free compute offered by mybinder.org, so unfortunately minor benefits aren't enough- there are 1000s of ports people would like us to open for a wide range of applications. It therefore needs to be a significant increase in utility that outweighs the additional abuse that is possible.

@sheriflouis-FF
Copy link
Author

sheriflouis-FF commented Nov 30, 2021

Hi @manics,
I apologize for not giving a lot of context as this was meant to be a continuation for #2069

What is Estuary?

Estuary is a hosted service by Protocol Labs that allows external users to store their data safely on two decentralized storage technologies: IPFS and Filecoin. Estuary in itself is an IPFS node. when you upload a file to Estuary it is stored on the Estuary nodes and published to the IPFS network. In the background the data is backed up to Filecoin.

IPFS pinning:

Is the method by which a user tells IPFS to not garbage collect data. When you write a file to IPFS you add it and it is subject to garbage collection if it does not get accessed for some period of time, or local storage space fills up. If you pin it, you tell the IPFS to never GC that data.

What significant benefits does this offer to users of mybinder.org?

Users who use IPFS to store their code, and datasets can benefit from storing that data on Estuary and retrieve it on mybinder and the other way round. The way to do this is to install IPFS on the mybinder instance. IPFS by default uses port 4001 while Estuary uses ports tcp: 6744,6745, udp: 6746

I hope this is sufficient information.

@yuvipanda
Copy link
Contributor

In some ways, this feels like we want to allow outbound HTTP but some particular set of servers use a non-standard port :( Is this a common occurance in the IPFS world?

@betatim
Copy link
Member

betatim commented Dec 3, 2021

Thanks for taking the time to explain to the newbies @sheriflouis-FF and to @manics for realising that this is something we need.

I think I now better understand what estuary is and why people use it. What I still don't quite understand is what (significant new) functionality is enabled by allowing people to use it. My (naive) understanding is that someone uploads data to IPFS from somewhere that isn't mybinder.org, by default this data could get garbage collected, so people use a service like estuary to prevent that from happening. What I don't quite understand yet is why people who just want to fetch data from IPFS need access to a service like estuary. It sounds like interacting with estuary is all about marking data (and paying for it?) to be stored "forever". This means as someone who just wants to consume data I don't need to care about estuary, I just talk to IPFS.

My conclusion right now is "there must be something I'm missing"

@sheriflouis-FF
Copy link
Author

@yuvipanda no this is not standard, it is something that the Estuary team decided on.
@betatim thanks for sharing your insights. I understand the confusion.
Estuary is a free public service. it is also an IPFS node, but it has the ability to make a copy on the Filecoin network which is a decentralized storage. In case content disappears from IPFS due to age or lack of access. That is why someone would want to write to Estuary, and also read from it, at least until we bridge IPFS and Filecoin where any IPFS node can retrieve CIDs from Filecoin. Estuary does not store data forever on its nodes, it will unpin data once they have been backed up to filecoin. A typical use case would be, I am a research whp has a large dataset on my IPFS nodes, and I want to take a backup of this dataset in case my nodes are filling up, and this data is old, because i do not want to delete my data ever. This way I can pin those CIDs to Estuary, and Estuary will back it up to filecoin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants