New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions after first learning about IPFS. #154

Closed
markg85 opened this Issue Aug 2, 2016 · 15 comments

Comments

Projects
None yet
8 participants
@markg85

markg85 commented Aug 2, 2016

Hi,

I just know about IPFS for about a day, everything seems really neat. Specifically the ability to mount an IPFS site just like any local mount. That's in my opinion a great feature!

Having seen some videos and read some stuff about it, i'm still left with a couple open questions. I'm sure most of them are asked already, but i apparently lack the search skills to find them :)

So here we go.

  1. If you visit an IPFS site, will you essentially become a peer for the information you requested?
    If IPFS follows P2P (which i think it does) then visiting an IPFS site makes you a peer. Next visitors could reach your computer and get the files required to view the site. That principle is fine by me. But i'm slightly puzzled about for example large files. Imagine i downloaded a docker image via IPFS, will another user download it from me then? Since if that's the case then i could be getting bandwidth issues since it would suck my bandwidth up completely.
  2. How would you make a site truly decentralized?
    If i look at ipfs.pics it's only partially decentralized. The storage of the pictures is fully P2P, but the MySQL database it needs is still very much centralized. How would one make a service like that, but fully decentralized? I'm guessing one would need a decentralized database for that as well, like https://github.com/rqlite/rqlite, but how would one install that in IPFS? But even if it somehow magically gets connected to IPFS, how would you make it secure? By secure i mean a database used by Site X should only be accessible by Site X. Or would this involve public/private key signing of the database which only Site X can unlock? If that is the case, then where would the private key be stored (which would be needed to unlock the database)? Lots of sub questions here :)
  3. The concept of IPNS, a bit fuzzy.
    Imagine this. A populair website or service, accessible by name (thus using IPNS), but as far as i understood it IPNS points to one hash. Basically a readable alias for a hash (or is that oversimplifying?). But what would happen if the hash it points to is for whatever reason not accessible? Would that make the site unavailable? Or does the name magically know (how?) which peers it has available to try and just send data back from another peer? For a user, the site would seem online.
  4. Is IPNS searchable?
    With "the regular web" we have Google (among dozens of other search engines) to find a site. Sites in IPFS are probably indexed just fine as well if there is some public link somewhere, but what about the other stuff that isn't linked? In "the regular web" there is the concept of the "deep web" which consists of sites that are online, but no links to exist thus search engines can't find them. With IPNS everything is essentially in the DHT so doesn't that make it possible to index every IPFS site? And does something like that exist? Or is there a (technical) reason why this isn't possible?
  5. Site "mirrors"?
    From what i understand, there currently is no concept of mirroring a site in IPFS, but i wonder if a "mirror" is the right term anyway. I've seen something about pointers to hashes where a new version of something would just change the pointer. So i guess the question is something like this. How would one mirror the site content and let the pointers know (or the DHT structure) that all/some data is available at some other locations (hash)? This would for instance be vital to big sites that mainly rely on having their data in multiple data centers as mirrors, as primary means of site hosting and load balancing. A secondary place would be the P2P structure where individual users host individual pieces of the site. But i don't think what i describe here is possible (yet?)? Or i'm wrong :)
  6. Getting notified about pointer updates.
    This is basically an extension of the question above. If you host something and it becomes outdated. For instance because a news article is posted or a new video or whatever your site is about. How would one - that wants to "mirror" your site - gets notified about a change? If IPFS has the concept of pointers, does it also have the concept of push notifications or publish/subscribe? I'd imagine it would just be a list of hashes (the ones subscribed) that need to be "notified" or "pinged" when the thing they point to isn't the latest version anymore. It's then op to the receiving side of that "ping" to act upon it.
  7. Real time collaborative editing in IPFS possible?
    The presentations about IPFS claim low latency and tons of benefits, i'm sure that's all true to some extend, but what about collaborative editing of documents? Is that possible? If so, how would it work? Is there a direct connection between all connected peers that want to edit something? Even then, how is the diff being made and how are conflicts resolved? On the other hand, this might be an application specific issue, not for IPFS to solve.

I keep it at those 7 questions for now :)

Cheers,
Mark

@whyrusleeping

This comment has been minimized.

Member

whyrusleeping commented Aug 2, 2016

Hey Mark, Thanks for the great questions, i'll try to answer them with as much detail as you asked them with :)

  1. Fetching content via ipfs temporarily caches it on your node. Currently, temporarily means "until a garbage collection is run" and garbage collection is at this time a manual process (you need to run ipfs repo gc). That means that yes, your node may be asked for content it has by other peers on the network. Bandwidth limitations are on our TODO list.
  2. Its actually quite easy to make a good number of websites 'truly decentralized'. Anything that doesnt require a backend server is trivially decentralizable. The difficult part (as you mention) comes when you need to have a database or some sort of backend logic. These scenarios are still possible with ipfs, but most of that work is still somewhat 'in progress'. For some examples, check out orbit and orbit-db. Orbit currently depends on a 'centralized' pubsub server, but a replacement system using only ipfs is in the works.
  3. IPNS is a system that uses PKI to create and verify name records. It maps a consistent public key hash to a changeable IPFS hash. The owner (or owners) of the keypair are able to change the value pointed to by this entry. If the hash pointed to is unavailable, the page would be unavailable.
  4. IPFS should be pretty searchable. I've thought about the problem a few times and think that the most interesting way to start indexing all of ipfs would be to put a number of nodes out on the network such that they cover a significant portion of the kademlia keyspace. In doing this, the nodes you control should be able to see provider records for the majority of content on the network as it is created and requested. Once you know the hashes of all the content, you can go through and request and index objects are you like. This is all pretty easy to do in a centralized way, but think about doing it in a completely distributed manner :)
  5. All you have to do to 'mirror' a site in ipfs is to visit it. If you really want to mirror it permanently then do an ipfs pin add on it and your node will not stop mirroring it even after a garbage collection is run.
  6. This is one of those harder problems we're working on. Take a look at some of the discussion here: ipfs/notes#148
  7. For collaborative editing type applications i'm going to refer you back to orbit. A 'google docs' clone should be doable with orbit-db pretty easily

Thanks!

@markg85

This comment has been minimized.

markg85 commented Aug 2, 2016

Hi @whyrusleeping,

All sounds clear to me, but i keep having questions about point 3.

The way you describe it sounds like there still is a single point of failure. Even if the site is backed by thousands of mirrors. The main entry point seems to be from name/certificate -> single hash. If that hash is offline, the site is offline. Sure, the owner can make it point to another hash, but that seems like a manual task that can (right?) be easily automated.

I might also be misunderstanding something..

Could you elaborate on this please? It sounds very interesting!

@whyrusleeping

This comment has been minimized.

Member

whyrusleeping commented Aug 2, 2016

For a hash (site) to be 'offline' all the peers that have mirrored it will also have to go offline.

@Ghoughpteighbteau

This comment has been minimized.

Ghoughpteighbteau commented Aug 2, 2016

@markg85

Generally speaking, you would think someone who is capable of resolving an IPNS name would also be hosting the IPFS content. I guess that's not technically necessary.

Also! Generally speaking it's very easy to rehost things on IPFS if you have the file. If I add a kitten gif, and you add the same kitten gif, it resolves to the same address.

and IPNS is pretty easy to automate, all you have to do is run this command:

~ $ ipfs name publish /ipfs/QmNwoE1vkQeEwY3dyDdK4uyaYpm2GYTUn68mqkf4kdvXcn
Published to QmRzYFGy9M5CyjEwNdh62udgBZV6BGbNZec8gMB9mXFhX6: /ipfs/QmNwoE1vkQeEwY3dyDdK4uyaYpm2GYTUn68mqkf4kdvXcn
~ $ ipfs name resolve QmRzYFGy9M5CyjEwNdh62udgBZV6BGbNZec8gMB9mXFhX6
/ipfs/QmNwoE1vkQeEwY3dyDdK4uyaYpm2GYTUn68mqkf4kdvXcn

https://ipfs.io/ipns/QmRzYFGy9M5CyjEwNdh62udgBZV6BGbNZec8gMB9mXFhX6

Also, I can now kill my daemon (so that I no longer answer IPNS requests) and that link will still be resolved. My understanding is that ipfs.io will be able to resolve the IPNS address for me, even though I'm not online, for another 24 hours.

Just for science sake I did this:

~ $ ipfs name publish -t 336h /ipfs/QmNwoE1vkQeEwY3dyDdK4uyaYpm2GYTUn68mqkf4kdvXcn
Published to QmRzYFGy9M5CyjEwNdh62udgBZV6BGbNZec8gMB9mXFhX6: /ipfs/QmNwoE1vkQeEwY3dyDdK4uyaYpm2GYTUn68mqkf4kdvXcn

That should mean other people can resolve the IPNS address for another 2 weeks? Wonder how long the gateway hangs on to content...

@whyrusleeping

This comment has been minimized.

Member

whyrusleeping commented Aug 2, 2016

@Ghoughpteighbteau Thats all mostly correct.

That should mean other people can resolve the IPNS address for another 2 weeks?

Close! That actually means that the record itself is valid for two weeks. The network will only hold onto records for at most 36 hours before they are discarded, but other nodes can put the same exact signed record out to the dht again for up to two weeks to keep the record resolveable.

@Ghoughpteighbteau

This comment has been minimized.

Ghoughpteighbteau commented Aug 3, 2016

hmmmm. In what circumstances will puts occur?

@Ghoughpteighbteau

This comment has been minimized.

Ghoughpteighbteau commented Aug 4, 2016

Still resolving two days later 👍

though, it took ~ 1 minute to resolve 👎

@Kubuxu

This comment has been minimized.

Member

Kubuxu commented Aug 4, 2016

It is because only very few nodes will have this record after two days.
Meaning you have to connect to many nodes in the key space bucket to find someone that has the record. DHT record republishing is hard to get right and as we plan replacing IPNS with IPRS (Record System) long lived record wasn't focus for us.

What I can say is that we plan improving both get and put performance of IPNS.

@jbenet

This comment has been minimized.

Member

jbenet commented Aug 4, 2016

IPNS resolution will get way better with pubsub improvements coming.
On Thu, Aug 4, 2016 at 15:10 Jakub Sztandera notifications@github.com
wrote:

It is because only very few nodes will have this record after two days.
Meaning you have to connect to many nodes in the key space bucket to find
someone that has the record. DHT record republishing is hard to get right
and as we plan replacing IPNS with IPRS (Record System) long lived record
wasn't focus for us.

What I can say is that we plan improving both get and put performance of
IPNS.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#154 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoTFVgKbtcWpjBUK81Uyu-TxCWhTpks5qcjkcgaJpZM4JabRe
.

@markg85

This comment has been minimized.

markg85 commented Aug 4, 2016

@jbenet is there any ETA for that?

@markg85

This comment has been minimized.

markg85 commented Aug 9, 2016

A few more question for site principles that i don't see possible, but i hope it is and that it's merely my lack of knowledge about ipfs that makes me think "it's impossible".

(i just continue with the numbering from the first post)

  1. Imagine a site that aggregates data (a feed parser would be a perfect example). You'd want to download and fetch all the feeds on "the server", combine all fetched items to sort them by date and present the data sorted by date to the user. In a "server - client" setup like the normal weg, this is easy. I don't see how that is possible in an ipfs setup (where there is no concept of a server, right?).
  2. This is a continuation of the question above. Imagine the case where you want to periodically run a task. For instance to fetch those feeds in the earlier example and pre-parse the results. How would such a thing (cronjobs basically) be done with ipfs?
@Ghoughpteighbteau

This comment has been minimized.

Ghoughpteighbteau commented Aug 10, 2016

Well, IPFS as a protocol doesn't let you think in a centralized way. Bitcoin does, that's why bitcoin was such a splash, because it managed to make decentralization act like a centralized system. It pulled off distributed consensus.

If you're planning to describe a service that downloads a bunch of feeds from different websites and distributes those feeds to it's clientele, then the question is: Who downloads those feeds?

  • In a centralized system you have just one server that every client connects to, the server obviously does all the work
  • In a distributed system you have multiple servers that clients connect to, the servers work together and distribute their results between eachother.
  • In a decentralized system (like IPFS) then the only people left to do the work are the clients. So the clients must make the requests to the feeds, and share their work with the network.

The problem then becomes: how do you trust the clients to do the job? They could lie, they're no longer your servers. They could lie and say: "Oh yah, 99% invisible totally released a whole bunch of Herbal Supplement pill ads. Totally. You should buy some."

The only way I can see to do this is to establish a network of trust. I download some feeds, you download some feeds, we share our data between eachother because we know we're not cheating. You add someone you trust, I add someone I trust, our network grows. That kinda thing. A system like this is pretty easy to describe with IPFS, though, the details can be a little complicated.

Regarding point 2: You could literally have cronjobs do it. It's pretty easy to interact with IPFS whether it be through an actual desktop application, or the browser. That said, it's always going to be on the users whim, no getting around that. You can't force people to work for the network.

@flyingzumwalt

This comment has been minimized.

Contributor

flyingzumwalt commented May 23, 2017

@ambermore

This comment has been minimized.

ambermore commented Sep 27, 2017

In IPFS does this possible to create authenticated resources?

@Kamelia2000

This comment has been minimized.

Kamelia2000 commented Feb 24, 2018

Anyone know any android and ios IPFS for realtime communication ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment