-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content Resolution And Gateway Performance #6383
Comments
Maybe the suggestion in https://discuss.ipfs.io/t/proposal-peer-hint-uri-scheme/4649 should be considered, i.e. giving users an option to specify the location of file if the file is not popular yet, so as to make DHT resolution much faster.
It can solve the issue of #6382 , as the user @voxsoftware knows where the file located exactly. |
There is this pinning service called Pinata https://pinata.cloud/documentation#AddHashToPinQueue which allows me to specify upto 5 addresses where my content is already available. Maybe they have a partial solution to this problem already? I have not worked on protocol levels so maybe I am completely wrong here. But I messaged the team in their slack in case that helps. |
That just punts the problem. IPFS is supposed to be a decentralized, content-addressed network. I agree we should consider adding, e.g., a header to gateway requests that says "please connect to this specific peer when trying to find this content" but that doesn't really fix the underlying issue. Basically, we won't consider this fixed until a user can |
@anshbansal Thank you for the reply. Yeah, specifying "multiaddresses" where the file already exists when pinning file in pinata is very useful. But it is used for the purpose that help Pinata find your content faster, which is not general enough. It will be better if this feature can be added to IPFS. |
@jasonzhouu fixed that. My point is that the suggestion doesn't fix the case where an ordinary user adds a file to their IPFS node and then tries to view it on the gateway. |
@Stebalien Yeah, it's true. If we can specify "multiaddress for the node that already has the content" when request file from IPFS gateway, the gateway can fetch it faster, like what pinata does. |
I can confirm this issue. Testing with 3 nodes:
I'm unable to access my hash on ipfs.io or ipfs.infura.io hours after. The hash is QmNNCcCF4ZRyuHutumcP9GSAgPXbzjjx1m4uddLwNsoAFg (this is public domain content) |
That would help make things faster. A slight variation could be to add an option to ipfs companion extension that allows us to specify the value of the header for the requests sent to the ipfs gateway. Or perhaps an API in the local IPFS daemon that the browser extension can use to see if the hash is present and add my address to the header automatically instead of the routing option currently present? This would also be helpful for the gateways if this is standardised as the searching for content would be easier in that case which might make operating gateways cheaper due to the reduced bandwidth costs. "we won't consider this fixed" sounds great. Ultimately having it at the protocol level is the perfect solution.
It does fix it to some extent. If the browser extension adds the header to automatically include my address and disable the routing to local node then it should make it faster as the gateway knows where the hash is present. It makes it easier for the first impressions to be better. If I can tell the gateway that my content is on this IP then the gateway can possibly cache it. That would make it easier for anyone using that gateway to see my content much faster. Now, if there was a standard way for the gateways to have links between each other (some trackers?) the content discovery should be faster between all gateways. It is not a perfect solution as you mentioned that "IPFS is supposed to be a decentralized, content-addressed network". What we are doing is allowing central authorities to be able to serve content easier. I guess It does not help the de-centralized part. |
I am thinking of some kind of "reputation/trus algorithm" relevant to #6097 ? |
@Stebalien What's the "status" of libp2p/go-libp2p#188 ? Is there anything blocking us from going ahead with it now that we pion has full fledged WebRTC support ? |
@jacobheun Assigning this to myself as this mostly deals with NATs/connectivity. |
I don't know bug @jacobheun probably does. |
As I've stated in #5541, at ipfs-search.com we've been structurally seeing around 40-60% timeout rates for fresh hashes. Part of this is due to us not (yet) using streaming listing - but it seems big part does seem to relate to go-ipfs (still) not being able to find content (although, for a while, after switching to 0.5.0 we suddenly had a much better rate - I'm still trying to figure out why that was - but we need time to gather reliable data). |
Circling back on this we'll be starting with QUIC hole punching work and then expanding from there. There are several things we need to do in libp2p to make this work well. Once we have other things in place like direct connection upgrading and quic hole punching, we'll start looking at webrtc to expand our ability to connect to nodes not using quic.
Finding content query times and success rates on the network were drastically improved in 0.5 and later, being able to retrieve that content is the next step we're working on solving which should create another large boost to getting content. |
Some additional debugging tools would be useful I think. I have two nodes running 0.7.0, and both are directly connectable (i.e. they are either not behind a NAT, or they are behind a NAT with port forwarding). Running |
@eminence Direct peering should provide a workaround for such issues https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#peering |
I experienced significant increase in content resolution and gateway performance using the tcp reverse proxy provided by ngrok. After exposing my laptop's tcp://localhost:4001 through ngrok, the two gateways: https://ipfs.io/ipfs and https://gateway.pinata.cloud/ipfs/ found my locally hosted file within minutes after doing The method is detailed at https://gist.github.com/SomajitDey/25f2f7f2aae8ef722f77a7e9ea40cc7c |
Finding content on the IPFS network can slow at times. This meta-issue tracks the issue, the reasons behind it, and the work being done to fix it.
There are two main causes for not being able to find content on the IPFS network:
The DHT can be slow.Unreachable Peers
The first issue is being addressed with better NAT traversal:
AutoRelay (Enable AutoRelay by default #6290): Automatically use a relay if the IPFS node is behind a NAT.Slow Content Resolution on the DHT
This issue was addressed in go-ipfs 0.5.0.
The primary issue was that most of the nodes DHT were unreachable (behind NATs). That meant every DHT query spent a significant amount of time trying to contact unreachable peers. This situation was improved improved by preventing NATed nodes from joining the DHT (libp2p/go-libp2p-kad-dht#216, libp2p/go-libp2p-kad-dht#330). At the moment, a large portion of the DHT is still behind NATs because our solution relies on nodes upgrading to go-ipfs 0.5.0, but the situation will improve as more and more nodes upgrade.
The second issue was that the previous DHT implementation doesn't correctly implement the Kademlia protocol (libp2p/go-libp2p-kad-dht#291) and instead continued querying the DHT past the point where it should have stopped. This makes queries took even longer than they should.
Both of these issues have been addressed in go-ipfs 0.5.0, please update.
Slow Content Publishing on the DHT
If you're adding data to go-ipfs and it takes a while to "show up", this might be because your node hasn't yet advertised the content. At the moment, advertising content is a slow sequential process where each block advertised can take many seconds.
There's an experimental "accellerated DHT" feature that aims to address this issue. However, this feature is experimental for a reason and will likely not be stabilized in it's current form (it's performs a resource intensive operation in the background to maintain a global view of the entire DHT).
If this feature is successful, we hope to ship a lighter-weight version in the future.
Unreliable DHT
Many of the nodes in the DHT are ephemeral so the DHT forgets information over time. While provider records are republished every 12 hours and published to multiple (20) peers, network churn (nodes joining/leaving) may still cause the network to forget these values.
A future release will fix this by implementing libp2p/go-libp2p-kad-dht#323.
The gateway is a shared resource
The gateway is a shared resource used by many parties. It's not designed to be a reliable service for building high-load web services. If you need such a gateway, we recommend that you run one yourself or pay an "IPFS pinning" service to host your content.
The text was updated successfully, but these errors were encountered: