`ipfs get` doesn't automatically restart the download of a CID #7301

vrde · 2020-05-10T17:26:51Z

Version information:

go-ipfs version: 0.5.1
Repo version: 9
System version: amd64/linux
Golang version: go1.13.10

Description:

ipfs get gets stuck if the other peer reconnects to the network.

In this example node_0 is my laptop (behind NAT), node_1 is my VPS with a public ip.

node_0: add a file and get the CID.
node_1: run ipfs get <CID>.
...wait for the download to start...
node_0: internet goes down.
node_0: internet goes up again.
node_1 download is now stuck.

Note that node_0 is the only one that can serve CID.

The text was updated successfully, but these errors were encountered:

rpodgorny · 2020-05-22T13:53:36Z

maybe similar/dupe of #7211 ?

Stebalien · 2020-05-22T16:39:57Z

How long is the download stuck? Is it possible to dial node_0 from a new node? If you restart node_1, does it work? If you explicitly re-connect node_1 to node_0 does it work?

I'm wondering if node_1 has stale addresses (ports, in this case) for node_0 and keeps trying the wrong ones.

vrde · 2020-05-22T19:56:01Z

How long is the download stuck?

I don't remember exactly how much, but I think it was for a long time, like 10/20 minutes maybe. I can give it another shot of course.

Is it possible to dial node_0 from a new node?

Which command should I use for that?

If you restart node_1, does it work?

I remember I was able to get it work again by stopping the command (ctrl+c) and running it again.

If you explicitly re-connect node_1 to node_0 does it work?

Which command should I use for that?

Stebalien · 2020-05-22T22:29:32Z

Is it possible to dial node_0 from a new node?
Which command should I use for that?

ipfs swarm connect /p2p/$PEER_ID_OF_OTHER_NODE

I was trying to determine if your node was dialable. However, if stopping and re-starting the command fixes the problem, it probably is dialable and this isn't the problem.

I remember I was able to get it work again by stopping the command (ctrl+c) and running it again.

That points to an issue in bitswap itself. Could you run:

On node_1:

> ipfs bitswap wantlist
> ipfs swarm peers --streams | grep -A5 $node_0

And the following on node_0:

> ipfs bitswap wantlist -p $node_1
> ipfs swarm peers --streams | grep -A5 $node_1

That will tell me if:

What node_1 is asking for.
What node_0 thinks node_1 wants.
If the peers are connected and what streams (requests) they have open.

rpodgorny · 2020-05-26T16:09:35Z

i am not the original poster but i'm also hit by this. a little bit different situation - for me node_1 is on flaky wifi connection but the outcome is the same...

node_0: QmYLhBZsWL2MQuNXFfn6Wr5snTTq2ys121nQ9FQoZnMLYa
node_1: Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML

on node_1:

ipfs swarm peers --streams|grep -A5 QmYLhBZsWL2MQuNXFfn6Wr5snTTq2ys121nQ9FQoZnMLYa                                                              
/ip6/2a01:9420:9:301:e11e:2ca4:ea5c:5065/tcp/4001/p2p/QmYLhBZsWL2MQuNXFfn6Wr5snTTq2ys121nQ9FQoZnMLYa
  /ipfs/bitswap/1.2.0
  /ipfs/lan/kad/1.0.0

ipfs bitswap wantlist                                                                                                                           (+00:00:00) 2020-05-26 22:58:39
QmYhK5uZSrrxrVcpdZyaNTwNBUeXszy4DgWTDENypghMmm
QmcWQMg3MqRx1j5QXrq4JfyZ53NGMGt9M79meH7xUMdrke
QmdMojSFbivBaKVeGhdxvjAzhdWDAiDFq6fNKMH2XFQgWA
Qmf97jbWtmn2ZRAU6aXovYQZiMdCP5A9YFm9SH1igZM94E

on node_0:

ipfs swarm peers --streams|grep -A5 Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML                                                    (+00:00:00) 2020-05-26 17:56:43
/ip6/2002:2e24:2741:250::2/tcp/4001/p2p/Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML
  /ipfs/bitswap/1.2.0

ipfs bitswap wantlist -p Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML
<NOTHING>

sometimes i see this on node_0:

ipfs swarm peers --streams|grep -A5 Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML                                                    (+00:00:00) 2020-05-26 18:04:34
/ip6/2002:2e24:2741:250::2/tcp/4001/p2p/Qmcc6ZBqk9bMs85ZRL1h1gcowW4mFXsjuYBoVuxaYw1HML
  /ipfs/bitswap/1.2.0
  /ipfs/bitswap/1.2.0
  /ipfs/bitswap/1.2.0
  /ipfs/bitswap/1.2.0
  /ipfs/bitswap/1.2.0

...for me the situation is so bad that i have been unable to get anything from "bitswap wantlist -p" on node_0 even after restarting "ipfs get" and even restarting "ipfs daemon" on node_1 :-(

rpodgorny · 2020-05-26T16:11:35Z

also:

ipfs version --all                                                                                                                              (+00:00:00) 2020-05-26 23:11:00
go-ipfs version: 0.5.1-8431e2e87
Repo version: 9
System version: amd64/linux
Golang version: go1.14.2

rpodgorny · 2020-05-26T16:21:54Z

is there a way to force a "wantlist resend" on node_1 (to node_0)?

Stebalien · 2020-05-27T02:41:01Z

is there a way to force a "wantlist resend" on node_1 (to node_0)?

We send a full wantlist every 30s. However, we don't send every wantlist to every peer, only to peers that we think have the content. But when the download stalls, we should broadcast to all connected peers so something fishy is going on here.

/ipfs/bitswap/1.2.0
/ipfs/bitswap/1.2.0
/ipfs/bitswap/1.2.0
/ipfs/bitswap/1.2.0
/ipfs/bitswap/1.2.0

That can happen when sending/receiving blocks.

Could you try the new RC (ipfs-update install v0.6.0-rc1)? We had a potential run-away timeout that could maybe cause this issue. I think this is a new bug, but it would be good to confirm that.

@dirkmc thoughts?

rpodgorny · 2020-05-27T06:33:36Z

same behaviour with 0.6.0-rc1 (compiled from git). -> i can see some streams on both nodes but the wantlist does not seem to be getting transferred. :-(

rpodgorny · 2020-05-27T06:46:00Z

also, this may be somehow connected to a "provide" feature/bug - currently on node_1's wantlist: QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU

node_0:

ipfs cat QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU | md5sum
bbfe0cc17f6a01caf8434c858e0caa34  -

...so the block is there. but:

on both node_0 and node_1:

ipfs dht findprovs QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU
<NOTHING>

after almost two minutes.

so it's almost like node_0 is not properly announcing its data and combined with "we don't send every wantlist to every peer, only to peers that we think have the content" this may lead to this download-being-stuck situation.

rpodgorny · 2020-05-27T06:57:04Z

after a forcefull:

ipfs dht provide QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU

on node_0 i now correctly see:

ipfs dht findprovs QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU
QmYLhBZsWL2MQuNXFfn6Wr5snTTq2ys121nQ9FQoZnMLYa

on both node_0 and node_1 but even-thou there's still connection between the nodes, QmVQ9tTPJeG2uWgjky4UWt4uubqRJeUpKGCG3qkYdM5BUU still stays in node_1's wantlist and not appearing on node_0's wantlist listing from node_1. :-(

...even after running the findprovs. i'd expect that manually finding the provider on node_1 (which returns node_0 and considering we have connection to that node) would immediatelly solve the situation.

Stebalien · 2020-05-27T17:21:40Z

so it's almost like node_0 is not properly announcing its data and combined with "we don't send every wantlist to every peer, only to peers that we think have the content" this may lead to this download-being-stuck situation.

Sorry, we don't send the wantlist to all peers until we timeout. Then, we broadcast to all peers. That's why this is really strange.

rpodgorny · 2020-05-28T09:49:33Z

is there anything i can do to further investigate the issue?

dirkmc · 2020-05-28T17:24:43Z

@rpodgorny & @vrde thanks for taking the time to follow up on this issue, it really helps us a lot to receive these bug reports.

I've been trying to reproduce locally with some unit tests inside the go-bitswap project but I haven't been able to so far. It may help understand where the problem is if you output some debug logging while performing the ipfs get, on both the requesting node and the responding node.

To output debug level logging in the daemon window for a particular sub-system run the ipfs log level command from another window, eg:
ipfs log level bs:sess debug

The subsystems that may output useful information:

bs:sess
Bitswap sessions: The session keeps track of all the outgoing requests for a particular file. The debug output should tell us when the session broadcasts a request (meaning it can't find a peer that has the blocks it wants) and when the session receives blocks that it had asked for, eg
2020-05-28T13:00:53.812-0400 DEBUG bs:sess session/session.go:218 Bitswap <- block {"local": "13bZk1tR4", "from": "13a5CQiao", "cid": "QmRFfdpSr7hvPYUNzPcKgvtQok95XGGnUKopTBy7LPa5FN", "session": 1}
bs:sprmgr
Bitswap session peer manager: Tells us when a peer is added to / removed from the session
engine
The Bitswap "engine": The component of Bitswap that receives incoming requests for blocks. It should tell us if someone asks us for blocks, and output whether we have those blocks.
bitswap
General bitswap log output: This is quite verbose so if you turn it on you may need to grep through it for things you are looking for. The most useful lines here are probably the ones from messagequeue.go. They tell us the want messages that get sent from the requesting node, eg
2020-05-28T13:00:53.811-0400 DEBUG bitswap messagequeue/messagequeue.go:638 sent message {"type": "WANT_BLOCK", "cid": "QmNTPrxjHgastHFGJa3eJ8q6dqVaM4tJ3bFuVGhCz1E114", "local": "13bZk1tR4", "to": "13a5CQiao"}
bitswap_network
Bitswap networking operations: Very verbose

RubenKelevra · 2020-05-29T22:20:46Z

@rpodgorny wrote:

maybe similar/dupe of #7211 ?

I don't think so. 7211 is more about the loss of DHT functionality after a disconnect, where it isn't properly restored/detected.

I'm currently checking it for the master, and it seems to be gone. :)

That's why I haven't updated the ticket anymore for some time.

vrde added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ipfs get` doesn't automatically restart the download of a CID #7301

`ipfs get` doesn't automatically restart the download of a CID #7301

vrde commented May 10, 2020

rpodgorny commented May 22, 2020

Stebalien commented May 22, 2020

vrde commented May 22, 2020

Stebalien commented May 22, 2020

rpodgorny commented May 26, 2020

rpodgorny commented May 26, 2020

rpodgorny commented May 26, 2020

Stebalien commented May 27, 2020

rpodgorny commented May 27, 2020

rpodgorny commented May 27, 2020

rpodgorny commented May 27, 2020 •

edited

Stebalien commented May 27, 2020

rpodgorny commented May 28, 2020

dirkmc commented May 28, 2020

RubenKelevra commented May 29, 2020

ipfs get doesn't automatically restart the download of a CID #7301

ipfs get doesn't automatically restart the download of a CID #7301

Comments

vrde commented May 10, 2020

Version information:

Description:

rpodgorny commented May 22, 2020

Stebalien commented May 22, 2020

vrde commented May 22, 2020

Stebalien commented May 22, 2020

rpodgorny commented May 26, 2020

rpodgorny commented May 26, 2020

rpodgorny commented May 26, 2020

Stebalien commented May 27, 2020

rpodgorny commented May 27, 2020

rpodgorny commented May 27, 2020

rpodgorny commented May 27, 2020 • edited

Stebalien commented May 27, 2020

rpodgorny commented May 28, 2020

dirkmc commented May 28, 2020

RubenKelevra commented May 29, 2020

`ipfs get` doesn't automatically restart the download of a CID #7301

`ipfs get` doesn't automatically restart the download of a CID #7301

rpodgorny commented May 27, 2020 •

edited