Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducable connection closure #3855

Closed
timthelion opened this issue Apr 5, 2017 · 36 comments
Closed

Reproducable connection closure #3855

timthelion opened this issue Apr 5, 2017 · 36 comments
Labels
kind/bug A bug in existing code (including security flaws) not our bug status/in-progress In progress

Comments

@timthelion
Copy link
Contributor

Version information:

timothy@yoga ~/p/p/tg> ipfs version --all
go-ipfs version: 0.4.8-
Repo version: 5
System version: amd64/linux
Golang version: go1.8

Type:

High - The main functionality of the application does not work, API breakage, repo format breakage, etc.

Description:

When I try to do

ipfs add -r -H repeatr-bug/

I eventually get

added Qmca9HVWG7m7dAsjni1E3ECbprYdC7dfmen8BAQetmQ1Eo repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/syscall/time_nacl_amd64p32.s
added QmegffNu6ixCBfkDPrkLEfKNYEG45kGRDz79KMaFnienzL repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/syscall/time_nacl_arm.s
15:39:15.187 ERROR commands/h: read tcp 127.0.0.1:46779->127.0.0.1:5001: use of closed network connection client.go:247

After adding a ton of files.

Server side it looks like

timothy@yoga ~> ipfs daemon
Initializing daemon...
Swarm listening on /ip4/127.0.0.1/tcp/4001
Swarm listening on /ip4/172.17.0.1/tcp/4001
Swarm listening on /ip4/192.168.2.14/tcp/4001
Swarm listening on /ip4/91.219.246.128/tcp/4001
Swarm listening on /ip6/::1/tcp/4001
API server listening on /ip4/127.0.0.1/tcp/5001
Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Daemon is ready
15:39:15.188 ERROR commands/h: err: write tcp4 127.0.0.1:5001->127.0.0.1:46779: write: connection reset by peer handler.go:288

I origionally encountered this on 0.4.5 and then upgraded to see if the bug was fixed yet. But I didn't delete any of my ~/.ipfs files between them, so I might have some cruft there which is causing the hang up.

This is %100 repeatable for this file tree, but always happens on a different file. Here's the tail of the result of running the command again:

added QmbD7xY7S4oaTgVKSrMo45TZ5Ru2fZpbu5B7QJE9TjGoK5 repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/syscall/zerrors_darwin_arm.go
added QmYwnyydaLwru5J4Zwc7LjDr5ErntDx3PKLzGJ4VGj6hMt repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/syscall/zerrors_darwin_arm64.go
 535.93 MB / ? [---------------------------------------------=-------------------------------] 15:46:31.110 ERROR commands/h: read tcp 127.0.0.1:47352->127.0.0.1:5001: use of closed network connection client.go:247
Error: read tcp 127.0.0.1:47352->127.0.0.1:5001: use of closed network connection

And again.

added QmV97mMytRva5e83UrdcaRPSkgN7ML2zWjXjnctHn6qvLD repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/sort/example_test.go
added QmbZvCd7yvcMbhuGKX1Qo9CUw6NRSsMHwjU92Rf8e7vsyi repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/sort/example_wrapper_test.go
 534.63 MB / ? [------------------------------------------------=----------------------------] 15:47:27.005 ERROR commands/h: read tcp 127.0.0.1:47413->127.0.0.1:5001: use of closed network connection client.go:247
added QmSiPJu95yq1Yx4GANCkdhfCRjhHQHq7N1ayCpoZTdRxp5 repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/src/sort/export_test.go
Error: read tcp 127.0.0.1:47413->127.0.0.1:5001: use of closed network connection

So it doesn't seem to even be getting farther each time. Just totally randomly gives up.

@timthelion
Copy link
Contributor Author

You know, I'd send you the file tree, if ipfs would eat it :P

@whyrusleeping
Copy link
Member

@timthelion

You know, I'd send you the file tree, if ipfs would eat it :P

Heh, mind zipping it then sending it? Can you add things without the daemon running?

@timthelion
Copy link
Contributor Author

I feel like we're going to have a bug inception here, because I was trying to send someone a bug report via IPFS and IPFS crashed and now I'm sending you the file tree that crashed a program that crashed a program.

I have a feeling, that the output of tar is going to be helpfull in debugging this:

timothy@yoga ~/p/a/projects> tar ca repeatr-bug/ > reapeatr-bug.tar
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/662214126/var/log/apt/term.log: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/662214126/var/log/btmp: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/662214126/var/log/dmesg: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/662214126/var/log/fsck/checkfs: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/662214126/var/log/fsck/checkroot: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/325534798: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/006848114: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/55f715e28c46073d0e217e2ce8eb46b0b45e3db6: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/7ad95dd0798a40da1ccdff6dff35fd177b5edf40: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/ee210fc98cc7756aa0cf55d8d554148828e8e658: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/f1ac7f4f24f50328e6bc838ca4437d1612a0243c: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/14eb9c4951195779ecfbec34431a976de7335b0a: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/09cded8978dc9e80714c4d85b0322337b0a1e5e0: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/40e4aedc8fabf8c23e040057540867186712faa5: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/da8a52bbf73e77f567cfed2159818e7abbc7538d: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/bb849080854e7e4002087afba559ac105ee1fe96: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/ee766f965e621738006a8e00112dd30234da4cc9: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/ec7fdbb58eb3e300c8595ad5ac74a5aa50019cc7: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/5de9043ec1a39cc97edc838ef7236538a55b30a4: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/995f5b2e021c69b8b028ba6d0b05c1dd500783db: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/1e4dda7d6af7dc921d42612f9aa7e7d6fd80745c: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/f313887934ef7927b7373c604f8b095f1bc1966b: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/320cb470e3e2998b215a4b1744ce5afb7de3ba5d: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/3f41379a4bdd0906316978f4f99166fdbe960be4: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/a98ad7ee00ec53921f08832bc06ecf7fd600e6a1: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/ac4f572a19b62231a8285fa9e614c238986f3544: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/d27573fdeb3338b39a0fdeedb6c60cdf038dd687: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/47ff8dfbc528fea3003fc0ce2d88ffbbfbc46a43: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/6c89489cafabcbc76df9dbf84ebf07204673fecf: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/2baa8a1b9338cf13d9eeb27696d761155fa480be: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/dc6d2353af16e2a2b0ff6986af051d473a4ed468: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/6fdcab499d2c45e2939521c4047855f12a745a1d: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/6b12603ea6fd7f84b8c90118face09df39ea10c6: Cannot open: Permission denied
tar: repeatr-bug/var/lib/repeatr/io/git/nosub/18c6b7129324841d920ee8dc1ab66ec07054998f: Cannot open: Permission denied
tar: Exiting with failure status due to previous errors

I'm 20% sure that ipfs is choking on those same permissions errors, but then, why does it close the connection on a different file each time? Perhaps the server is trying to open those files in a separate thread, or some sort of sync issue between the client and server is causing this opaque client side behavior?

@timthelion
Copy link
Contributor Author

OK, my certainty level just raised to %97. Running the command without the daemon running gives me:

added QmQ5mhwZ6F3mdYcjTRJcTMGFq4UmktLecQphMbXwA7VZ2M repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/test/writebarrier.go
added QmZ2bfFv4e7RppT21oWzXCReSyiea6BFApzNMPvBJiakVQ repeatr-bug/var/lib/repeatr/io/dircacher/committed/vbl0TwPjBrjoph65IaWxOy-Yl0MZXtXEDKcxodzY0_-inUDq7rPVTEDvqugYpJAH/go/test/zerodivide.go
Error: open repeatr-bug/var/lib/repeatr/io/dircacher/stg/tar/006848114: permission denied

@whyrusleeping
Copy link
Member

@timthelion iiiinteresting. So the daemon output is just hiding the error messages for some reason. Try running with --progress=false

@timthelion
Copy link
Contributor Author

timthelion commented Apr 5, 2017 via email

@Kubuxu
Copy link
Member

Kubuxu commented Apr 17, 2017

Can you take a look at daemon log. The command code might be crashing or something.

@Kubuxu Kubuxu added kind/bug A bug in existing code (including security flaws) need/analysis Needs further analysis before proceeding labels Apr 17, 2017
@timthelion
Copy link
Contributor Author

The following commands will reproduce the bug:

$ ipfs get QmNYvYzYSfCN4SVyhRCePq4kvyG2AzigBZeqbJrsfcya3H
$ chmod -r QmNYvYzYSfCN4SVyhRCePq4kvyG2AzigBZeqbJrsfcya3H/pu/projects/subuser/Makefile
$ ipfs add -rwH QmNYvYzYSfCN4SVyhRCePq4kvyG2AzigBZeqbJrsfcya3H/

Note: this works with any suitably large directory.

@whyrusleeping
Copy link
Member

@timthelion thanks!
The issue here is that the process is erroring, and the error is likely getting overwritten or ignored due to how quickly messages are coming in

@timthelion
Copy link
Contributor Author

timthelion commented Apr 18, 2017 via email

@whyrusleeping
Copy link
Member

Ah, good data points there. I think that makes the issue pretty easy to track down now.

@whyrusleeping whyrusleeping added help wanted Seeking public contribution on this issue and removed need/analysis Needs further analysis before proceeding labels Apr 18, 2017
@whyrusleeping whyrusleeping added this to the Ipfs 0.4.9 milestone Apr 18, 2017
@kevina kevina self-assigned this Apr 25, 2017
@kevina kevina added status/in-progress In progress and removed help wanted Seeking public contribution on this issue labels Apr 26, 2017
@kevina
Copy link
Contributor

kevina commented Apr 26, 2017

Okay I am on it. I will see if I can track down what is going on and fix it in the next day or so.

@kevina
Copy link
Contributor

kevina commented Apr 28, 2017

So the problem is on the client side in MultiFileReader.Read. As far as I can tell it is doing the correct thing and returning an error. Here is a stack dump by using debug.PrintStack() right before it returns an error

err<1.1>: open large-dir/z476raw: permission denied
goroutine 40 [running]:
runtime/debug.Stack(0x41d639, 0x137dee8, 0xc400000000)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x79
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/ipfs/go-ipfs/commands/http.(*MultiFileReader).Read(0xc420230270, 0xc420254000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /home/kevina/gocode2/src/github.com/ipfs/go-ipfs/commands/http/multifilereader.go:72 +0x8ae
io/ioutil.(*nopCloser).Read(0xc420232130, 0xc420254000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        <autogenerated>:4 +0x6b
io.(*multiReader).Read(0xc42022cda0, 0xc420254000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/local/go/src/io/multi.go:20 +0xc4
io.copyBuffer(0x7f0fe1ee9240, 0xc420232300, 0x1271ee0, 0xc42022cda0, 0xc420254000, 0x8000, 0x8000, 0xc238e0, 0x0, 0x7f0fe1ee9240)
        /usr/local/go/src/io/io.go:390 +0x147
io.Copy(0x7f0fe1ee9240, 0xc420232300, 0x1271ee0, 0xc42022cda0, 0xc420232300, 0xc4202321f0, 0x6e2232)
        /usr/local/go/src/io/io.go:360 +0x68
net/http.(*transferWriter).WriteBody(0xc42017b180, 0x126ffe0, 0xc420224880, 0x2, 0x2)
        /usr/local/go/src/net/http/transfer.go:220 +0x158
net/http.(*Request).write(0xc42001a870, 0x126ffe0, 0xc420224880, 0x0, 0xc4201dfb90, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/http/request.go:565 +0x778
net/http.(*persistConn).writeLoop(0xc420146c00)
        /usr/local/go/src/net/http/transport.go:1649 +0x1ac
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1063 +0x50e

Notice how there is only one line in the trace that is not standard go code, so these could be a problem with the go libraries themself.

I need to get to bed but will look more into this tomorrow.

@whyrusleeping
Copy link
Member

I don't think its a problem with go, When that error gets returned, who does it get returned to? The stdlib? It should pass the error on to the http client, which i think is in commands/http/client.go

@kevina
Copy link
Contributor

kevina commented Apr 28, 2017

I pushed some code with some debug println in the branch kevina/3855-debug.

@whyrusleeping
Copy link
Member

golang/go@9d29be4 which landed in go1.9 looks like it was trying to fix this issue, but we're still experiencing it rebuilding everything using go1.9

@whyrusleeping
Copy link
Member

Hey @josharian, You did the CR on that commit. Any idea what might be going on here?

@kevinburke
Copy link

Hey there - Josh is on personal leave, and he asked me to help debug this. Can you help me figure out how to reproduce the issue 100% of the time? Can you trigger it in your test suite, for example?

@kevinburke
Copy link

In the meantime, I'll try to get the project installed/ test suite for this project up and running locally.

@kevinburke
Copy link

Ah, sorry, here looks good: #3855 (comment)

I tried to build ipfs using Go tip, but the Makefile told me I should be using Go 1.8 - safe to ignore that warning?

@whyrusleeping
Copy link
Member

@kevinburke Hey! Yeah, safe to ignore that warning.

@whyrusleeping
Copy link
Member

@kevinburke Let me work on a minimal reproduction of the issue, should just take a few minutes at this point.

@whyrusleeping
Copy link
Member

@kevinburke here: https://ipfs.io/ipfs/QmcPGyf9pzRZHF57AmTHwM9yMxrDeaptXrCvnjv273bHw5

Run the main.go as its own server, then run the client.go with an argument of 100 or so. You should see "my cool error". But if you run it with an argument of 1000000, you'll get a connection closed error instead.

@whyrusleeping
Copy link
Member

Hey @timthelion, good job, looks like you found a go bug :)

@Kubuxu
Copy link
Member

Kubuxu commented Nov 1, 2017

Update: The golang/go#21760 is tagged as release-blocker for Go 1.10.

@magik6k magik6k modified the milestones: Ipfs 0.4.11, Ipfs 0.4.12 Nov 2, 2017
@Kubuxu Kubuxu modified the milestones: Ipfs 0.4.12, go-ipfs 0.4.13 Nov 6, 2017
@iain17
Copy link

iain17 commented Dec 31, 2017

I'm having the same issue here. Super annoying. Sometimes it will actually connect, but 9.9/10 times it just won't connect and returns: read: connection reset by peer

I tried debugging it myself, or coming up with some kind of patch. Am I correct in saying the problem is somewhere in https://github.com/libp2p/go-libp2p-conn?

@paralin
Copy link

paralin commented Jan 4, 2018

I was seeing this when working on the libp2p codebase. Haven't seen it recently, but I think there might be a bug somewhere.

@whyrusleeping
Copy link
Member

@iain17 @paralin If youre getting connection issues from libp2p dials, then its a different issue. This one in particular is about the http bug mentioned above. If youre seeing something else, then please file a new issue so we can start gathering information about it.

@paralin
Copy link

paralin commented Jan 4, 2018

@whyrusleeping ah whoops, I didn't realize this wasn't p2p layer.

@Stebalien
Copy link
Member

I'm having the same issue here. Super annoying. Sometimes it will actually connect, but 9.9/10 times it just won't connect and returns:

It could be anywhere (unfortunately). Your best bet is to set the env variable IPFS_LOGLEVEL=debug and look at the debug logs.

@thinkhy
Copy link

thinkhy commented Apr 9, 2018

Any update for this bug, still hit it. IPFS version is 0.4.14.

@Stebalien
Copy link
Member

No, the underlying bug is in go and hasn't been fixed (golang/go#11745).

Note: This is actually just a bug in error reporting. That is, go reports one error when it should be reporting another. In this case, you likely don't have read permissions on one of the files you're trying to add to ipfs.

@Stebalien
Copy link
Member

Closing in favor of #6552 as I'm always confused when I run into this issue and forget to scroll to the bottom :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) not our bug status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

10 participants