Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: bug: ClusterConfig called on closed or nonexistent connection #4170

Closed
eyeson opened this issue May 23, 2017 · 21 comments

Comments

Projects
None yet
10 participants
@eyeson
Copy link

commented May 23, 2017

I have had two different Syncthing instances crash on Windows 10. One is running v0.14.28, one v.0.14.29-rc1.
Both are using SyncTrayzor (V1.1.16).

I restarted one, only to have it crash ~5 mins later. I captured the Synctrazor log from that one and have attached it.

I will endeavour to provide any other information required.

Forum post for reference: https://forum.syncthing.net/t/syncthing-crashing/9845

SyncTrayzor.txt

@calmh

This comment has been minimized.

Copy link
Member

commented May 23, 2017

For clarity:

[5NSF2] 09:17:06 INFO: Failed to exchange Hello messages with K2ECF46 (95.97.92.221:22067): read tcp 192.168.1.50:61924->95.97.92.221:22067: i/o timeout 
entWatcher: Received 0 events 
panic: bug: ClusterConfig called on closed or nonexistent connection 
goroutine 4242 [running]: 
github.com/syncthing/syncthing/lib/model.(*Model).ClusterConfig(0xc042058280, 0xdbf211def3228856, 0xc92a31f0bf8e42e1, 0xd87e29914adce15e, 0x87221d62cccbc3d9, 0xc042631f00, 0x1, 0x1) 
	c:/jenkins/workspace/syncthing-release-windows/src/github.com/syncthing/syncthing/lib/model/model.go:789 +0x21c3 
github.com/syncthing/syncthing/lib/protocol.(*nativeModel).ClusterConfig(0xc042b91fe0, 0xdbf211def3228856, 0xc92a31f0bf8e42e1, 0xd87e29914adce15e, 0x87221d62cccbc3d9, 0xc042631f00, 0x1, 0x1) 
	<autogenerated>:74 +0x95 
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).readerLoop(0xc0457315c0, 0x0, 0x0) 
	c:/jenkins/workspace/syncthing-release-windows/src/github.com/syncthing/syncthing/lib/protocol/protocol.go:306 +0x11de 
created by github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start 
	c:/jenkins/workspace/syncthing-release-windows/src/github.com/syncthing/syncthing/lib/protocol/protocol.go:171 +0x4a 
goroutine 1 [chan receive, 70 minutes]: 
main.syncthingMain(0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc042057a40, 0x34, 0x0, 0x0, ...) 
	c:/jenkins/workspace/syncthing-release-windows/src/github.com/syncthing/syncthing/cmd/syncthing/main.go:954 +0x1c5a 
main.main() 
	c:/jenkins/workspace/syncthing-release-windows/src/github.com/syncthing/syncthing/cmd/syncthing/main.go:459 +0x541 
goroutine 20 [select, 71 minutes]: 
github.com/syncthing/syncthing/vendor/github.com/xtaci/kcp-go.(*updateHeap).updateTask(0x10d0f40) 

@calmh calmh changed the title Syncthing Panic panic: bug: ClusterConfig called on closed or nonexistent connection May 23, 2017

@calmh calmh added the bug label May 23, 2017

@calmh calmh added this to the Planned milestone May 23, 2017

@calmh

This comment has been minimized.

Copy link
Member

commented May 23, 2017

I think the connection failure at the top is relevant. Maybe this is something simple as two slow concurrent connections (i.e., one in each direction), one succeeding first and getting added, the other failing and removing the first from the model, then the first one continuing and exploding like this.

@calmh

This comment has been minimized.

Copy link
Member

commented May 23, 2017

Note same in #3633 which this is really a dup of, but I'm leaving it open just because

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented May 23, 2017

I think we should just remove the panic and ignore the message, no point in crashing?

@calmh

This comment has been minimized.

Copy link
Member

commented May 23, 2017

@calmh

This comment has been minimized.

Copy link
Member

commented May 23, 2017

@eyeson I don't suppose you can reproduce this somewhat reliably, or know the conditions leading up to it?

@eyeson

This comment has been minimized.

Copy link
Author

commented May 23, 2017

@calmh Sorry, unfortunately unable to reproduce. I had been doing nothing unusual prior to the crash - if it occurs again I will try and get more details

@Nnnes

This comment has been minimized.

Copy link

commented May 25, 2017

I had this crash for the first time just now. It happened around the same time some "unexpected" disconnections occurred (laptops moving out of WiFi range), but these disconnections normally happen dozens of times per day and have not given us trouble in the past.

They don't look too helpful, but here are the Synctrayzor logs from the half minute before the crash.

@scienmind

This comment has been minimized.

Copy link
Member

commented Nov 12, 2017

Happens to me as well, from time to time,
ST log: https://gist.github.com/anonymous/9ac8cb7329e8af3373efea3f79e51e8c

ST v.0.14.40

@calmh

This comment has been minimized.

Copy link
Member

commented Nov 12, 2017

@AudriusButkevicius

This comment has been minimized.

Copy link
Member

commented Nov 12, 2017

It's most likely us starting a connection, receiving a message, disconnecting, and then delivering the message on a separate routine that is still alive.

@scienmind

This comment has been minimized.

Copy link
Member

commented Nov 16, 2017

@xor-gate

This comment has been minimized.

Copy link
Member

commented Nov 19, 2017

I'm having the same https://gist.github.com/xor-gate/f5ba7db33e06da75f10848ca42452b82. Both have 0.14.40 (freebsd & osx if that is usefull).

@calmh

This comment has been minimized.

Copy link
Member

commented Nov 19, 2017

If anyone has a process for controlled reproduction that would be great.

@xor-gate

This comment has been minimized.

Copy link
Member

commented Nov 19, 2017

It only happened once for me (today), and never happened before (using syncthing two years or so). I'm not sure how it was produced. I will keep an eye on it, and will report back when it is reproducible.

@optix2000

This comment has been minimized.

Copy link

commented Jan 6, 2018

Had this happen to me today, when I was setting up syncthing for the first time. Nodes connected briefly and then disconnected and stayed disconnected, until one of them crashed and restarted.

Both running 0.14.42.
Non-crashing node running on Win7. Crashing node running on Ubuntu 16.04.

Logs seem pretty interesting as the connection appears to get get rebuilt a huge amount of times.
Non-crashing node:
https://gist.github.com/optix2000/6bb6d438492bc004fa448c258158fd64

Crashing node:
https://gist.github.com/optix2000/b30939397a6221858f94765c32e42e80

@calmh calmh modified the milestones: Planned, Unplanned (Contributions Welcome) Jan 31, 2018

@calmh calmh removed this from the Unplanned (Contributions Welcome) milestone Feb 11, 2018

@andrewufrank

This comment has been minimized.

Copy link

commented Nov 18, 2018

I have this now a lot, several times a day, but cannot produce it at will. If you tell me what log you would like I can perhaps produce it.
The configuration is 4 central nodes for syncing (all connecte to all) and relatively slow (armhf odroid - hash performance about 50 M/s) and then faster (intel) boxes connected. to one of the central nodes have 500 GB, the connected outer nodes between 20 and 200 GB synced.
what logs would you need?

@bendlas

This comment has been minimized.

Copy link

commented Feb 26, 2019

I just ran into this on my laptop https://hastebin.com/qisihisewu.log

Running syncthing-0.14.54

@imsodin

This comment has been minimized.

Copy link
Member

commented Mar 23, 2019

I'd be interested in reports of this with >=v1.0.1, as #5440 might have an influence on this.

And it looks like this is a duplicate of #3906, but more info is in this issue, so I'd close #3906 (if it really is a dupe).

imsodin added a commit to imsodin/syncthing that referenced this issue Apr 16, 2019

@imsodin

This comment has been minimized.

Copy link
Member

commented Apr 16, 2019

Another report: https://forum.syncthing.net/t/cant-add/13173
And from it's panic and one panic in this test (checked for confirmation), it looks like the issue identified in #5645 is indeed the culprit:

[...] the rawConnection causing the panic only has a readerLoop running, writerLoop doesn’t exist anymore -> “race” on closing the connection.

As #5645 does fix this as a side-effect, I opened a separate PR to fix this issue, as it's much more relevant than the main purpose of #5645, which is now closed and will be opened again once the new one is merged: #5657

imsodin added a commit to imsodin/syncthing that referenced this issue Apr 21, 2019

@calmh calmh added this to the v1.1.3 milestone Apr 29, 2019

imsodin added a commit to imsodin/syncthing that referenced this issue May 2, 2019

imsodin added a commit to imsodin/syncthing that referenced this issue May 3, 2019

@calmh calmh removed this from the v1.1.3 milestone May 8, 2019

@calmh

This comment has been minimized.

Copy link
Member

commented May 8, 2019

Reopen because revert

@calmh calmh reopened this May 8, 2019

@calmh calmh modified the milestone: v1.1.3 May 8, 2019

imsodin added a commit to imsodin/syncthing that referenced this issue May 25, 2019

@calmh calmh added this to the v1.2.0 milestone May 27, 2019

calmh added a commit to calmh/syncthing that referenced this issue May 29, 2019

Merge branch 'master' into crashrep
* master:
  lib/config: Refactor migrations a bit (syncthing#5750)
  lib/connections: Add QUIC protocol support (fixes syncthing#5377) (syncthing#5737)
  github: Create SECURITY.md (syncthing#5749)
  gui, man, authors: Update docs, translations, and contributors
  golang-ci: Disable confused scopelint check
  lib/protocol: Don't send anything else before cluster config (syncthing#5741)
  lib/model: Unflake folder restart with blocking conn test (ref syncthing#5707) (syncthing#5744)
  lib/fs, lib/model: Add error channel to Watch to avoid panics (fixes syncthing#5697) (syncthing#5734)
  lib/protocol: Don't call receiver after calling Closed (fixes syncthing#4170) (syncthing#5742)
  lib/model: Readd special handling of conn close in TestIssue5063 (syncthing#5743)
  lib/model: Close connections when model is stopped (syncthing#5733)
  lib/protocol: Test for Close on blocking send deadlock (ref syncthing#5442) (syncthing#5732)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.