Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test killed after 10min on travis with docker mongo #120

Closed
dvic opened this issue Mar 6, 2018 · 14 comments
Closed

test killed after 10min on travis with docker mongo #120

dvic opened this issue Mar 6, 2018 · 14 comments
Assignees
Labels

Comments

@dvic
Copy link

dvic commented Mar 6, 2018

Hi,

Every once in a while our mongo suite gets killed on TravicCI. We run go 1.10 and use docker for our test suites. Our Postgres and Neo4j test suites run just fine with this setup but with mgo and Mongo we're having these issues.

Stacktrace information can be found below. Any idea why this is happening?

+go test -v -race -coverprofile=coverage.out -covermode=atomic ./...
=== RUN   TestMongoSuiteWithoutCredentials
2018/03/04 13:50:01 CREATING NEW POOL
2018/03/04 13:50:01 POOL CREATED <nil>
2018/03/04 13:50:01 RUNNING MONGO CONTAINER
2018/03/04 13:50:11 MONGO CONTAINER CREATED <nil>
2018/03/04 13:50:11 BEFORE testConnect
2018/03/04 13:50:11 START DialWithTimeout
2018/03/04 13:50:11 MONGO URL = mongodb://localhost:32768
SIGQUIT: quit
PC=0x474643 m=0 sigcode=0

goroutine 31 [syscall]:
runtime.notetsleepg(0x12be9e0, 0x37e09133e, 0x16)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/lock_futex.go:227 +0x42 fp=0xc420052760 sp=0xc420052730 pc=0x422022
runtime.timerproc(0x12be9c0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:261 +0x2f9 fp=0xc4200527d8 sp=0xc420052760 pc=0x461889
runtime.goexit()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200527e0 sp=0xc4200527d8 pc=0x472bd1
created by runtime.(*timersBucket).addtimerLocked
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:160 +0x107

goroutine 1 [chan receive]:
testing.(*T).Run(0xc42021c000, 0xd2d7a8, 0x20, 0xd41b80, 0xc4201e5c00)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:825 +0x597
testing.runTests.func1(0xc42021c000)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:1063 +0xa5
testing.tRunner(0xc42021c000, 0xc4201e5d48)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:777 +0x16e
testing.runTests(0xc4201378e0, 0x127b3e0, 0x1, 0x1, 0xc420160800)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:1061 +0x4e2
testing.(*M).Run(0xc420160800, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:978 +0x2ce
main.main()
	_testmain.go:90 +0x325

goroutine 19 [syscall]:
os/signal.signal_recv(0x472bd1)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/os/signal/signal_unix.go:22 +0x30
created by os/signal.init.0
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/os/signal/signal_unix.go:28 +0x4f

goroutine 20 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000000)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x0, 0xc420240a01, 0x6fc23ac00, 0x6fc23ac00, 0x0, 0x0, 0x0, 0x1000, 0x1c5b320, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc4202409c0, 0xb9e201, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc42017bc20, 0xc2ee40, 0xda60b0, 0x0, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc4202409c0, 0xc2ee40, 0xda60b0, 0x0, 0x0, 0xcf84e0, 0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*Session).Ping(0xc4202409c0, 0xc42023a6c0, 0x6fc23ac00)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2299 +0x5d
github.com/globalsign/mgo.DialWithInfo(0xc4202c0000, 0x17, 0xc4202c0000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:563 +0x566
github.com/globalsign/mgo.DialWithTimeout(0xc420026d20, 0x17, 0x6fc23ac00, 0x0, 0xc420167780, 0xc4200b0120)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:304 +0xc3
mongo_test.(*suite).testConnect(0xc42017bf48, 0xc42021c0f0)
	/home/travis/build/qdentity/qdentity/go/src/mongo/mongo_test.go:36 +0xc8
mongo_test.TestMongoSuiteWithoutCredentials(0xc42021c0f0)
	/home/travis/build/qdentity/qdentity/go/src/mongo/mongo_test.go:22 +0x187
testing.tRunner(0xc42021c0f0, 0xd41b80)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:777 +0x16e
created by testing.(*T).Run
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:824 +0x565

goroutine 30 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000001)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x1, 0xc420240b01, 0x2540be400, 0x2540be400, 0x0, 0x0, 0x0, 0x1000, 0xc420082700, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc420240b60, 0xc5f001, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc4200779b8, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc420240b60, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x1)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*mongoCluster).isMaster(0xc42023a6c0, 0xc4202c20f0, 0xc420232630, 0xc4202c20f0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:182 +0x258
github.com/globalsign/mgo.(*mongoCluster).syncServer(0xc42023a6c0, 0xc4202c00e0, 0xd, 0xc42001ed20, 0xc4202c00e0, 0xc42023a6c0, 0xc440000000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:231 +0x434
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1(0xc420292060, 0xc420026d2a, 0xd, 0xc420292070, 0xc420026d00, 0xc4202867b0, 0xc42023a6c0, 0xc4202867e0, 0xc420286810, 0x0, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:553 +0x1fb
created by github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:525 +0x175

goroutine 11 [semacquire]:
sync.runtime_Semacquire(0xc42029206c)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420292060)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/waitgroup.go:129 +0xb3
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration(0xc42023a6c0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:582 +0x4c5
github.com/globalsign/mgo.(*mongoCluster).syncServersLoop(0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:390 +0x17c
created by github.com/globalsign/mgo.newCluster
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:81 +0x2e3

goroutine 12 [sleep]:
time.Sleep(0x37e11d600)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:102 +0x146
github.com/globalsign/mgo.(*mongoServer).pinger(0xc4202c00e0, 0x479801)
	/home/travis/gopath/src/github.com/globalsign/mgo/server.go:314 +0x7ad
created by github.com/globalsign/mgo.newServer
	/home/travis/gopath/src/github.com/globalsign/mgo/server.go:89 +0x24b

goroutine 34 [IO wait]:
internal/poll.runtime_pollWait(0x7f50f3494f00, 0x72, 0x128aff0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc420234e18, 0x72, 0xda9f00, 0x128aff0, 0xffffffffffffffff)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_poll_runtime.go:85 +0xe5
internal/poll.(*pollDesc).waitRead(0xc420234e18, 0xc420028800, 0x24, 0x24)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_poll_runtime.go:90 +0x4b
internal/poll.(*FD).Read(0xc420234e00, 0xc420028840, 0x24, 0x24, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_unix.go:157 +0x22a
net.(*netFD).Read(0xc420234e00, 0xc420028840, 0x24, 0x24, 0x4ab9ed, 0xc420234e00, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/net/fd_unix.go:202 +0x66
net.(*conn).Read(0xc42000e0c8, 0xc420028840, 0x24, 0x24, 0x0, 0xc4202c24b0, 0xc420062dc0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/net/net.go:176 +0x85
github.com/globalsign/mgo.fill(0xdb3660, 0xc42000e0c8, 0xc420028840, 0x24, 0x24, 0x0, 0x11)
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:567 +0x64
github.com/globalsign/mgo.(*mongoSocket).readLoop(0xc4202c24b0)
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:583 +0x15b
created by github.com/globalsign/mgo.newSocket
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:197 +0x341

rax    0xfffffffffffffffc
rbx    0x12bb3a0
rcx    0x474643
rdx    0x0
rdi    0x12be9e0
rsi    0x0
rbp    0xc4200526e8
rsp    0xc420052698
r8     0x0
r9     0x0
r10    0xc4200526d8
r11    0x202
r12    0xc420079c80
r13    0x12bb3a0
r14    0xc420001500
r15    0x1a354620
rip    0x474643
rflags 0x202
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL	mongo	600.006s
@dvic
Copy link
Author

dvic commented Mar 6, 2018

Could it be a problem with the -race flag? We removed the -race flag and up to this point the tests have stopped failing.

@KJTsanaktsidis
Copy link

KJTsanaktsidis commented Mar 9, 2018

I got a similar (but not identical) deadlock & backtrace when running TestConnectCloseConcurrency. I think the main source of this problem are these two stacks:

goroutine 30 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000001)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x1, 0xc420240b01, 0x2540be400, 0x2540be400, 0x0, 0x0, 0x0, 0x1000, 0xc420082700, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc420240b60, 0xc5f001, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc4200779b8, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc420240b60, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x1)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*mongoCluster).isMaster(0xc42023a6c0, 0xc4202c20f0, 0xc420232630, 0xc4202c20f0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:182 +0x258
github.com/globalsign/mgo.(*mongoCluster).syncServer(0xc42023a6c0, 0xc4202c00e0, 0xd, 0xc42001ed20, 0xc4202c00e0, 0xc42023a6c0, 0xc440000000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:231 +0x434
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1(0xc420292060, 0xc420026d2a, 0xd, 0xc420292070, 0xc420026d00, 0xc4202867b0, 0xc42023a6c0, 0xc4202867e0, 0xc420286810, 0x0, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:553 +0x1fb
created by github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:525 +0x175

and

goroutine 11 [semacquire]:
sync.runtime_Semacquire(0xc42029206c)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420292060)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/waitgroup.go:129 +0xb3
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration(0xc42023a6c0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:582 +0x4c5
github.com/globalsign/mgo.(*mongoCluster).syncServersLoop(0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:390 +0x17c
created by github.com/globalsign/mgo.newCluster
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:81 +0x2e3

As near as I can tell....

  • Goroutine 11 has the syncServersLoop, which loops every few hundred ms and checks the topology of the cluster.
  • syncServersLoop calls syncServersIteration to do its actual work on every pump of the loop
  • syncServersIteration spawns a new goroutine 30 and blocks goroutine 11 waiting for 30 on a sync.waitGroup
  • The anonymous function in syncServersIteration calls cluster.syncServer() to probe it and add it to the cluster.masters and cluster.servers slices.
  • cluster.syncServer explicitly opens a socket to this particular server with a call to server.AcquireSocket (as opposed to opening a socket to any server in the cluster)
  • cluster.syncServer calls server.isMaster() with this socket, to ask if the server is a replset master
  • isMaster creates a new session and explicitly assigns the passed-in socket to it. It prepares a command and then attempts to execute it with session.Run
  • This eventually falls in to Database.Run(), which calls session.acquireSocket()
  • acquireSocket() should be a no-op, since the isMaster call a few frames above explicitly set s.setSocket. However, it apparently fails the checks that s.masterSocket != nil && s.masterSocket.dead == nil or s.slaveSocket != nil && s.slaveSocket.dead == nil && s.slaveOk && slaveOk && (s.masterSocket == nil || s.consistency != PrimaryPreferred && s.consistency != Monotonic), and thus falls into s.cluster().AcquireSocket(). THIS is I believe the bug; the code higher up the stack is trying to call isMaster on a particular server, but this is going to get a connection to any arbitrary server matching the tags.
  • AcquireSocket looks for a server in its understanding of the topology by checking cluster.masters.Len() and cluster.servers.Len(). However, the cluster discovery hasn't actually run yet - syncServersIteration (further up our call stack in this goroutine) is supposed to populate those collections with a call to cluster.addServer(), but it needs to finish its call to syncServer/isMaster first.
  • Since the cluster topology isn't populated yet, AcquireSocket attempts to poke the syncServers loop on goroutine 11 by calling cluster.syncServers which just writes to a channel. This is actually a total no-op because both sides of the channel are read/written to nonblocking and the data is just a signal, but this is a different bug and not the actual issue.
  • AcquireSocket then waits on the condition variable cluster.serverSynced.Wait().
  • BUT, that condition variable is broadcast from three places:
    • syncServersLoop, which is not iterating at the moment because goroutine 11 is blocked on the waitgroup in syncServersIteration
    • addServer and syncServer, both of which are only called from syncServersIteration, which we are blocking on goroutine 30
  • Thus, we have a deadlock.

phew. That was fun.

I'm pretty sure the bug is that isMaster is using session.setSocket to ensure that the command with Run is run against the right server, but if something is wrong with the socket, instead of passing an error up to isMaster, Run calls acquireSocket which just attempts to make a new socket to any random server in the cluster. The deadlock is not a code path that should ever be made to work, I think.

Thoughts?

@domodwyer
Copy link

domodwyer commented Mar 9, 2018

Hi @dvic and @KJTsanaktsidis

First off - @dvic thanks for the solid report, and @KJTsanaktsidis thanks for diving deeper into mgo than is good for your sanity!

We'll take a look at this - we've never seen any deadlocks ourselves but the possibility is definitely there - there's an amazing amount of interplay with the locks (as @KJTsanaktsidis can clearly attest!) Do either of you have any reproducing code we can look at?

Dom

@KJTsanaktsidis
Copy link

I’ll have a look and see if I can find a solid reproduction next week - maybe a “mongo” server that accepts then closes all connections might trigger this code path?

KJTsanaktsidis added a commit to zendesk/mgo that referenced this issue Mar 10, 2018
We've seen a deadlock happen occasionally where syncServers needs to
acquire a socket to call isMaster, but the socket acquisition needs to
know the server topology which isn't known yet. See globalsign#120
issue for a detailed breakdown.

This replicates the issue by setting up a mongo "server" which closes
sockets as soon as they're opened; about 20% of the time, this will
trigger the deadlock because the acquired socket for ismaster() dies and
needs to be reacquired.
KJTsanaktsidis added a commit to zendesk/mgo that referenced this issue Mar 10, 2018
As discussed in the issue globalsign#120, isMaster() can cause a
deadlock with the topology scanner if the connection it makes dies
before running the command; mgo automagically attempts to make another
socket in acquireSocket, but this can't work without topology.

This commit forces isMaster() to actually run on the intended socket.
@KJTsanaktsidis
Copy link

@domodwyer I think I've managed to provide a repro in #121 - the test in the first commit fails about 20% of the time when i run it with go test -check.v -check.f "S.TestNoDeadlockOnClose" -timeout 25s on my machine.

@domodwyer domodwyer added bug and removed needs info labels Mar 12, 2018
@domodwyer
Copy link

Hi @dvic

We're going to merge #121 into development ASAP (thanks to @KJTsanaktsidis !) and cut a hotfix to master once it's tested. In the meantime would you be able to run your tests using the development mgo branch to check if it resolves this issue?

Dom

@dvic
Copy link
Author

dvic commented Mar 12, 2018

Hi @domodwyer, sure no problem. Thanks! Will try it now and get back to you.

@domodwyer
Copy link

Hey @dvic

It's not merged just yet - I'll post here when it's done 👍

Dom

@dvic
Copy link
Author

dvic commented Mar 12, 2018

No problem, for now I just used https://github.com/zendesk/mgo/tree/fix_dial_deadlock directly, TravisCI is running.. 🤞

@dvic
Copy link
Author

dvic commented Mar 12, 2018

Good news: I ran the test suite three times now, each passed without problems 👍 I'll keep them running just to be sure and I can also run it a few times on the dev branch once you're ready.

@dvic
Copy link
Author

dvic commented Mar 12, 2018

@domodwyer Tests keep passing, #121 definitely seems to solve the problem (for me at least). Let me know if you want me to perform additional test runs on the dev branch.

@domodwyer
Copy link

This is great news - thanks @dvic for reporting and @KJTsanaktsidis for such a comprehensive analysis and fix! Open source communities are alive and well! 👍

I will close this after the hotfix - thanks a lot!

Dom

KJTsanaktsidis added a commit to zendesk/mgo that referenced this issue Mar 12, 2018
We've seen a deadlock happen occasionally where syncServers needs to
acquire a socket to call isMaster, but the socket acquisition needs to
know the server topology which isn't known yet. See globalsign#120
issue for a detailed breakdown.

This replicates the issue by setting up a mongo "server" which closes
sockets as soon as they're opened; about 20% of the time, this will
trigger the deadlock because the acquired socket for ismaster() dies and
needs to be reacquired.
KJTsanaktsidis added a commit to zendesk/mgo that referenced this issue Mar 12, 2018
As discussed in the issue globalsign#120, isMaster() can cause a
deadlock with the topology scanner if the connection it makes dies
before running the command; mgo automagically attempts to make another
socket in acquireSocket, but this can't work without topology.

This commit forces isMaster() to actually run on the intended socket.
@KJTsanaktsidis
Copy link

Really happy to help - having this library be actively maintained helps everyone!

szank added a commit that referenced this issue Mar 22, 2018
@domodwyer
Copy link

Hi @dvic, @KJTsanaktsidis

Sorry for disappearing, I was out the country! It looks like this has been fixed (thanks!) but with a direct push to development so this didn't close (I'll also find out how that happened - it should be PR only) so closing now.

I will cut a hotfix release after a test run - thanks again!

Dom

domodwyer added a commit that referenced this issue Apr 2, 2018
For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.
domodwyer added a commit that referenced this issue Apr 3, 2018
* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis
domodwyer pushed a commit that referenced this issue Apr 12, 2018
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver
domodwyer pushed a commit that referenced this issue May 9, 2018
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver

* added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0)

* fix IsNil on slices and maps

format

* added godoc

* fix sasl empty payload

* fix scram-sha-1 auth

* revert fix sasl empty payload
domodwyer pushed a commit that referenced this issue Jun 5, 2018
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* Allow passing slice pointer as an interface pointer to Iter.All

* Reverted to original error message, added test case for interface{} ptr
domodwyer pushed a commit that referenced this issue Jun 8, 2018
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* findAndModify support writeConcern

* fix
domodwyer added a commit that referenced this issue Jun 15, 2018
* allow ptr in inline structs

* inline pointer_to_struce mode: update comments. return error on pointer not to struct

* fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛

I've added a use for os.Kill, instead of os.Interrupt signal, when using
Windows. I'm current developing my project on Windows, and using
DBServer.Stop() was resulting in: "timeout waiting for mongod process to
die". After investigating, I've discovered that os.Interrupt isn't
implemented on Windows, and it seems golang has Frozen this issue due to
age (2013). They instruct to use os.Kill instead. Using this, the
DBServer on my project works with no problem.

* Respect nil slices, maps in bson encoder (#147)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver

* added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0)

* fix IsNil on slices and maps

format

* added godoc

* fix sasl empty payload

* fix scram-sha-1 auth

* revert fix sasl empty payload

* Separate read/write network timeouts (#161)

* socket: separate read/write network timeouts

Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout
and WriteTimeout to address #160. Read/write timeout defaults to
DialInfo.Timeout to preserve existing behaviour.

* cluster: remove AcquireSocket

Only used by tests, replaced by the pool-aware acquire socket functions:
	* AcquireSocketWithPoolTimeout
	* AcquireSocketWithBlocking

* cluster: use configured timeouts for cluster operations

* `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds
* `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds

* tests: use DialInfo for internal timeouts

* server: fix fantastic serverTags nil slice bug

When unmarshalling serverTags, it is now an empty slice, instead of a nil slice.

`len(thing) == 0` works all the time, regardless.

* cluster: remove unused duplicate pool config

* session: avoid calculating default values in hot path

Changes `DialWithInfo` to handle setting default values by setting the relevant
`DialInfo` field, rather than calling the respective methods in the hot path for:

	* `PoolLimit`
	* `ReadTimeout`
	* `WriteTimeout`

* session: remove unused consts

* session: update docs

* add URI options: "w", "j", "wtimeoutMS" (#162)

* add URI options: "w", "j", "wtimeoutMS"

* change "w" to "j"

* Add Collation support for calling Count() on a Query (#166)

* Expand documentation for *Iter.Next (#163)

The documentation now explains the difference between calling Err and
Close after Next returns false.

The example code has been expanded to include checking for timeout.

* add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (#171)

code is inspired by go-mgo#202

* MGO-156 Avoid iter.Next deadlock on dead sockets (#182)

* Allow passing slice pointer as an interface pointer to Iter.All (#181)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* Allow passing slice pointer as an interface pointer to Iter.All

* Reverted to original error message, added test case for interface{} ptr

* Contributing:findAndModify support writeConcern (#185)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* findAndModify support writeConcern

* fix

* readme: credit everyone (#187)

* @cedric-cordenier
* @DaytonG
* @ddspog
* @gedge
* @jefferickson
* @larrycinnabar
* @Mei-Zhao
* @roobre

* revert: MGO-156 Avoid iter.Next deadlock on dead sockets (#182) (#188)

This reverts commit 7253b2b.

* Add support for ssl dial string (#184)

* Add support for ssl dial string

* Ensure we dont override user settings

* update examples

* update ssl value parsing

* PingSsl test

* skip test requiring system certificates

* readme: credit @tbruyelle (#190)
domodwyer pushed a commit that referenced this issue Jul 4, 2018
* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* Release/r2018.06.15 (#191)

* allow ptr in inline structs

* inline pointer_to_struce mode: update comments. return error on pointer not to struct

* fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛

I've added a use for os.Kill, instead of os.Interrupt signal, when using
Windows. I'm current developing my project on Windows, and using
DBServer.Stop() was resulting in: "timeout waiting for mongod process to
die". After investigating, I've discovered that os.Interrupt isn't
implemented on Windows, and it seems golang has Frozen this issue due to
age (2013). They instruct to use os.Kill instead. Using this, the
DBServer on my project works with no problem.

* Respect nil slices, maps in bson encoder (#147)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver

* added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0)

* fix IsNil on slices and maps

format

* added godoc

* fix sasl empty payload

* fix scram-sha-1 auth

* revert fix sasl empty payload

* Separate read/write network timeouts (#161)

* socket: separate read/write network timeouts

Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout
and WriteTimeout to address #160. Read/write timeout defaults to
DialInfo.Timeout to preserve existing behaviour.

* cluster: remove AcquireSocket

Only used by tests, replaced by the pool-aware acquire socket functions:
	* AcquireSocketWithPoolTimeout
	* AcquireSocketWithBlocking

* cluster: use configured timeouts for cluster operations

* `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds
* `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds

* tests: use DialInfo for internal timeouts

* server: fix fantastic serverTags nil slice bug

When unmarshalling serverTags, it is now an empty slice, instead of a nil slice.

`len(thing) == 0` works all the time, regardless.

* cluster: remove unused duplicate pool config

* session: avoid calculating default values in hot path

Changes `DialWithInfo` to handle setting default values by setting the relevant
`DialInfo` field, rather than calling the respective methods in the hot path for:

	* `PoolLimit`
	* `ReadTimeout`
	* `WriteTimeout`

* session: remove unused consts

* session: update docs

* add URI options: "w", "j", "wtimeoutMS" (#162)

* add URI options: "w", "j", "wtimeoutMS"

* change "w" to "j"

* Add Collation support for calling Count() on a Query (#166)

* Expand documentation for *Iter.Next (#163)

The documentation now explains the difference between calling Err and
Close after Next returns false.

The example code has been expanded to include checking for timeout.

* add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (#171)

code is inspired by go-mgo#202

* MGO-156 Avoid iter.Next deadlock on dead sockets (#182)

* Allow passing slice pointer as an interface pointer to Iter.All (#181)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* Allow passing slice pointer as an interface pointer to Iter.All

* Reverted to original error message, added test case for interface{} ptr

* Contributing:findAndModify support writeConcern (#185)

* socket: only send client metadata once per socket (#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (#111)

* Brings in a patch on having flusher not suppress errors. (#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes #101 and fixes #103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (#110)

* Hotfix #120  (#136)

* cluster: fix deadlock in cluster synchronisation (#120)

For a impressively thorough breakdown of the problem, see:
	#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* findAndModify support writeConcern

* fix

* readme: credit everyone (#187)

* @cedric-cordenier
* @DaytonG
* @ddspog
* @gedge
* @jefferickson
* @larrycinnabar
* @Mei-Zhao
* @roobre

* revert: MGO-156 Avoid iter.Next deadlock on dead sockets (#182) (#188)

This reverts commit 7253b2b.

* Add support for ssl dial string (#184)

* Add support for ssl dial string

* Ensure we dont override user settings

* update examples

* update ssl value parsing

* PingSsl test

* skip test requiring system certificates

* readme: credit @tbruyelle (#190)

* strip space of flag
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
We've seen a deadlock happen occasionally where syncServers needs to
acquire a socket to call isMaster, but the socket acquisition needs to
know the server topology which isn't known yet. See globalsign#120
issue for a detailed breakdown.

This replicates the issue by setting up a mongo "server" which closes
sockets as soon as they're opened; about 20% of the time, this will
trigger the deadlock because the acquired socket for ismaster() dies and
needs to be reacquired.
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
As discussed in the issue globalsign#120, isMaster() can cause a
deadlock with the topology scanner if the connection it makes dies
before running the command; mgo automagically attempts to make another
socket in acquireSocket, but this can't work without topology.

This commit forces isMaster() to actually run on the intended socket.
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
* cluster: fix deadlock in cluster synchronisation (globalsign#120)

For a impressively thorough breakdown of the problem, see:
	globalsign#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
* socket: only send client metadata once per socket (globalsign#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (globalsign#111)

* Brings in a patch on having flusher not suppress errors. (globalsign#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (globalsign#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (globalsign#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (globalsign#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (globalsign#110)

* Hotfix globalsign#120  (globalsign#136)

* cluster: fix deadlock in cluster synchronisation (globalsign#120)

For a impressively thorough breakdown of the problem, see:
	globalsign#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver
libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022
* allow ptr in inline structs

* inline pointer_to_struce mode: update comments. return error on pointer not to struct

* fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛

I've added a use for os.Kill, instead of os.Interrupt signal, when using
Windows. I'm current developing my project on Windows, and using
DBServer.Stop() was resulting in: "timeout waiting for mongod process to
die". After investigating, I've discovered that os.Interrupt isn't
implemented on Windows, and it seems golang has Frozen this issue due to
age (2013). They instruct to use os.Kill instead. Using this, the
DBServer on my project works with no problem.

* Respect nil slices, maps in bson encoder (globalsign#147)

* socket: only send client metadata once per socket (globalsign#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (globalsign#111)

* Brings in a patch on having flusher not suppress errors. (globalsign#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (globalsign#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (globalsign#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (globalsign#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (globalsign#110)

* Hotfix globalsign#120  (globalsign#136)

* cluster: fix deadlock in cluster synchronisation (globalsign#120)

For a impressively thorough breakdown of the problem, see:
	globalsign#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* added support for marshalling/unmarshalling maps with non-string keys

* refactor method receiver

* added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0)

* fix IsNil on slices and maps

format

* added godoc

* fix sasl empty payload

* fix scram-sha-1 auth

* revert fix sasl empty payload

* Separate read/write network timeouts (globalsign#161)

* socket: separate read/write network timeouts

Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout
and WriteTimeout to address globalsign#160. Read/write timeout defaults to
DialInfo.Timeout to preserve existing behaviour.

* cluster: remove AcquireSocket

Only used by tests, replaced by the pool-aware acquire socket functions:
	* AcquireSocketWithPoolTimeout
	* AcquireSocketWithBlocking

* cluster: use configured timeouts for cluster operations

* `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds
* `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds

* tests: use DialInfo for internal timeouts

* server: fix fantastic serverTags nil slice bug

When unmarshalling serverTags, it is now an empty slice, instead of a nil slice.

`len(thing) == 0` works all the time, regardless.

* cluster: remove unused duplicate pool config

* session: avoid calculating default values in hot path

Changes `DialWithInfo` to handle setting default values by setting the relevant
`DialInfo` field, rather than calling the respective methods in the hot path for:

	* `PoolLimit`
	* `ReadTimeout`
	* `WriteTimeout`

* session: remove unused consts

* session: update docs

* add URI options: "w", "j", "wtimeoutMS" (globalsign#162)

* add URI options: "w", "j", "wtimeoutMS"

* change "w" to "j"

* Add Collation support for calling Count() on a Query (globalsign#166)

* Expand documentation for *Iter.Next (globalsign#163)

The documentation now explains the difference between calling Err and
Close after Next returns false.

The example code has been expanded to include checking for timeout.

* add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (globalsign#171)

code is inspired by go-mgo#202

* MGO-156 Avoid iter.Next deadlock on dead sockets (globalsign#182)

* Allow passing slice pointer as an interface pointer to Iter.All (globalsign#181)

* socket: only send client metadata once per socket (globalsign#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (globalsign#111)

* Brings in a patch on having flusher not suppress errors. (globalsign#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (globalsign#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (globalsign#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (globalsign#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (globalsign#110)

* Hotfix globalsign#120  (globalsign#136)

* cluster: fix deadlock in cluster synchronisation (globalsign#120)

For a impressively thorough breakdown of the problem, see:
	globalsign#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* Allow passing slice pointer as an interface pointer to Iter.All

* Reverted to original error message, added test case for interface{} ptr

* Contributing:findAndModify support writeConcern (globalsign#185)

* socket: only send client metadata once per socket (globalsign#105)

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Merge Development (globalsign#111)

* Brings in a patch on having flusher not suppress errors. (globalsign#81)

go-mgo#360

* Fallback to JSON tags when BSON tag isn't present (globalsign#91)

* Fallback to JSON tags when BSON tag isn't present


Cleanup.

* Add test to demonstrate tagging fallback.

- Test coverage for tagging test.

* socket: only send client metadata once per socket

Periodic cluster synchronisation calls isMaster() which currently resends the
"client" metadata every call - the spec specifies:

	isMaster commands issued after the initial connection handshake MUST NOT
	contain handshake arguments

	https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake

This hotfix prevents subsequent isMaster calls from sending the client metadata
again - fixes globalsign#101 and fixes globalsign#103.

Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial
issue, opening tickets, and having the problem debugged with a PoC fix before I
even woke up.

* Cluster abended test 254 (globalsign#100)

* Add a test that mongo Server gets their abended reset as necessary.

See https://github.com/go-mgo/mgo/issues/254 and
https://github.com/go-mgo/mgo/pull/255/files

* Include the patch from Issue 255.

This brings in a test which fails without the patch, and passes with the
patch. Still to be tested, manual tcpkill of a socket.

* changeStream support (globalsign#97)

Add $changeStream support

* readme: credit @peterdeka and @steve-gray (globalsign#110)

* Hotfix globalsign#120  (globalsign#136)

* cluster: fix deadlock in cluster synchronisation (globalsign#120)

For a impressively thorough breakdown of the problem, see:
	globalsign#120 (comment)

Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

* readme: credit @dvic and @KJTsanaktsidis

* findAndModify support writeConcern

* fix

* readme: credit everyone (globalsign#187)

* @cedric-cordenier
* @DaytonG
* @ddspog
* @gedge
* @jefferickson
* @larrycinnabar
* @Mei-Zhao
* @roobre

* revert: MGO-156 Avoid iter.Next deadlock on dead sockets (globalsign#182) (globalsign#188)

This reverts commit 7253b2b.

* Add support for ssl dial string (globalsign#184)

* Add support for ssl dial string

* Ensure we dont override user settings

* update examples

* update ssl value parsing

* PingSsl test

* skip test requiring system certificates

* readme: credit @tbruyelle (globalsign#190)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants