test killed after 10min on travis with docker mongo #120

dvic · 2018-03-06T08:55:43Z

Hi,

Every once in a while our mongo suite gets killed on TravicCI. We run go 1.10 and use docker for our test suites. Our Postgres and Neo4j test suites run just fine with this setup but with mgo and Mongo we're having these issues.

Stacktrace information can be found below. Any idea why this is happening?

+go test -v -race -coverprofile=coverage.out -covermode=atomic ./...
=== RUN   TestMongoSuiteWithoutCredentials
2018/03/04 13:50:01 CREATING NEW POOL
2018/03/04 13:50:01 POOL CREATED <nil>
2018/03/04 13:50:01 RUNNING MONGO CONTAINER
2018/03/04 13:50:11 MONGO CONTAINER CREATED <nil>
2018/03/04 13:50:11 BEFORE testConnect
2018/03/04 13:50:11 START DialWithTimeout
2018/03/04 13:50:11 MONGO URL = mongodb://localhost:32768
SIGQUIT: quit
PC=0x474643 m=0 sigcode=0

goroutine 31 [syscall]:
runtime.notetsleepg(0x12be9e0, 0x37e09133e, 0x16)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/lock_futex.go:227 +0x42 fp=0xc420052760 sp=0xc420052730 pc=0x422022
runtime.timerproc(0x12be9c0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:261 +0x2f9 fp=0xc4200527d8 sp=0xc420052760 pc=0x461889
runtime.goexit()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4200527e0 sp=0xc4200527d8 pc=0x472bd1
created by runtime.(*timersBucket).addtimerLocked
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:160 +0x107

goroutine 1 [chan receive]:
testing.(*T).Run(0xc42021c000, 0xd2d7a8, 0x20, 0xd41b80, 0xc4201e5c00)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:825 +0x597
testing.runTests.func1(0xc42021c000)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:1063 +0xa5
testing.tRunner(0xc42021c000, 0xc4201e5d48)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:777 +0x16e
testing.runTests(0xc4201378e0, 0x127b3e0, 0x1, 0x1, 0xc420160800)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:1061 +0x4e2
testing.(*M).Run(0xc420160800, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:978 +0x2ce
main.main()
	_testmain.go:90 +0x325

goroutine 19 [syscall]:
os/signal.signal_recv(0x472bd1)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sigqueue.go:139 +0xa6
os/signal.loop()
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/os/signal/signal_unix.go:22 +0x30
created by os/signal.init.0
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/os/signal/signal_unix.go:28 +0x4f

goroutine 20 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000000)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x0, 0xc420240a01, 0x6fc23ac00, 0x6fc23ac00, 0x0, 0x0, 0x0, 0x1000, 0x1c5b320, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc4202409c0, 0xb9e201, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc42017bc20, 0xc2ee40, 0xda60b0, 0x0, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc4202409c0, 0xc2ee40, 0xda60b0, 0x0, 0x0, 0xcf84e0, 0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*Session).Ping(0xc4202409c0, 0xc42023a6c0, 0x6fc23ac00)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2299 +0x5d
github.com/globalsign/mgo.DialWithInfo(0xc4202c0000, 0x17, 0xc4202c0000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:563 +0x566
github.com/globalsign/mgo.DialWithTimeout(0xc420026d20, 0x17, 0x6fc23ac00, 0x0, 0xc420167780, 0xc4200b0120)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:304 +0xc3
mongo_test.(*suite).testConnect(0xc42017bf48, 0xc42021c0f0)
	/home/travis/build/qdentity/qdentity/go/src/mongo/mongo_test.go:36 +0xc8
mongo_test.TestMongoSuiteWithoutCredentials(0xc42021c0f0)
	/home/travis/build/qdentity/qdentity/go/src/mongo/mongo_test.go:22 +0x187
testing.tRunner(0xc42021c0f0, 0xd41b80)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:777 +0x16e
created by testing.(*T).Run
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/testing/testing.go:824 +0x565

goroutine 30 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000001)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x1, 0xc420240b01, 0x2540be400, 0x2540be400, 0x0, 0x0, 0x0, 0x1000, 0xc420082700, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc420240b60, 0xc5f001, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc4200779b8, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc420240b60, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x1)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*mongoCluster).isMaster(0xc42023a6c0, 0xc4202c20f0, 0xc420232630, 0xc4202c20f0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:182 +0x258
github.com/globalsign/mgo.(*mongoCluster).syncServer(0xc42023a6c0, 0xc4202c00e0, 0xd, 0xc42001ed20, 0xc4202c00e0, 0xc42023a6c0, 0xc440000000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:231 +0x434
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1(0xc420292060, 0xc420026d2a, 0xd, 0xc420292070, 0xc420026d00, 0xc4202867b0, 0xc42023a6c0, 0xc4202867e0, 0xc420286810, 0x0, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:553 +0x1fb
created by github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:525 +0x175

goroutine 11 [semacquire]:
sync.runtime_Semacquire(0xc42029206c)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420292060)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/waitgroup.go:129 +0xb3
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration(0xc42023a6c0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:582 +0x4c5
github.com/globalsign/mgo.(*mongoCluster).syncServersLoop(0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:390 +0x17c
created by github.com/globalsign/mgo.newCluster
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:81 +0x2e3

goroutine 12 [sleep]:
time.Sleep(0x37e11d600)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/time.go:102 +0x146
github.com/globalsign/mgo.(*mongoServer).pinger(0xc4202c00e0, 0x479801)
	/home/travis/gopath/src/github.com/globalsign/mgo/server.go:314 +0x7ad
created by github.com/globalsign/mgo.newServer
	/home/travis/gopath/src/github.com/globalsign/mgo/server.go:89 +0x24b

goroutine 34 [IO wait]:
internal/poll.runtime_pollWait(0x7f50f3494f00, 0x72, 0x128aff0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc420234e18, 0x72, 0xda9f00, 0x128aff0, 0xffffffffffffffff)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_poll_runtime.go:85 +0xe5
internal/poll.(*pollDesc).waitRead(0xc420234e18, 0xc420028800, 0x24, 0x24)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_poll_runtime.go:90 +0x4b
internal/poll.(*FD).Read(0xc420234e00, 0xc420028840, 0x24, 0x24, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/internal/poll/fd_unix.go:157 +0x22a
net.(*netFD).Read(0xc420234e00, 0xc420028840, 0x24, 0x24, 0x4ab9ed, 0xc420234e00, 0x0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/net/fd_unix.go:202 +0x66
net.(*conn).Read(0xc42000e0c8, 0xc420028840, 0x24, 0x24, 0x0, 0xc4202c24b0, 0xc420062dc0)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/net/net.go:176 +0x85
github.com/globalsign/mgo.fill(0xdb3660, 0xc42000e0c8, 0xc420028840, 0x24, 0x24, 0x0, 0x11)
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:567 +0x64
github.com/globalsign/mgo.(*mongoSocket).readLoop(0xc4202c24b0)
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:583 +0x15b
created by github.com/globalsign/mgo.newSocket
	/home/travis/gopath/src/github.com/globalsign/mgo/socket.go:197 +0x341

rax    0xfffffffffffffffc
rbx    0x12bb3a0
rcx    0x474643
rdx    0x0
rdi    0x12be9e0
rsi    0x0
rbp    0xc4200526e8
rsp    0xc420052698
r8     0x0
r9     0x0
r10    0xc4200526d8
r11    0x202
r12    0xc420079c80
r13    0x12bb3a0
r14    0xc420001500
r15    0x1a354620
rip    0x474643
rflags 0x202
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL	mongo	600.006s

The text was updated successfully, but these errors were encountered:

dvic · 2018-03-06T09:26:47Z

Could it be a problem with the -race flag? We removed the -race flag and up to this point the tests have stopped failing.

KJTsanaktsidis · 2018-03-09T03:07:59Z

I got a similar (but not identical) deadlock & backtrace when running TestConnectCloseConcurrency. I think the main source of this problem are these two stacks:

goroutine 30 [semacquire]:
sync.runtime_notifyListWait(0xc42023a6e8, 0xc400000001)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:510 +0x11a
sync.(*Cond).Wait(0xc42023a6d8)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/cond.go:56 +0x8e
github.com/globalsign/mgo.(*mongoCluster).AcquireSocket(0xc42023a6c0, 0x1, 0xc420240b01, 0x2540be400, 0x2540be400, 0x0, 0x0, 0x0, 0x1000, 0xc420082700, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:644 +0xff
github.com/globalsign/mgo.(*Session).acquireSocket(0xc420240b60, 0xc5f001, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:4853 +0x271
github.com/globalsign/mgo.(*Database).Run(0xc4200779b8, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:799 +0x5e
github.com/globalsign/mgo.(*Session).Run(0xc420240b60, 0xc5f0c0, 0xc42000d200, 0xc10ec0, 0xc420232630, 0x0, 0x1)
	/home/travis/gopath/src/github.com/globalsign/mgo/session.go:2270 +0xba
github.com/globalsign/mgo.(*mongoCluster).isMaster(0xc42023a6c0, 0xc4202c20f0, 0xc420232630, 0xc4202c20f0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:182 +0x258
github.com/globalsign/mgo.(*mongoCluster).syncServer(0xc42023a6c0, 0xc4202c00e0, 0xd, 0xc42001ed20, 0xc4202c00e0, 0xc42023a6c0, 0xc440000000, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:231 +0x434
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1(0xc420292060, 0xc420026d2a, 0xd, 0xc420292070, 0xc420026d00, 0xc4202867b0, 0xc42023a6c0, 0xc4202867e0, 0xc420286810, 0x0, ...)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:553 +0x1fb
created by github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:525 +0x175

and

goroutine 11 [semacquire]:
sync.runtime_Semacquire(0xc42029206c)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc420292060)
	/home/travis/.gimme/versions/go1.10.linux.amd64/src/sync/waitgroup.go:129 +0xb3
github.com/globalsign/mgo.(*mongoCluster).syncServersIteration(0xc42023a6c0, 0x0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:582 +0x4c5
github.com/globalsign/mgo.(*mongoCluster).syncServersLoop(0xc42023a6c0)
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:390 +0x17c
created by github.com/globalsign/mgo.newCluster
	/home/travis/gopath/src/github.com/globalsign/mgo/cluster.go:81 +0x2e3

As near as I can tell....

Goroutine 11 has the syncServersLoop, which loops every few hundred ms and checks the topology of the cluster.
syncServersLoop calls syncServersIteration to do its actual work on every pump of the loop
syncServersIteration spawns a new goroutine 30 and blocks goroutine 11 waiting for 30 on a sync.waitGroup
The anonymous function in syncServersIteration calls cluster.syncServer() to probe it and add it to the cluster.masters and cluster.servers slices.
cluster.syncServer explicitly opens a socket to this particular server with a call to server.AcquireSocket (as opposed to opening a socket to any server in the cluster)
cluster.syncServer calls server.isMaster() with this socket, to ask if the server is a replset master
isMaster creates a new session and explicitly assigns the passed-in socket to it. It prepares a command and then attempts to execute it with session.Run
This eventually falls in to Database.Run(), which calls session.acquireSocket()
acquireSocket() should be a no-op, since the isMaster call a few frames above explicitly set s.setSocket. However, it apparently fails the checks that s.masterSocket != nil && s.masterSocket.dead == nil or s.slaveSocket != nil && s.slaveSocket.dead == nil && s.slaveOk && slaveOk && (s.masterSocket == nil || s.consistency != PrimaryPreferred && s.consistency != Monotonic), and thus falls into s.cluster().AcquireSocket(). THIS is I believe the bug; the code higher up the stack is trying to call isMaster on a particular server, but this is going to get a connection to any arbitrary server matching the tags.
AcquireSocket looks for a server in its understanding of the topology by checking cluster.masters.Len() and cluster.servers.Len(). However, the cluster discovery hasn't actually run yet - syncServersIteration (further up our call stack in this goroutine) is supposed to populate those collections with a call to cluster.addServer(), but it needs to finish its call to syncServer/isMaster first.
Since the cluster topology isn't populated yet, AcquireSocket attempts to poke the syncServers loop on goroutine 11 by calling cluster.syncServers which just writes to a channel. This is actually a total no-op because both sides of the channel are read/written to nonblocking and the data is just a signal, but this is a different bug and not the actual issue.
AcquireSocket then waits on the condition variable cluster.serverSynced.Wait().
BUT, that condition variable is broadcast from three places:
- syncServersLoop, which is not iterating at the moment because goroutine 11 is blocked on the waitgroup in syncServersIteration
- addServer and syncServer, both of which are only called from syncServersIteration, which we are blocking on goroutine 30
Thus, we have a deadlock.

phew. That was fun.

I'm pretty sure the bug is that isMaster is using session.setSocket to ensure that the command with Run is run against the right server, but if something is wrong with the socket, instead of passing an error up to isMaster, Run calls acquireSocket which just attempts to make a new socket to any random server in the cluster. The deadlock is not a code path that should ever be made to work, I think.

Thoughts?

domodwyer · 2018-03-09T15:43:54Z

Hi @dvic and @KJTsanaktsidis

First off - @dvic thanks for the solid report, and @KJTsanaktsidis thanks for diving deeper into mgo than is good for your sanity!

We'll take a look at this - we've never seen any deadlocks ourselves but the possibility is definitely there - there's an amazing amount of interplay with the locks (as @KJTsanaktsidis can clearly attest!) Do either of you have any reproducing code we can look at?

Dom

KJTsanaktsidis · 2018-03-10T04:43:24Z

I’ll have a look and see if I can find a solid reproduction next week - maybe a “mongo” server that accepts then closes all connections might trigger this code path?

We've seen a deadlock happen occasionally where syncServers needs to acquire a socket to call isMaster, but the socket acquisition needs to know the server topology which isn't known yet. See globalsign#120 issue for a detailed breakdown. This replicates the issue by setting up a mongo "server" which closes sockets as soon as they're opened; about 20% of the time, this will trigger the deadlock because the acquired socket for ismaster() dies and needs to be reacquired.

As discussed in the issue globalsign#120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology. This commit forces isMaster() to actually run on the intended socket.

KJTsanaktsidis · 2018-03-10T08:03:03Z

@domodwyer I think I've managed to provide a repro in #121 - the test in the first commit fails about 20% of the time when i run it with go test -check.v -check.f "S.TestNoDeadlockOnClose" -timeout 25s on my machine.

domodwyer · 2018-03-12T12:07:26Z

Hi @dvic

We're going to merge #121 into development ASAP (thanks to @KJTsanaktsidis !) and cut a hotfix to master once it's tested. In the meantime would you be able to run your tests using the development mgo branch to check if it resolves this issue?

Dom

dvic · 2018-03-12T12:22:50Z

Hi @domodwyer, sure no problem. Thanks! Will try it now and get back to you.

domodwyer · 2018-03-12T12:29:37Z

Hey @dvic

It's not merged just yet - I'll post here when it's done 👍

Dom

dvic · 2018-03-12T12:31:50Z

No problem, for now I just used https://github.com/zendesk/mgo/tree/fix_dial_deadlock directly, TravisCI is running.. 🤞

dvic · 2018-03-12T12:50:10Z

Good news: I ran the test suite three times now, each passed without problems 👍 I'll keep them running just to be sure and I can also run it a few times on the dev branch once you're ready.

dvic · 2018-03-12T12:55:50Z

@domodwyer Tests keep passing, #121 definitely seems to solve the problem (for me at least). Let me know if you want me to perform additional test runs on the dev branch.

domodwyer · 2018-03-12T13:00:48Z

This is great news - thanks @dvic for reporting and @KJTsanaktsidis for such a comprehensive analysis and fix! Open source communities are alive and well! 👍

I will close this after the hotfix - thanks a lot!

Dom

We've seen a deadlock happen occasionally where syncServers needs to acquire a socket to call isMaster, but the socket acquisition needs to know the server topology which isn't known yet. See globalsign#120 issue for a detailed breakdown. This replicates the issue by setting up a mongo "server" which closes sockets as soon as they're opened; about 20% of the time, this will trigger the deadlock because the acquired socket for ismaster() dies and needs to be reacquired.

As discussed in the issue globalsign#120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology. This commit forces isMaster() to actually run on the intended socket.

KJTsanaktsidis · 2018-03-12T21:53:28Z

Really happy to help - having this library be actively maintained helps everyone!

Proposed fix for deadlock in #120

domodwyer · 2018-03-26T07:50:22Z

Hi @dvic, @KJTsanaktsidis

Sorry for disappearing, I was out the country! It looks like this has been fixed (thanks!) but with a direct push to development so this didn't close (I'll also find out how that happened - it should be PR only) so closing now.

I will cut a hotfix release after a test run - thanks again!

Dom

@dvic

For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

@dvic

* cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis

@changwoo-nam

* socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver

@changwoo-nam

* socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver * added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0) * fix IsNil on slices and maps format * added godoc * fix sasl empty payload * fix scram-sha-1 auth * revert fix sasl empty payload

@changwoo-nam

* socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * Allow passing slice pointer as an interface pointer to Iter.All * Reverted to original error message, added test case for interface{} ptr

@changwoo-nam

* socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * findAndModify support writeConcern * fix

@changwoo-nam

* allow ptr in inline structs * inline pointer_to_struce mode: update comments. return error on pointer not to struct * fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛 I've added a use for os.Kill, instead of os.Interrupt signal, when using Windows. I'm current developing my project on Windows, and using DBServer.Stop() was resulting in: "timeout waiting for mongod process to die". After investigating, I've discovered that os.Interrupt isn't implemented on Windows, and it seems golang has Frozen this issue due to age (2013). They instruct to use os.Kill instead. Using this, the DBServer on my project works with no problem. * Respect nil slices, maps in bson encoder (#147) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver * added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0) * fix IsNil on slices and maps format * added godoc * fix sasl empty payload * fix scram-sha-1 auth * revert fix sasl empty payload * Separate read/write network timeouts (#161) * socket: separate read/write network timeouts Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout and WriteTimeout to address #160. Read/write timeout defaults to DialInfo.Timeout to preserve existing behaviour. * cluster: remove AcquireSocket Only used by tests, replaced by the pool-aware acquire socket functions: * AcquireSocketWithPoolTimeout * AcquireSocketWithBlocking * cluster: use configured timeouts for cluster operations * `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds * `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds * tests: use DialInfo for internal timeouts * server: fix fantastic serverTags nil slice bug When unmarshalling serverTags, it is now an empty slice, instead of a nil slice. `len(thing) == 0` works all the time, regardless. * cluster: remove unused duplicate pool config * session: avoid calculating default values in hot path Changes `DialWithInfo` to handle setting default values by setting the relevant `DialInfo` field, rather than calling the respective methods in the hot path for: * `PoolLimit` * `ReadTimeout` * `WriteTimeout` * session: remove unused consts * session: update docs * add URI options: "w", "j", "wtimeoutMS" (#162) * add URI options: "w", "j", "wtimeoutMS" * change "w" to "j" * Add Collation support for calling Count() on a Query (#166) * Expand documentation for *Iter.Next (#163) The documentation now explains the difference between calling Err and Close after Next returns false. The example code has been expanded to include checking for timeout. * add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (#171) code is inspired by go-mgo#202 * MGO-156 Avoid iter.Next deadlock on dead sockets (#182) * Allow passing slice pointer as an interface pointer to Iter.All (#181) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * Allow passing slice pointer as an interface pointer to Iter.All * Reverted to original error message, added test case for interface{} ptr * Contributing:findAndModify support writeConcern (#185) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * findAndModify support writeConcern * fix * readme: credit everyone (#187) * @cedric-cordenier * @DaytonG * @ddspog * @gedge * @jefferickson * @larrycinnabar * @Mei-Zhao * @roobre * revert: MGO-156 Avoid iter.Next deadlock on dead sockets (#182) (#188) This reverts commit 7253b2b. * Add support for ssl dial string (#184) * Add support for ssl dial string * Ensure we dont override user settings * update examples * update ssl value parsing * PingSsl test * skip test requiring system certificates * readme: credit @tbruyelle (#190)

@changwoo-nam

* socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * Release/r2018.06.15 (#191) * allow ptr in inline structs * inline pointer_to_struce mode: update comments. return error on pointer not to struct * fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛 I've added a use for os.Kill, instead of os.Interrupt signal, when using Windows. I'm current developing my project on Windows, and using DBServer.Stop() was resulting in: "timeout waiting for mongod process to die". After investigating, I've discovered that os.Interrupt isn't implemented on Windows, and it seems golang has Frozen this issue due to age (2013). They instruct to use os.Kill instead. Using this, the DBServer on my project works with no problem. * Respect nil slices, maps in bson encoder (#147) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver * added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0) * fix IsNil on slices and maps format * added godoc * fix sasl empty payload * fix scram-sha-1 auth * revert fix sasl empty payload * Separate read/write network timeouts (#161) * socket: separate read/write network timeouts Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout and WriteTimeout to address #160. Read/write timeout defaults to DialInfo.Timeout to preserve existing behaviour. * cluster: remove AcquireSocket Only used by tests, replaced by the pool-aware acquire socket functions: * AcquireSocketWithPoolTimeout * AcquireSocketWithBlocking * cluster: use configured timeouts for cluster operations * `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds * `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds * tests: use DialInfo for internal timeouts * server: fix fantastic serverTags nil slice bug When unmarshalling serverTags, it is now an empty slice, instead of a nil slice. `len(thing) == 0` works all the time, regardless. * cluster: remove unused duplicate pool config * session: avoid calculating default values in hot path Changes `DialWithInfo` to handle setting default values by setting the relevant `DialInfo` field, rather than calling the respective methods in the hot path for: * `PoolLimit` * `ReadTimeout` * `WriteTimeout` * session: remove unused consts * session: update docs * add URI options: "w", "j", "wtimeoutMS" (#162) * add URI options: "w", "j", "wtimeoutMS" * change "w" to "j" * Add Collation support for calling Count() on a Query (#166) * Expand documentation for *Iter.Next (#163) The documentation now explains the difference between calling Err and Close after Next returns false. The example code has been expanded to include checking for timeout. * add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (#171) code is inspired by go-mgo#202 * MGO-156 Avoid iter.Next deadlock on dead sockets (#182) * Allow passing slice pointer as an interface pointer to Iter.All (#181) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * Allow passing slice pointer as an interface pointer to Iter.All * Reverted to original error message, added test case for interface{} ptr * Contributing:findAndModify support writeConcern (#185) * socket: only send client metadata once per socket (#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (#111) * Brings in a patch on having flusher not suppress errors. (#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes #101 and fixes #103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (#110) * Hotfix #120 (#136) * cluster: fix deadlock in cluster synchronisation (#120) For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * findAndModify support writeConcern * fix * readme: credit everyone (#187) * @cedric-cordenier * @DaytonG * @ddspog * @gedge * @jefferickson * @larrycinnabar * @Mei-Zhao * @roobre * revert: MGO-156 Avoid iter.Next deadlock on dead sockets (#182) (#188) This reverts commit 7253b2b. * Add support for ssl dial string (#184) * Add support for ssl dial string * Ensure we dont override user settings * update examples * update ssl value parsing * PingSsl test * skip test requiring system certificates * readme: credit @tbruyelle (#190) * strip space of flag

We've seen a deadlock happen occasionally where syncServers needs to acquire a socket to call isMaster, but the socket acquisition needs to know the server topology which isn't known yet. See globalsign#120 issue for a detailed breakdown. This replicates the issue by setting up a mongo "server" which closes sockets as soon as they're opened; about 20% of the time, this will trigger the deadlock because the acquired socket for ismaster() dies and needs to be reacquired.

As discussed in the issue globalsign#120, isMaster() can cause a deadlock with the topology scanner if the connection it makes dies before running the command; mgo automagically attempts to make another socket in acquireSocket, but this can't work without topology. This commit forces isMaster() to actually run on the intended socket.

Proposed fix for deadlock in globalsign#120

@dvic

* cluster: fix deadlock in cluster synchronisation (globalsign#120) For a impressively thorough breakdown of the problem, see: globalsign#120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis

@changwoo-nam

* socket: only send client metadata once per socket (globalsign#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (globalsign#111) * Brings in a patch on having flusher not suppress errors. (globalsign#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (globalsign#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (globalsign#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (globalsign#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (globalsign#110) * Hotfix globalsign#120 (globalsign#136) * cluster: fix deadlock in cluster synchronisation (globalsign#120) For a impressively thorough breakdown of the problem, see: globalsign#120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver

@changwoo-nam

* allow ptr in inline structs * inline pointer_to_struce mode: update comments. return error on pointer not to struct * fix(dbtest): Use os.Kill on windows instead of Interrupt 🐛 I've added a use for os.Kill, instead of os.Interrupt signal, when using Windows. I'm current developing my project on Windows, and using DBServer.Stop() was resulting in: "timeout waiting for mongod process to die". After investigating, I've discovered that os.Interrupt isn't implemented on Windows, and it seems golang has Frozen this issue due to age (2013). They instruct to use os.Kill instead. Using this, the DBServer on my project works with no problem. * Respect nil slices, maps in bson encoder (globalsign#147) * socket: only send client metadata once per socket (globalsign#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (globalsign#111) * Brings in a patch on having flusher not suppress errors. (globalsign#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (globalsign#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (globalsign#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (globalsign#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (globalsign#110) * Hotfix globalsign#120 (globalsign#136) * cluster: fix deadlock in cluster synchronisation (globalsign#120) For a impressively thorough breakdown of the problem, see: globalsign#120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * added support for marshalling/unmarshalling maps with non-string keys * refactor method receiver * added support for json-compatible support for slices and maps Marshal() func: nil slice or map converts to nil, not empty (initialized with len=0) * fix IsNil on slices and maps format * added godoc * fix sasl empty payload * fix scram-sha-1 auth * revert fix sasl empty payload * Separate read/write network timeouts (globalsign#161) * socket: separate read/write network timeouts Splits DialInfo.Timeout (defaults to 60s when using mgo.Dial()) into ReadTimeout and WriteTimeout to address globalsign#160. Read/write timeout defaults to DialInfo.Timeout to preserve existing behaviour. * cluster: remove AcquireSocket Only used by tests, replaced by the pool-aware acquire socket functions: * AcquireSocketWithPoolTimeout * AcquireSocketWithBlocking * cluster: use configured timeouts for cluster operations * `mongoCluster.syncServer()` no longer uses hard-coded 5 seconds * `mongoCluster.isMaster()` no longer uses hard-coded 10 seconds * tests: use DialInfo for internal timeouts * server: fix fantastic serverTags nil slice bug When unmarshalling serverTags, it is now an empty slice, instead of a nil slice. `len(thing) == 0` works all the time, regardless. * cluster: remove unused duplicate pool config * session: avoid calculating default values in hot path Changes `DialWithInfo` to handle setting default values by setting the relevant `DialInfo` field, rather than calling the respective methods in the hot path for: * `PoolLimit` * `ReadTimeout` * `WriteTimeout` * session: remove unused consts * session: update docs * add URI options: "w", "j", "wtimeoutMS" (globalsign#162) * add URI options: "w", "j", "wtimeoutMS" * change "w" to "j" * Add Collation support for calling Count() on a Query (globalsign#166) * Expand documentation for *Iter.Next (globalsign#163) The documentation now explains the difference between calling Err and Close after Next returns false. The example code has been expanded to include checking for timeout. * add NewMongoTimestamp() and MongoTimestamp.Time(),Counter() (globalsign#171) code is inspired by go-mgo#202 * MGO-156 Avoid iter.Next deadlock on dead sockets (globalsign#182) * Allow passing slice pointer as an interface pointer to Iter.All (globalsign#181) * socket: only send client metadata once per socket (globalsign#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (globalsign#111) * Brings in a patch on having flusher not suppress errors. (globalsign#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (globalsign#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (globalsign#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (globalsign#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (globalsign#110) * Hotfix globalsign#120 (globalsign#136) * cluster: fix deadlock in cluster synchronisation (globalsign#120) For a impressively thorough breakdown of the problem, see: globalsign#120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * Allow passing slice pointer as an interface pointer to Iter.All * Reverted to original error message, added test case for interface{} ptr * Contributing:findAndModify support writeConcern (globalsign#185) * socket: only send client metadata once per socket (globalsign#105) Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Merge Development (globalsign#111) * Brings in a patch on having flusher not suppress errors. (globalsign#81) go-mgo#360 * Fallback to JSON tags when BSON tag isn't present (globalsign#91) * Fallback to JSON tags when BSON tag isn't present Cleanup. * Add test to demonstrate tagging fallback. - Test coverage for tagging test. * socket: only send client metadata once per socket Periodic cluster synchronisation calls isMaster() which currently resends the "client" metadata every call - the spec specifies: isMaster commands issued after the initial connection handshake MUST NOT contain handshake arguments https://github.com/mongodb/specifications/blob/master/source/mongodb-handshake/handshake.rst#connection-handshake This hotfix prevents subsequent isMaster calls from sending the client metadata again - fixes globalsign#101 and fixes globalsign#103. Thanks to @changwoo-nam @qhenkart @canthefason @jyoon17 for spotting the initial issue, opening tickets, and having the problem debugged with a PoC fix before I even woke up. * Cluster abended test 254 (globalsign#100) * Add a test that mongo Server gets their abended reset as necessary. See https://github.com/go-mgo/mgo/issues/254 and https://github.com/go-mgo/mgo/pull/255/files * Include the patch from Issue 255. This brings in a test which fails without the patch, and passes with the patch. Still to be tested, manual tcpkill of a socket. * changeStream support (globalsign#97) Add $changeStream support * readme: credit @peterdeka and @steve-gray (globalsign#110) * Hotfix globalsign#120 (globalsign#136) * cluster: fix deadlock in cluster synchronisation (globalsign#120) For a impressively thorough breakdown of the problem, see: globalsign#120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix. * readme: credit @dvic and @KJTsanaktsidis * findAndModify support writeConcern * fix * readme: credit everyone (globalsign#187) * @cedric-cordenier * @DaytonG * @ddspog * @gedge * @jefferickson * @larrycinnabar * @Mei-Zhao * @roobre * revert: MGO-156 Avoid iter.Next deadlock on dead sockets (globalsign#182) (globalsign#188) This reverts commit 7253b2b. * Add support for ssl dial string (globalsign#184) * Add support for ssl dial string * Ensure we dont override user settings * update examples * update ssl value parsing * PingSsl test * skip test requiring system certificates * readme: credit @tbruyelle (globalsign#190)

domodwyer added the needs info label Mar 9, 2018

KJTsanaktsidis mentioned this issue Mar 10, 2018

Proposed fix for deadlock in globalsign/mgo#120 #121

Merged

domodwyer added bug and removed needs info labels Mar 12, 2018

domodwyer assigned dvic Mar 12, 2018

szank added a commit that referenced this issue Mar 22, 2018

Fix for deadlock in cluster: isMaster() #120 (#121)

876956d

Proposed fix for deadlock in #120

domodwyer closed this as completed Mar 26, 2018

domodwyer added the needs stg test label Mar 26, 2018

domodwyer added a commit that referenced this issue Apr 2, 2018

cluster: fix deadlock in cluster synchronisation (#120)

06f95aa

For a impressively thorough breakdown of the problem, see: #120 (comment) Huge thanks to @dvic and @KJTsanaktsidis for the report and fix.

This was referenced Apr 2, 2018

Hotfix #120 #136

Merged

Blocked in function cluster.AcquireSocket #134

Closed

Data race for socket #135

Closed

domodwyer removed the needs stg test label Apr 3, 2018

QieKai mentioned this issue Aug 8, 2018

Release/r2018.06.15 (#191) nzgogo/mgo#2

Closed

libi pushed a commit to libi/mgo that referenced this issue Dec 1, 2022

Fix for deadlock in cluster: isMaster() globalsign#120 (globalsign#121)

00e7550

Proposed fix for deadlock in globalsign#120

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test killed after 10min on travis with docker mongo #120

test killed after 10min on travis with docker mongo #120

dvic commented Mar 6, 2018

dvic commented Mar 6, 2018

KJTsanaktsidis commented Mar 9, 2018 •

edited

Loading

domodwyer commented Mar 9, 2018 •

edited

Loading

KJTsanaktsidis commented Mar 10, 2018

KJTsanaktsidis commented Mar 10, 2018

domodwyer commented Mar 12, 2018

dvic commented Mar 12, 2018

domodwyer commented Mar 12, 2018

dvic commented Mar 12, 2018 •

edited

Loading

dvic commented Mar 12, 2018

dvic commented Mar 12, 2018

domodwyer commented Mar 12, 2018

KJTsanaktsidis commented Mar 12, 2018

domodwyer commented Mar 26, 2018

test killed after 10min on travis with docker mongo #120

test killed after 10min on travis with docker mongo #120

Comments

dvic commented Mar 6, 2018

dvic commented Mar 6, 2018

KJTsanaktsidis commented Mar 9, 2018 • edited Loading

domodwyer commented Mar 9, 2018 • edited Loading

KJTsanaktsidis commented Mar 10, 2018

KJTsanaktsidis commented Mar 10, 2018

domodwyer commented Mar 12, 2018

dvic commented Mar 12, 2018

domodwyer commented Mar 12, 2018

dvic commented Mar 12, 2018 • edited Loading

dvic commented Mar 12, 2018

dvic commented Mar 12, 2018

domodwyer commented Mar 12, 2018

KJTsanaktsidis commented Mar 12, 2018

domodwyer commented Mar 26, 2018

KJTsanaktsidis commented Mar 9, 2018 •

edited

Loading

domodwyer commented Mar 9, 2018 •

edited

Loading

dvic commented Mar 12, 2018 •

edited

Loading