Introduce a single txn log watcher for state pool. #8223

howbazaar · 2017-12-14T03:30:32Z

Backport of 2.3 work to split up txn.logs watcher.

Back in the dawn of time, there was only a single model, and a single State instance. As the controller concept became a thing, we had additional State instances for each model. A core aspect of how the event model works in Juju is watching of the txns.log collection. As documents are added, updated, and removed by the transaction system, documents are added to the txns.log capped collection. Each document refers to the documents that were touched a single transaction. Each State instance has its own txn watcher. Each one of these tails the txns.log collection to watch for changes. As the number of models grows in a single controller, so does the number of things watching the txns.log collection. If the controller is HA, then there is n x m watchers where n is the number of models and m is the number of API servers. This creates quite a bit of idle load on mongo as the number of models grows above a small number. As numbers approach 200, there is a LOT of load. Especially when actions occur that trigger a bunch of transactions that touch a bunch of documents, such as removing an application. Particularly an application that has a bunch of subordinates.

So... this branch changes how the polling of the database is done. The state pool now has a watcher, that publishes all document changes to a hub that is also owned by the state pool. Whenever the state pool creates a new State instance, the base watcher for that State instance listens to the hub for changes rather than polling the database.

This drastically reduces the mongo load.

QA steps

I verified this by starting with a 2.2.6 controller, deploying 55 units to each of four models, and adding 700 empty models. This showed significant i/o load on the juju.txns.log collection.

After upgrading the controller, the load dropped significantly. From ~1000ms read / 10s to 15ms read / 10s.

Additional QA was done against EC2 with 3500 unit agents.

Bug reference

https://bugs.launchpad.net/juju/+bug/1727973
https://bugs.launchpad.net/juju/+bug/1733708
https://bugs.launchpad.net/juju/+bug/1737255

wallyworld

I'm assuming the hub watcher and txn watcher stuff has been reviewed elsewhere in the original PR?
If so, LGTM, if not, I'll need to look at those bits properly.

wallyworld · 2017-12-14T03:38:58Z

state/pool.go

+	// If systemState is nil, this is clearly a test, and a poorly
+	// isolated one. However now is not the time to fix all those broken
+	// tests.
+	if systemState != nil {


can we invert this to avoid indentation and jeep the code closer to what it should be without the work around
if systemState == nil {
return pool
}

wallyworld · 2017-12-14T03:40:25Z

state/pool.go

 		systemState: systemState,
 		pool:        make(map[string]*PoolItem),
+		hub:         pubsub.NewSimpleHub(nil),
+	}
+	// If systemState is nil, this is clearly a test, and a poorly


Can we log a warning so that if we accidentally introduce nil system state in production code, we at least get to see an artefact

jameinel

Mostly approving this based on reviewing it against 2.3 and feeling like at a minimum, it is better than what we had before. There might still be possibilities of pushing filtering higher in the stack (so callbacks are invoked per thread, but filtering is done in, say, a single thread, rather than filter into callback).
Regardless, we need something like this, and this is a good first step.

jameinel · 2017-12-14T04:41:52Z

state/watcher/hubwatcher.go

+func (w *HubWatcher) flush() {
+	// syncEvents are stored first in first out.
+	for i := 0; i < len(w.syncEvents); i++ {
+		e := &w.syncEvents[i]


do we allow for syncEvents to also grow during the loop? Should we put a comment to that effect?
certainly this paradigm supports it changing.

Yes, adding comment.

howbazaar · 2017-12-14T08:28:52Z

$$merge$$

jujubot · 2017-12-14T08:30:11Z

Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju

howbazaar added 3 commits December 14, 2017 12:12

Introduce BaseWatcher interface.

55c14a8

Introduce a txns.log watcher in the state pool.

e99fb89

Fix some clocks.

43cae1a

wallyworld approved these changes Dec 14, 2017

View reviewed changes

jameinel approved these changes Dec 14, 2017

View reviewed changes

Tweaks following review.

8dfc091

jujubot merged commit edc8d6d into juju:2.2 Dec 14, 2017

howbazaar deleted the 2.2-txn-log-watcher-pubsub branch December 14, 2017 08:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a single txn log watcher for state pool. #8223

Introduce a single txn log watcher for state pool. #8223

howbazaar commented Dec 14, 2017

wallyworld left a comment

wallyworld Dec 14, 2017

howbazaar Dec 14, 2017

wallyworld Dec 14, 2017

howbazaar Dec 14, 2017

jameinel left a comment

jameinel Dec 14, 2017

howbazaar Dec 14, 2017

howbazaar commented Dec 14, 2017

jujubot commented Dec 14, 2017

Introduce a single txn log watcher for state pool. #8223

Introduce a single txn log watcher for state pool. #8223

Conversation

howbazaar commented Dec 14, 2017

QA steps

Bug reference

wallyworld left a comment

Choose a reason for hiding this comment

wallyworld Dec 14, 2017

Choose a reason for hiding this comment

howbazaar Dec 14, 2017

Choose a reason for hiding this comment

wallyworld Dec 14, 2017

Choose a reason for hiding this comment

howbazaar Dec 14, 2017

Choose a reason for hiding this comment

jameinel left a comment

Choose a reason for hiding this comment

jameinel Dec 14, 2017

Choose a reason for hiding this comment

howbazaar Dec 14, 2017

Choose a reason for hiding this comment

howbazaar commented Dec 14, 2017

jujubot commented Dec 14, 2017