Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Introduce a single txn log watcher for state pool. #8223
Conversation
howbazaar
added some commits
Dec 8, 2017
wallyworld
approved these changes
Dec 14, 2017
I'm assuming the hub watcher and txn watcher stuff has been reviewed elsewhere in the original PR?
If so, LGTM, if not, I'll need to look at those bits properly.
| systemState: systemState, | ||
| pool: make(map[string]*PoolItem), | ||
| + hub: pubsub.NewSimpleHub(nil), | ||
| + } | ||
| + // If systemState is nil, this is clearly a test, and a poorly |
wallyworld
Dec 14, 2017
Owner
Can we log a warning so that if we accidentally introduce nil system state in production code, we at least get to see an artefact
| + // If systemState is nil, this is clearly a test, and a poorly | ||
| + // isolated one. However now is not the time to fix all those broken | ||
| + // tests. | ||
| + if systemState != nil { |
wallyworld
Dec 14, 2017
Owner
can we invert this to avoid indentation and jeep the code closer to what it should be without the work around
if systemState == nil {
return pool
}
jameinel
approved these changes
Dec 14, 2017
Mostly approving this based on reviewing it against 2.3 and feeling like at a minimum, it is better than what we had before. There might still be possibilities of pushing filtering higher in the stack (so callbacks are invoked per thread, but filtering is done in, say, a single thread, rather than filter into callback).
Regardless, we need something like this, and this is a good first step.
| +func (w *HubWatcher) flush() { | ||
| + // syncEvents are stored first in first out. | ||
| + for i := 0; i < len(w.syncEvents); i++ { | ||
| + e := &w.syncEvents[i] |
jameinel
Dec 14, 2017
Owner
do we allow for syncEvents to also grow during the loop? Should we put a comment to that effect?
certainly this paradigm supports it changing.
|
$$merge$$ |
|
Status: merge request accepted. Url: http://ci.jujucharms.com/job/github-merge-juju |
howbazaar commentedDec 14, 2017
Backport of 2.3 work to split up txn.logs watcher.
Back in the dawn of time, there was only a single model, and a single State instance. As the controller concept became a thing, we had additional State instances for each model. A core aspect of how the event model works in Juju is watching of the txns.log collection. As documents are added, updated, and removed by the transaction system, documents are added to the txns.log capped collection. Each document refers to the documents that were touched a single transaction. Each State instance has its own txn watcher. Each one of these tails the txns.log collection to watch for changes. As the number of models grows in a single controller, so does the number of things watching the txns.log collection. If the controller is HA, then there is n x m watchers where n is the number of models and m is the number of API servers. This creates quite a bit of idle load on mongo as the number of models grows above a small number. As numbers approach 200, there is a LOT of load. Especially when actions occur that trigger a bunch of transactions that touch a bunch of documents, such as removing an application. Particularly an application that has a bunch of subordinates.
So... this branch changes how the polling of the database is done. The state pool now has a watcher, that publishes all document changes to a hub that is also owned by the state pool. Whenever the state pool creates a new State instance, the base watcher for that State instance listens to the hub for changes rather than polling the database.
This drastically reduces the mongo load.
QA steps
I verified this by starting with a 2.2.6 controller, deploying 55 units to each of four models, and adding 700 empty models. This showed significant i/o load on the juju.txns.log collection.
After upgrading the controller, the load dropped significantly. From ~1000ms read / 10s to 15ms read / 10s.
Additional QA was done against EC2 with 3500 unit agents.
Bug reference
https://bugs.launchpad.net/juju/+bug/1727973
https://bugs.launchpad.net/juju/+bug/1733708
https://bugs.launchpad.net/juju/+bug/1737255