Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Add the option of a shared txn watcher for state objects. #8153
Conversation
anastasiamac
approved these changes
Nov 29, 2017
I think it looks ok overall.
It would be nice to test it with considerable number of models on the controller...
Also I am surprised that no existing unit tests were affected and no new unit tests were written...
Either way, the logic is better than before...
| + sharedTxnWatcher: txnWatcher, | ||
| + } | ||
| + if ws.sharedTxnWatcher == nil { | ||
| + ws.StartWorker(txnLogWorker, func() (worker.Worker, error) { |
|
Actually, questions (with more coffee):
|
|
My main concern with this change is that filters run inline in the watcher, and some of those filters are quite expensive; and so one busy model can starve another of events. Aside from that, I don't like the idea of tying one State to another. I could live with it short term, but what I want to see (and started on, but didn't get too far with) is a "state.Manager" which manages all of the State objects, and their workers. The state.Manager would run a state/watcher, and provide it to each of the States. States would be tied to a state.Manager, but not to any other State. |
|
This approach is too naïve, and a restarted worker in the main state leaves others broken. |
howbazaar
closed this
Nov 30, 2017
|
I agree that we should be providing a factory for the watcher, instead of just a watcher, so that restarts, etc can still give you a valid watcher. |
|
Also note the bug that Junien just filed. Our existing TXN watcher restart
code *doesnt* refresh the underlying Mongo session. So if the connection
dies the TXN watcher goes into death throws dying and restarting every 1s.
John
=:->
…
|
howbazaar commentedNov 29, 2017
Whenever juju creates a new state object, it starts a txn log watcher. This watcher polls the txns.log collection every five seconds. This watcher is the basis of most other document watchers.
Problems occur when we have many models. For example, prodstack had 184 models and 3 api servers. Every model was in every state pool, so there were 552 objects each with a worker that was polling the transaction log. When a spike of transactions were added, like in the removal of an application, the i/o load on the mongo database went through the roof as all the objects read the txns.log collection. This then caused slow downs on other documents, and caused juju status on an unrelated model go from half a second to 20 seconds.
This change uses the current state objects txn log watcher when creating a new state instance for the state pool.
QA steps
Deploy a bunch of models into a controller, and add units and machines. Relate things.
Documentation changes
No documentation change here, this is just an internal optimisation.
Bug reference
https://bugs.launchpad.net/juju/+bug/1733708