Add the option of a shared txn watcher for state objects. #8153

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
Owner

howbazaar commented Nov 29, 2017

Whenever juju creates a new state object, it starts a txn log watcher. This watcher polls the txns.log collection every five seconds. This watcher is the basis of most other document watchers.

Problems occur when we have many models. For example, prodstack had 184 models and 3 api servers. Every model was in every state pool, so there were 552 objects each with a worker that was polling the transaction log. When a spike of transactions were added, like in the removal of an application, the i/o load on the mongo database went through the roof as all the objects read the txns.log collection. This then caused slow downs on other documents, and caused juju status on an unrelated model go from half a second to 20 seconds.

This change uses the current state objects txn log watcher when creating a new state instance for the state pool.

QA steps

Deploy a bunch of models into a controller, and add units and machines. Relate things.

Documentation changes

No documentation change here, this is just an internal optimisation.

Bug reference

https://bugs.launchpad.net/juju/+bug/1733708

I think it looks ok overall.
It would be nice to test it with considerable number of models on the controller...

Also I am surprised that no existing unit tests were affected and no new unit tests were written...

Either way, the logic is better than before...

+ sharedTxnWatcher: txnWatcher,
+ }
+ if ws.sharedTxnWatcher == nil {
+ ws.StartWorker(txnLogWorker, func() (worker.Worker, error) {
@anastasiamac

anastasiamac Nov 29, 2017

Member

Don't you need to assign this to ws.sharedTxnWatcher?

Member

anastasiamac commented Nov 30, 2017

Actually, questions (with more coffee):

  • you are storing the watcher on the worker, what will happen if something happens to the watcher? All these workers will refer to an invalid/non-working watcher and will keep passing it around?...
  • Does the worker really care if it is using shared watcher or not? It seems to only care that it has a watcher.. So would it not be safer to (1) pass a watcher into a constructor; (2) check that the passed-in watcher is either non-nil or ok; (3) if it's usable, use it or if not create a new one. In other words, do not store the reference to it on the worker...
Member

axw commented Nov 30, 2017

My main concern with this change is that filters run inline in the watcher, and some of those filters are quite expensive; and so one busy model can starve another of events.

Aside from that, I don't like the idea of tying one State to another. I could live with it short term, but what I want to see (and started on, but didn't get too far with) is a "state.Manager" which manages all of the State objects, and their workers. The state.Manager would run a state/watcher, and provide it to each of the States. States would be tied to a state.Manager, but not to any other State.

Owner

howbazaar commented Nov 30, 2017

This approach is too naïve, and a restarted worker in the main state leaves others broken.
I will rework and resubmit.

@howbazaar howbazaar closed this Nov 30, 2017

Owner

jameinel commented Nov 30, 2017

I agree that we should be providing a factory for the watcher, instead of just a watcher, so that restarts, etc can still give you a valid watcher.
I'm less concerned about filters in one model causing starvation in others, though if we wanted to provide per-model consumers of the overall watcher that could be ok. (Especially if we start explicitly knowing that these only watch a given model, which means we only need to send possible events for that model.)
I do believe we have a few collections that are cross model (like modelusers, etc), which any State object can track. I don't know if we have explicit watchers on those collections.
Having a "generic" group that is not model specific, and then separate model-specific objects that run in separate goroutines and the core watcher only feeds them model-specific changes seems like a good breakdown. Though a fair bit more invasive than just sharing a single txnwatcher.

Owner

jameinel commented Dec 1, 2017

@howbazaar howbazaar deleted the howbazaar:shared-txnwatcher branch Dec 7, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment