Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Restart agent when txn log errors occur; provider configurable txn log size #7382
Conversation
|
What about after upgrades, etc. There are more times than just bootstrap where we will need to initialize all of the tables. |
| @@ -83,7 +84,10 @@ func (w *commonWatcher) commonLoop() { | ||
| defer wg.Done() | ||
| <-w.tomb.Dying() | ||
| if err := w.call("Stop", nil); err != nil { | ||
| - logger.Errorf("error trying to stop watcher: %v", err) | ||
| + // Don't log an error if a watcher is stopped due to an agent restart. | ||
| + if err.Error() != worker.ErrRestartAgent.Error() { |
wallyworld
May 25, 2017
Owner
Because the error here is a comes from across the wire and just contains a string message
| modelTag := names.NewModelTag(args.ControllerModelArgs.Config.UUID()) | ||
| - st, err := open(modelTag, args.MongoInfo, args.MongoDialOpts, args.NewPolicy, args.Clock, nil) | ||
| + if !names.IsValidModel(modelTag.Id()) { |
wallyworld
May 24, 2017
Owner
Not sure since you would expect the uuid to be validated elsewhere. But the code had this check already so I kept it.
| @@ -526,18 +558,13 @@ func newState( | ||
| } | ||
| }() | ||
| - // Set up database. | ||
| + schema := allCollections() |
| + current: make(map[watchKey]int64), | ||
| + request: make(chan interface{}), | ||
| + } | ||
| + if w.iteratorFunc == nil { |
howbazaar
May 24, 2017
Owner
This is quite ick.
Couldn't you instead keep New function taking one arg, but having a test method that provider an iterator override?
Having the testing arg part of the public interface is not nice.
wallyworld
May 24, 2017
Owner
We use this pattern elsewhere. But we also use a test New function also in other places. Matter of taste as to how one does DI. Doing it this way makes the dependency explicit. I can go either way.
| @@ -441,7 +464,15 @@ func (w *Watcher) sync() error { | ||
| } | ||
| } | ||
| if err := iter.Close(); err != nil { |
howbazaar
May 24, 2017
Owner
Isn't there an iter.Err() that we should be looking for, rather than just using Close?
| +type badIter struct { | ||
| + *mgo.Iter | ||
| + | ||
| + errorAfter *int |
howbazaar
May 24, 2017
Owner
Why a pointer? Since the interface is supported by a *badIter, you can just use a normal int here.
|
As @jameinel says, can we reconfigure the max txn log size after the controller has been set up? |
|
Upgrades is a good point. I still think it's bad to, regardless of context, do the collection setup each time a new state is created. eg state.ForModel() should not have to do it. I think it would be better to have the upgrade process explicitly redo the collections as part of the upgrade. Also, controller settings are immutable at the moment. If/when that changes, again, I'd rather be explicit about messing with the collections, rather than just doing it every time we open a state. |
|
I'm perfectly happy to have an explicit point where we do DB reconfiguration (be it Upgrades or Install, or even user initiated reset and restart). But I wanted to make sure we didn't forget things like upgrades. Especially wrt, how will things function if you upgrade to a version that now has this config, but never had it set. As far as initialization of collections, I'm actually more concerned that we'll add a collection that, say, has a really important second index, but because we aren't applying the DB schema, we forget to create that index when upgrading. Mongo does generally follow the rule of "ensure this index exists" rather than "create an explicit index" for this precise reason. You can always just ask to make sure the index you want exists. |
|
The latest revision ensures that each time the agent is restarted, we apply indices, create explicit collections etc. This covers the upgrade case, as well as not unnecessarily doing it each time we simply get a new state from the state pool etc. |
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
|
Build failed: Tests failed |
wallyworld
added some commits
May 24, 2017
|
$$merge$$ |
|
Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju |
wallyworld commentedMay 24, 2017
Description of change
During scale testing, destroying many models all at once would cause the capped txn log to roll over from underneath the watcher iterator. The watcher would error and things would go pear shaped from there.
We now catch that error in the watcher and bubble it up as a fatal agent error. This causes the agent to restart. The restart has the effect of starting from a known point again.
A small refactoring was done as part of this work to only initialise the database collections during state initialisation at bootstrap time, rather than every time a state instance is opened. This was needed because it was only at init time that the controller config with the max txn log size is available, plus it is dumb to initialise the collections after initialisation.
QA steps
$ juju bootstrap aws --config max-txn-log-size=1M
$ juju controller-config
verify that the max-txn-log-size is shown as 1M
run juju debug-log
create many models and destroy all at once
verify that debug log shows the capped position lost error and the agent should be restarted error and then exits as the agent is killed
run juju debug-log again and see that the logs show the agent has restarted
$ juju bootstrap aws
$ juju controller-config
verify that the max-txn-log-size is shown as 10M
bootstrap with an older client and upgrade
$ juju controller-config
verify that the max-txn-log-size is shown as 10M
Documentation changes
There's a new controller config attribute - max-txn-log-size, defaults to 10M
// MaxTxnLogSize is the maximum size the of capped txn log collection, eg "10M"
MaxTxnLogSize = "max-txn-log-size"
Bug reference
https://bugs.launchpad.net/juju/+bug/1692792