Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger upgrade-series-prepare hook. #8864

Merged
merged 45 commits into from Jul 13, 2018

Conversation

ExternalReality
Copy link
Contributor

Description of change

When a upgrade-series prepare is initiated hooks need to be run.

@ExternalReality
Copy link
Contributor Author

notifyWatcher := watchertest.NewNotifyWatcherC(c, watcher, s.BackingState.StartSync)
defer notifyWatcher.AssertStops()

// Why do I have two initial events?!?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are apparently two initial events coming out of this puppy. Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find out why this is happening? I noticed the TestWatchActionNotificationsMoreResults() is testing that only 1 watch notification happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I mean above. No I don't know why, its innocuous (since the Uniter will just loop twice), but I don't know why its sending two initial events.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histerical raisins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All watchers send an initial event to trigger the checking of initial state. Many times this isn't useful, and you'll see comments like "consume initial event".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fixed now.

@ExternalReality ExternalReality force-pushed the ensure_upgrade_series_hooks_run branch from 7591243 to ff2f1e7 Compare July 2, 2018 08:00
@ExternalReality ExternalReality force-pushed the ensure_upgrade_series_hooks_run branch from 254edfe to be3d968 Compare July 2, 2018 19:58
@@ -984,6 +984,77 @@ func (u *UniterAPI) WatchActionNotifications(args params.Entities) (params.Strin
return common.WatchActionNotifications(args, canAccess, watchOne), nil
}

// WatchActionNotifications returns a StringsWatcher for observing the status of an Upgrade Series
func (u *UniterAPI) WatchUpgradeSeriesNotifications(args params.Entities) (params.NotifyWatchResults, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can put these into an api shared between the uniter and the new upgrade series worker? Both need to WatchUpgradeSeriesNotifications(), UpgradeSeriesStatus() etc. Entities deal in tags, so we can get different data if necessary for machines and units.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is being moved into a new API as per the above, therefor there is no reason to bump version here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ExternalReality I'm not sure what you are meaning "as per the above". If we are adding methods to the uniter API, then we need to bump the facade version, even if they are brought in through a common implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howbazaar, the above comment by @hmlanigan. The code is being moved out of the UniterAPI entirely in the next PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we have a facade per-worker, and while they might both embed a common.Methods, they would be separate grouping of methods.

state/machine.go Outdated
@@ -2205,29 +2205,62 @@ func (m *Machine) RemoveUpgradeSeriesLock() error {

// UpgradeSeriesStatus returns the status of a series upgrade.
func (m *Machine) UpgradeSeriesStatus() (string, error) {
return "preparing", nil
coll, closer := m.st.db().GetCollection(machineUpgradeSeriesLocksC)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move 2208-2215 into a helper method

state/machine.go Outdated
if err != nil {
return "", errors.Trace(err)
}
s := fmt.Sprintf("%d", lock.PrepareUnits[0].Status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we rename to UpgradeSeriesPrepareStatus() and return the lock.PrepareStatus, and have a new Unit.UpgradeSeriesPrepareStatus() for the individual units?

state/machine.go Outdated
@@ -2205,29 +2205,62 @@ func (m *Machine) RemoveUpgradeSeriesLock() error {

// UpgradeSeriesStatus returns the status of a series upgrade.
func (m *Machine) UpgradeSeriesStatus() (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why pass back a string? the two status types are typed.

state/machine.go Outdated
return "", errors.Trace(err)
}
s := fmt.Sprintf("%d", lock.PrepareUnits[0].Status)
return s, nil
}

// SetUpgradeSeriesStatus sets the status of a series upgrade.
func (m *Machine) SetUpgradeSeriesStatus(unitName string, status string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetUpgradeSeriesPrepareUnitStatus()?

@ExternalReality ExternalReality changed the title WIP: Trigger upgrade-series-prepare hook. Trigger upgrade-series-prepare hook. Jul 7, 2018
UnitStarted UnitSeriesUpgradeStatus = "Started"
UnitErrored UnitSeriesUpgradeStatus = "Errored"
UnitCompleted UnitSeriesUpgradeStatus = "Completed"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These constants may eventually get replaced with int iota for ordering. This would require a stringer a custom BSON marshaling. Of course, ordering can be done in other ways too.

@ExternalReality ExternalReality force-pushed the ensure_upgrade_series_hooks_run branch from d9ae88d to 1da751e Compare July 9, 2018 00:03
@ExternalReality ExternalReality force-pushed the ensure_upgrade_series_hooks_run branch from 1da751e to afd2b0e Compare July 9, 2018 00:05
// First we create the prepare lock or the required state will not exists
s.CreateUpgradeSeriesLock(c)

err := s.apiUnit.SetUpgradeSeriesStatus(params.UnitStarted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use UnitCompleted instead, as UnitStarted is now the "default"

@hmlanigan
Copy link
Member

LGTM, waiting for changes we discussed and review by @jameinel

}

// UpgradeSeriesStatus returns the upgrade series status of a unit from remote state
func (u *Unit) UpgradeSeriesStatus() (params.UnitSeriesUpgradeStatus, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't return any params structs or types from the API package.

If they really need their own type, I'd be tempted to make a core package. Alternatively keep the type in the same api package as the method.

}

// UpgradeSeriesStatus sets the upgrade series status of the unit in the remote state
func (u *Unit) SetUpgradeSeriesStatus(status params.UnitSeriesUpgradeStatus) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Both the args and return values from API calls should not expose the params package.

notifyWatcher := watchertest.NewNotifyWatcherC(c, watcher, s.BackingState.StartSync)
defer notifyWatcher.AssertStops()

// Why do I have two initial events?!?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Histerical raisins.

notifyWatcher := watchertest.NewNotifyWatcherC(c, watcher, s.BackingState.StartSync)
defer notifyWatcher.AssertStops()

// Why do I have two initial events?!?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All watchers send an initial event to trigger the checking of initial state. Many times this isn't useful, and you'll see comments like "consume initial event".

func (s *unitSuite) CreateUpgradeSeriesLock(c *gc.C, additionalUnits ...string) {
unitNames := additionalUnits
unitNames = append(unitNames, s.wordpressUnit.Name())
series := "trust"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to add validation that the series mentioned is one that juju knows about?

package params

//MachineSeriesUpgradeStatus is the current status a machine series upgrade
type MachineSeriesUpgradeStatus string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these constants would be better suited in core/model package?

state/machine.go Outdated
Id string `bson:"machineid"`
ToSeries string `bson:"toseries"`
FromSeries string `bson:"fromseries"`
PrepareStatus params.MachineSeriesUpgradeStatus `bson:"preparestatus"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned above, core/model would be a good place. But I'm not sure we need a type. A string with validation functions at the entry points is almost certainly fine.

err := m.st.db().Run(buildTxn)
if err != nil {
err = onAbort(err, ErrDead)
logger.Errorf("cannot set series upgrade status for unit %q of machine %q: %v", unitName, m.Id(), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally you either log or return an error, not both.

if err := m.isStillAlive(); err != nil {
return nil, errors.Trace(err)
}
coll, closer := m.st.db().GetCollection(machineUpgradeSeriesLocksC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should use UpgradeSeriesStatus above to short circuit the check.

unitStatus, err := m.UpgradeSeriesStatus(unitName)
if err != nil {
  return nil, errors.Trace(err)
}
if unitStatus == status {
return nil, jujutxn.ErrNoOperations
}
// pass in the current status to assert against, and the status to set it to.
return setUpgradeSeriesTxnOps(m.Id(), unitStatus, status), nil
}

state/machine.go Outdated
Id: machineDocId,
Assert: txn.DocExists,
Update: bson.D{
{"$set", bson.D{{unitField, status}}}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be better to have the unit fields keyed off the unit tag rather than an offset. This would make it easier to make the sested set. Also you should assert that the field matches the expected value.

Copy link
Member

@jameinel jameinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely looks to be progressing well, I think the fix for 2 Next events is easily fixable.

defer notifyWatcher.AssertStops()

// Why do I have two initial events?!?
_, ok := <-notifyWatcher.Watcher.Changes()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bare channel operation is almost always dangerous, as it can block indefinitely (and then you only find out when the test suite is killed after 25min).

However, getting 2 events is even worse, as it throws off everyone's expectations.
I don't know exactly how your code is initialized, but I'll note that the NotifyWatcher's that I know of don't transmit an event over-the-wire for the first event. They intentionally swallow the first event on the server side, and trust that the client side knows its a NotifyWatcher and thus triggers the event on the client side.
I'm guessing you have the same "always trigger an event" on your client side, but don't have the "swallow the initial event" on the server side.

c.Assert(err, jc.ErrorIsNil)

// A notification that the document was removed (i.e. the lock was released)
notifyWatcher.AssertOneChange()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than the initial double event, this test looks good.

}

func (s *unitSuite) TestUpgradeSeriesStatus(c *gc.C) {
// First we create the prepare lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth checking the pre-condition of s.apiUnit.UpgradeSeriesStatus() before we create the lock? So that we see the value changes to Started?

It seems odd that you have this test, which says the status is Started, even though there isn't a call to SetUpgradeSeriesStatus(UnitStarted).
Shouldn't the status of a unit start in something like Pending?

If we want to just have "started" as the initial state, then I'd say we don't need a test that you can set the state to Started (drop TestSetUpgradeSeriesStatus, or change it to set the status to Completed)
or clarify that we just treat setting the status to Started as a no-op?
Do we support going from Completed back to Started? Or should that be considered an error? (I'd tend toward the latter, though re-running a pre upgrade step is something we've discussed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestUpgradeSeriesStatus tests that a call to the api gets the current remote state - regardless of what it is.

It just so happens that the remote state starts in the UnitStartedState. When you create a lock, an upgrade is considered started. The uniter see's that and syncs its local state accordingly, so the initial states are as follows:

Uniter: UnitNotStarted
Controller: UnitStarted (as soon as a lock is created)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but correct, lets test our assumption! We will put in a check of the pre-condition. I only note it in a comment.

c.Assert(err, jc.ErrorIsNil)

// Creating a lock for the machine transitions all units to started state
s.CreateUpgradeSeriesLock(c, unit2.Name())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this syntax is correct, as it is a locally defined function that knows about the various machines/units in play, and only creates a lock for s.wordpressMachine and takes additional unit names to include.
I will say that the signature of the method is confusing, as just reading through the code looks like you're taking an UpgradeSeriesLock for unit2, which isn't even a machine.
Maybe its ok, but it does mean I originally had a comment about correctness, and had to come back and reevaluate.
Maybe an issue is that we are using exactly the same function name, so I was expecting a similar signature? CreateTestUpgradeSeriesLock ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this invocation also pass in a unit name? It isn't clear to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its because the Suite function takes a list of "additional units to include in the preparation"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It compares the passed in values with the known values, and thus if someone has added a machine then we can't start the update. The user must confirm again in light of the new information, that is, a newly created machine. Its just used for validation.

@@ -984,6 +984,111 @@ func (u *UniterAPI) WatchActionNotifications(args params.Entities) (params.Strin
return common.WatchActionNotifications(args, canAccess, watchOne), nil
}

// WatchActionNotifications returns a StringsWatcher for observing the status of an Upgrade Series
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string: WatchActionNotifications creates a NotifyWatcher

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, the copy pasta monster - or creepy pasta - depending on which way you want to look at it.

@@ -121,7 +121,7 @@ var debugHooksTests = []struct {
}, {
info: `invalid hook`,
args: []string{"mysql/0", "invalid-hook"},
error: `unit "mysql/0" contains neither hook nor action "invalid-hook", valid actions are [fakeaction] and valid hooks are [collect-metrics config-changed install juju-info-relation-broken juju-info-relation-changed juju-info-relation-departed juju-info-relation-joined leader-deposed leader-elected leader-settings-changed meter-status-changed server-admin-relation-broken server-admin-relation-changed server-admin-relation-departed server-admin-relation-joined server-relation-broken server-relation-changed server-relation-departed server-relation-joined start stop update-status upgrade-charm]`,
error: `unit "mysql/0" contains neither hook nor action "invalid-hook", valid actions are [fakeaction] and valid hooks are [collect-metrics config-changed install juju-info-relation-broken juju-info-relation-changed juju-info-relation-departed juju-info-relation-joined leader-deposed leader-elected leader-settings-changed meter-status-changed post-series-upgrade pre-series-upgrade server-admin-relation-broken server-admin-relation-changed server-admin-relation-departed server-admin-relation-joined server-relation-broken server-relation-changed server-relation-departed server-relation-joined start stop update-status upgrade-charm]`,
}, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is getting long enough, I have to question if it is being useful vs having something you can run to find out the list...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just a regex unit "mysql/0" contains neither hook nor action "invalid-hook", valid actions are .*. Unless we don't trust the list generation and want to test it here, I'd say we don't need the list at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its more a signal that we're not giving a helpful message to the user if we're drowning them in a list that wraps multiple times.
Occasionally validating long lists is good, as it makes sure things are wired up/listed

state/machine.go Outdated
}
return &upgradeSeriesLock{
Id: m.Id(),
ToSeries: toSeries,
FromSeries: m.Series(),
PrepareStatus: NotStarted,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default to Started before the unit has noticed that it needs to run the hook? is it just simpler and more straightforward elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind there are two definitions possible for "Upgrade Series Started". These are:
An upgrade starts when a user performs juju upgrade-series prepare
Alternatively
An upgrade starts when the pre-series-upgrade hook runs.

There former is by far the easiest to implement - I went occam's razor - hopefully I don't get cut.
So by definition the Lock is initialized with machine & units in the started state.

state/machine.go Outdated
Assert: isAliveDoc,
},
{
C: machineUpgradeSeriesLocksC,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we want to take the unit id and assert that
prepareunits.%d.id
is what we expect?
making a change to the "nth" entry in an array always seems a bit concerning without it. I realize the array shouldn't be changing order, but certainly I would think we want to associate the status to the Id rather than the offset.

@@ -170,7 +170,7 @@ func (w *RemoteStateWatcher) CommandCompleted(completed string) {
}

func (w *RemoteStateWatcher) setUp(unitTag names.UnitTag) (err error) {
// TODO(dfc) named return value is a time bomb
// TODO(dfc) named return value is a time bomb (externalreality) I second the notion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason we can't just remove 'err error' and change it to 'error' ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a hesitance to dabble. Will change.

@@ -346,6 +346,7 @@ func (u *Uniter) loop(unitTag names.UnitTag) (err error) {
localState := resolver.LocalState{
CharmURL: charmURL,
CharmModifiedVersion: charmModifiedVersion,
UpgradeSeriesStatus: params.UnitNotStarted,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where I've seen UnitNotStarted, is this exposed somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Uniter API starts in UnitNotStarted state. Yet, the act of creating a lock indicates that the user wants to start the process of upgrading the units. Therefore the controller's corresponding remote state is create in the core.UnitStarted state. This will cause the hook to fire once the user issues a juju upgrade-series prepare. In turn the Uniter's status will be synced.

defer notifyWatcher.AssertStops()

// Why do I have two initial events?!?
_, ok := <-notifyWatcher.Watcher.Changes()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Yes, the other watchers do consume an initial event on the server side - subsequently, on the client side, the additional event is not observed. Thank you @jameinel.

Copy link
Member

@hmlanigan hmlanigan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if @jameinel or @howbazaar approve.

Copy link
Contributor

@howbazaar howbazaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole this is very good, but I have some conerns around the document structure stored in the db.

Generally we like separating words for bson with hyphens. It just makes the direct db json docs easier to read in the mongo shell.

// UpgradeSeriesStatus sets the upgrade series status of the unit in the remote state
func (u *Unit) SetUpgradeSeriesStatus(status string) error {
var results params.ErrorResults
args := params.SetUpgradeSeriesStatusParams{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of the bulk calls isn't to apply a particular value to multiple targets, but instead to be able to process multiple actions in a single call. I think this is why the param struct this way feels weird. The bulk nature of the SetUpgradeSeriesStatus should be to take pairs of agents and status values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

// assigned units.
status, err := s.apiUnit.UpgradeSeriesStatus()
c.Assert(err, jc.ErrorIsNil)
c.Assert(status, gc.Equals, string(model.UnitStarted))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading this test leaves me wondering what model.UnitStarted means. Do you think the constant would be more clear to say model.SeriesUpgradeStarted?
We know that it is for a unit, because we asked on a unit. It seems that a machine might also have an UpgradeSeriesStatus, in which case the constant shouldn't really refer to units.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two sets of constants defined in the same package; one for the update status of the machine and the other for units. We can't have a name like "UpgradeSeriesStarted" for both because go would complain about redundant names. We can't use the same sets for both Units and Machines because they have different upgrade series states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have different types for Machine vs Unit? It does make it simpler if it is unified, but if there is a good reason to leave them as separate types that can be ok.

}

func (s *unitSuite) TestSetUpgradeSeriesStatusFailsIfNoLockExists(c *gc.C) {
arbitaryStatus := string(model.UnitNotStarted)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again with the unit mention. See above comment.

arbitaryStatus := string(model.UnitNotStarted)

err := s.apiUnit.SetUpgradeSeriesStatus(arbitaryStatus)
c.Assert(err, gc.ErrorMatches, "Machine \"[0-9]*\" is not locked for upgrade")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really locked or just marked? Using the terminology of locked has me thinging that I need to use commands to lock and unlock the machine, whereas that isn't what we are doing. This is more just commentary rather than expecting change but it seems to just grate on me a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it does lock the machine in the sense that other "threads/users/whathaveyou" will be blocked from a defined set of operations while the machine and its subordinates are being upgraded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fact a lock of a kind. In the future it will block actions

c.Assert(err, jc.ErrorIsNil)

// Creating a lock for the machine transitions all units to started state
s.CreateUpgradeSeriesLock(c, unit2.Name())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this invocation also pass in a unit name? It isn't clear to me.

state/machine.go Outdated
PrepareUnits []unitStatus `bson:"prepareunits"`
CompleteStatus MachineSeriesUpgradeStatus `bson:"completestatus"`
CompleteUnits []unitStatus `bson:"completeunits"`
Id string `bson:"machineid"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id is a bit confusing when compared with the implicit _id mongo field. Perhaps "MachineID"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue is that the real Machine doc has this bug, but I think thats mostly historical-reasons and I agree its better to use Id => _id and something else for any other id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

state/machine.go Outdated
Id string `bson:"machineid"`
ToSeries string `bson:"toseries"`
FromSeries string `bson:"fromseries"`
PrepareStatus model.MachineSeriesUpgradeStatus `bson:"preparestatus"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this structure very confusing. Also I think that recording the time things happened would also be useful.

What do you think of something like this:

type upgradeStatus struct {
    Value   model.SeriesUpgradeStatus `bson:"value"`
    Timestamp  int64  `bson:"timestamp"`
}


type upgradeSeriesLockDoc struct {
  MachineID  string `bson:"machine-id"` // note the hyphens for readiblity
  ToSeries string  `bson:"to-series"`
  FromSeries string `bson:"from-series"`
  MachineStatus upgradeStatus `bson:"machine-status"`
  UnitStatus map[string]upgradeStatus  `bson:"unit-status"`
}

You could then have a simple function that creates an upgradeStatus sub-document from a status value, and have it automatically populate the timestamp with the state's clock.

Setting the status for a particular unit is a single set call.

Even if we don't immediately expose the timestamps, it would be very useful for debugging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of timestamps, and its something we've done for other status fields.
I'm the one that asked for a slice instead of a map, but I can admit fault in that the complexity around dealing with a slice doesn't outweight the benefit of not putting arbitrary data as BSON attribute names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a Trello Card for adding a time stamp. I certainly do agree. However the PR is getting quite long in the tooth and scope creep is starting to set in. This PR was initially meant to be really tiny - it was meant to detect a lock has been created and fire the hook. I feel that much of the small errors that we are seeing are due to the size of the PR coupled with the rapid pace of development.

case hooks.PreSeriesUpgrade:
logger.Debugf("completing pre upgrade series hook. updating state of series upgrade.")
err = rh.callbacks.SetUpgradeSeriesStatus(model.UnitCompleted)
// Does the unit status need to be set to something here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think the unit status needs to be set there, it should be caught further up.

Where do we set the error state for pre-series-upgrade?

case _, ok := <-upgradeSeriesw.Changes():
logger.Debugf("got upgrade series change")
if !ok {
return errors.New("upgrades series watcher closed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is going to close this? And is it really an error? The other watchers don't follow this pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop where this code lives has 12 other watchers that follow this pattern. They all return a similar error. The loop function is a catacomb worker and the function's watchers are bound the life cycle of that catacomb using the catacomb's Add function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is an appropriate safeguard that the event we got was an actual event rather than just a closed channel.
I agree with Eric that the other watchers are following a similar pattern. It might just be that the very previous watcher wasn't doing this that made it seem to be a different pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, just saw the one above. Happy to have consistency here.

if err != nil {
return err
}
status, err := model.ValidateUnitSeriesUpgradeStatus(rawStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assume that the api server will only ever give us valid status types. Otherwise we end up double guessing ourselves all over the place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to generate an err than a panic(), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a string, it can't panic.

Copy link
Contributor

@howbazaar howbazaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason not to land this now and deal with other issues in follow up branches.

I do however still feel strongly that the underlying doc in the database needs to be looked at again.

@ExternalReality
Copy link
Contributor Author

!!build!!

@ExternalReality
Copy link
Contributor Author

$$MERGE$$

@jujubot jujubot merged commit 3560f1e into juju:develop Jul 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants