-
Notifications
You must be signed in to change notification settings - Fork 224
checkpointer: implement state machine. #755
Conversation
coreosbot run e2e checkpointer |
cc @thorfour |
coreosbot run e2e checkpointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@diegs This looks promising. Thanks for the update!
Would you like us to test this on coreos/tectonic-installer#1938 before merging here?
@mxinden i am going to test this against a tectonic cluster this morning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple minor comments, and somewhat quick pass (probably good to have another pair of eyes take a look at)
pkg/checkpoint/process.go
Outdated
state: stateNone{}, | ||
} | ||
} | ||
// Always overwrite with the localParentPod since it is a better source of truth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better source of truth than what (and for what)? Do we not just use name/namespace (I wouldn't expect that to change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's using this pod to write to disk so we want to make sure we have the most updated copy of it right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. Duh.
pkg/checkpoint/process.go
Outdated
if !ok || isPodCheckpointer(inactiveCheckpoints[id], checkpointerPod) { | ||
glog.V(4).Infof("Should start checkpoint %s", id) | ||
start = append(start, id) | ||
// The apiserver pod is deleted, transition to stateInactiveGracePeriod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the apiserver parent pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/checkpoint/process.go
Outdated
for k := range removeMap { | ||
if k == podCheckpointerID { | ||
continue | ||
// The apiserver pod is still deleted, remain in stateInactiveGracePeriod. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apiserver parent pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b9d2b16
to
4544738
Compare
coreosbot run e2e checkpointer |
@aaronlevy thanks for the comments. Made a few small changes, also testing it against tectonic and it seems to be surviving apiserver destruction nicely |
coreosbot run e2e checkpointer |
pkg/checkpoint/manifest.go
Outdated
@@ -73,3 +69,11 @@ func writeManifestIfDifferent(path, name string, data []byte) (bool, error) { | |||
glog.Infof("Writing manifest for %q to %q", name, path) | |||
return true, writeAndAtomicRename(path, data, 0644) | |||
} | |||
|
|||
func writeAndAtomicRename(path string, data []byte, perm os.FileMode) error { | |||
tmpfile := filepath.Join(filepath.Dir(path), "."+filepath.Base(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I notice that maybe we should not write the file under the directory.
Even with the prefix of ".", it will still be scanned by ReadDir()
in func getFileCheckpoints()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yifan-gu since the checkpointer is single-threaded getFileCheckpoints()
should not run concurrently. The only problem is if the program crashes in the middle of writeAndAtomicRename
and we have a leftover .foo.json
file sitting there.
I can adjust getFileCheckpoints()
to ignore files that start with .
, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my question is why we are not just using a system temp dir here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yifan-gu Atomic renames are only guaranteed in the same directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yifan-gu added some code in getFileCheckpoints()
to clean up old files that start with .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://godoc.org/os#Rename calls rename()
says 'OS-specific restrictions may apply when oldpath and newpath are in different directories. If there is an error, it will be of type *LinkError.'
If you look at http://man7.org/linux/man-pages/man2/rename.2.html, EXDEV
occurs if the old and new locations are on different filesystems (I've seen this happen before). It's safer to keep it in the same directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ioutil.TempFile would be a smart choice to create a temporary file within a certain directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rphillips the only difference would be that it will choose a random rather than deterministic filename, as we would be calling it with the current directory and .
prefix, and then writing to it... I can make the change anyway though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't a hard blocker... just a suggestion if we need to enumerate those files with a better prefix in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh already changed it.
t.Errorf("For test: %s\nExpected start: %s Got: %s\nExpected stop: %s Got: %s\nExpected remove: %s Got: %s\n", | ||
tc.desc, tc.expectStart, gotStart, tc.expectStop, gotStop, tc.expectRemove, gotRemove) | ||
!reflect.DeepEqual(tc.expectRemove, gotRemove) || | ||
!reflect.DeepEqual(tc.expectGraceStart, gotGraceStart) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe just print them one by one? I mean:
if !reflect.DeepEqual() {
t.Errorf()
}
if !reflect.DeepEqual() {
t.Errorf()
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple states this can get a little verbose. I prefer to keep it as is.
pkg/checkpoint/process.go
Outdated
return starts, stops, removes | ||
} | ||
|
||
// stateSelfCheckpoint represents a checkpoint of the checkpointer itself, which has special |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define all these states in a separate file? this process.go
is getting big, I think it would be cleaner to just keep control logics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/checkpoint/process.go
Outdated
} | ||
|
||
// state represents the current state of a checkpoint. | ||
type state interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a little confused by the name at first because we also have apiState
, either change apiState
to something else like apiCondition
, or rename this to podState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (both)
@yifan-gu thanks for the review! PTAL |
coreosbot run e2e checkpointer |
Thanks a lot @diegs, first time coreos/tectonic-installer#1938 passed green with this one 👍 |
pkg/checkpoint/state.go
Outdated
// behavior. | ||
// | ||
// stateSelfCheckpoint can transition to stateInactiveGracePeriod. | ||
type stateSelfCheckpoint struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: stateSelfCheckpointActive ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction, overall LGTM.
I hope to see some more unit tests that exercises all the transition path between those states in the future, not a blocker.
This implements an explicit state machine to capture the various states a checkpoint can be in. It recovers those states from disk upon startup (if applicable) and then reconciles them with the states that are fetched from the various apiservers. It also adds 2 new states to attempt to work around issues with kubernetes 1.8 in which daemonset pods are deleted before being recreated. For single-master clusters this can lead to a total outage during apiserver upgrades since the checkpointer will aggressively retire the checkpoint for the deleted apiserver pod before it has a chance to schedule the new one.
@yifan-gu thanks, I added a simple test to enforce only allowed transitions can occur. It doesn't actually check behavior (I was relying on the existing test for that) but I think it's a small improvement for now. |
coreosbot run e2e checkpointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If the checkpointer crashes during `writeAndAtomicRename()` then it may leave temporary files that start with `.`. This checks for and removes them. Also changes `writeAndAtomicRename()` to use `ioutil.TempFile()`.
coreosbot run e2e checkpointer |
coreosbot run e2e checkpointer |
This implements an explicit state machine to capture the various states
a checkpoint can be in. It recovers those states from disk upon startup
(if applicable) and then reconciles them with the states that are
fetched from the various apiservers.
It also adds 2 new states to attempt to workaround issues with
kubernetes 1.8 in which daemonset pods are deleted before being
recreated. For single-master clusters this can lead to a total outage
during apiserver upgrades since the checkpointer will aggressively
retire the checkpoint for the deleted apiserver pod before it has
a chance to schedule the new one.