cmd/snap-update-ns: add execWritableMimic #4315

Merged
merged 7 commits into from Jan 3, 2018

Conversation

Projects
None yet
4 participants
Contributor

zyga commented Nov 28, 2017

This patch adds a function that executes a plan for a writable mimic and
constructs one and returns an "undo plan" that contains simplified view
of the changes that are suitable for undo.

Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com

codecov-io commented Nov 28, 2017

Codecov Report

❗️ No coverage uploaded for pull request base (master@8bb4282). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #4315   +/-   ##
=========================================
  Coverage          ?   78.08%           
=========================================
  Files             ?      449           
  Lines             ?    30997           
  Branches          ?        0           
=========================================
  Hits              ?    24203           
  Misses            ?     4780           
  Partials          ?     2014
Impacted Files Coverage Δ
cmd/snap-update-ns/utils.go 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bb4282...06859a2. Read the comment docs.

cmd/snap-update-ns: add execWritableMimic
This patch adds a function that executes a plan for a writable mimic and
constructs one and returns an "undo plan" that contains simplified view
of the changes that are suitable for undo.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@zyga zyga changed the title from cmd/snap-update-ns: add code that executes a plan for writable mimic to cmd/snap-update-ns: add execWritableMimic Nov 29, 2017

cmd/snap-update-ns/utils.go
@@ -362,6 +362,93 @@ func planWritableMimic(dir string) ([]*Change, error) {
return changes, nil
}
+// execWritableMimic executes the plan for a writable mimic.
+// The result is a transformed mount namespace and a set of fake mount changes
@bboozzoo

bboozzoo Nov 29, 2017

Contributor

Got slightly confused with the wording, or maybe I'm just missing a wider context here. The idea is that the function returns a list of changes that need to be 'undone' when cleaning up. Did I get this right?

@zyga

zyga Nov 29, 2017

Contributor

A list of changes that can be handled by the undo algorithm.

@bboozzoo

bboozzoo Nov 29, 2017

Contributor

Got it. Thanks for explaining.

+ if i == 0 || i == len(plan)-1 {
+ // Don't represent the initial and final changes in the undo plan.
+ // The initial change is the safe-keeping bind mount, the final
+ // change is the safe-keeping unmount.
@bboozzoo

bboozzoo Nov 29, 2017

Contributor

Does that warrant a check len(plan) >= 2 somewhere at the top of the function?

@zyga

zyga Nov 29, 2017

Contributor

I was thinking about it but I'm more inclined to just slap the two together in one function. They were split for testing more than anything. I don't want to add lots of checks for the structure that would be irrelevant if this was one function.

cmd/snap-update-ns/utils.go
+ if _, err2 := changePerform(recoveryUndoChange); err2 != nil {
+ // Drat, we failed when trying to recover from an error.
+ // We cannot do anything at this stage.
+ panic(fmt.Errorf("cannot undo change %q while recovering from earlier error %v: %v", recoveryUndoChange, err, err2))
@bboozzoo

bboozzoo Nov 29, 2017

Contributor

I'm not convinced that panicking here is the right thing to do. Perhaps passing an error further up the stack would be a better solution. You should have more context there and can produce something more informative than this error and a stack trace.

Since it's a 'fatal-needing-a-panic error' this might warrant a separate type so that it's distinguishable.

type FatalError struct {
    error
}

func IsFatalError(err error) bool {
    _, ok := err.(*FatalError)
    return ok
}

func NewFatalError(err error) *FatalError {
    return &FatalError{error: err}
}

working example: https://play.golang.org/p/nMTpTDMbHl

@zyga

zyga Nov 29, 2017

Contributor

Well, the caller up the stack will just error (which is what this will do). I can make the transformation but really, at this stage we cannot do anything useful.

@zyga

zyga Dec 5, 2017

Contributor

Done

zyga added some commits Nov 30, 2017

cmd/snap-update-ns: don't panic, return FatalError
This has little semantic difference but allows the callers to handle the
error (even if just to terminate the process) in a more standard way.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

Looks very good, just a few minor comments, mostly questions. Thanks for extensive comments!

cmd/snap-update-ns/utils.go
+ if _, err := changePerform(change); err != nil {
+ recoveryUndoChanges := make([]*Change, 0, len(undoChanges)+1)
+ if i > 0 {
+ // The undo plan doesn't contain the entry for the initial bind
@stolowski

stolowski Jan 3, 2018

Contributor

Question: why don't we just include an entry for initial bind in the undo plan and get rid of special casing here?

@zyga

zyga Jan 3, 2018

Contributor

Because the initial bind is temporary and doesn't exist when the plan is over. In either case we are looking at some special casing and I choose to do it on the error path where it felt more natural.

cmd/snap-update-ns/utils.go
+ }
+ recoveryUndoChanges = append(recoveryUndoChanges, undoChanges...)
+
+ // Drat, we failed! Let's undo everything according to our own undo
@stolowski

stolowski Jan 3, 2018

Contributor

I find this comment slightly misplaced, I think it belongs a bit above - around if _, err := changePerform(change); err != nil {

@zyga

zyga Jan 3, 2018

Contributor

Yep. Moved now.

+ if _, err2 := changePerform(recoveryUndoChange); err2 != nil {
+ // Drat, we failed when trying to recover from an error.
+ // We cannot do anything at this stage.
+ return nil, &FatalError{error: fmt.Errorf("cannot undo change %q while recovering from earlier error %v: %v", recoveryUndoChange, err, err2)}
@stolowski

stolowski Jan 3, 2018

Contributor

Naive question: I understand we have a chain of changes that depend on predecessors, so if something fails we cannot fully recover anymore; but at the same time (AFAIR) we may have some changes that are independent (e.g. multiple mounts?), wouldn't it make sense to just try undo all mount actions for example to clean as much as possible (and return the error(s) anyway of course)?

@stolowski

stolowski Jan 3, 2018

Contributor

Also.. Is there a possibility of a transient failure (e.g. a mountpoint being held by a running process?), and a retry later would succeed? Is this something we should be worried about here?

@zyga

zyga Jan 3, 2018

Contributor

There are no running processes that can interfere with things in any way here. First of all we freeze everything so we can be confident that no ongoing mutations occur. Second of all we are constructing a mirror of something that we just mostly read. The final operations that replace the original cannot be stopped by any open files as we simply mount a tmpfs over existing files.

The error path here is really just about apparmor profile being broken. In normal cases where the code is correct it would never happen. If apparmor gets in the way all bets are off (it has the power to prevent us from undoing everything, for example). While I coded this defensively the idea is that it would never fail as a course of normal operation.

Independent changes are handled at a layer above this and this is indeed what we do, if something fails we just carry on with other changes that don't strongly depend on the things we failed with (I think we may even just try all the things, I need to check).

@stolowski

stolowski Jan 3, 2018

Contributor

Great, thanks for explanation!

+ // change is the safe-keeping unmount.
+ continue
+ }
+ if kind, _ := change.Entry.OptStr("x-snapd.kind"); kind == "symlink" {
@stolowski

stolowski Jan 3, 2018

Contributor

I don't think we have a test to check that this is excluded from the undo plan, do we? Can you extend the test below if this is the case?

@zyga

zyga Jan 3, 2018

Contributor

It was meant to be handled by
TestExecWirableMimicErrorWithRecovery but I put things in the wrong order. Fixed now.

zyga added some commits Jan 3, 2018

cmd/snap-update-ns: move comment around
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-update-ns: fix golint
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-update-ns: test symlink undo plan
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

Thank you!

@zyga zyga merged commit 55648cb into snapcore:master Jan 3, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@zyga zyga deleted the zyga:feature/exec-writable-mimic branch Jan 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment