boot: cmd/snap-bootstrap: handle a candidate recovery system v2 #9940

bboozzoo · 2021-02-18T07:47:39Z

Supersedes #9928

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…om initramfs Add a helpers for checking whether the current recovery system is being tried and marking its state. Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…rdingly When we 'try' a new recovery system, make sure it boots up to a point where snap-bootstrap has set up the system for further execution in after-pivot world. This ensures that the try system is functional until that point, but does not execute any services from it in case those would alter the system state. We also verify that the ubuntu-data is mounted and accessible, and as such the new recovery system should be functional if fully booted. Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…being removed Auto labeler seems to be removing the labels unexpectedly, see: - snapcore#9940 (comment) label was added manually when opening a PR and later removed by bot - snapcore#9943 (comment) modified files match the label, but labeler removed 'Run nested' label Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…t' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2-wip

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…t' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2

…em and back Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…overy-mgmt-sb-try-handling-v2

…est' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2

anonymouse64

A couple comments, I have not finished reviewing the tests yet

anonymouse64 · 2021-03-04T02:47:37Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+	// do we even have ubuntu-data?
+	haveData := m.degradedState.partition("ubuntu-data").FindState == partitionFound
+
+	if m.noFallback && (!haveData || assumeEncrypted) {


this logic is a bit confusing to me what is the purpose of the || assumeEncrypted) ?

Also I think this may warrant at least an explanation in the README about the behavior if not an update to the state machine diagram (though I don't think there are any new transitions, just an additional exit path)

I don't understand why we have a && part here at all? which scenario are we considering? I'm not understanding the comment below

Those are 2 possible scenarios where we want to exit early:

the system is encrypted (assumeEncrypted == true), the call to using fallback key is unwarranted for whatever reason (unlocking the data or save with run key must have failed, or data is not mounted)

we could not find ubuntu-data (!haveData == true), thus we don't know if the system is encrypted or not and we try using save fallback key, but we don't want that either

I have left out the case for system being unencrypted and data failing to mount, as I assume this is part of the health check, which basically tells whether the relevant data is there (in this case it's state.json). Though to make it consistent in all cases, the check could verify that data is actually mounted and block this path early too. Does that make sense?

As discussed in a quick chat, I'll try to prepare a little cleanup of the state functions around this part so that it is clear whether we are following the encrypted or unencrypted device code paths.

anonymouse64 · 2021-03-04T02:50:44Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+	}
+
+	if outcome == boot.TryRecoverySystemOutcomeSuccess {
+		if err := healthCheck(); err != nil {


this err is not assigning to the one in the function as intended, should this be

Suggested change

if err := healthCheck(); err != nil {

if err = healthCheck(); err != nil {

that way it can be included in the first error message in the defer'd function

I'm not sure what's the original intention here, but this change in particular doesn't help surfacing the error because we will still end with "return nil" and that nil will be assigned to err before we process the defers

I've added a log here instead.

pedronis

did a pass, to be honest right now I'm a bit confused by some of the code here

pedronis · 2021-03-05T09:35:07Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+	}
+
+	if outcome == boot.TryRecoverySystemOutcomeSuccess {
+		if err := healthCheck(); err != nil {


I'm not sure what's the original intention here, but this change in particular doesn't help surfacing the error because we will still end with "return nil" and that nil will be assigned to err before we process the defers

pedronis · 2021-03-05T09:36:41Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+		if finalizeErr != nil {
+			err = finalizeErr
+		}
+		panic(fmt.Errorf("%v tried recovery system %q: %v", status, mst.recoverySystem, err))


successful/failed tried reads a bit oddly ?

Yeah, I tried to make it too fancy. Dropped this now. Also made sure that relevant errors are actually logged, so that at least have some trace of what failed.

pedronis · 2021-03-05T09:40:08Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+	// do we even have ubuntu-data?
+	haveData := m.degradedState.partition("ubuntu-data").FindState == partitionFound
+
+	if m.noFallback && (!haveData || assumeEncrypted) {


I don't understand why we have a && part here at all? which scenario are we considering? I'm not understanding the comment below

pedronis · 2021-03-05T09:41:17Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+			// return back to run mode
+			finalizeErr := finalizeTryRecoverySystemAndReboot(boot.TryRecoverySystemOutcomeInconsistent)
+			// not reached, unless in tests
+			panic(fmt.Errorf("inconsistent tried recovery system bootenv: %v", finalizeErr))


s/tried/try?

my 2cts is that it would be nicer if finalizeTryRecoverySystemAndReboot would not return an error but either do its job or panic directly

fair point, i've refactored the code to end in finalizeTry..

…overy-mgmt-sb-try-handling-v2

…errors, extend tests Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

…oo/uc20-recovery-mgmt-sb-try-handling-v2-wip

…overy-mgmt-sb-try-handling-v2-wip

pedronis

thanks for the changes, couple of comments

pedronis · 2021-03-24T16:04:34Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+		if err == nil && !machine.degraded() {
+			outcome = boot.TryRecoverySystemOutcomeSuccess
+		}
+		if outcome == boot.TryRecoverySystemOutcomeFailure {


maybe I'm missing something but wouldn't an else be clearer than setting outcome and then checking it again?

pedronis · 2021-03-24T16:07:38Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

@@ -948,6 +956,11 @@ func (m *recoverModeStateMachine) openUnencryptedSave() (stateFunc, error) {
 func (m *recoverModeStateMachine) unlockEncryptedSaveFallbackKey() (stateFunc, error) {
 	// try to unlock save with the fallback key on ubuntu-seed, which must have
 	// been mounted at this point
+
+	if m.noFallback {


we could also short cut unlockMaybeEncryptedAloneSaveFallbackKey? it still works as is because over that path we mark things as degraded but is a bit of pointless work?

I think I'd leave the check here. It's probably a bit cleaner this way. Also, we identify the system as failed if we either get an error or are in the degraded mode. In case of disabled fallback, we'd get an error if we reach the fallback path.

…overy-mgmt-sb-try-handling-v2

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

anonymouse64

overall, lgtm some test nitpicks

anonymouse64 · 2021-03-29T19:50:01Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+
+	tryingCurrentSystem, err := boot.InitramfsIsTryingRecoverySystem(mst.recoverySystem)
+	if err != nil {
+		if boot.IsInconsystemRecoverySystemState(err) {


it just now occurred to me that this is a typo, I assume that the original name intention of the function is:

Suggested change

if boot.IsInconsystemRecoverySystemState(err) {

if boot.IsInconsistentSystemRecoverySystemState(err) {

Haha, good catch.

anonymouse64 · 2021-03-29T19:51:14Z

cmd/snap-bootstrap/cmd_initramfs_mounts.go

+			logger.Noticef("try recovery system state is inconsistent: %v", err)
+			finalizeTryRecoverySystemAndReboot(boot.TryRecoverySystemOutcomeInconsistent)
+		}
+		// this could be an inconsistency in the state


could you clarify what kind of situations could lead here, using the phrase inconsistency is confusing, since above the function we check for is IsInconsistent..., so just saying that this could be an inconsistency in the state is a bit weird, since one would assume inconsistencies would have been true for the above function.

actually this comment is no longer correct and should be dropped

anonymouse64 · 2021-03-29T20:04:03Z

tests/core/uc20-try-recovery/task.yaml

@@ -0,0 +1,76 @@
+summary: verify early boot handling of a try recovery system on UC20


nice spread test!

anonymouse64 · 2021-03-29T20:05:03Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

@@ -4949,3 +4949,496 @@ func (s *initramfsMountsSuite) TestInitramfsMountsInstallModeUnsetMeasure(c *C)
 func (s *initramfsMountsSuite) TestInitramfsMountsRecoverModeMeasure(c *C) {
 	s.testInitramfsMountsInstallRecoverModeMeasure(c, "recover")
 }
+
+func (s *initramfsMountsSuite) runInitramfsMountsUnencryptedTryRecovery(c *C, triedSystem bool) (err error) {


we might seriously need to think about not adding new tests to this file and instead creating new files here, it's a bit cough cough large

anonymouse64 · 2021-03-29T20:27:53Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

+	defer bootloader.Force(nil)
+
+	hostUbuntuData := filepath.Join(boot.InitramfsRunMntDir, "host/ubuntu-data/")
+	mockedState := filepath.Join(hostUbuntuData, "system-data/var/lib/snapd/state.json")


mmmh it would be nice if we actually wrote this file only after the systemd-mount for ubuntu-data was called/performed in the callback, but this is probably okay as-is

anonymouse64 · 2021-03-29T20:35:09Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

+		// system
+		"recovery_system_status": "try",
+		"try_recovery_system":    "1234",
+		// system is set up to go into run more if rebooted


Suggested change

// system is set up to go into run more if rebooted

// system is set up to go into run mode if rebooted

anonymouse64 · 2021-03-29T20:38:08Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

+		switch unlockVolumeWithSealedKeyCalls {
+
+		case 1:
+			// ubuntu data can't be unlocked with run key


Suggested change

// ubuntu data can't be unlocked with run key

anonymouse64 · 2021-03-29T20:38:27Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

+			c.Assert(name, Equals, "ubuntu-data")
+			c.Assert(sealedEncryptionKeyFile, Equals, filepath.Join(s.tmpDir, "run/mnt/ubuntu-boot/device/fde/ubuntu-data.sealed-key"))
+			if unlockDataFails {
+				return foundEncrypted("ubuntu-data"), fmt.Errorf("failed to unlock ubuntu-data with run object")


Suggested change

return foundEncrypted("ubuntu-data"), fmt.Errorf("failed to unlock ubuntu-data with run object")

// ubuntu-data can't be unlocked with the run key

return foundEncrypted("ubuntu-data"), fmt.Errorf("failed to unlock ubuntu-data with run object")

anonymouse64 · 2021-03-29T20:41:12Z

cmd/snap-bootstrap/cmd_initramfs_mounts_test.go

+	hostUbuntuData := filepath.Join(boot.InitramfsRunMntDir, "host/ubuntu-data/")
+	mockedState := filepath.Join(hostUbuntuData, "system-data/var/lib/snapd/state.json")
+	c.Assert(os.MkdirAll(filepath.Dir(mockedState), 0750), IsNil)
+	c.Assert(ioutil.WriteFile(mockedState, []byte(mockStateContent), 0640), IsNil)


why create these files if we are mocking a failure for the recovery health check anyways?

It's a bit artificial scenario where setting up try recovery has been successful so far (thus copying of state files from ubuntu-data was successful too), and only the health check fails. I've added a comment in the code to cover that.

…overy-mgmt-sb-try-handling-v2

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

bboozzoo · 2021-03-30T13:31:43Z

The relevant test is green and the failures are unrelated. @pedronis I think we can land it.

bboozzoo added 3 commits February 18, 2021 08:24

boot: export initramfs reboot helper

f6c4621

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

boot: helper for checking and marking tried recovery system status fr…

45076df

…om initramfs Add a helpers for checking whether the current recovery system is being tried and marking its state. Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

bboozzoo added the Run nested The PR also runs tests inluded in nested suite label Feb 18, 2021

bboozzoo requested review from pedronis and anonymouse64 February 18, 2021 07:47

github-actions bot removed the Run nested The PR also runs tests inluded in nested suite label Feb 18, 2021

bboozzoo added the Run nested The PR also runs tests inluded in nested suite label Feb 18, 2021

bboozzoo mentioned this pull request Feb 18, 2021

boot: helper for checking and marking tried recovery system status from initramfs #9942

Merged

cmd/snap-bootstrap: cleanup and refactor disabled fallback path handling

e1e5115

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

github-actions bot removed the Run nested The PR also runs tests inluded in nested suite label Feb 18, 2021

bboozzoo added the Run nested The PR also runs tests inluded in nested suite label Feb 18, 2021

bboozzoo mentioned this pull request Feb 18, 2021

github: temporarily disable action labeler due to issues with labels being removed #9944

Merged

bboozzoo added 6 commits February 25, 2021 12:42

Merge branch 'bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2-just-boo…

6e63c31

…t' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2-wip

cmd/snap-bootstrap: use try recovery system outcome, extend tests

52bae8b

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

Merge branch 'bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2-just-boo…

6e84920

…t' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2

tests/core/uc20-try-recovery: verify switching to a try recovery syst…

764de5e

…em and back Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

Merge remote-tracking branch 'upstream/master' into bboozzoo/uc20-rec…

2e5ba08

…overy-mgmt-sb-try-handling-v2

Merge branch 'bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2-spread-t…

c37312e

…est' into bboozzoo/uc20-recovery-mgmt-sb-try-handling-v2

bboozzoo mentioned this pull request Mar 2, 2021

boot: helper for setting up a try recover system #9921

Merged

anonymouse64 reviewed Mar 4, 2021

View reviewed changes

pedronis added the UC20 label Mar 4, 2021

pedronis reviewed Mar 5, 2021

View reviewed changes

bboozzoo added 4 commits March 5, 2021 11:10

Merge remote-tracking branch 'upstream/master' into bboozzoo/uc20-rec…

c2fabf8

…overy-mgmt-sb-try-handling-v2

cmd/snap-bootstrap: tweak finalize try recovery system handling, log …

36c4961

…errors, extend tests Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

Merge branch 'bboozzoo/sb-un-encrypted-code-path-cleanup' into bboozz…

be8e108

…oo/uc20-recovery-mgmt-sb-try-handling-v2-wip

Merge remote-tracking branch 'upstream/master' into bboozzoo/uc20-rec…

d34f804

…overy-mgmt-sb-try-handling-v2-wip

bboozzoo requested review from pedronis and anonymouse64 March 23, 2021 13:30

pedronis approved these changes Mar 24, 2021

View reviewed changes

bboozzoo added 2 commits March 25, 2021 11:13

Merge remote-tracking branch 'upstream/master' into bboozzoo/uc20-rec…

80fe0ed

…overy-mgmt-sb-try-handling-v2

cmd/snap-bootstrap: tweak tried system handling

ccf8adb

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

anonymouse64 approved these changes Mar 29, 2021

View reviewed changes

bboozzoo added 3 commits March 30, 2021 13:20

Merge remote-tracking branch 'upstream/master' into bboozzoo/uc20-rec…

36805f6

…overy-mgmt-sb-try-handling-v2

boot, cmd/snap-bootstrap: fix typo in helper name

51550db

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

cmd/snap-bootstrap: comments and test tweaks

d4bb220

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>

pedronis merged commit f525ab4 into snapcore:master Mar 30, 2021

	if err := healthCheck(); err != nil {
	if err = healthCheck(); err != nil {

	if boot.IsInconsystemRecoverySystemState(err) {
	if boot.IsInconsistentSystemRecoverySystemState(err) {

		@@ -0,0 +1,76 @@
		summary: verify early boot handling of a try recovery system on UC20

	// system is set up to go into run more if rebooted
	// system is set up to go into run mode if rebooted

	return foundEncrypted("ubuntu-data"), fmt.Errorf("failed to unlock ubuntu-data with run object")
	// ubuntu-data can't be unlocked with the run key
	return foundEncrypted("ubuntu-data"), fmt.Errorf("failed to unlock ubuntu-data with run object")

boot: cmd/snap-bootstrap: handle a candidate recovery system v2 #9940

boot: cmd/snap-bootstrap: handle a candidate recovery system v2 #9940

Conversation

bboozzoo commented Feb 18, 2021

anonymouse64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pedronis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pedronis Mar 5, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pedronis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anonymouse64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bboozzoo commented Mar 30, 2021

pedronis Mar 5, 2021 •

edited