Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boot: helper for setting up a try recover system #9921

Merged

Conversation

bboozzoo
Copy link
Collaborator

Add a new helper for setting up a new recovery system to be tried. A request to
reboot into given recovery system must be issued separately.

Based on top of #9918, plus some trivial enhancements in seal and bootloader test.

@bboozzoo bboozzoo force-pushed the bboozzoo/uc20-recovery-mgmt-boot-set-try branch from 1c1d4b0 to 63794c5 Compare February 10, 2021 10:49
@anonymouse64 anonymouse64 self-requested a review February 10, 2021 14:20
Improve some error messages in the sealing code.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
…BootVars errors

This gives slightly more flexibility in mocking different failure modes.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Add a new helper for setting up a new recovery system to be tried. A request to
reboot into given recovery system must be issued separately.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@bboozzoo bboozzoo force-pushed the bboozzoo/uc20-recovery-mgmt-boot-set-try branch from 63794c5 to 879e3c8 Compare February 10, 2021 14:50
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some minor comments and a question

boot/systems.go Show resolved Hide resolved
)

func clearTryRecoverySystem(dev Device, systemLabel string) error {
if !dev.HasModeenv() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect to call this in other contexts more standalone to have to do this check again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this eventually becomes ClearTryRecoverySystem and gets called from a devicestate handler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if indeed this function will be used in more places than just to specifically cleanup in failure modes of SetTryRecoverySystem we need to think carefully about the order of operations here to make sure we don't accidentally brick things with an unexpected reboot in between operations

boot/systems.go Outdated
}
vars := map[string]string{
"try_recovery_system": systemLabel,
"recovery_system_status": "try",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will not appear on the kernel command line, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only bootenv. Well, at least our boot script is not passing that on the command line.

boot/seal.go Outdated
@@ -592,6 +592,10 @@ func runModeBootChains(rbl, bl bootloader.Bootloader, model *asserts.Model, mode
// hashes from modeenv plus it returns separately the last BootFile
// which is for the kernel.
func buildBootAssets(bootFiles []bootloader.BootFile, modeenv *Modeenv) (assets []bootAsset, kernel bootloader.BootFile, err error) {
if len(bootFiles) == 0 {
// useful in testing, when mocking is insufficient
return nil, bootloader.BootFile{}, fmt.Errorf("internal error: no boot files")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it doesn't matter because it will show up only in tests, so there's context. I'm just slightly worried that as a message in doesn't make too clear where it is coming from. Though at least atm we don't have any ohter "no boot files" error

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, some questions about the ordering of operations though as it pertains to getting rebooted in the middle of things

boot/systems.go Outdated
Comment on lines 59 to 62
if found {
// we may be repeating the cleanup, in which case the system may
// not be present in modeenv already
if err := m.Write(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if found {
// we may be repeating the cleanup, in which case the system may
// not be present in modeenv already
if err := m.Write(); err != nil {
// we may be repeating the cleanup, in which case the system was
// already removed from the modeenv and we don't need to rewrite
// the modeenv
if found {
if err := m.Write(); err != nil {

boot/systems.go Outdated
Comment on lines 69 to 77
resealErr := resealKeyToModeenv(dirs.GlobalRootDir, dev.Model(), m, expectReseal)

// clear both variables, no matter the values they hold
vars := map[string]string{
"try_recovery_system": "",
"recovery_system_status": "",
}
// try to clear regardless of reseal failing
blErr := bl.SetBootVars(vars)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the ordering here safe? if we end up resealing before we manage to write the boot vars and got rebooted in between, would the system try to try booting back into that recovery system which is no longer trusted since we resealed to remove it? I would think that we would want the opposite order, writing bootloader variables first, then resealing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking of this kind of ordering, what is your opinion on how hard it would be to make something in resealKeyToModeenv panic() thus simulating a reboot in the middle of the operation to ensure that operations are in a consistent state? too much work / too little reward since it's a fairly simple set of operations right now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ian is right here about the order unless we are confused

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booting into the system is controlled by snapd_recovery_mode and snapd_recovery_system, which are set separately. This cleanup is invoked if SetTryRecoverySystem fails, which is before we try to reboot to a new system, or after we're back to run mode, in which case the bootloader variables selecting the mode/system would have been modified and would no longer boot into the tried system. Need to think about it a bit, maybe I missed something.

boot/systems.go Outdated
}
}()
const expectReseal = true
if err := resealKeyToModeenv(dirs.GlobalRootDir, dev.Model(), m, expectReseal); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw I think the ordering of operations here is safe, we are resealing first, but only indicate to the bootloader that it is safe to boot into those by setting the vars after resealing is successful

err := boot.SetTryRecoverySystem(s.uc20dev, "1234")
c.Assert(err, IsNil)

c.Check(mtbl.BootVars, DeepEquals, map[string]string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try and use GetBootVars as much as possible so we can stop exporting BootVars at some point and make the mock bootloaders act more like real bootloaders

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still need to have something line GetAllBootVars() to make sure nothing unexpected was set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes agreed on that


modeenvRead, err := boot.ReadModeenv("")
c.Assert(err, IsNil)
c.Check(modeenvRead.CurrentRecoverySystems, DeepEquals, []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but I also have slight preference for doing a full DeepEquals comparison to ensure that nothing else was modified accidentally on the object

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepEquals pick up the read flag that gets set when it's read from a file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, since we are in the boot pkg though you could use modeenvRead.DeepEqual(exp) I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint, I've updated the tests.

})
}

func (s *systemsSuite) TestSetTryRecoverySystemSetBootVarsErr(c *C) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

})
}

func (s *systemsSuite) TestSetTryRecoverySystemCleanupOnErrorAfterReseal(c *C) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a great test, thanks for this

)

func clearTryRecoverySystem(dev Device, systemLabel string) error {
if !dev.HasModeenv() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if indeed this function will be used in more places than just to specifically cleanup in failure modes of SetTryRecoverySystem we need to think carefully about the order of operations here to make sure we don't accidentally brick things with an unexpected reboot in between operations

…ith mocked bootloader

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
boot/systems.go Outdated
}

// SetTryRecoverySystem sets up the boot environment for trying out a recovery
// system with given label. Once done, the caller should request switching to a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"switching to the..." ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that when I pushed the first few fixed, but added it now.

…t' into bboozzoo/uc20-recovery-mgmt-boot-set-try
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
}

const expectReseal = true
return resealKeyToModeenv(dirs.GlobalRootDir, dev.Model(), m, expectReseal)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have altered the order of operations here, to match what we have in the document, that is:

  1. set try_recovery_system, recovery_system_status boot variables
  2. reseal

If for any reason we reboot, and unexpectedly boot into the tried recovery system before resealing, with the changes in #9940 accessing data will be impossible and we will reboot back to run mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is fairly strange to do it differently than everything else though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine I think but it merits a comment explaining what you said but also that is up to the caller to also switch the mode touching the env var that actually affect where we boto, I also wonder if it means that we should have a helper actually doing both? but maybe that's clearer when one sees the full code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that these things should be called out in comments, but the ordering and logic here makes sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Copy link
Collaborator

@pedronis pedronis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok but still a bit confusing so if nothing else it needs more comment, I also wonder if we need a helper that actually does more, basically an EnsureNextBootToTryRecoverySystem ? maybe is clear in the full code

boot/systems.go Outdated

// SetTryRecoverySystem sets up the boot environment for trying out a recovery
// system with given label. Once done, the caller should request switching to
// the given recovery system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should describe that this adds to the current_recovery_systems and also reseals to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

}

const expectReseal = true
return resealKeyToModeenv(dirs.GlobalRootDir, dev.Model(), m, expectReseal)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine I think but it merits a comment explaining what you said but also that is up to the caller to also switch the mode touching the env var that actually affect where we boto, I also wonder if it means that we should have a helper actually doing both? but maybe that's clearer when one sees the full code

Copy link
Member

@anonymouse64 anonymouse64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pedronis about adding a comment, but otherwise lgtm, thanks for the work here. One nitpicky thing about tests that you can take in a followup or disregard as you prefer.

}

const expectReseal = true
return resealKeyToModeenv(dirs.GlobalRootDir, dev.Model(), m, expectReseal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that these things should be called out in comments, but the ordering and logic here makes sense


modeenvRead, err := boot.ReadModeenv("")
c.Assert(err, IsNil)
c.Check(modeenvRead.CurrentRecoverySystems, DeepEquals, []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, since we are in the boot pkg though you could use modeenvRead.DeepEqual(exp) I think

Thanks to @anonymouse64 for the suggestion.

Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
Signed-off-by: Maciej Borzecki <maciej.zenon.borzecki@canonical.com>
@bboozzoo bboozzoo merged commit 5721e5e into snapcore:master Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants