Add upgrade-recovery subcommand #1974

anmazzotti · 2024-02-23T10:43:30Z

A different take of #1966

This PR does not actually introduce a new upgrade-recovery subcommand, but instead just adds a --recovery-only upgrade subcommand argument.

Scratch the above.
This is now something in the middle.

elemental upgrade-recovery --recovery-system.uri <imgRef> is the final choice.

elemental upgrade --recovery can still be used as usual to upgrade both recovery and system at the same time.

The upgrade-recovery subcommand shares the UpgradeSpec with upgrade to reuse the same configuration/initialization logic, but filters it at a command level.
Only --recovery-system probably makes sense here, most of other flags should be disabled.

frelon

I prefer adding the upgrade-recovery command.

Adding flags that completely changes the implementation of the upgrade command feels like a recipe for unintended consequences.

codecov-commenter · 2024-02-23T12:39:52Z

Codecov Report

Attention: Patch coverage is 38.38863% with 130 lines in your changes are missing coverage. Please review.

Project coverage is 72.27%. Comparing base (9538960) to head (3718d94).

Files	Patch %	Lines
pkg/action/upgrade-recovery.go	56.03%	40 Missing and 11 partials ⚠️
cmd/upgrade-recovery.go	20.00%	31 Missing and 1 partial ⚠️
cmd/config/config.go	0.00%	20 Missing ⚠️
pkg/types/v1/config.go	0.00%	16 Missing ⚠️
pkg/action/upgrade.go	41.66%	5 Missing and 2 partials ⚠️
pkg/constants/constants.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1974      +/-   ##
==========================================
- Coverage   73.04%   72.27%   -0.77%     
==========================================
  Files          74       76       +2     
  Lines        8713     8895     +182     
==========================================
+ Hits         6364     6429      +65     
- Misses       1831     1938     +107     
- Partials      518      528      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

anmazzotti · 2024-02-23T13:58:41Z

I prefer adding the upgrade-recovery command.

Adding flags that completely changes the implementation of the upgrade command feels like a recipe for unintended consequences.

From @davidcassany other comment I got the impression adding the argument was better. This is indeed simpler to implement. (And adding a separate action here is probably not needed too)

anmazzotti · 2024-02-26T14:17:35Z

Part of rancher/elemental#1218

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

davidcassany

Nice work 👍

I think the only actual request change I have to actually include the yaml loading unit test. I think the current sub viper setup is wrong (speaking from memory I might miss some detail), but in any case a simple unit test in that front will tell us and actually verify it works when fixed, in case it needs to be fixed.

The other issues, probably it is worth addressing them in a separate PR, not so sure. If we don't face obvious breaks and regressions with this I'd vote to merge it and iterate from that point. I think it might be easier this way.

davidcassany · 2024-02-29T11:27:30Z

pkg/action/upgrade-recovery.go

+	statePath := filepath.Join(u.spec.Partitions.State.MountPoint, constants.InstallStateFile)
+	if u.spec.Partitions.State.MountPoint == "/" || u.spec.Partitions.State.MountPoint == "/.snapshots" {
+		statePath = filepath.Join(constants.RunningStateDir, constants.InstallStateFile)
+	}


This lines are concerning, these are likely to explain why this does not work in a pod run. I got rid of this hack in upgrade because those paths shouldn't be hardcoded and we should try to minimize suc-upgrade impact in toolkit code (so I'd say that testing against /host/ or /host/.snapshots is also not an option).

Probably a compromise to start walking is not updating the state.yaml in State partition and figure out the consequences of such an inconsistency an figure out if there are better alternatives to manage state.yaml data (probably could be stored somewhere else such as in OEM, even I am not a fan of the idea but it would make is handling way easier).

Good catch. I did inherit the hack and I didn't notice it was removed.
It's gone now. Thank you

davidcassany · 2024-02-29T12:00:49Z

pkg/action/upgrade.go

-			return elementalError.NewFromError(err, elementalError.MountStatePartition)
-		}
-		cleanup.Push(umount)
+	umount, err = elemental.MountRWPartition(u.cfg.Config, u.spec.Partitions.State)


This is likely to produce an undesired effect or even fail, it is likely to attempt mounting the current root as RW. By default /run/initramfs/elemental-state is already mounted as RW on a running system with the current setup. So it should not be needed to actually mount it as RW, it already is as RW.

Also, on a running system on btrfs, the active subvolume is mounted by default on this partition and it could also be the reported mount point from ghw library. This is the actual reason this was only called in recovery mode in the former code. Probably we need to move or add the logic at

elemental-toolkit/pkg/snapshotter/btrfs.go

Lines 736 to 761 in 9538960

func findStateMount(runner v1.Runner, device string) (rootDir string, stateMount string, err error) {

output, err := runner.Run("findmnt", "-lno", "SOURCE,TARGET", device)

if err != nil {

return "", "", err

}

r := regexp.MustCompile(`@/.snapshots/\d+/snapshot`)

scanner := bufio.NewScanner(strings.NewReader(strings.TrimSpace(string(output))))

for scanner.Scan() {

lineFields := strings.Fields(scanner.Text())

if len(lineFields) != 2 {

continue

}

if strings.Contains(lineFields[1], constants.RunningStateDir) {

stateMount = lineFields[1]

} else if r.MatchString(lineFields[0]) {

rootDir = lineFields[1]

}

}

if stateMount == "" || rootDir == "" {

err = fmt.Errorf("could not find expected mountpoints, findmnt output: %s", string(output))

}

return rootDir, stateMount, err

}

into utils.GetAllPartitions method so we can have a known and consistent mountpoint defined for state partition.
The problem is that current Partitions struct and the underlaying ghw library assumes a single mountpoint per partition, while there could be more. In fact in the upgrade pod mountpoints under /host/run and /run are duplicated, we have not criteria from elemental-toolkit to distinguish them or even reporting them.
At mid term I'd probably drop ghw library use in favor of an lsblk wrapper (we used to have that at a time) and report at list of mountpoints per partition. But this is relatively ambitious refactor... specially in terms of testing.

Interesting. I actually changed this to always remount it because the CI tests were failing because /run/initramfs/elemental-state could not be found. I will revert this change here, most likely this is not the solution if there was an issue with it.
I also noticed the state partition was already RW mounted, this is why I added the debug log somewhere else to confirm this. Then the test just worked. 🤷

davidcassany · 2024-02-29T12:11:39Z

cmd/config/config.go

+	if err != nil {
+		return nil, fmt.Errorf("failed initializing upgrade recovery spec: %v", err)
+	}
+	vp := viper.Sub("upgrade-recovery")


Are we sure this is correct? shouldn't it be vp := viper.Sub("upgrade")? IIRC this states the sub schema item, hence it maps whatever is loaded from the yaml under upgrade-recovery stanza. I guess this could be tested and validated by checking this method properly reads upgrade section of a yaml. I assume this falls into the scope of cmd/config/config_test.go.

I think this should be fixed or at least validated with a unit test in this PR.

Yes, sorry I got confused (again :D ) with it.
I removed it all and just added a (bit ugly) flag to sanitize the same stanza when used for recovery only, since it's the only logical difference in there.

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

davidcassany · 2024-02-29T13:27:16Z

pkg/action/upgrade-recovery.go

 	return u.cfg.WriteInstallState(
-		u.spec.State, statePath,
+		u.spec.State, filepath.Join(u.spec.Partitions.State.MountPoint, constants.InstallStateFile),


This is likely to cause failures in recovery-only upgrades as for those deployments using btrfs snapshotter the u.spec.Partitions.State.Mountpoint is likely to point to / (if running directly on the host) or /host (if running i the system-upgrade-controller pod).

As said before I think thin can be addressed later in follow up PR, it is not a regression.

frelon reviewed Feb 23, 2024

View reviewed changes

anmazzotti changed the title ~~Add --recovery-only upgrade argument~~ Add upgrade-recovery subcommand Feb 26, 2024

anmazzotti force-pushed the implement-upgrade-recovery-command-2 branch from 49451b4 to 3246717 Compare February 27, 2024 12:45

anmazzotti added 10 commits February 28, 2024 12:01

Add --recovery-only upgrade argument

6abffbc

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Add --recovery-only flag

cd6e16b

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Fix mapping

21c29ab

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Random typo fix

523ca2b

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Always init snapshotter and transaction

b4e9582

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Implement to upgrade-recovery separate command

e1c4d6a

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Fix constants file

9f9747c

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Do not update install state files if not needed

bcc20d8

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Test that upgrading recovery from recovery is not possible

3dc13bc

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Add upgrade-recovery action tests

825c402

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

anmazzotti force-pushed the implement-upgrade-recovery-command-2 branch from 0e19aad to 825c402 Compare February 28, 2024 11:03

anmazzotti added 5 commits February 28, 2024 13:35

Update documentation

3718d94

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Remove unused argument

91d1a78

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Fix upgrade-recovery ENV mapping

004fb58

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Always remount recovery partition

1532598

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Add debug log

935cbcd

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

anmazzotti marked this pull request as ready for review February 29, 2024 08:01

anmazzotti requested a review from a team as a code owner February 29, 2024 08:01

davidcassany requested changes Feb 29, 2024

View reviewed changes

anmazzotti added 3 commits February 29, 2024 13:39

Fix regression

b9a3b61

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Do not remount State partition unneccesarily

6b53b19

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

Actually re-use upgrade stanza

f991c3e

Signed-off-by: Andrea Mazzotti <andrea.mazzotti@suse.com>

davidcassany reviewed Feb 29, 2024

View reviewed changes

davidcassany approved these changes Feb 29, 2024

View reviewed changes

anmazzotti merged commit 2c5be14 into rancher:main Feb 29, 2024
14 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add upgrade-recovery subcommand #1974

Add upgrade-recovery subcommand #1974

anmazzotti commented Feb 23, 2024 •

edited

frelon left a comment

codecov-commenter commented Feb 23, 2024 •

edited

anmazzotti commented Feb 23, 2024

anmazzotti commented Feb 26, 2024

davidcassany left a comment

davidcassany Feb 29, 2024

davidcassany Feb 29, 2024 •

edited

anmazzotti Feb 29, 2024

davidcassany Feb 29, 2024

anmazzotti Feb 29, 2024

davidcassany Feb 29, 2024

anmazzotti Feb 29, 2024

davidcassany Feb 29, 2024

	func findStateMount(runner v1.Runner, device string) (rootDir string, stateMount string, err error) {
	output, err := runner.Run("findmnt", "-lno", "SOURCE,TARGET", device)
	if err != nil {
	return "", "", err
	}
	r := regexp.MustCompile(`@/.snapshots/\d+/snapshot`)

	scanner := bufio.NewScanner(strings.NewReader(strings.TrimSpace(string(output))))
	for scanner.Scan() {
	lineFields := strings.Fields(scanner.Text())
	if len(lineFields) != 2 {
	continue
	}
	if strings.Contains(lineFields[1], constants.RunningStateDir) {
	stateMount = lineFields[1]
	} else if r.MatchString(lineFields[0]) {
	rootDir = lineFields[1]
	}
	}

	if stateMount == "" \|\| rootDir == "" {
	err = fmt.Errorf("could not find expected mountpoints, findmnt output: %s", string(output))
	}

	return rootDir, stateMount, err
	}

Add upgrade-recovery subcommand #1974

Add upgrade-recovery subcommand #1974

Conversation

anmazzotti commented Feb 23, 2024 • edited

frelon left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 23, 2024 • edited

Codecov Report

anmazzotti commented Feb 23, 2024

anmazzotti commented Feb 26, 2024

davidcassany left a comment

Choose a reason for hiding this comment

davidcassany Feb 29, 2024

Choose a reason for hiding this comment

davidcassany Feb 29, 2024 • edited

Choose a reason for hiding this comment

anmazzotti Feb 29, 2024

Choose a reason for hiding this comment

davidcassany Feb 29, 2024

Choose a reason for hiding this comment

anmazzotti Feb 29, 2024

Choose a reason for hiding this comment

davidcassany Feb 29, 2024

Choose a reason for hiding this comment

anmazzotti Feb 29, 2024

Choose a reason for hiding this comment

davidcassany Feb 29, 2024

Choose a reason for hiding this comment

anmazzotti commented Feb 23, 2024 •

edited

codecov-commenter commented Feb 23, 2024 •

edited

davidcassany Feb 29, 2024 •

edited