New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃啎鈱氾笍 Automatic updates #247

Open
cgwalters opened this Issue Mar 23, 2016 · 45 comments

Comments

Projects
None yet
6 participants
@cgwalters
Member

cgwalters commented Mar 23, 2016

EDIT 20181206:

Today with rpm-ostree if you want to enable automatic background updates, edit /etc/rpm-ostreed.conf, and ensure that the Daemon section looks like:

[Daemon]
AutomaticUpdatePolicy=stage
#IdleExitTimeout=60

Next then, systemctl enable rpm-ostree-automatic.timer.

This won't automatically reboot though.

This thread though contains a lot of background information/design around higher level issues.


Initial PR: #1147

@cgwalters

This comment has been minimized.

Member

cgwalters commented Mar 23, 2016

If we do have hands-off upgrades that's going to drive a more immediate need for automated rollbacks. That's #177

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2017

I am now thinking the default model for automatic updates should involve automatic downloading/queuing. Having to download just the rpmdb to display diffs sucks for multiple reasons. Among them it's going to be hard to support if we move to OCI images. Plus I'd like to support a "deltas only" ostree repo mode. Or a combination.

Beyond that, for the majority of cases such as standalone desktop, enterprise desktop, enterprise server this is what I think is a good default. Enterprise particularly if we encourage local mirroring. One case where people may not want this is standalone embedded systems, but we can obviously support the status quo of typing rpm-ostree upgrade. This is more about defaults and UI workflow of the tools.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2017

So specifically for Cockpit, I'd like to move them away from the GetCached* DBus API towards a UI that's oriented around controlling automatic updates.

@dustymabe

This comment has been minimized.

Collaborator

dustymabe commented Jan 20, 2017

Having to download just the rpmdb to display diffs sucks

So.. I have an idea for this (probably not a very good one). See #558 where in the 2nd paragraph I say

I think we can achieve this goal if we add, in a predictable format, the list of rpms in that commit to the commit log message.

That way we don't need the rpmdb to do a diff. Also we can choose to only use the rpm data from the commit message if the rpmdb doesn't exist locally, i.e. we only have metadata about the commit. WDYT?

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jul 11, 2017

Yeah, I think we can put at least the NEVRAs in the commit header.

# rpm -qa|xz | wc -c
4180

which isn't too bad.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Oct 25, 2017

I think this also blocks on ostreedev/ostree#545

@jlebon

This comment has been minimized.

Member

jlebon commented Dec 5, 2017

Had a chat with @cgwalters about this today. Here are the notes from that:

High-level expectations:

  1. rpm-ostree status should indicate:

    • if auto-update is completely off, then a line to that effect
    • if no updates are present, then the last time updates were successfully checked for; this is important to ensure users are aware of any e.g. timer/networking etc... issues that may give them a false sense of security
    • if an update is present, then what the pending version/csum and pkgs are, and importantly whether there are any security updates. Be able to provide a diff with e.g. -v (bottom has mock-up outputs).
  2. Users can choose between different levels of automation. Possible levels to consider:
    a) [none] (current)
    b) [check] (download the minimal amount of ostree/rpmmd metadata to know that there is an update and describe it)
    - This would be a good default to ship with
    walters: Two check phases: Check just md freshness, versus download full md? Or maybe too hard.
    c) [download] (download the full ostree/new packages)
    d) [deploy] (deploy but don't reboot)
    - This of course would be blocked on
    #40
    ostreedev/ostree#545
    e) [reboot] (deploy and reboot)

I feel like between all of these steps, at least for the desktop we need to think about having gnome-software be in control of triggers. Similarly for server side, Ansible control for blue/green.

Implementation:

  1. include rpmdb pkglist in commit metadata during compose
    • for jigdo, should we split the jigdo RPM into a thinner commit metadata only one and a fatter content one? this could also help with gpg signature verification
    • or just making the jigdo RPM just Requires all the packages and fetch that pkglist from rpmmd
  2. leave package_diff and cached* API business separate for now; they need to always work for Cockpit even on commits without the new rpmdb pkglist and they download /usr/share/rpm -- we can look to unify this with the deploy_transaction_execute flow afterwards so that it uses the new pkglist if available, otherwise falls back?
  3. teach the deploy transaction the needed logic to support the [check] mode (i.e. turn on commit metadata only, refresh rpmmd, heuristically try to find updates to layered pkgs). [download] is already supported by --download-only.
  4. enhance the CachedUpdate property in a backcompatible manner to also include rpm diff and make deploy transaction update that during non-deploy mode.
  5. teach status to read CachedUpdate property and display the relevant info
  6. ship systemd timer & service that calls upgrade with a hidden --auto=$MODE switch with MODE coming from e.g. /etc/rpm-ostree-automatic.conf.

Other considerations:

  • where to keep:
    1. last update check
      • bump timestamp on a file in /var/cache?
    2. auto-update policy setting
      • /etc/rpm-ostree-automatic.conf?
  • how should the systemd timer & auto-update mode be managed? purely by systemctl and e.g. vi /etc/rpm-ostree-automatic.conf, or should rpm-ostree provide a wrapper for it? leaning more towards the former.

Mock-up status outputs:

$ rpm-ostree status
State: idle, automatic updates enabled (download)
Deployments:
鈼 atomicws:fedora/x86_64/workstation
                   Version: 26.230 (2017-10-15 03:11:00)
                BaseCommit: b8503c69c36591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
           LayeredPackages: krb5-workstation libvirt-client mosh sshpass strace tmux

Pending update:
            Version: 26.241 (2017-11-28 12:09:24)
             Commit: abcdef12344591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
               Diff: 12 upgrades, 2 downgrades, 2 removals, 1 addition

$ rpm-ostree status --verbose
State: idle, automatic updates enabled (download)
Deployments:
鈼 atomicws:fedora/x86_64/workstation
                   Version: 26.230 (2017-10-15 03:11:00)
                BaseCommit: b8503c69c36591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
           LayeredPackages: krb5-workstation libvirt-client mosh sshpass strace tmux

Pending update:
            Version: 26.241 (2017-11-28 12:09:24)
             Commit: abcdef12344591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
           Upgraded: 12 packages
                     |- asdf 1.23.213 -> 5.12.23
                     ...
                     `- rtyu 2.4 -> 12.3
                     (includes both tree updates and layering updates)
         Downgraded: 2 packages
                     |- zxcv-2.1.23
                     ...
                     (just includes tree updates)
            Removed: 2 packages
                     |- zxcv-2.1.23
                     ...
                     (just includes tree updates)
              Added: 1 packages
                     |- zxcv-2.1.23
                     ...
                     (just includes tree updates)

# when there are security updates:

$ rpm-ostree status
...
Pending update:
            Version: 26.241 (2017-11-28 12:09:24)
             Commit: abcdef12344591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
    SecurityUpdates: 2 packages (kernel, openssh-clients)    [[BOLDED RED]]
               Diff: 2 upgrades

$ rpm-ostree status --verbose
...
Pending update:
            Version: 26.241 (2017-11-28 12:09:24)
             Commit: abcdef12344591606c11743abdfeb5591c1ae8d9c3c69c18a583071b3b7caf3f
    SecurityUpdates: 2 packages
                     |- kernel
                     |  |- <list of available references & URLs>
                     |  `- ...
                     `- openssh-clients
                        |- <list of available references & URLs>
                        `- ...
           Upgraded: 2 packages
                     |- kernel 1.2.3 -> 4.5.6
                     `- openssh-clients 1.2 -> 3.4

@jlebon jlebon changed the title from integrate optional systemd timer for individual host automatic upgrades to 馃啎鈱氾笍 Automatic updates Dec 5, 2017

@jlebon jlebon added the jira label Dec 5, 2017

@dustymabe

This comment has been minimized.

Collaborator

dustymabe commented Dec 6, 2017

In that status output I think Available Update vs Pending Update would probably be more appropriate especially if we haven't staged a deployment. The we should probably list the state of the update: not downloaded, downloaded, deployed and staged for next reboot. We can come up with more succinct words to describe those states.

@jlebon

This comment has been minimized.

Member

jlebon commented Dec 7, 2017

Definitely, we need to describe the state as well. Another interesting piece of information that would be worth displaying is the size of the download. Interestingly, this is something we can easily calculate for jigdo remotes. In the ostree remote case, we can only display that if there are static deltas.

@dustymabe

This comment has been minimized.

Collaborator

dustymabe commented Dec 14, 2017

ok. one other thing I wonder if we're covering: automatic rollbacks based on some conditions. If we enable automatic updates including the reboot then we should at least think about automatic rollbacks in case of some sort of failure. For this we can only do so good since the mechanism that triggers the rollback would depend on the system coming up at least somewhat, but it is something I'd love to see us brainstorm.

@jlebon

This comment has been minimized.

Member

jlebon commented Dec 14, 2017

Right, this is #177. I'm open to discuss whether to hide the reboot mode until that's supported. At the very least, we'd need a warning of some sort to make that clear. OTOH, I don't want to completely not support reboot because of that either. E.g. I don't mind taking on the risk for my pet home servers. :)

@dustymabe

This comment has been minimized.

Collaborator

dustymabe commented Dec 14, 2017

Right, this is #177.

cool

@jlebon

This comment has been minimized.

Member

jlebon commented Jan 5, 2018

WIP in #1147.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

OK so let's try to agree on what happens with the "first cut" of this. Are we thinking that we'll land this but it will just be disabled by default and people who want it can opt-in for now?

I'm generally OK with that. But there are definitely issues in turning on even check by default. A good example of a past conversation is around including fedora-motd in Atomic Host.

Now a good thing here is we're not triggering the updates out of PAM. But we still have the problem for example that a whole lot of people need to configure a proxy.

What I'd like to see for example is adding the notion of "auto-cancellable transactions" or so. Basically if while the rpm-ostree upgrade --automatic timer is running, I do rpm-ostree override remove or whatever, I don't want to get an error and have to rpm-ostree cancel.

Further a whole big conceptual issue the degree to which our systemd units are "special". We also need to support e.g. gnome-software, Cockpit, and also Ansible at least; @jlebon mentioned that in

I feel like between all of these steps, at least for the desktop we need to think about having gnome-software be in control of triggers. Similarly for server side, Ansible control for blue/green.

I think in the "personal desktop" case it's pretty clear gnome-software could just frob the settings in the config file (do we own the polkit gateway for that? expose an API?)

BTW down the line for the "CSB laptop" case I'd actually like to support a mode where if e.g. someone has their laptop suspended/turned off for a month while they go on vacation, when they boot up Internet access is disabled for everything except rpm-ostree upgrades until they get updated. I'm sure some people would despise this idea but if we make updates fast and painless we can get a lot closer to having both security and convenience.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

(Actually for the desktop case implementing that is probably a gnome-software thing given flatpaks need updating too)

@kalev

This comment has been minimized.

Contributor

kalev commented Jan 19, 2018

Instead of a config file, I think it may be easier to have gnome-software drive the automatic updates over dbus -- it already has a session service specifically for that purpose. This way it could also make sure that base OS and flatpak updates are applied at the same time, reducing user interruptions etc. Would that make sense?

@jlebon

This comment has been minimized.

Member

jlebon commented Jan 19, 2018

That makes sense and is part of the design in #1147. Basically, gnome-software could just turn off the timer and call AutomaticUpdateTrigger() at its leisure. We don't support a deploy mode right now since it wouldn't make sense from a timer without fixing #40 first. But in an "update & reboot" model, #40 is less relevant, and we can add support for that. (Of course, that can be done today as well with the code in #1147 by just using a follow-up UpdateDeployment() in cache-only mode.)

@kalev

This comment has been minimized.

Contributor

kalev commented Jan 19, 2018

That sounds great! Let me see if I can quickly hack up gnome-software to make use of the new goodness and then report back on Monday or so.

@dustymabe

This comment has been minimized.

Collaborator

dustymabe commented Jan 19, 2018

@kalev, is there any sort of gnome-software cli? gnome-software incorporates rpms, faltpaks, firmware, ostree??, it would be really nice to have something like that on my Atomic Host (not workstation) system in a cli form to report potential updates and allow me to choose what to install. related discussion in #405 (comment)

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

rpm-ostree auto-updates --disable/--enable=$MODE

Yeah, I like that. But it's something we can do later.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

The topic I have in mind now is partially a design thing but also partially implementation.

Basically with --download-only, for base ostree updates we end up holding a strong reference to both the new and old commits (the former via the ref update, the latter via the magic ostree/N/M deployment refs).

But nothing holds a ref to (imported) layered packages, so doing other operations (e.g. rpm-ostree initramfs --enable) could end up pruning them as part of our normal GC.

These issues are why I feel like everything is going to work out a lot better after we do #40
Deployments are the basis of a lot of things in the implementation, and it leaks through a lot into the UI.

I guess a root question here is - after #40 is implemented - who would want to use download? Would we change the semantics of it to start being "download and make pending", i.e. we actually assemble the filesystem, run scripts etc.?

@jlebon

This comment has been minimized.

Member

jlebon commented Jan 19, 2018

I guess a root question here is - after #40 is implemented - who would want to use download?

Yeah, that's a good point. I'm a bit on the fence on this one. Maybe let's rephrase it a different way.

A deploy policy allows sysadmins to queue up an update until the next convenient reboot time. OTOH, a download policy basically means that your decision to update is decoupled from your decision to reboot. You may reboot at any time, but only update when you're ready. The advantage over check is that you minimize downtime by having everything already cached.

So the question is, are there contexts where this distinction is relevant? I think I would answer "probably?", though I don't have any clear cut examples to present. It is really cool though that our update model allows us to even make this distinction and it's nice to expose that.

But clearly the GC issue is a thorn. Maybe the first cut should be restricted to off, check, and reboot? (At least as documented).

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

We can actually cut yet another distinction here...say deploy and prepare perhaps? The difference between the two is simply whether or not the ostree-complete-pending.service (or whatever we decide to call it) runs by default on shutdown or not.

In prepare we wouldn't run the service, meaning that whether or not the update takes effect is completely independent of rebooting.

Broadly speaking...I think deploy might be better for servers but prepare better for desktops, like the UI would have a button that says "Yes I want to update" or something?

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

But clearly the GC issue is a thorn. Maybe the first cut should be restricted to off, check, and reboot? (At least as documented).

If we don't want to block on #40 (let's start calling it "pending deploys") then...yeah, I'm OK with that. It'd certainly be a cool milestone to at least have e.g. gnome-software doing update notifications with "check" and being able to then take it to "reboot". Maybe something like what iOS does with "schedule update between 2am-4am"?

Or...it probably might work to have a "download but don't import" phase? That's what PK's offline updates does...it would sidestep the GC issue, but be less elegant.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 19, 2018

The reason I brought this up is I was reading your "rpmmd diff" code and while it all looks good offhand, it's also complex - and that's not your fault it's just the problem domain!.

But...a whole lot of things get simpler if we primarily do prepare updates, and then do the diff of that, which is exactly the same code path we have for upgrade + diff today. It's just fundamentally more reliable since we're only doing e.g. the depsolve once.

(I think I babbled about this before but I have no idea where to find that discussion)

The counter here though is that some use cases want to show e.g. how much will be downloaded before actually doing it. And particularly with layering involved we can't do that until we depsolve. Which...hm maybe is your check phase. So I guess we do need that.

I'm keeping this whole comment here since it might be useful, hopefully?

@cgwalters

This comment has been minimized.

Member

cgwalters commented Jan 25, 2018

Timing of the reboot

There's a pretty huge semantic difference between reboot and anything not-reboot. Having it executed out of the same timer unit feels...weird. I think people are going to want a lot of control over the reboots. This came up on IRC earlier.

WDYT about actually having separate systemd units for this? Something like:

rpm-ostree-reboot-for-updates.{timer,service}?

That way someone could easily unconditionally systemctl mask rpm-ostree-reboot-for-updates.service rather than editing the config file.

I think what I'm getting at here is I can see it being extremely common for "management tools" like gnome-software/Cockpit/Ansible to want a lot of fine grained control over the reboot cycle. It feels really like we should encourage people to always use check/deploy in the config, and use mgmt tools for rebooting (particularly in the desktop case). Our reboot-for-updates.timer is then a really simple policy engine for people who don't have a management tool (say IoT devices without reliable internet that you just want to auto-apply updates as they make it to the device).

@peterbaouoft

This comment has been minimized.

Contributor

peterbaouoft commented Jan 30, 2018

Hi, I had a try for the autoupdate. It works nicely for me =). I do have a few questions though (hopefully you won't mind =) ). Note: the test output might be long (but content should not be that much). I also did not read many of the comments above, so if I happen to miss something, please let me know =P

1: When apply the auto-update patch, rpm-ostree status does take noticeably longer than before. Is that expected?

[root@localhost ~]# time rpm-ostree status    
State: idle; auto updates disabled
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man

real	0m0.040s
user	0m0.023s
sys	0m0.005s

vs

[root@localhost ~]# time rpm-ostree status -v 
State: idle; auto updates enabled (check; last run unknown)
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
                    Commit: ab75f9249820bd6c32e16ebbf9947322b484aaa9d4164cf573bc7480a1c2a22b
                 StateRoot: fedora-atomic
              GPGSignature: 1 signature
                            Signature made Tue Jan 16 16:35:22 2018 using RSA key ID F55E7430F5282EE4
                            Good signature from "Fedora 27 <fedora-27@fedoraproject.org>"
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
                    Commit: bfb5f4147f4b9aa6d5b0277ec337ee38871cedbcc2e97721609f242f15d3b37c
                 StateRoot: fedora-atomic
              GPGSignature: 1 signature
                            Signature made Wed Jan 17 15:52:59 2018 using RSA key ID F55E7430F5282EE4
                            Good signature from "Fedora 27 <fedora-27@fedoraproject.org>"
           LayeredPackages: man

Available update:
       Version: 27.61 (2018-01-17 15:52:47)
        Commit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
  GPGSignature: 1 signature
                Signature made Wed Jan 17 15:52:59 2018 using RSA key ID F55E7430F5282EE4
                Good signature from "Fedora 27 <fedora-27@fedoraproject.org>"
      Upgraded: docker 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27
                docker-common 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27
                docker-rhel-push-plugin 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27

real	0m25.050s
user	0m0.022s
sys	0m0.010s

2: It seems like I have to do an upgrade --preview in order to make rpm-ostree status show the available update, is that the expected behavior?

[root@localhost ~]# rpm-ostree status
State: idle; auto updates disabled
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
[root@localhost ~]# vi /etc/rpm-ostreed.conf 
[root@localhost ~]# cat /etc/rpm-ostreed.conf 
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# For option meanings, see rpm-ostreed.conf(5).

[Daemon]
AutomaticUpdatePolicy=check
#IdleExitTimeout=60
[root@localhost ~]# rpm-ostree reload
[root@localhost ~]# time rpm-ostree status
State: idle; auto updates enabled (check; last run unknown)
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man

real	0m25.064s
user	0m0.024s
sys	0m0.006s
[root@localhost ~]# rpm-ostree upgrade --preview
1 metadata, 0 content objects fetched; 569 B transferred in 0 seconds
Enabled rpm-md repositories: updates fedora

Updating metadata for 'updates': [=============] 100%
rpm-md repo 'updates'; generated: 2018-01-29 17:58:29

Updating metadata for 'fedora': [=============] 100%
rpm-md repo 'fedora'; generated: 2017-11-05 05:51:47

Importing metadata [=============] 100%
Available update:
       Version: 27.61 (2018-01-17 15:52:47)
        Commit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
  GPGSignature: 1 signature
                Signature made Wed Jan 17 15:52:59 2018 using RSA key ID F55E7430F5282EE4
                Good signature from "Fedora 27 <fedora-27@fedoraproject.org>"
      Upgraded: docker 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27
                docker-common 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27
                docker-rhel-push-plugin 2:1.13.1-42.git4402c09.fc27 -> 2:1.13.1-44.git584d391.fc27

[root@localhost ~]# time rpm-ostree status   
State: idle; auto updates enabled (check; last run unknown)
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man

Available update:
       Version: 27.61 (2018-01-17 15:52:47)
        Commit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
  GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
          Diff: 3 upgraded

real	0m25.069s
user	0m0.022s
sys	0m0.010s

3: Last question, how do I generate a test output so that last run is no longer unknown in the status?

Other than that, the functionality looks nice =). Sorry it took long, had to spend time understanding the testing procedure. And this is the complete test log if you are interested:
https://paste.fedoraproject.org/paste/F~Nxr4I7w3j3QSctno~jbQ ( also seems long, read with caution)

@jlebon

This comment has been minimized.

Member

jlebon commented Jan 31, 2018

Thanks @peterbaouoft for trying it out! :)

1: When apply the auto-update patch, rpm-ostree status does take noticeably longer than before. Is that expected?

Ahh, you're probably hitting fedora-selinux/selinux-policy-contrib#45. You can either use the same hack we use in the testsuite, or just setenforce 0.

2: It seems like I have to do an upgrade --preview in order to make rpm-ostree status show the available update, is that the expected behavior?

Right. The reload only reloads the configuration. The actualy check for updates happens according to the the rpm-ostreed-automatic.timer. You can also do rpm-ostree upgrade --trigger-automatic-update-policy to force a check.

3: Last question, how do I generate a test output so that last run is no longer unknown in the status?

That's due to the SELinux policy issue above.

@peterbaouoft

This comment has been minimized.

Contributor

peterbaouoft commented Jan 31, 2018

Ahh, you're probably hitting fedora-selinux/selinux-policy-contrib#45. You can either use the same hack we use in the testsuite, or just setenforce 0.

Yup, applying setenforce 0 does make it a lot faster, and seems like also solve the unknown status problem. 2 birds with one stone! =P

[root@localhost ~]# time rpm-ostree status
State: idle; auto updates enabled (check; no runs since boot)
Deployments:
* ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.60 (2018-01-16 16:35:15)
                BaseCommit: 972e5a8158b610fec80f3f73f3372b7bea2b841038f2e246aa7623dbf5b5a751
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man
                  Unlocked: development

  ostree://fedora-atomic:fedora/27/x86_64/atomic-host
                   Version: 27.61 (2018-01-17 15:52:47)
                BaseCommit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
              GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
           LayeredPackages: man

Available update:
       Version: 27.61 (2018-01-17 15:52:47)
        Commit: 772ab185b0752b0d6bc8b2096d08955660d80ed95579e13e136e6a54e3559ca9
  GPGSignature: Valid signature by 860E19B0AFA800A1751881A6F55E7430F5282EE4
          Diff: 3 upgraded

real	0m0.047s
user	0m0.024s
sys	0m0.008s

The actualy check for updates happens according to the the rpm-ostreed-automatic.timer. You can also do rpm-ostree upgrade --trigger-automatic-update-policy to force a check.

I see, makes sense. Thanks for the explanation! I am more and more excited about this new feature now(auto-update)! =D

@jlebon

This comment has been minimized.

Member

jlebon commented Feb 8, 2018

The counter here though is that some use cases want to show e.g. how much will be downloaded before actually doing it. And particularly with layering involved we can't do that until we depsolve. Which...hm maybe is your check phase. So I guess we do need that.

Yeah, I think there's a lot of use cases where you don't want your updater to auto-download in the background. E.g. for FAW, I'd feel comfortable shipping with check by default, but not download/prepare. The depsolve issue is indeed unfortunate but not terrible. I think in the great majority of cases, our heuristics will work. Perfect is the enemy of good. :)

I'm keeping this whole comment here since it might be useful, hopefully?

I think it helps to reason out things explicitly to make sure we're going the right way!

There's a pretty huge semantic difference between reboot and anything not-reboot.

To get back to this, I do see where you're coming from. I think in that case, I'd rather we not ship such a timer at all for now?

My initial thoughts before were to add some of these "policy engine"-like settings to rpm-ostree itself, such as "auto reboot only for security erratas", or "auto reboot, but not for layered packages". I still think there's some value in doing this for the lone server/IoT case, because even though it's not very hard to implement manually, it makes things really easy to configure OOTB. But I guess that should be a separate discussion from whether to have a dumb reboot policy at all. So I'd vote for leaving this out for now until we gain more experience in the managed workflows like GNOME Software and cluster cases.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Apr 10, 2018

I tend to think anything that will reboot the node needs to be handled outside of the daemon directly. The daemon itself (unless I'm mistaken) isn't aware of how many other nodes it lives with and can't initiate a restart without that possibility of downtime. Instead, having rpm-ostree be in a state noting that it's ready to apply it's update (or that updates are available) seems ideal. Then the external controller can make intelligent decisions based on state.

Edit: s/agent/daemon/g

@cgwalters

This comment has been minimized.

Member

cgwalters commented Apr 10, 2018

Yeah...it's tempting to take the reboot mode out of rpm-ostreed entirely but I actually am still today using the timer linked at the top that just does rpm-ostree upgrade -r on my home server, just accepting the downtime. I should probably switch now to the reboot policy but eh.

This all ties back into the (just posted) https://pagure.io/atomic-wg/issue/453

@ashcrow

This comment has been minimized.

Member

ashcrow commented Apr 10, 2018

I think having the -r is fine. The more I think about it the policy idea sounds good as well ... but should default to the least surprising setting. Part of external management systems would be to ensure that the policy is set to a download so it can reliably control when the deployment occurs.

@jlebon

This comment has been minimized.

Member

jlebon commented Apr 10, 2018

One question is whether rpm-ostreed-automatic initiates new deployment creation (as suggested in the WIP that proposes a new stage policy: #1321), or the agent. The former is clearly useful also for single node/workstation cases. Though in the cluster case, an argument for the latter is that the agent is a better place to embed policy engine style settings. E.g. I'm not sure we want to cause updates across the whole cluster if only a utility layered pkg was updated.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Apr 10, 2018

@jlebon isn't rpm-ostreed-automatic still dependent on what policy is set? If so, we could document how one could set their non agent managed nodes by changing policy. If they are using the agent then the agent could verify/set the proper policy it expects. It would follow as:

  1. A single node/group of nodes: We default to downloading updates (or doing nothing)
  2. A single node/group of nodes with auto deploy on: We download and deploy with a reboot
  3. A single node/group of nodes managed by an agent: We download updates and defer to the agent to tell us when to deploy and reboot

This is a tricky subject though. My initial feeling is to put as much orchestration in the agent and as little in rpm-ostree. What keeps me from outright pushing for that is any agent that is used will likely be tied to a specific orchestration system or tool. If we try to make a generic agent then we are basically providing an interface and, to me, that would seem more at home in rpm-ostree anyway.

@cgwalters

This comment has been minimized.

Member

cgwalters commented Apr 11, 2018

This is a tricky subject though. My initial feeling is to put as much orchestration in the agent and as little in rpm-ostree. What keeps me from outright pushing for that is any agent that is used will likely be tied to a specific orchestration system or tool. If we try to make a generic agent then we are basically providing an interface and, to me, that would seem more at home in rpm-ostree anyway.

Yeah, that's the core tension. I guess my core feeling is let's not delete anything that exists in rpm-ostreed today, but I would vote that the Kube agent initiates updates itself rather than relying on the timer.

@jlebon

This comment has been minimized.

Member

jlebon commented Apr 11, 2018

So, from discussions here, I think what we want is "both". I.e. we do want a "stage" mode that rpm-ostreed knows about and enacted by the timer. E.g. that's something I'd love to have on my workstation. But we also want to be fully compatible with agents that want to take over all aspects of node management, including when stage deployments are created (and obviously when to reboot).

We could slice this further even into node agents that could still rely on rpm-ostree's "check" mode to know that a node has an update vs a more controlled environment where the "update available" signal comes directly to the agent OOB from some other metadata protocol (in which case, the rpm-ostree timer/policy is completely off).

@jlebon

This comment has been minimized.

Member

jlebon commented Apr 11, 2018

Yeah...it's tempting to take the reboot mode out of rpm-ostreed entirely but I actually am still today using the timer linked at the top that just does rpm-ostree upgrade -r on my home server, just accepting the downtime. I should probably switch now to the reboot policy but eh.

Note that reboot is not actually supported right now. Depending on how we want to implement https://pagure.io/atomic-wg/issue/453 re. the single node case, it might make sense to add it (and e.g. let that be the default we ship with in Fedora). Though maybe not if there's a bunch of "policy" type things we want to account for (e.g. "are any users logged in and how long have they been idle for?"). I think I'd rather have that logic live somewhere else.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Apr 11, 2018

So, from discussions here, I think what we want is "both". I.e. we do want a "stage" mode that rpm-ostreed knows about and enacted by the timer. E.g. that's something I'd love to have on my workstation.

To clarify, stage mode would download updates and have them ready for deployment (not actually deploy) correct?

But we also want to be fully compatible with agents that want to take over all aspects of node management, including when stage deployments are created (and obviously when to reboot).

馃憤

Note that reboot is not actually supported right now. Depending on how we want to implement https://pagure.io/atomic-wg/issue/453 re. the single node case, it might make sense to add it (and e.g. let that be the default we ship with in Fedora). Though maybe not if there's a bunch of "policy" type things we want to account for (e.g. "are any users logged in and how long have they been idle for?"). I think I'd rather have that logic live somewhere else.

That makes sense. Assuming staging means downloading and being ready to deploy, I say lets get that in. Having a timer to deploy and reboot that's configurable is fine too as long as we can disable the timer deploy portion. Being able to disable the auto staging would be a nice to have but could be added at a later time.

@ashcrow

This comment has been minimized.

Member

ashcrow commented Apr 12, 2018

Do we have a path forward on this?

@cgwalters

This comment has been minimized.

Member

cgwalters commented May 4, 2018

That makes sense and is part of the design in #1147. Basically, gnome-software could just turn off the timer and call AutomaticUpdateTrigger() at its leisure.

I think if we do this though I'd like to have something like:

rpm-ostree upgrade --trigger-automatic-update-policy=timer
rpm-ostree upgrade --trigger-automatic-update-policy=gnome-software

And the daemon then tracks (somewhere) the name passed. The idea here is that then

# rpm-ostree status
State: idle; auto updates enabled (stage, agent=gnome-software)

So administrators understand what's going on. And we should probably explicitly throw an error if the built-in timer is enabled and anything else executes the auto-update policy.

This "tracking the last agent" though only works after things have run at least once. But I think that's OK.

@cgwalters

This comment has been minimized.

Member

cgwalters commented May 4, 2018

馃 Though...today we probably could use sd_pid_get_unit() (or sd_pid_get_user_unit()) to work this out automatically.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue May 15, 2018

daemon: Load sd unit for callers, log it
The high level goal is to render in a better way what caused an
update: projectatomic#247 (comment)

This gets us for Cockpit:
`Initiated txn DownloadUpdateRpmDiff for client(dbus:1.28 unit:session-6.scope uid:0): /org/projectatomic/rpmostree1/fedora_atomic`
which isn't as good as I'd hoped; I was thinking we'd get `cockpit.service`
but actually Cockpit does invocations as a real login for good reason.

We get a similar result from the CLI.

rh-atomic-bot added a commit that referenced this issue May 16, 2018

daemon: Load sd unit for callers, log it
The high level goal is to render in a better way what caused an
update: #247 (comment)

This gets us for Cockpit:
`Initiated txn DownloadUpdateRpmDiff for client(dbus:1.28 unit:session-6.scope uid:0): /org/projectatomic/rpmostree1/fedora_atomic`
which isn't as good as I'd hoped; I was thinking we'd get `cockpit.service`
but actually Cockpit does invocations as a real login for good reason.

We get a similar result from the CLI.

Closes: #1368
Approved by: jlebon
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment