-
Notifications
You must be signed in to change notification settings - Fork 58
[nexus] Update status API #8897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ff5dd1c
to
0bf80f5
Compare
|
||
let counts = status.components_by_release_version; | ||
assert_eq!(counts.get("install dataset").unwrap(), &7); | ||
assert_eq!(counts.get("unknown").unwrap(), &15); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably want to change the Display
impl to make these snake case. I could just write my own helper that does that and leave the Display
impl alone, but it's probably bad to have to two ways of doing this.
omicron/nexus/types/src/internal_api/views.rs
Lines 573 to 588 in 8571be3
impl Display for TufRepoVersion { | |
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |
match self { | |
TufRepoVersion::Unknown => write!(f, "unknown"), | |
TufRepoVersion::InstallDataset => { | |
write!(f, "install dataset") | |
} | |
TufRepoVersion::Version(version) => { | |
write!(f, "{}", version) | |
} | |
TufRepoVersion::Error(s) => { | |
write!(f, "{}", s) | |
} | |
} | |
} | |
} |
4a75d91
to
3a017f1
Compare
3a017f1
to
5f8076c
Compare
/// rack should eventually correspond to the release described here. | ||
/// | ||
/// Will only be null if a target release has never been set. | ||
pub target_release: Option<TargetRelease>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've collapsed the TargetReleaseSource
enum with its Unspecified
and SystemVersion
arms into Option<TargetRelease>
, where TargetRelease
is always a system version. This is based on my understanding that it's basically never going to be Unspecified
after the first few weeks of update being released. By handling that with target_release: null
, we ensure future values will be nice ones like target_release: { version: "2.0.0", time_requested: "..." }
that don't distract the user with the possibility of other shapes. In the previous TargetReleaseSource
enum, this was the distinction between { "type": "unspecified" }
vs. { "type": "system_version", ... }
. The option approach lets us make the unspecified
arm nearly invisible.
/// The source of the target release. | ||
pub release_source: TargetReleaseSource, | ||
/// The specified release of the rack's system software | ||
pub version: Version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the word "source" here very confusing. Now there is no source. There either is no target release or there is one, and when there is one, it always has a system version.
I also thought it was a little weird that you could have a target release like TargetRelease { time_requested: "...", release_source: Unspecified }
, namely unspecified but still with a non-null time_requested
. Based on the migration adding the unspecified value, time_requested
is just whenever the migration ran, which to me is not very meaningful to the user. In this PR, time_requested
is still inside TargetRelease
, but since TargetRelease
is now optional, you only have time_requested
in the response when you have a release set.
omicron/schema/crdb/create-target-release/up3.sql
Lines 1 to 12 in c7673aa
-- System software is by default from the `install` dataset. | |
INSERT INTO omicron.public.target_release ( | |
generation, | |
time_requested, | |
release_source, | |
tuf_repo_id | |
) VALUES ( | |
1, | |
NOW(), | |
'unspecified', | |
NULL | |
) ON CONFLICT DO NOTHING; |
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize, JsonSchema)] | ||
pub struct TargetRelease { | ||
/// The target-release generation number. | ||
pub generation: i64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been removed. I don't think it's meaningful to the user.
.chain(rot_version) | ||
.chain(std::iter::once(bootloader_version)) | ||
.chain(host_version) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to do this by hand here, but "make a flat list of all the versions in the system" feels like a good operation to have a canonical version of that lives somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not had a chance to look carefully at the changes to nexus/src/app/update.rs
or the updates.rs integration test. The API changes like fine to me. We'll need to coordinate an update to the CLI as people are testing this.
/// configuration) was made a target by the update system. In other words, | ||
/// it's the last time the system decided on a next step to take. | ||
pub time_last_progress: DateTime<Utc>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I think we talked about a field here that reflects "the system is not doing anything because of a recent MUPdate".
Internally, this can be determined if the current target release generation number (not the version, the generation number of the target release object) is less than or equal to the target blueprint's target_release_minimum_generation
. If that's the case, that means a MUPdate has happened some time after the last time the operator set the target release. In this case, the system stops doing anything update-related until the operator sets the target release again. We'll want to identify and call out this condition in the console.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this as paused
in 3ca18a7
/// Count of components running each release version | ||
pub components_by_release_version: BTreeMap<String, usize>, | ||
|
||
/// Time of last meaningful change to update status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Time of last meaningful change to update status | |
/// Time of most recent update-related activity (internal to the system) |
I'm definitely open to other things. I don't love "last meaningful change to update status" because technically this is the time when the system most recently decided to do something new (not even necessarily update-related, and it definitely hasn't done it yet). The real definition is too implementation-specific though, hence my vague "update-related activity"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went for "Time of most recent update planning activity" and time_last_blueprint
in 3ca18a7. As much as it feels bad to expose an internal term, I would feel worse calling it something else (like last_update_planning_step
or last_update_step
) and literally meaning a blueprint anyway. I'd rather just say blueprint.
tuf_repo::table | ||
.select(tuf_repo::system_version) | ||
.filter(tuf_repo::id.eq(tuf_repo_id.into_untyped_uuid())) | ||
.first_async::<SemverVersion>(&*conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle the case where there was no version at all? I feel like this should probably be a 500 error. It shouldn't happen because we don't let you delete TUF repos at all, and when we do, you won't be able to delete one that's being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean when there is no row found in the table with this ID? Every row has a version. I tested it with a nonexistent UUID and got a 500 with unexpected database error: Record not found
, so I think it's already doing what you want.
nexus/src/app/update.rs
Outdated
let time_last_progress = self | ||
.datastore() | ||
.blueprint_target_get_current(opctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the current target blueprint sitting in a watch channel from the blueprint_loader
task. It'd be nice to use that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. As far as I can tell this is the first case of an external API endpoint pulling data from a watch channel. Is it possible that it could be out of date with respect to the other stuff we're straight from the DB? I have a commit on the way that seems to work fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted in 321374d. It seems fine? It always makes me nervous to do something we don't do anywhere else.
self.datastore() | ||
.target_release_get_generation(opctx, Generation(prev_gen)) | ||
.await | ||
.internal_context("fetching previous target release")? | ||
.and_then(|r| r.tuf_repo_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a little sketchy? I'm not sure.
Example: after a MUPdate, you might set the target release to the same value it was before. Then the previous row will not be the previous release.
Today, the PlanningInput has this information. Maybe PlanningInputFromDb
has a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, that is where I copied this logic from. The comment suggests that #8056 helps but I am not sure it does.
omicron/nexus/reconfigurator/preparation/src/lib.rs
Lines 163 to 199 in 4d5bdc6
let tuf_repo = TufRepoPolicy { | |
target_release_generation: target_release.generation.0, | |
description: target_release_desc, | |
}; | |
// NOTE: We currently assume that only two generations are in play: the | |
// target release generation and its previous one. This depends on us | |
// not setting a new target release in the middle of an update: see | |
// https://github.com/oxidecomputer/omicron/issues/8056. | |
// | |
// We may need to revisit this decision in the future. See that issue | |
// for some discussion. | |
let old_repo = if let Some(prev) = target_release.generation.prev() { | |
let prev_release = datastore | |
.target_release_get_generation(opctx, Generation(prev)) | |
.await | |
.internal_context("fetching previous target release")?; | |
let description = if let Some(prev_release) = prev_release { | |
if let Some(repo_id) = prev_release.tuf_repo_id { | |
TargetReleaseDescription::TufRepo( | |
datastore | |
.tuf_repo_get_by_id(opctx, repo_id.into()) | |
.await | |
.internal_context( | |
"fetching previous target release repo", | |
)? | |
.into_external(), | |
) | |
} else { | |
TargetReleaseDescription::Initial | |
} | |
} else { | |
TargetReleaseDescription::Initial | |
}; | |
TufRepoPolicy { target_release_generation: prev, description } | |
} else { | |
TufRepoPolicy::initial() | |
}; |
I certainly see your point, though, and it is a problem. In the internal API logic I'm borrowing, it seems like we will only turn up component versions that match either the old or the new one passed in, because we're matching against artifact hashes. In the MUPdate case where current and prev release are the same, components that are on the the MUPdate's version will all show up as TufRepoVersion::Unknown
.
omicron/nexus/types/src/internal_api/views.rs
Lines 937 to 968 in 4d5bdc6
fn zone_image_source_to_version( | |
old: &TargetReleaseDescription, | |
new: &TargetReleaseDescription, | |
source: &OmicronZoneImageSource, | |
res: &ConfigReconcilerInventoryResult, | |
) -> TufRepoVersion { | |
if let ConfigReconcilerInventoryResult::Err { message } = res { | |
return TufRepoVersion::Error(message.clone()); | |
} | |
let &OmicronZoneImageSource::Artifact { hash } = source else { | |
return TufRepoVersion::InstallDataset; | |
}; | |
if let Some(new) = new.tuf_repo() { | |
if new.artifacts.iter().any(|meta| meta.hash == hash) { | |
return TufRepoVersion::Version( | |
new.repo.system_version.clone(), | |
); | |
} | |
} | |
if let Some(old) = old.tuf_repo() { | |
if old.artifacts.iter().any(|meta| meta.hash == hash) { | |
return TufRepoVersion::Version( | |
old.repo.system_version.clone(), | |
); | |
} | |
} | |
TufRepoVersion::Unknown | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallagher says to use this instead, and sort by generation number to figure out what are old and new releases (not sure I actually need old and new anyway, I just need all the versions on hand).
Closes #8869
update_status
endpoint at/v1/system/update/status
that includes:target_release_view
)time_last_progress
, representing thetime_made_target
of the latestbp_target
— meant to indicate the last time the update system Did Somethingcomponents_by_release_version
, a map where the keys are to_string'dTufRepoVersion
s and the values are counts of components on that versiontarget_release_view
endpoint, which is fully redundant with update statustarget_release_update
that you can use update status to check the current target releaseTargetRelease
struct that was previously being returned fromtarget_release_view
and is now part of the update status responsegeneration
fromTargetRelease
because it is not meaningful to the end userListing blockers or problems preventing further update progress is mentioned as a goal of #8869, but they are currently stored as JSON blobs in the DB for debug purposes. They will soon be stored in a more regimented way that should make it easy to stick a list of blockers in the response body. If this PR gets merged without that, I'll make a dedicated issue for it.
Example status response