Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manage cockroachdb cluster version with blueprints #5603

Merged
merged 28 commits into from
May 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
4de6545
manage crdb cluster version with blueprints
iliana Apr 23, 2024
8d40d59
review feedback
iliana Apr 23, 2024
234d57b
check cockroachdb state before barreling forward
iliana Apr 24, 2024
1d14380
ugh
iliana Apr 25, 2024
18ad60c
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana Apr 25, 2024
5218f78
add logging to planner
iliana Apr 25, 2024
0763171
excise last few mentions of "cluster settings"
iliana Apr 25, 2024
be49010
move settings execution to end of executor
iliana Apr 25, 2024
2cd8ce0
write a Big Theory Statement on all this
iliana Apr 30, 2024
9925918
document the gadget better
iliana Apr 30, 2024
2b7590d
ensure that a fingerprint mismatch doesn't have side effects
iliana Apr 30, 2024
dd68d8f
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 1, 2024
b26abaf
clarify "not set" into something useful
iliana May 1, 2024
29d1b7b
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 8, 2024
4051bc2
write planner test (it found a bug!)
iliana May 8, 2024
d7ba0c1
write execution test
iliana May 8, 2024
0d25f84
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 21, 2024
8f1114a
use an enum to describe the preserve downgrade setting
iliana May 22, 2024
3ab0ede
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 22, 2024
d67b2cb
update docs for enum
iliana May 22, 2024
a5f49cc
dbinit comment nit
iliana May 22, 2024
04afc26
oops
iliana May 23, 2024
aa252b2
set preserve downgrade option during rack init
iliana May 23, 2024
3030fc5
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 24, 2024
9fddf32
add crdb settings to blueprint display
iliana May 24, 2024
cbe475c
oops
iliana May 24, 2024
7df2d80
nits from self-review
iliana May 24, 2024
0871b51
Merge remote-tracking branch 'origin/main' into iliana/crdb-cluster-v…
iliana May 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions dev-tools/omdb/tests/successes.out
Original file line number Diff line number Diff line change
Expand Up @@ -542,6 +542,10 @@ WARNING: Zones exist without physical disks!



COCKROACHDB SETTINGS:
state fingerprint::::::::::::::::: d4d87aa2ad877a4cc2fddd0573952362739110de
cluster.preserve_downgrade_option: "22.1"

METADATA:
created by::::::::::: nexus-test-utils
created at::::::::::: <REDACTED TIMESTAMP>
Expand Down Expand Up @@ -576,6 +580,10 @@ WARNING: Zones exist without physical disks!



COCKROACHDB SETTINGS:
state fingerprint::::::::::::::::: d4d87aa2ad877a4cc2fddd0573952362739110de
cluster.preserve_downgrade_option: "22.1"

METADATA:
created by::::::::::: nexus-test-utils
created at::::::::::: <REDACTED TIMESTAMP>
Expand Down Expand Up @@ -613,6 +621,10 @@ to: blueprint ......<REDACTED_BLUEPRINT_ID>.......
nexus ..........<REDACTED_UUID>........... in service ::ffff:127.0.0.1


COCKROACHDB SETTINGS:
state fingerprint::::::::::::::::: d4d87aa2ad877a4cc2fddd0573952362739110de (unchanged)
cluster.preserve_downgrade_option: "22.1" (unchanged)

METADATA:
internal DNS version: 1 (unchanged)
external DNS version: 2 (unchanged)
Expand Down
115 changes: 115 additions & 0 deletions docs/crdb-upgrades.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
:showtitle:
:numbered:
:toc: left

= So You Want To Upgrade CockroachDB

CockroachDB has a number of overlapping things called "versions":

1. The `cockroachdb` executable is built from a particular version, such
as v22.2.19. We'll call this the *executable version*.
2. The executable version is made up of three components: a number
representing the release year, a number representing which release
it was within that year, and a patch release number. The first two
components constitute the *major version* (such as v22.2).
3. There is also a version for the on-disk data format that CockroachDB
writes and manages. This is called the *cluster version*. When
you create a new cluster while running major version v22.2, it
is initialized at cluster version `22.2`. Each major version of
CockroachDB can operate on both its own associated cluster version,
and the previous major version's cluster version, to facilitate
rolling upgrades.

By default the cluster version is upgraded and _finalized_ once
all nodes in the cluster have upgraded to a new major version
(the CockroachDB docs refer to this as "auto-finalization").
<<crdb-tn-upgrades>> However, it is not possible to downgrade the
cluster version. To mitigate the risk of one-way upgrades, we use a
CockroachDB cluster setting named `cluster.preserve_downgrade_option`
to prevent auto-finalization and... preserve our option to downgrade in
a future release, as the option name would suggest. We then perform an
upgrade to the next major version across at least two releases, which we
refer to as a tick-tock cycle:

- In a *tick release*, we upgrade the executable versions across the
cluster.
- In a *tock release*, we release our downgrade option and allow
CockroachDB to perform the cluster upgrade automatically. When the
upgrade is complete, we configure the "preserve downgrade option"
setting again to prepare for the next tick release.

(This is not strictly speaking a "tick-tock" cycle, because any number
of releases may occur between a tick and a tock, and between a tock and
a tick, but they must occur in that order.)

== Process for a tick release

. Determine whether to move to the next major release of CockroachDB.
We have generally avoided being early adopters of new major releases
and prefer to see the rate of https://www.cockroachlabs.com/docs/advisories/[technical
advisories] that solely affect the new major version drop off. (This
generally won't stop you from working on building and testing the
next major version, however, as the build process sometimes changes
significantly from release to release.)
. Build a new version of CockroachDB for illumos. You will want to
update the https://github.com/oxidecomputer/garbage-compactor/tree/master/cockroach[build
scripts in garbage-compactor].
. In your local Omicron checkout on a Helios machine, unpack the
resulting tarball to `out/cockroachdb`, and update `tools/cockroachdb_version`
to the version you've built.
. Add an enum variant for the new version to `CockroachDbClusterVersion`
in `nexus/types/src/deployment/planning_input.rs`, and change the
associated constant `NEWLY_INITIALIZED` to that value.
. Run the test suite, which should catch any unexpected SQL
compatibility issues between releases and help validate that your
build works.
* You will need to run the `test_omdb_success_cases` test from
omicron-omdb with `EXPECTORATE=overwrite`; this file contains the
expected output of various omdb commands, including a fingerprint of
CockroachDB's cluster state.
. Submit a PR for your changes to garbage-compactor; when merged,
publish the final build to the `oxide-cockroachdb-build` S3 bucket.
. Update `tools/cockroachdb_checksums`. For non-illumos checksums, use
the https://www.cockroachlabs.com/docs/releases/[official releases]
matching the version you built.
. Submit a PR with your changes (including `tools/cockroachdb_version`
and `tools/cockroachdb_checksums`) to Omicron.

== Process for a tock release

. Change the associated constant `CockroachDbClusterVersion::POLICY` in
`nexus/types/src/deployment/planning_input.rs` from the previous major
version to the current major version.

== What Nexus does

The Reconfigurator collects the current cluster version, and compares
this to the desired cluster version set by policy (which we update in
tock releases).

If they do not match, CockroachDB ensures the
`cluster.preserve_downgrade_option` setting is the default value (an
empty string), which allows CockroachDB to perform the upgrade to the
desired version. The end result of this upgrade is that the current and
desired cluster versions will match.

When they match, Nexus ensures that the
`cluster.preserve_downgrade_option` setting is set to the current
cluster version, to prevent automatic cluster upgrades when CockroachDB
is next upgraded to a new major version.

Because blueprints are serialized and continue to run even if the
underlying state has changed, Nexus needs to ensure its view of the
world is not out-of-date. Nexus saves a fingerprint of the current
cluster state in the blueprint (intended to be opaque, but ultimately
a hash of the cluster version and executable version of the node we're
currently connected to). When setting CockroachDB options, it verifies
this fingerprint in a way that causes an error instead of setting the
option.

[bibliography]
== External References

- [[[crdb-tn-upgrades]]] Cockroach Labs. Cluster versions and upgrades.
November 2023.
https://github.com/cockroachdb/cockroach/blob/53262957399e6e0fccd63c91add57a510b86dc9a/docs/tech-notes/version_upgrades.md
13 changes: 13 additions & 0 deletions nexus/db-model/src/deployment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ use nexus_types::deployment::BlueprintTarget;
use nexus_types::deployment::BlueprintZoneConfig;
use nexus_types::deployment::BlueprintZoneDisposition;
use nexus_types::deployment::BlueprintZonesConfig;
use nexus_types::deployment::CockroachDbPreserveDowngrade;
use omicron_common::api::internal::shared::NetworkInterface;
use omicron_common::disk::DiskIdentity;
use omicron_uuid_kinds::GenericUuid;
Expand All @@ -41,6 +42,8 @@ pub struct Blueprint {
pub parent_blueprint_id: Option<Uuid>,
pub internal_dns_version: Generation,
pub external_dns_version: Generation,
pub cockroachdb_fingerprint: String,
pub cockroachdb_setting_preserve_downgrade: Option<String>,
pub time_created: DateTime<Utc>,
pub creator: String,
pub comment: String,
Expand All @@ -53,6 +56,10 @@ impl From<&'_ nexus_types::deployment::Blueprint> for Blueprint {
parent_blueprint_id: bp.parent_blueprint_id,
internal_dns_version: Generation(bp.internal_dns_version),
external_dns_version: Generation(bp.external_dns_version),
cockroachdb_fingerprint: bp.cockroachdb_fingerprint.clone(),
cockroachdb_setting_preserve_downgrade: bp
.cockroachdb_setting_preserve_downgrade
.to_optional_string(),
time_created: bp.time_created,
creator: bp.creator.clone(),
comment: bp.comment.clone(),
Expand All @@ -67,6 +74,12 @@ impl From<Blueprint> for nexus_types::deployment::BlueprintMetadata {
parent_blueprint_id: value.parent_blueprint_id,
internal_dns_version: *value.internal_dns_version,
external_dns_version: *value.external_dns_version,
cockroachdb_fingerprint: value.cockroachdb_fingerprint,
cockroachdb_setting_preserve_downgrade:
CockroachDbPreserveDowngrade::from_optional_string(
&value.cockroachdb_setting_preserve_downgrade,
)
.ok(),
time_created: value.time_created,
creator: value.creator,
comment: value.comment,
Expand Down
3 changes: 3 additions & 0 deletions nexus/db-model/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1504,6 +1504,9 @@ table! {

internal_dns_version -> Int8,
external_dns_version -> Int8,
cockroachdb_fingerprint -> Text,

cockroachdb_setting_preserve_downgrade -> Nullable<Text>,
}
}

Expand Down
3 changes: 2 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use std::collections::BTreeMap;
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(65, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(66, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(66, "blueprint-crdb-preserve-downgrade"),
KnownVersion::new(65, "region-replacement"),
KnownVersion::new(64, "add-view-for-v2p-mappings"),
KnownVersion::new(63, "remove-producer-base-route-column"),
Expand Down
Loading
Loading