Skip to content

Commit

Permalink
OSDOCS-2427: Update best practices
Browse files Browse the repository at this point in the history
  • Loading branch information
skopacz1 committed Nov 28, 2023
1 parent acc7329 commit 98427d1
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 1 deletion.
57 changes: 57 additions & 0 deletions modules/update-best-practices.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
// Module included in the following assemblies:
//
// * updating/preparing_for_updates/updating-cluster-prepare.adoc

:_mod-docs-content-type: PROCEDURE
[id="update-best-practices_{context}"]
= Best practices for cluster updates

{product-title} is designed to provide a robust update experience that allows clusters to update with minimal disruptions to workloads.
Updates will not begin unless the cluster is determined to be in an upgradeable state at the time of the update request.

While this design helps ensure that updates are successful as long as some key conditions are met, there are a number of actions you can take to increase your chances of a successful cluster update.

[discrete]
[id="recommended-versions_{context}"]
=== Choose versions recommended by the OpenShift Update Service

The OpenShift Update Service (OSUS) provides update recommendations based on cluster characteristics such as the cluster's subscribed channel, which are then saved by the Cluster Version Operator as either recommended or conditional updates.
While it is possible to attempt an update to a version that is not recommended by OSUS, doing so significantly increases the risk of update failure or unintended consequences to the cluster after the update has finished.

Choose only update targets that are recommended by OSUS to ensure a successful update.

[discrete]
[id="critical-alerts_{context}"]
=== Address all critical alerts on the cluster

Critical alerts must always be addressed as soon as possible, but it is especially important to address these alerts and resolve any problems before initiating a cluster update.
Failing to address critical alerts before beginning an update can cause a loss of data or a major failure of cluster services.

You should also periodically review Warning and Info alerts on the cluster to address any potentially problematic conditions before initiating an update.

[discrete]
[id="cluster-upgradeable_{context}"]
=== Ensure that the cluster is in an Upgradable state

When one or more Operators have not reported their `Upgradeable` condition as `true` for more than an hour, the `ClusterNotUpgradeable` warning alert is triggered in the cluster.
In most cases patch updates are not blocked by this alert, but you cannot perform a minor version update until this alert is resolved and all Operators report `Upgradeable` as `true`.

[discrete]
[id="nodes-ready_{context}"]
=== Ensure that all nodes are available

// Completely guessing the explanation in this section just to have something to start with when this is reviewed by an SME.
Nodes should not be down when beginning an update.
Nodes that are not running and available may limit a cluster's ability to perform an update with minimal disruption to cluster workloads.

Depending the on the configured value of the cluster's `maxUnavailable` spec, an unavailable node can also prevent itself and other nodes from having machine configuration changes applied during a cluster update.

[discrete]
[id="pod-disruption-budget_{context}"]
=== Ensure that the cluster's PodDisruptionBudget is properly configured

The `PodDisruptionBudget` object allows you to define the minimum number or percentage of pod replicas that must be available at any given time.
This configuration allows workloads to be protected from disruptions during maintenance tasks such as cluster updates.

However, it is possible to configure the `PodDisruptionBudget` for a given topology in a way that prevents nodes from being drained and updated during a cluster update.
When planning a cluster update, check the configuration of the `PodDisruptionBudget` object to ensure that it will not prevent nodes from being drained, unless it is your explicit intention to keep a workload safe during the update process.
5 changes: 4 additions & 1 deletion updating/preparing_for_updates/updating-cluster-prepare.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,7 @@ include::modules/update-preparing-conditional.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources
* xref:../../updating/understanding_updates/how-updates-work.adoc#update-evaluate-availability_how-updates-work[Evaluation of update availability]
* xref:../../updating/understanding_updates/how-updates-work.adoc#update-evaluate-availability_how-updates-work[Evaluation of update availability]

// Best practices for cluster updates
include::modules/update-best-practices.adoc[leveloffset=+1]

0 comments on commit 98427d1

Please sign in to comment.