OSDOCS-2427: Update best practices

openshift · Nov 28, 2023 · 98427d1 · 98427d1
1 parent acc7329
commit 98427d1
Show file tree

Hide file tree

Showing 2 changed files with 61 additions and 1 deletion.
diff --git a/modules/update-best-practices.adoc b/modules/update-best-practices.adoc
@@ -0,0 +1,57 @@
+// Module included in the following assemblies:
+//
+// * updating/preparing_for_updates/updating-cluster-prepare.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="update-best-practices_{context}"]
+= Best practices for cluster updates
+
+{product-title} is designed to provide a robust update experience that allows clusters to update with minimal disruptions to workloads.
+Updates will not begin unless the cluster is determined to be in an upgradeable state at the time of the update request.
+
+While this design helps ensure that updates are successful as long as some key conditions are met, there are a number of actions you can take to increase your chances of a successful cluster update.
+
+[discrete]
+[id="recommended-versions_{context}"]
+=== Choose versions recommended by the OpenShift Update Service
+
+The OpenShift Update Service (OSUS) provides update recommendations based on cluster characteristics such as the cluster's subscribed channel, which are then saved by the Cluster Version Operator as either recommended or conditional updates.
+While it is possible to attempt an update to a version that is not recommended by OSUS, doing so significantly increases the risk of update failure or unintended consequences to the cluster after the update has finished.
+
+Choose only update targets that are recommended by OSUS to ensure a successful update.
+
+[discrete]
+[id="critical-alerts_{context}"]
+=== Address all critical alerts on the cluster
+
+Critical alerts must always be addressed as soon as possible, but it is especially important to address these alerts and resolve any problems before initiating a cluster update.
+Failing to address critical alerts before beginning an update can cause a loss of data or a major failure of cluster services.
+
+You should also periodically review Warning and Info alerts on the cluster to address any potentially problematic conditions before initiating an update.
+
+[discrete]
+[id="cluster-upgradeable_{context}"]
+=== Ensure that the cluster is in an Upgradable state
+
+When one or more Operators have not reported their `Upgradeable` condition as `true` for more than an hour, the `ClusterNotUpgradeable` warning alert is triggered in the cluster.
+In most cases patch updates are not blocked by this alert, but you cannot perform a minor version update until this alert is resolved and all Operators report `Upgradeable` as `true`.
+
+[discrete]
+[id="nodes-ready_{context}"]
+=== Ensure that all nodes are available
+
+// Completely guessing the explanation in this section just to have something to start with when this is reviewed by an SME.
+Nodes should not be down when beginning an update.
+Nodes that are not running and available may limit a cluster's ability to perform an update with minimal disruption to cluster workloads.
+
+Depending the on the configured value of the cluster's `maxUnavailable` spec, an unavailable node can also prevent itself and other nodes from having machine configuration changes applied during a cluster update.
+
+[discrete]
+[id="pod-disruption-budget_{context}"]
+=== Ensure that the cluster's PodDisruptionBudget is properly configured
+
+The `PodDisruptionBudget` object allows you to define the minimum number or percentage of pod replicas that must be available at any given time.
+This configuration allows workloads to be protected from disruptions during maintenance tasks such as cluster updates.
+
+However, it is possible to configure the `PodDisruptionBudget` for a given topology in a way that prevents nodes from being drained and updated during a cluster update.
+When planning a cluster update, check the configuration of the `PodDisruptionBudget` object to ensure that it will not prevent nodes from being drained, unless it is your explicit intention to keep a workload safe during the update process.
diff --git a/updating/preparing_for_updates/updating-cluster-prepare.adoc b/updating/preparing_for_updates/updating-cluster-prepare.adoc
@@ -51,4 +51,7 @@ include::modules/update-preparing-conditional.adoc[leveloffset=+1]
 
 [role="_additional-resources"]
 .Additional resources
-* xref:../../updating/understanding_updates/how-updates-work.adoc#update-evaluate-availability_how-updates-work[Evaluation of update availability]
+* xref:../../updating/understanding_updates/how-updates-work.adoc#update-evaluate-availability_how-updates-work[Evaluation of update availability]
+
+// Best practices for cluster updates
+include::modules/update-best-practices.adoc[leveloffset=+1]