Skip to content

Commit

Permalink
feat (cluster): [day2-ops] image update configuration node-level only (
Browse files Browse the repository at this point in the history
  • Loading branch information
ferantivero committed Mar 20, 2024
1 parent 065d44b commit 9754715
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 64 deletions.
51 changes: 5 additions & 46 deletions 07-bootstrap-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,59 +26,18 @@ GitOps allows a team to author Kubernetes manifest files, persist them in their
echo AKS_CLUSTER_NAME: $AKS_CLUSTER_NAME
```

1. Validate the current day2 strategy this baseline follows to upagrade the AKS cluster

```bash
az aks show -n $AKS_CLUSTER_NAME -g rg-bu0001a0008 --query "autoUpgradeProfile"
```

```outcome
{
"nodeOsUpgradeChannel": "NodeImage",
"upgradeChannel": "node-image"
}
```

> This cluster now receives weekly updates for both the Operating System (OS) and Kubernetes. For workloads that need to always run the most secure OS version, you can opt-in for regular updates by selecting the `SecurityPatch` channel.

> The node update phase of the cluster’s lifecycle belongs to day2 operations. Cluster operations will regularly update node images for two main reasons: 1) to update the Kubernetes cluster version, and 2) to keep up with node-level OS updates. A new AKS release introduces new features, such as addons and new Kubernetes versions, while new AKS node images bring changes at the OS level. Both types of releases adhere to Azure Safe Deployment Practices for rollout across all regions. For more information, please refer to [How to use the release tracker](https://learn.microsoft.com/azure/aks/release-tracker#how-to-use-the-release-tracker). Additionally, cluster operations aim to stay updated with supported Kubernetes versions for Service Level Agreement (SLA) compliance and to avoid accumulating updates, as version updates cannot be skipped at will. For more details, please see [Kubernetes version upgrades](https://learn.microsoft.com/azure/aks/upgrade-aks-cluster?tabs=azure-cli#kubernetes-version-upgrades).

> When a new update becomes available, it can be manually applied for the greatest degree of control by making requests against the Azure control plane. Alternatively, the operations team can opt to automatically update to the latest version by configuring an update channel to follow the desired cadence. This can be combined with a planned maintenance window, one for Kubernetes version updates and another for OS-level upgrades. AKS offers two different configurable auto-upgrade channels dedicated to these update types. For more information, please refer to [Upgrade options for Azure Kubernetes Service (AKS) clusters](https://learn.microsoft.com/azure/aks/upgrade-cluster). Node pools in this AKS cluster span multiple availability zones. Therefore, it’s important to note that automatic updates are conducted based on a best-effort zone balancing in node groups. To prevent zone imbalance and increase availability, Nodes Max Surge and Pod Disruption Budget are configured in this baseline. By default, cluster nodes are updated one at a time. Max Surge can adjust the speed of a cluster upgrade. In clusters with 6+ nodes hosting disruption-sensitive workloads, a surge of up to `33%` is recommended for a safe upgrade pace. For more information, please see [Customer node surge upgrade](https://learn.microsoft.com/azure/aks/upgrade-aks-cluster?tabs=azure-cli#customize-node-surge-upgrade). To minimize disruption, production clusters should be configured with [node draining timeout](https://learn.microsoft.com/azure/aks/upgrade-aks-cluster?tabs=azure-cli#set-node-drain-timeout-valuei) and [soak time](https://learn.microsoft.com/azure/aks/upgrade-aks-cluster?tabs=azure-cli#set-node-soak-time-value), taking into account the specific characteristics of their workloads.

1. See your maitenance configuration

```bash
az aks maintenanceconfiguration list --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008
```

> When managing an Azure Kubernetes Service (AKS) cluster, it is crucial to plan your upgrades thoughtfully. Here are some recommendations to consider:
>
> Mindful Timing for Upgrades:
> - Be mindful of when upgrades should occur. If you have overlapping maintenance windows, AKS will determine the running order.
> - To avoid conflicts, leave at least 24 hours between maintenance window configurations. The timing will depend on the number of nodes in your specific cluster and the duration required for upgrades.
>
> OS-Level Updates:
> - By default, the OS-level updates maintenance window is scheduled on a weekly cadence. This is because the OS channel is configured with `NodeImage`, where a new node image is shipped every week.
> - If you choose the `SecurityPatch` channel, consider changing the maintenance window to daily for more frequent updates.
>
> Kubernetes Version Management:
> - To stay current with the latest Kubernetes version, a monthly cadence is generally sufficient. However, you can adjust this based on your specific needs.
> - For more regular updates, configure your cluster to upgrade every two weeks.
>
> Maintenance Operations:
> - Keep in mind that performing maintenance operations is considered best-effort. They are not guaranteed to occur within a specific window.
> - While it’s not strictly recommended, if you require greater control, consider manually updating your cluster.
>
> Remember that these guidelines provide flexibility, allowing you to strike a balance between timely updates and operational control. Choose the approach that aligns best with your organization’s requirements.

1. Validate there are no available image upgrades. As this AKS cluster was recently deployed, only a race condition between publication of new available images and the deployment image fetch could result into a different state.

```bash
az aks nodepool get-upgrades -n npuser01 --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008 && \
az aks nodepool show -n npuser01 --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008 --query nodeImageVersion
```

> Typically, base node iamges doesn't contain a suffix with a date (i.e. `AKSUbuntu-2204gen2containerd`). If the `nodeImageVersion` value looks like `AKSUbuntu-2204gen2containerd-202402.26.0` a SecurityPatch or NodeImage upgrade has been applied to the aks node.
> Typically, base node iamges doesn't contain a suffix with a date (i.e. `AKSUbuntu-2204gen2containerd`). If the `nodeImageVersion` value looks like `AKSUbuntu-2204gen2containerd-202402.26.0` a SecurityPatch or NodeImage upgrade has been applied to the AKS node.
> The AKS nodes are configured to automatically receives weekly image updates including security patches, kernel and other node related stuff. AKS cluster version won't be automatically updated since production cluster should be manually updated after testing in lower environments.

> Node image updates are shipped on a weekly default cadence. The maintenance window of this AKS cluster for node image updates is configured every Tuesday at 9PM. If that node image is released out of this maintenance window, the nodes will catchup on the following ocurrence. AKS nodes that require to be more frequently updated could consider changing its auto-upgrade channel to `SecurityPatch` and configure a daily maintenance window.

1. Get AKS `kubectl` credentials.

Expand Down
19 changes: 1 addition & 18 deletions cluster-stamp.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -1818,7 +1818,7 @@ resource mc 'Microsoft.ContainerService/managedClusters@2024-01-02-preview' = {
}
autoUpgradeProfile: {
nodeOSUpgradeChannel: 'NodeImage'
upgradeChannel: 'node-image'
upgradeChannel: 'none'
}
azureMonitorProfile: {
metrics: {
Expand Down Expand Up @@ -1937,27 +1937,10 @@ resource mc 'Microsoft.ContainerService/managedClusters@2024-01-02-preview' = {
intervalWeeks: 1
}
}
startTime: '09:00'
}
}
}

resource k8s_maintenanceConfigurations 'maintenanceConfigurations' = {
name: 'aksManagedAutoUpgradeSchedule'
properties: {
maintenanceWindow: {
durationHours: 12
schedule: {
weekly: {
dayOfWeek: 'Wednesday'
intervalWeeks: 2
}
}
startTime: '21:00'
}
}
}

}

resource acrKubeletAcrPullRole_roleAssignment 'Microsoft.Authorization/roleAssignments@2020-10-01-preview' = {
Expand Down

0 comments on commit 9754715

Please sign in to comment.