Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (cluster): [day2-ops] node update configuration #403

Merged
merged 14 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 0 additions & 30 deletions 05-bootstrap-prep.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,36 +57,6 @@ In addition to Azure Container Registry being deployed to support bootstrapping,
# Get your ACR instance name
export ACR_NAME_AKS_BASELINE=$(az deployment group show -g rg-bu0001a0008 -n acr-stamp --query properties.outputs.containerRegistryName.value -o tsv)
echo ACR_NAME_AKS_BASELINE: $ACR_NAME_AKS_BASELINE

# Import core image(s) hosted in public container registries to be used during bootstrapping
az acr import --source ghcr.io/kubereboot/kured:1.15.0 -n $ACR_NAME_AKS_BASELINE
```

> In this walkthrough, there is only one image that is included in the bootstrapping process. It's included as a reference for this process. Your choice to use Kubernetes Reboot Daemon (Kured) or any other images, including Helm charts, as part of your bootstrapping is yours to make.

1. Update bootstrapping manifests to pull from your Azure Container Registry. *Optional. Fork required.*

> Your cluster will immediately begin processing the manifests in [`cluster-manifests/`](./cluster-manifests/) due to the bootstrapping configuration that will be applied to it. So, before you deploy the cluster now would be the right time push the following changes to your fork so that it will use your files instead of the files found in the original mspnp repo which point to public container registries:
>
> - update the one `image:` value in [`kured.yaml`](./cluster-manifests/cluster-baseline-settings/kured.yaml) to use your container registry instead of a public container registry. See the comment in the file for instructions (or you can simply run the following command.)

:warning: Without updating these files and using your own fork, you will be deploying your cluster such that it takes dependencies on public container registries. This is generally okay for exploratory/testing, but not suitable for production. Before going to production, ensure *all* image references you bring to your cluster are from *your* container registry (link imported in the prior step) or another that you feel confident relying on.

```bash
sed -i "s:ghcr.io:${ACR_NAME_AKS_BASELINE}.azurecr.io:" ./cluster-manifests/cluster-baseline-settings/kured.yaml
```

Note, that if you are on macOS, you might need to use the following command instead:

```bash
sed -i '' 's:ghcr.io:'"${ACR_NAME_AKS_BASELINE}"'.azurecr.io:g' ./cluster-manifests/cluster-baseline-settings/kured.yaml
```

Now commit changes to repository.

```bash
git commit -a -m "Update image source to use my ACR instance instead of a public container registry."
git push
```

### Save your work in-progress
Expand Down
13 changes: 13 additions & 0 deletions 07-bootstrap-validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,19 @@ GitOps allows a team to author Kubernetes manifest files, persist them in their
echo AKS_CLUSTER_NAME: $AKS_CLUSTER_NAME
```

1. Validate there are no available image upgrades. As this AKS cluster was recently deployed, only a race condition between publication of new available images and the deployment image fetch could result into a different state.

```bash
az aks nodepool get-upgrades -n npuser01 --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008 && \
az aks nodepool show -n npuser01 --cluster-name $AKS_CLUSTER_NAME -g rg-bu0001a0008 --query nodeImageVersion
```

> Typically, base node iamges doesn't contain a suffix with a date (i.e. `AKSUbuntu-2204gen2containerd`). If the `nodeImageVersion` value looks like `AKSUbuntu-2204gen2containerd-202402.26.0` a SecurityPatch or NodeImage upgrade has been applied to the AKS node.
ferantivero marked this conversation as resolved.
Show resolved Hide resolved

> The AKS nodes are configured to automatically receives weekly image updates including security patches, kernel and other node related stuff. AKS cluster version won't be automatically updated since production cluster should be manually updated after testing in lower environments.
ferantivero marked this conversation as resolved.
Show resolved Hide resolved

> Node image updates are shipped on a weekly default cadence. The maintenance window of this AKS cluster for node image updates is configured every Tuesday at 9PM. If that node image is released out of this maintenance window, the nodes will catchup on the following ocurrence. AKS nodes that require to be more frequently updated could consider changing its auto-upgrade channel to `SecurityPatch` and configure a daily maintenance window.
ferantivero marked this conversation as resolved.
Show resolved Hide resolved

1. Get AKS `kubectl` credentials.

> In the [Microsoft Entra ID Integration](03-microsoft-entra-id.md) step, we placed our cluster under Microsoft Entra group-backed RBAC. This is the first time we are seeing this used. `az aks get-credentials` sets your `kubectl` context so that you can issue commands against your cluster. Even when you have enabled Microsoft Entra ID integration with your AKS cluster, an Azure user has sufficient permissions on the cluster resource can still access your AKS cluster by using the `--admin` switch to this command. Using this switch *bypasses* Microsoft Entra ID and uses client certificate authentication instead; that isn't what we want to happen. So in order to prevent that practice, local account access such as `clusterAdmin` or `clusterMonitoringUser`) is expressly disabled.
Expand Down
21 changes: 19 additions & 2 deletions cluster-stamp.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -1640,7 +1640,7 @@ resource pdzAksIngress 'Microsoft.Network/privateDnsZones@2020-06-01' = {
}
}

resource mc 'Microsoft.ContainerService/managedClusters@2023-02-02-preview' = {
resource mc 'Microsoft.ContainerService/managedClusters@2024-01-02-preview' = {
name: clusterName
location: location
tags: {
Expand Down Expand Up @@ -1800,7 +1800,8 @@ resource mc 'Microsoft.ContainerService/managedClusters@2023-02-02-preview' = {
enabled: false // Using Microsoft Entra Workload IDs for pod identities.
}
autoUpgradeProfile: {
upgradeChannel: 'stable'
nodeOSUpgradeChannel: 'NodeImage'
upgradeChannel: 'none'
}
azureMonitorProfile: {
metrics: {
Expand Down Expand Up @@ -1907,6 +1908,22 @@ resource mc 'Microsoft.ContainerService/managedClusters@2023-02-02-preview' = {
kvPodMiIngressControllerKeyVaultReader_roleAssignment
kvPodMiIngressControllerSecretsUserRole_roleAssignment
]

resource os_maintenanceConfigurations 'maintenanceConfigurations' = {
name: 'aksManagedNodeOSUpgradeSchedule'
properties: {
maintenanceWindow: {
durationHours: 12
schedule: {
weekly: {
dayOfWeek: 'Tuesday'
intervalWeeks: 1
}
}
startTime: '21:00'
}
}
}
}

resource acrKubeletAcrPullRole_roleAssignment 'Microsoft.Authorization/roleAssignments@2020-10-01-preview' = {
Expand Down