Configurer et déployer Apache Airflow sur Azure Kubernetes Service (AKS)

Ce notebook vous guide à travers la configuration et le déploiement d’Apache Airflow sur AKS à l’aide de Helm.

## 1. Configurer une identité de charge de travail

In [None]:
%%bash
kubectl create namespace ${AKS_AIRFLOW_NAMESPACE} --dry-run=client --output yaml | kubectl apply -f -

In [None]:
%%bash
export TENANT_ID=$(az account show --query tenantId -o tsv)
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    azure.workload.identity/client-id: "${MY_IDENTITY_NAME_CLIENT_ID}"
    azure.workload.identity/tenant-id: "${TENANT_ID}"
  name: "${SERVICE_ACCOUNT_NAME}"
  namespace: "${AKS_AIRFLOW_NAMESPACE}"
EOF

## 2. Installer External Secrets Operator

In [None]:
%%bash
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets   --namespace ${AKS_AIRFLOW_NAMESPACE}   --create-namespace   --set installCRDs=true   --wait

## 3. Créer des secrets

In [None]:
%%bash
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1
kind: SecretStore
metadata:
  name: azure-store
  namespace: ${AKS_AIRFLOW_NAMESPACE}
spec:
  provider:
    azurekv:
      authType: WorkloadIdentity
      vaultUrl: "${KEYVAULTURL}"
      serviceAccountRef:
        name: ${SERVICE_ACCOUNT_NAME}
EOF

In [None]:
%%bash
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: airflow-aks-azure-logs-secrets
  namespace: ${AKS_AIRFLOW_NAMESPACE}
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: SecretStore
    name: azure-store
  target:
    name: ${AKS_AIRFLOW_LOGS_STORAGE_SECRET_NAME}
    creationPolicy: Owner
  data:
    - secretKey: azurestorageaccountname
      remoteRef:
        key: AKS-AIRFLOW-LOGS-STORAGE-ACCOUNT-NAME
    - secretKey: azurestorageaccountkey
      remoteRef:
        key: AKS-AIRFLOW-LOGS-STORAGE-ACCOUNT-KEY
EOF

## 4. Créer un volume persistant pour les journaux Airflow

In [None]:
%%bash
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-airflow-logs
spec:
  capacity:
    storage: 5Gi
  accessModes:
  - ReadWriteMany
  storageClassName: azureblob-fuse-premium
  csi:
    driver: blob.csi.azure.com
    volumeHandle: airflow-logs-1
    volumeAttributes:
      resourceGroup: ${MY_RESOURCE_GROUP_NAME}
      storageAccount: ${AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME}
      containerName: ${AKS_AIRFLOW_LOGS_STORAGE_CONTAINER_NAME}
    nodeStageSecretRef:
      name: ${AKS_AIRFLOW_LOGS_STORAGE_SECRET_NAME}
      namespace: ${AKS_AIRFLOW_NAMESPACE}
EOF

## 5. Déployer Apache Airflow avec Helm

In [None]:
%%bash
cat <<EOF > airflow_values.yaml
executor: "KubernetesExecutor"
postgresql:
  enabled: true
logs:
  persistence:
    enabled: true
    existingClaim: pvc-airflow-logs
EOF

helm repo add apache-airflow https://airflow.apache.org
helm repo update
helm install airflow apache-airflow/airflow --namespace airflow --create-namespace -f airflow_values.yaml --debug

## 6. Vérifier le déploiement et accéder à Airflow

In [None]:
%%bash
kubectl get pods -n airflow

In [None]:
%%bash
kubectl port-forward svc/airflow-webserver 8080:8080 -n airflow

## 7. Bonnes pratiques pour la production
- Utilisez une base de données managée (Azure PostgreSQL)
- Activez la surveillance (Prometheus, Grafana)
- Sécurisez les identités via Microsoft Entra Workload ID
- Gérez vos DAGs avec Git et CI/CD