# TSG109 - Set upgrade timeouts

## Description

When viewing the upgrade configmap it may report “Controller upgrade
stalled” or “ControllerDb upgrade has timed out”, e.g.:

> data: controller-upgrade:
> ‘{“upgradeInitiatedTimestamp”:“2019-12-19T21:07:37.1608034+00:00”,“lastTransitionTimestamp”:“2019-12-19T21:15:08.7304489+00:00”,“targetVersion”:“<image_tag>”,“currentVersion”:“15.0.4003.10009_2”,“targetRepository”:“<image_name>”,“currentRepository”:“<image_name>”,“currentState”:“NoUpgradeInProgress”,“previousState”:“RollingBackController”,“message”:“Controller
> upgrade
> stalled.”,“controllerUpgradeTimeoutInMinutes”:5,“componentUpgradeTimeoutInMinutes”:30,“totalUpgradeTimeoutInMinutes”:30,“stableUptimeThresholdInMinutes”:2}’

or

> data: controller-upgrade:
> ‘{“upgradeInitiatedTimestamp”:“2019-12-19T22:12:44.9427392+00:00”,“lastTransitionTimestamp”:“2019-12-19T22:25:13.9526729+00:00”,“targetVersion”:“<image_tag>”,“currentVersion”:“<image_tag>”,“targetRepository”:“<image_name>”:“<image_name>”,“currentState”:“NoUpgradeInProgress”,“previousState”:“RollingBackController”,“message”:“ControllerDb
> upgrade has timed out. Rolling back to version
> <image_tag>.”,“controllerUpgradeTimeoutInMinutes”:5,“componentUpgradeTimeoutInMinutes”:30,“totalUpgradeTimeoutInMinutes”:30,“stableUptimeThresholdInMinutes”:2}’

This can happen if it takes too long to pull the image. By default
upgrade allows for \~5 minutes. This setting can be increased by editing
the configmap, and bumping the field `controllerUpgradeTimeoutInMinutes`
to a higher value.

Recommend:

-   Increase `controllerUpgradeTimeoutInMinutes` field to 15 minutes,
    depending on network speed.
-   The `componentUpgradeTimeoutInMinutes` field may also need a bump,
    because if the image pull for controller is taking a while, it’s
    likely that the downloads for Hadoop and mssql-server images might
    take a while as well.

## Steps

Use these steps to troubleshoot the issue.

### Parameters

In [None]:
controller_timeout=20
controller_total_timeout=40
component_timeout=45

### Instantiate Kubernetes client

In [None]:
# Instantiate the Python Kubernetes client into 'api' variable

import os
from IPython.display import Markdown

try:
    from kubernetes import client, config
    from kubernetes.stream import stream
except ImportError: 

    # Install the Kubernetes module
    import sys
    !{sys.executable} -m pip install kubernetes    
    
    try:
        from kubernetes import client, config
        from kubernetes.stream import stream
    except ImportError:
        display(Markdown(f'HINT: Use [SOP059 - Install Kubernetes Python module](../install/sop059-install-kubernetes-module.ipynb) to resolve this issue.'))
        raise

if "KUBERNETES_SERVICE_PORT" in os.environ and "KUBERNETES_SERVICE_HOST" in os.environ:
    config.load_incluster_config()
else:
    try:
        config.load_kube_config()
    except:
        display(Markdown(f'HINT: Use [TSG118 - Configure Kubernetes config](../repair/tsg118-configure-kube-config.ipynb) to resolve this issue.'))
        raise

api = client.CoreV1Api()

print('Kubernetes client instantiated')

### Get the namespace for the big data cluster

Get the namespace of the Big Data Cluster from the Kuberenetes API.

**NOTE:**

If there is more than one Big Data Cluster in the target Kubernetes
cluster, then either:

-   set \[0\] to the correct value for the big data cluster.
-   set the environment variable AZDATA_NAMESPACE, before starting Azure
    Data Studio.

In [None]:
# Place Kubernetes namespace name for BDC into 'namespace' variable

if "AZDATA_NAMESPACE" in os.environ:
    namespace = os.environ["AZDATA_NAMESPACE"]
else:
    try:
        namespace = api.list_namespace(label_selector='MSSQL_CLUSTER').items[0].metadata.name
    except IndexError:
        from IPython.display import Markdown
        display(Markdown(f'HINT: Use [TSG081 - Get namespaces (Kubernetes)](../monitor-k8s/tsg081-get-kubernetes-namespaces.ipynb) to resolve this issue.'))
        display(Markdown(f'HINT: Use [TSG010 - Get configuration contexts](../monitor-k8s/tsg010-get-kubernetes-contexts.ipynb) to resolve this issue.'))
        display(Markdown(f'HINT: Use [SOP011 - Set kubernetes configuration context](../common/sop011-set-kubernetes-context.ipynb) to resolve this issue.'))
        raise

print('The kubernetes namespace for your big data cluster is: ' + namespace)

### Set upgrade timeouts

Set the timeouts for upgrades. The timeout settings are as follows

-   controllerUpgradeTimeoutInMinutes: sets the max amount of time for
    the controller or controllerdb to finish upgrading
-   totalUpgradeTimeoutInMinutes: sets the max amount of time to wait
    for both the controller and controllerdb to complete their upgrade
-   componentUpgradeTimeoutInMinutes: sets the max amount of time
    allowed for subsequent phases of the upgrade to complete

In [None]:
import json

upgrade_config_map = api.read_namespaced_config_map("controller-upgrade-configmap", namespace)

upgrade_config = json.loads(upgrade_config_map.data["controller-upgrade"])
upgrade_config["controllerUpgradeTimeoutInMinutes"] = controller_timeout
upgrade_config["totalUpgradeTimeoutInMinutes"] = controller_total_timeout
upgrade_config["componentUpgradeTimeoutInMinutes"] = component_timeout
upgrade_config_map.data["controller-upgrade"] = json.dumps(upgrade_config)

api.patch_namespaced_config_map("controller-upgrade-configmap", namespace, upgrade_config_map)

In [None]:
print("Notebook execution is complete.")

Related
-------

- [TSG108 - View the controller upgrade config map](../diagnose/tsg108-controller-failed-to-upgrade.ipynb)
