# Cluster Status Notebook
This notebook allows you to see the status of the controller, master instance, and pools in your SQL Server big data cluster.

In [15]:
import sys, platform

if platform.system()=="Windows":
    user = ' --user'
else:
    user = ''
    
cmd = f'{sys.executable} -m pip uninstall --yes mssqlctl-cli-storage'
cmdOutput = !{cmd}
cmd = f'{sys.executable} -m pip uninstall -r http://helsinki/browse/packages/python/aris-p-release-candidate-gb/mssqlctl/requirements.txt --yes'

cmdOutput = !{cmd}
cmdOutput = ''.join(cmdOutput)
if 'is not installed' in cmdOutput or 'Successfully uninstalled mssqlctl' in cmdOutput:
    print("Uninstalling mssqlctl successful: " + cmd)
else:
    raise SystemExit(f'Uninstall of mssqlctl failed:\n\n\t{cmd}\n\nreturned non-zero exit code: ' + ''.join(cmdOutput) + '.\n')

cmd = f'{sys.executable} -m pip install -r http://helsinki/browse/packages/python/aris-p-release-candidate-gb/mssqlctl/requirements.txt{user} --trusted-host helsinki'
print("Installing the latest version of mssqlctl: " + cmd)
cmdOutput = !{cmd}
cmdOutput = ''.join(cmdOutput)
if 'Requirement already satisfied' in cmdOutput or 'Successfully installed mssqlctl' in cmdOutput:
    print(f'\nSUCCESS: Upgraded the mssqlctl to the latest version')
else:
    raise SystemExit(f'Installation of mssqlctl failed:\n\n\t{cmd}\n\nreturned non-zero exit code: ' + ''.join(cmdOutput) + '.\n')

#install pandas
cmd = f'{sys.executable} -m pip show pandas'
cmdOutput = !{cmd}
if len(cmdOutput) > 0 and '0.24' in cmdOutput[1]:
    print('Pandas required version is already installed!')
else:
    pandasVersion = 'pandas==0.24.2'
    cmd = f'{sys.executable} -m pip install {pandasVersion}'
    cmdOutput = !{cmd}
    print(f'\nSuccess: Upgraded pandas.')


Uninstalling mssqlctl successful: /Users/madhurikoripalli/azuredatastudio-python/0.0.1/bin/python3.6 -m pip uninstall -r http://helsinki/browse/packages/python/aris-p-release-candidate-gb/mssqlctl/requirements.txt --yes
Installing the latest version of mssqlctl: /Users/madhurikoripalli/azuredatastudio-python/0.0.1/bin/python3.6 -m pip install -r http://helsinki/browse/packages/python/aris-p-release-candidate-gb/mssqlctl/requirements.txt --trusted-host helsinki



SUCCESS: Upgraded the mssqlctl to the latest version


Pandas required version is already installed!


In [11]:
import os, getpass, sys, json
import pandas as pd
import numpy as np
from IPython.display import *

# Check if mssqlctl is installed
cmd = f'{sys.executable} -m mssqlctl --version'
mssqlctl_version = !{cmd}

if 'mssqlctl: command not found' in mssqlctl_version[0]:
        raise SystemExit(f'mssqlctl is Required, Please install the latest version before proceeding. Thanks! ' + '.\n')

 # set display colwidth to avoid truncation of result data 
pd.set_option('display.max_colwidth', -1)
# Prompt user inputs:
cluster_name = input('Please provide your Cluster Name: ')
if cluster_name == "":
    raise SystemExit(f'Cluster Name is required!' + '\n')
controller_username = input('Please provide your Controller Username for login: ')
if controller_username == "":
    raise SystemExit(f'Controller Username is required!' + '\n')
controller_password = getpass.getpass(prompt='Controller Password: ')
if controller_password == "":
    raise SystemExit(f'Password is required!' + '\n')
else:
    print('***********')

# Login in to your big data cluster 
cmd = f'mssqlctl login -n {cluster_name} -u {controller_username} -a yes'
print("Start " + cmd)
os.environ['CONTROLLER_USERNAME'] = controller_username
os.environ['CONTROLLER_PASSWORD'] = controller_password
os.environ['ACCEPT_EULA'] = 'yes'

loginResult = !{cmd}
if 'ERROR: Please check your kube config or specify the correct controller endpoint with: --controller-endpoint https://<ip>:<port>.' in loginResult[0] or 'ERROR' in loginResult[0]:
    controller_ip = input('Please provide your Controller endpoint: ')
    if controller_ip == "":
        raise SystemExit(f'Controller IP is required!' + '\n')
    else:
        cmd = f'mssqlctl login -n {cluster_name} -e {controller_ip} -u {controller_username} -a yes'
        loginResult = !{cmd}
print(loginResult)
# mssqlctl login -n test -e  https://10.127.22.122:30080 -u controlleradmin -a yes    
# mssqlctl login -n june14-bdc -u admin -a yes
# User `admin` logged in successfully to `https://13.68.131.229:30080

***********
Start mssqlctl login -n june14-bdc -u admin -a yes


['User `admin` logged in successfully to `https://13.68.131.229:30080`']


In [12]:
# Display status of big data cluster
def formatColumnNames(column):
    return ' '.join(word[0].upper() + word[1:] for word in column.split())

def show_results(input):
    input = ''.join(input)
    results = json.loads(input)
    df = pd.DataFrame(results)
    df.columns = [formatColumnNames(n) for n in results[0].keys()]
    mydata = HTML(df.to_html(render_links=True))
    display(mydata)

results  = !mssqlctl bdc status show
strRes = ''.join(results)
jsonRes = json.loads(strRes)
dtypes = '{'
spark = [x for x in jsonRes if x['kind'] == 'Spark']
if spark:
    spark_exists = True
else:
    spark_exists = False
show_results(results)

Unnamed: 0,Kind,Name,State
0,BDC,june14-bdc,Ready
1,Control,default,Ready
2,Master,default,Ready
3,Compute,default,Ready
4,Data,default,Ready
5,Storage,default,Ready


## Controller status
The controller hosts the core logic for deploying and managing a big data cluster. It takes care of all interactions with Kubernetes, SQL server instances that are part of the cluster and other components like like HDFS and Spark. 

To learn more, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-controller?view=sql-server-ver15)

In [13]:
# Display status of controller
results = !mssqlctl bdc control status show
show_results(results)

Unnamed: 0,Kind,LogsUrl,Name,NodeMetricsUrl,SqlMetricsUrl,State
0,DaemonSet,-,metricsdc,https://13.68.131.229:30080/clusters/june14-bdc/pods/metricsdc-5bdk8/nodemetrics/ui,-,Ready
1,ReplicaSet,-,metricsui,https://13.68.131.229:30080/clusters/june14-bdc/pods/metricsui-sbvdp/nodemetrics/ui,-,Ready
2,StatefulSet,-,metricsdb,https://13.68.131.229:30080/clusters/june14-bdc/pods/metricsdb-0/nodemetrics/ui,-,Ready
3,ReplicaSet,-,logsui,https://13.68.131.229:30080/clusters/june14-bdc/pods/logsui-6kcnj/nodemetrics/ui,-,Ready
4,StatefulSet,-,logsdb,https://13.68.131.229:30080/clusters/june14-bdc/pods/logsdb-0/nodemetrics/ui,-,Ready
5,StatefulSet,-,nmnode-0,https://13.68.131.229:30080/clusters/june14-bdc/pods/nmnode-0-0/nodemetrics/ui,-,Ready
6,StatefulSet,https://13.68.131.229:30080/clusters/june14-bdc/pods/gateway-0/logs/ui,gateway,https://13.68.131.229:30080/clusters/june14-bdc/pods/gateway-0/nodemetrics/ui,-,Ready
7,ReplicaSet,-,mgmtproxy,https://13.68.131.229:30080/clusters/june14-bdc/pods/mgmtproxy-rntml/nodemetrics/ui,-,Ready
8,ReplicaSet,https://13.68.131.229:30080/clusters/june14-bdc/pods/control-l7bkg/logs/ui,control,https://13.68.131.229:30080/clusters/june14-bdc/pods/control-l7bkg/nodemetrics/ui,-,Running
9,StatefulSet,https://13.68.131.229:30080/clusters/june14-bdc/pods/controldb-0/logs/ui,controldb,https://13.68.131.229:30080/clusters/june14-bdc/pods/controldb-0/nodemetrics/ui,-,Running


## Master Instance status
The master instance is a SQL Server instance running in a SQL Server big data cluster control plane.

To learn more, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-master-instance?view=sqlallproducts-allversions)

In [5]:
# Display status of master instance
results = !mssqlctl bdc pool status show -k master -n default
show_results(results)

## Compute Pool status
Compute pools provide scale-out computational resources for a big data cluster.

To learn more, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-compute-pool?view=sqlallproducts-allversions)

In [6]:
# Display status of compute pool
results = !mssqlctl bdc pool status show -k compute -n default
show_results(results)

## Storage Pool status
Storage pool are responsible for:
- Data ingestion through Spark.
- Data storage in HDFS (Parquet format). HDFS also provides data persistency, as HDFS data is spread across all the storage nodes in the SQL big data cluster.
- Data access through HDFS And SQL Server endpoints.

To learn more, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-storage-pool?view=sqlallproducts-allversions)

In [7]:
# Display status of storage pools
results = !mssqlctl bdc pool status show -k storage -n default
show_results(results)

## Data Pool status
SQL data pool instances provide persistent SQL Server storage for the cluster. A data pool is used to ingest data from SQL queries or Spark jobs. 

To learn more, [read here.](https://docs.microsoft.com/sql/big-data-cluster/concept-data-pool?view=sqlallproducts-allversions)

In [8]:
# Display status of data pools
results = !mssqlctl bdc pool status show -k data -n default
show_results(results)

## Spark Pool status


In [9]:
# Display status of spark pool
if spark_exists:
    results = !mssqlctl bdc pool status show -k spark -n default
    show_results(results)
else:
    print('No spark pool.')