### Azure Open-OnDemand Devito SLURM
https://github.com/edwardsp/Azure-OnDemand

Prerequisites:  
 - control_plane_ACR, for example [created via Azure Portal](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-get-started-portal).
 
Runing Environments - either a conda env or miniconda3 docker container:
    - a conda env defined by /workspace/apps/devito/azure_ood_devito_conda_control_plane.yml
    - a continuumio/miniconda3 docker container with a conda env based on above conda .yml file, in which case host dirs are defined by the apps/devito/not_shared/sibling_docker.env [dotenv](https://github.com/theskumar/python-dotenv) file (variable  DOCKER_CONTAINER_MOUNT_POINT).  
    
    
 <a id='user_input_requiring_steps'></a>
Repro steps (require user input):
   1. (Optional) [Edit config variables](#az_cli_variables) by editing the az cli .sh bash file in this notebook.
   2. [Fill in and save](#dot_env_description) sensitive and configuration information using python code/variables.  
   3. [Azure CLI login ](#Azure_cli_login) is required once in az_cli_docker_image
   4. (Optional) [Edit](#az_cli_bash_script) Azure_ood resources names (driven by __prefix__) by editing the az cli .sh bash file in this notebook.


In [1]:
# Allow multiple displays per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
import sys, os, pathlib, shutil
import platform,  dotenv
import subprocess

In [3]:
platform.platform()
os.getcwd()

'Linux-4.15.0-1091-azure-x86_64-with-debian-10.3'

'/workspace/apps/devito'

<a id='az_cli_variables'></a>
##### 1. Edit config variables saved in local_config_file below. 
__local_config_file__ is a configuration .env file that can be shell sourced, i,e, it does not contain python code.  
Variables __azure_ood_resources_prefix__ and __azure_ood_dir__ control Azure resources names and local azure_ood directory. Changing them creates a distinct Azure resource sets.  
  
[Back](#user_input_requiring_steps) to list of repro steps.

In [4]:
local_config_file='_local_config.env'

In [5]:
%%writefile $local_config_file 

azure_ood_resources_prefix='ghiordanood01'
azure_ood_dir_list=".,azure_ood_temp_01"

not_shared_dir_list=".,not_shared"
general_config_file="general.env"
sibling_docker_file="sibling_docker.env"

docker_files_dir="docker_files"
control_plane_docker_build_dir="control_plane_docker_build"
docker_build_no_cache=""  # '--no-cache' # or '' #

az_cli_container="signed_in_az_cli__container01"
signed_in_az_cli_image="signed_in_az_cli_image"

azure_ood_setup_resource_naming_script_file="azure_ood_setup_resource_naming.sh"
azure_ood_setup_script_step010_file="azure_ood_setup_step010.sh"
azure_ood_setup_script_step020_file="azure_ood_setup_step020.sh"

azure_ood_secrets_file="azure_ood_secrets.env"
azure_ood_resources_file="azure_resources.py"


use_ACR_and_dockerhub=1 # bool('False') is True

Overwriting _local_config.env


In [6]:
dotenv.load_dotenv(dotenv_path=local_config_file, override=True)

not_shared_dir_list=os.getenv('not_shared_dir_list').split(",")
DOTENV_FILE_PATH = not_shared_dir_list + [os.getenv('general_config_file')]
SIBLING_DOCKER_DOTENV_FILE_PATH = not_shared_dir_list + [os.getenv('sibling_docker_file')]

az_cli_container = os.getenv('az_cli_container')
signed_in_az_cli_image = os.getenv('signed_in_az_cli_image')


azure_ood_resources_prefix = os.getenv('azure_ood_resources_prefix')
azure_ood_dir = os.path.join(*( os.getenv('azure_ood_dir_list').split(",")))
azure_ood_setup_resource_naming_script =  os.path.join(*([azure_ood_dir] + [os.getenv('azure_ood_setup_resource_naming_script_file')]))
azure_ood_setup_script_step010 = os.path.join(*([azure_ood_dir] + [os.getenv('azure_ood_setup_script_step010_file')] ))
azure_ood_setup_script_step020 = os.path.join(*([azure_ood_dir] + [os.getenv('azure_ood_setup_script_step020_file')] ))

azure_ood_secrets_file_dir = azure_ood_dir 
azure_ood_secrets_file = os.getenv('azure_ood_secrets_file')

azure_ood_resources_file_dir = azure_ood_dir 
azure_ood_resources_file = os.getenv('azure_ood_resources_file')

docker_files_dir = os.path.join(*([azure_ood_dir]+ [os.getenv('docker_files_dir')]))
control_plane_docker_build_dir = os.getenv('control_plane_docker_build_dir')
docker_build_no_cache = os.getenv('docker_build_no_cache')

use_ACR_and_dockerhub = bool(int(os.getenv('use_ACR_and_dockerhub')))

True

In [7]:
os.makedirs(azure_ood_dir, exist_ok=True)
azure_ood_dir

'./azure_ood_temp_01'

In [8]:
def create_empty_file(dotenv_file_path_list):
    created_dotenv_file_path = os.path.join(*(dotenv_file_path_list)) 
    os.makedirs(os.path.join(*(dotenv_file_path_list[:-1])), exist_ok=True)
    pathlib.Path(created_dotenv_file_path).touch()
    return created_dotenv_file_path

dotenv_file_path = create_empty_file(DOTENV_FILE_PATH)
sibling_docker_dotenv_file_path = create_empty_file(SIBLING_DOCKER_DOTENV_FILE_PATH)
azure_ood_secrets_file_path = create_empty_file([azure_ood_secrets_file_dir]+ [azure_ood_secrets_file])
azure_ood_resources_file_path = create_empty_file([azure_ood_resources_file_dir]+ [azure_ood_resources_file])


# # show .env file path
# !pwd
dotenv_file_path
sibling_docker_dotenv_file_path
azure_ood_secrets_file_path

'./not_shared/general.env'

'./not_shared/sibling_docker.env'

'./azure_ood_temp_01/azure_ood_secrets.env'

<a id='dot_env_description'></a>
##### 2. Input here sensitive and configuration information

A [dotenv](https://github.com/theskumar/python-dotenv) file is used to store config info and hide sensitive info. 
  
REQUIRED actions for the 2 cells below: 
- Input the required info in first cell below.  
- Uncomment second cell below.  
- Run both cells once. The sensitive information from first cell will be saved in the second cell in .env files (__dotenv_file_path__ and __sibling_docker_dotenv_file_path__) that should likely be git ignored. 
- after running next 2 cells once, second cell can be commmented. Future runs of this notebook will re-use the info saved in .env files.

[Back](#user_input_requiring_steps) to list of repro steps.

In [9]:
control_plane_ACR= "" #e.g. "control_plane_acr 
control_plane_ACR_uname=""
control_plane_ACR_password=""

dockerhub_login =  ""
dockerhub_pwd = ""

SUBSCRIPTION_ID=""

control_plane_docker_image_name = "" 
control_plane_docker_image_tag=""

In [10]:
# dotenv.set_key(dotenv_file_path, 'CONTROL_PLANE_ACR', control_plane_ACR)
# dotenv.set_key(dotenv_file_path, 'CONTROL_PLANE_ACR_USERNAME', control_plane_ACR_uname)
# dotenv.set_key(dotenv_file_path, 'CONTROL_PLANE_ACR_PASSWORD', control_plane_ACR_password)


# dotenv.set_key(dotenv_file_path, 'DOCKERHUB_LOGIN', dockerhub_login)
# dotenv.set_key(dotenv_file_path, 'DOCKERHUB_PWD', dockerhub_pwd)

# dotenv.set_key(dotenv_file_path, 'SUBSCRIPTION_ID', SUBSCRIPTION_ID)

# dotenv.set_key(dotenv_file_path,'control_plane_docker_image_name', control_plane_docker_image_name)
# dotenv.set_key(dotenv_file_path,'control_plane_docker_image_tag', control_plane_docker_image_tag)

In [11]:
dotenv.load_dotenv(dotenv_path=dotenv_file_path, override=True)

# docker_files_dir = os.path.join(*(os.getcwd(), docker_files_dir))

def create_docker_file(docker_file_dir, docker_build_dir, 
                       docker_image_name, docker_image_version, docker_repo_name):
    if docker_image_version=="":
        docker_image_version = 'latest'
    
    docker_file_name = 'Dockerfile'+ '_' + docker_image_name +'_'+ docker_image_version
#     docker_file_dir = os.path.join(*([os.getcwd(), docker_file_base_dir]))
    os.makedirs(docker_file_dir, exist_ok=True)
    docker_file_path = os.path.join(*([docker_file_dir]+[docker_file_name]))
    
#     docker_build_dir = os.path.join(*([os.getcwd(), docker_build_base_dir]))
    os.makedirs(docker_build_dir, exist_ok=True)

    docker_image_name_only = docker_image_name +':'+ docker_image_version
    docker_image_name = docker_repo_name + '.azurecr.io' + '/' + docker_image_name_only
    
    return_dict = {'docker_image_name': docker_image_name, 
            'docker_image_name_only': docker_image_name_only, 
            'docker_file_path': docker_file_path ,
            'docker_build_dir': docker_build_dir}
    [print(key, value) for key, value in return_dict.items()]

    return return_dict

docker_build_assets=create_docker_file(docker_files_dir, 
                                       control_plane_docker_build_dir,
                                       os.getenv('control_plane_docker_image_name'),
                                       os.getenv('control_plane_docker_image_tag'),
                                       os.getenv('CONTROL_PLANE_ACR'))
control_plane_docker_image_name=docker_build_assets['docker_image_name']
control_plane_docker_image_name_only=docker_build_assets['docker_image_name_only']
control_plane_docker_file_path=docker_build_assets['docker_file_path']
control_plane_docker_build_dir=docker_build_assets['docker_build_dir']


True

docker_image_name fwi01acr.azurecr.io/azure_ood:latest
docker_image_name_only azure_ood:latest
docker_file_path ./azure_ood_temp_01/docker_files/Dockerfile_azure_ood_latest
docker_build_dir control_plane_docker_build


###### sibling_docker_dotenv_file_path points to a host dir.  
This is different than pwd/os.getcwd() if this notebook runs in a container

In [12]:
dotenv.load_dotenv(dotenv_path=sibling_docker_dotenv_file_path)
os.getenv('DOCKER_CONTAINER_MOUNT_POINT')

True

'/datadrive01/prj/Azure-OnDemand/apps/devito'

In [13]:
!docker rm -f $az_cli_container

signed_in_az_cli__container01


#### Use mcr.microsoft.com/azure-cli to sign in
Pull mcr docker image, and be prepared to use and then save it

In [14]:
cli_base_command='(docker run '+ \
'-it '+ \
'--name ' + az_cli_container+ ' ' + \
'mcr.microsoft.com/azure-cli '+ \
'/bin/bash  -c ' 
internal_command = '"'+ \
'apk update && apk add --update --no-cache  openrc docker-cli docker; '+ \
'rc-update add docker boot; '+ \
' az login; '+ \
' az account set --subscription '+os.getenv('SUBSCRIPTION_ID')+ '; '+ \
' az account list -o table; '+\
'"; '+\
'docker commit '+az_cli_container + ' ' + signed_in_az_cli_image + ' ;'+ \
'docker rm -f '+ az_cli_container + ' ;'+ \
') '

cli_command = cli_base_command+internal_command

# cli_command

<a id='Azure_cli_login'></a>
##### 3. Login into Azure (interactively ) may be required in cell below
Save signed in az cli docker container to a local docker image and re-use it if this notebook is run again.

[Back](#user_input_requiring_steps) to list of repro steps.

###### Use local signed in az cli docker image, if exists.

In [15]:
cli_command_base='docker image inspect '+ \
signed_in_az_cli_image+ ' ' + \
 '>/dev/null 2>&1 && ' + \
'echo Found ' + signed_in_az_cli_image + ' docker image, will use it. ' + \
'|| ' +cli_command

# cli_command_base
!$cli_command_base

Found signed_in_az_cli_image docker image, will use it.


In [16]:
# mcr.microsoft.com/azure-cli container was started without rm option so it should show up as stopped
!docker container ls -a

CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                     NAMES
3b5620df7fdc        continuumio/miniconda3   "/bin/bash -c 'ls -l…"   33 minutes ago      Up 33 minutes       0.0.0.0:10002->8888/tcp   miniconda3_container02


<a id='az_cli_bash_script'></a>
##### 4. (Optional) Edit azure_ood settings (azure_ood crtresourcegroup, cluster name, SP) in az cli  .sh file below
Uses these environment variables: subscription_ID, azure_resources_prefix, azure_ood_secrets_file_path.   
[Back](#user_input_requiring_steps) to list of repro steps.

In [17]:
%%writefile $azure_ood_setup_resource_naming_script 
crtresourcegroup="${azure_resources_prefix}rsg"
crtstorageaccount="${azure_resources_prefix}sa"

azure_ood_slurm_cluster_name="${azure_resources_prefix}slrmclst001"
service_principal_name="${azure_resources_prefix}SP01"
location='southcentralus'

Overwriting ./azure_ood_temp_01/azure_ood_setup_resource_naming.sh


In [18]:
%%writefile $azure_ood_setup_script_step020 
# az login
az account set --subscription "$subscription_ID"
az account list --all --refresh -o table 

source "$azure_ood_setup_resource_naming_script"

echo "$crtresourcegroup"
echo "$crtstorageaccount"
# echo "$subscription_ID"
echo "$service_principal_name"

az storage account list --resource-group "$crtresourcegroup"  -o tsv

crt_aad_server_app_secret=$(az ad sp list --display-name "$service_principal_name" --query "[].appId" -o tsv)
if [ -z "$crt_aad_server_app_secret" ]; then
    az group create --name "$crtresourcegroup" --location "$location"
    az group update -n "$crtresourcegroup" --set tags.'alias'='ghiordan' tags.'project'='Azure_Open_OnDemand_Devito_SLURM'  tags.'expires'='2022_12_30'

    az acr create --resource-group "$crtresourcegroup" --name "${azure_resources_prefix}acr" --sku Basic
    # az acr list --resource-group "$crtresourcegroup" --query "[].{acrLoginServer:loginServer}" --output table
    az acr update -n "${azure_resources_prefix}acr" --admin-enabled true

    acr_username=$(az acr credential show -n "${azure_resources_prefix}acr" --query "username"  -o tsv)
    acr_password=$(az acr credential show -n "${azure_resources_prefix}acr" --query "passwords[0].value"  -o tsv)
    
    az storage account create -n "$crtstorageaccount" -g "$crtresourcegroup" -l "$location" --sku Standard_LRS --kind StorageV2
    sa_key=$(az storage account keys list -g "$crtresourcegroup" -n "$crtstorageaccount" --query [0].value -o tsv)
    #sa_conn_str=$(az storage account show-connection-string -n "${crtstorageaccount}" -g "$crtresourcegroup" --query connectionString --output tsv)


    #SP contributor role is not enough to provision a cluster?
    SP_output=$(az ad sp create-for-rbac \
        --name "$service_principal_name" \
        --role owner \
        --scopes /subscriptions/"${subscription_ID}"/resourceGroups/"${crtresourcegroup}")

    crt_aad_server_app_secret=$(echo $SP_output |python3 -c "import sys, json; print(json.load(sys.stdin)['password'])")
    # echo $crt_aad_server_app_secret

    crt_aad_server_app_id=$(az ad sp list --display-name $service_principal_name --query "[].appId" -o tsv)
    # echo $crt_aad_server_app_id

    crt_aad_tenant_id=$(az account show --subscription "${subscription_ID}" --query "tenantId")
    # echo $crt_aad_tenant_id

	cat <<-EOT > "${azure_ood_secrets_file_path}"
	AZURE_OOD_RESOURCES_PREFIX="${azure_resources_prefix}"
	AZURE_OOD_ACR_NAME="${azure_resources_prefix}acr"
	AZURE_OOD_ACR_USERNAME="${acr_username}"
	AZURE_OOD_ACR_PASSWORD="${acr_password}"
	AZURE_OOD_STORAGE_ACC_NAME="${crtstorageaccount}"
	AZURE_OOD_STORAGE_ACC_KEY="${sa_key}"
	AZURE_OOD_AAD_SP_NAME="${service_principal_name}"
	AZURE_OOD_AAD_SP_APP_SECRET='${crt_aad_server_app_secret}'
	AZURE_OOD_AAD_SP_APP_ID="${crt_aad_server_app_id}"
	AZURE_OOD_AAD_SP_TENANT_ID=${crt_aad_tenant_id}
	EOT
    
	cat <<-EOT > "${azure_ood_resources_file_path}"
	def get_azure_resorces():
	    azure_config  = dict();
	    azure_config['account_name'] = "${crtstorageaccount}"
	    azure_config['account_key'] = "${sa_key}"
	    return azure_config
	EOT

else
    echo "Service Principal exists! will not recreate it, and neither the rsg, ACR and storage account."
    # az ad sp delete --id "$service_principal_name" --only-show-errors
fi


Overwriting ./azure_ood_temp_01/azure_ood_setup_step020.sh


In [19]:
!docker rm -f $az_cli_container

Error: No such container: signed_in_az_cli__container01


In [20]:
#do 'docker run -it ' if !$cli_command response is not captured  
cli_base_command='docker run '+ \
'--rm '+ \
'--name ' + az_cli_container + ' ' + \
'-v '+ os.getenv('DOCKER_CONTAINER_MOUNT_POINT') +'/:/workspace:rw '+ \
'-v /var/run/docker.sock:/var/run/docker.sock '+ \
signed_in_az_cli_image + ' '+ \
'/bin/bash -c ' 

internal_command = '"'+ \
'export subscription_ID="'+ os.getenv('SUBSCRIPTION_ID') +'" \n '+ \
'export azure_resources_prefix="'+ azure_ood_resources_prefix +'" \n '+ \
'export azure_ood_secrets_file_path=\"'+ os.path.join(*((['/workspace']+[azure_ood_secrets_file_path]))) +'\" \n '+ \
'export azure_ood_resources_file_path=\"'+ os.path.join(*((['/workspace']+[azure_ood_resources_file_path]))) +'\" \n '+ \
'export azure_ood_setup_resource_naming_script=\"'+ os.path.join(*((['/workspace']+\
                                                                    [azure_ood_setup_resource_naming_script]))) +'\" \n '+ \
'/bin/bash ' + os.path.join(*(['/workspace']+[azure_ood_setup_script_step020])) +' '+ \
'"'
                              
cli_command = cli_base_command+internal_command
# cli_command

In [21]:
response = !$cli_command

In [22]:
# response

In [23]:
dotenv.load_dotenv(dotenv_path=azure_ood_secrets_file_path, override=True)
# os.getenv('AZURE_OOD_ACR_USERNAME')
# os.getenv('AZURE_OOD_ACR_PASSWORD')

True

###### Use secrets file created above to login into acr and pull docker images if available


In [24]:
!docker rm -f $az_cli_container

Error: No such container: signed_in_az_cli__container01


In [25]:
cli_base_command=' docker run ' +\
'-it '+\
'--name ' + az_cli_container+ ' ' + \
'-v '+ os.getenv('DOCKER_CONTAINER_MOUNT_POINT') +'/:/workspace:rw '+ \
'-v /var/run/docker.sock:/var/run/docker.sock '+ \
signed_in_az_cli_image+ ' '+\
'/bin/bash  -c ' 

internal_command = '"'+ \
' : az account list -o table; '+\
'az acr login --name '+os.getenv('CONTROL_PLANE_ACR')+ \
' --username '+os.getenv('CONTROL_PLANE_ACR_USERNAME')+ \
' --password ' + os.getenv('CONTROL_PLANE_ACR_PASSWORD')+'; '+\
'docker pull '+control_plane_docker_image_name+'; '+\
'"'

cli_command = cli_base_command+internal_command

# cli_command
!$cli_command

Login Succeeded
https://docs.docker.com/engine/reference/commandline/login/#credentials-store[0m
[0mError response from daemon: manifest for fwi01acr.azurecr.io/azure_ood:latest not found: manifest unknown: manifest tagged by "latest" is not found


In [26]:
%%writefile $control_plane_docker_file_path 

# https://hub.docker.com/r/microsoft/azure-cli/dockerfile   
FROM debian:latest 
MAINTAINER George Iordanescu <ghiordan@microsoft.com>

# os updates
RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends \
    ca-certificates curl wget apt-transport-https lsb-release gnupg \
    python3-pip \
    git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*  


#https://docs.docker.com/compose/install/
RUN curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" \
    -o /usr/local/bin/docker-compose && \
    chmod +x /usr/local/bin/docker-compose && \
    echo $(docker-compose --version)
#     ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose && \
#     curl -L https://raw.githubusercontent.com/docker/compose/1.25.5/contrib/completion/bash/docker-compose \
#     -o /etc/bash_completion.d/docker-compose

RUN pip3 install -U python-dotenv 

# https://docs.microsoft.com/en-us/cli/azure/install-azure-cli-apt?view=azure-cli-latest#no-package-for-your-distribution
# ENV AZ_CLI_REPO=stretch 
RUN echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $(lsb_release -sc) main" | \
    tee /etc/apt/sources.list.d/azure-cli.list && \
    curl -L https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    azure-cli  && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

#clone devito repo
RUN git clone https://github.com/devitocodes/devito.git && cd devito && git checkout c1c8caf
RUN git clone https://github.com/devitocodes/daks.git
    
RUN git clone https://github.com/Azure/azurehpc.git && \
    azurehpc/install.sh &&\
    chmod -R ugo=rwx azurehpc 
    
RUN git clone https://github.com/edwardsp/Azure-OnDemand.git

ENV PATH=azurehpc/bin:$PATH

Overwriting ./azure_ood_temp_01/docker_files/Dockerfile_azure_ood_latest


In [27]:
cli_command='docker build -t '+ control_plane_docker_image_name + \
' -f ' + control_plane_docker_file_path + \
' ' + control_plane_docker_build_dir + ' ' +\
docker_build_no_cache  + ' ' 

cli_command

'docker build -t fwi01acr.azurecr.io/azure_ood:latest -f ./azure_ood_temp_01/docker_files/Dockerfile_azure_ood_latest control_plane_docker_build  '

In [28]:
! $cli_command

Sending build context to Docker daemon  3.119kB
Step 1/11 : FROM debian:latest
 ---> ae8514941ea4
Step 2/11 : MAINTAINER George Iordanescu <ghiordan@microsoft.com>
 ---> Using cache
 ---> 0fd1613f99f6
Step 3/11 : RUN apt-get update --fix-missing && apt-get install -y --no-install-recommends     ca-certificates curl wget apt-transport-https lsb-release gnupg     python3-pip     git &&     apt-get clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 7cd7b7dfdd99
Step 4/11 : RUN curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)"     -o /usr/local/bin/docker-compose &&     chmod +x /usr/local/bin/docker-compose &&     echo $(docker-compose --version)
 ---> Using cache
 ---> fef982a946b0
Step 5/11 : RUN pip3 install -U python-dotenv
 ---> Using cache
 ---> 055eda2ff394
Step 6/11 : RUN echo "deb [arch=amd64] https://packages.microsoft.com/repos/azure-cli/ $(lsb_release -sc) main" |     tee /etc/apt/sources.list.d/azure-cli.

In [29]:
# docker run -it \
# --rm --name azure_ood_container01 \
# -v /datadrive01/prj/Azure-OnDemand/apps/:/workspace:rw \
# -v /usr/bin/docker:/usr/bin/docker \
# -v /var/run/docker.sock:/var/run/docker.sock \
# fwi01acr.azurecr.io/azure_ood:latest \
# /bin/bash