### Copyright (C) Microsoft Corporation.  
  
## Develop scoring script in a docker container
Purpose: Before operationalization (o16n), we show how to develop and test the containerized scripts (o16n python script that invokes the user provided R scoring script) using AML SDK experimentation framework.  
  
#### Authors

* **George Iordanescu** [Microsoft AI CAT](https://github.com/Microsoft/AMLSDKRModelsOperationalization) *Initial work*  
* See also the list of [contributors](https://github.com/Microsoft/AMLSDKRModelsOperationalization) who participated in this project.  
  
Here we use the experimentation (__e13n__) infrastructure in AML SDK to build a docker image that tests score.py script. This docker image alows running R code from python and is the closest proxy to the operationalization (__o16n__) docker image one will get in the next notebooks, where we operationalize the R model. The created docker image is __not__ identical to the o16n image because AML SDK does not allow BYOD (bring your own docker) scenario yet.

The score.py script is written in python, but it has an R session created via rpy2. The R model is run via four interactions with an R session:
 - The init() function in score.py passes the R model file name to the R session which then loads the R model.  
 - The run() function in score.py passes the jsoned data to be scored to the R session with the model loaded above. jsoned data to be scored are then used with a full R scoring script using rpy2.robjects.r().  

Main steps:  
* Run score.py script (and the real R scoring script) in a docker image.
* Create artifacts for deployment:   
  - scoring script file in the project folder (variable __score_script_filename__)  
  - conda dependency file (adds R and desired packages to the base docker image)

* This covers strictly post e13n steps, so it assumes the existence of R model file (saved as an rds file on disk). This will be registered here so that we can use Model.get_model_path() function inside the init() funstion of the scoring script. See [azureml.core.model.Model](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#get-model-path-model-name--version-none---workspace-none-) for details about encapsulating the model path in the 016n docker image.

In [None]:
# Allow multiple displays per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#### Check core SDK version number

In [None]:
import azureml.core
import platform
import sys, os
from azureml.core import Workspace
from azureml.core import Experiment
from azureml.core.compute import ComputeTarget, RemoteCompute 
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import Run
from azureml.core import ScriptRunConfig

experiment_name = 'test_R_scoring_script'

In [None]:
# Check core SDK version number, os info and current wd
print("SDK version:", azureml.core.VERSION)
platform.platform()
# os.getcwd()

In [None]:
# import utility functions like project config params

def add_path_to_sys_path(path_to_append):
    if not (any(path_to_append in paths for paths in sys.path)):
        sys.path.append(path_to_append)

auxiliary_files_dir = os.path.join(*(['.', 'src']))

paths_to_append = [os.path.join(os.getcwd(), auxiliary_files_dir)]
[add_path_to_sys_path(crt_path) for crt_path in paths_to_append]

import o16n_regular_ML_R_models_utils
prj_consts = o16n_regular_ML_R_models_utils.R_models_operationalization_consts()

### use existing dotenv file (created in previous notebook) to load sensitive info

In [None]:
%load_ext dotenv
dotenv_file_path = os.path.join(*(prj_consts.DOTENV_FILE_PATH))

# #show .env file path
# dotenv_file_path

#### Define filename and directory variables


In [None]:
r_model_file_name = prj_consts.R_MODEL_FILE_NAME
r_model_AML_name = prj_consts.R_MODEL_AML_NAME
conda_dependencies_filename = prj_consts.R_MODEL_CONDA_DEPENDENCIES_FILE_NAME
score_script_filename = prj_consts.SCORE_SCRIPT_FILE_NAME

workspace_config_dir = os.path.join(*(prj_consts.AML_WORKSPACE_CONFIG_DIR))
workspace_config_file = prj_consts.AML_WORKSPACE_CONFIG_FILE_NAME
# workspace_config_dir

experiment_dir = os.path.join(*(prj_consts.AML_EXPERIMENT_DIR))
crt_dir = os.path.join(os.getcwd(), os.path.join(*([experiment_dir])))
os.makedirs(crt_dir, exist_ok=True)

# make sure exp name is within required limits
# len(experiment_name)

R_artifacts_dir = os.path.join(os.getcwd(), os.path.join(*(prj_consts.R_MODEL_DIR)))
print('Will o16n R model from directory {}'.format(R_artifacts_dir))

#### Use the AML SDK workspace (ws) created and documented as a json file in previous notebook

Initialize a workspace object from persisted configuration.

In [None]:
ws = Workspace.from_config(path=os.path.join(os.getcwd(), 
                                             os.path.join(*([workspace_config_dir, 'aml_config', workspace_config_file]))))
# print(ws.name, ws.resource_group, ws.location, ws.subscription_id[0], sep = '\n')

## Register Model

We develop here the scoring (operationalization) script that will use a pre-trained model. Operationalized models can be local files or registered models in the Azure. We show here the latter way. We __can__ access registered models even when using the AML SDK experimentation framework.

You can add tags and descriptions to your models. Note you do not need to have the r model .rds file in the current directory.  The below call registers that file in the workspace as a model with name defined by __r_model_AML_name__ variable.  
  
Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric.

In [None]:
#show model exists at the expected location 
!ls -l {os.path.join(R_artifacts_dir, r_model_file_name)}

In [None]:
from azureml.core.model import Model
model_tags = {'language': 'R', 'type': 'TC_kSVM'}
if not Model.list(ws, tags=model_tags):
    model = Model.register(model_path = os.path.join(R_artifacts_dir, r_model_file_name),
                           model_name = r_model_AML_name,
                           tags = model_tags,
                           description = 'my R model',
                           workspace = ws)
    
    print(model.name, model.description, model.version, model.tags, sep = '\t')

You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers.   

For demo purposes, we choose v1 as the model used for deployment.

In [None]:
best_r_model = None

for m in Model.list(ws, tags={'type': 'TC_kSVM'}):
# for m in r_models:
    print("Name:", m.name,"\tVersion:", m.version, "\tDescription:", m.description, m.tags)
    if ((m.name==r_model_AML_name) and (m.version==1) and (m.description=='my R model')):
        best_r_model = m

In [None]:
print(best_r_model.name, best_r_model.description, best_r_model.version, sep = '\t')

## Create Experiment


In [None]:
exp = Experiment(workspace = ws, name = experiment_name)

## Create info for the VM compute target

Attach a remote Linux VM. This is usually used as a remote docker commpute target for experimentation, but we are using it here to test our dockerized score script used for deploymnet. 
Create a Linux DSVM in Azure. Make sure you use the Ubuntu flavor, NOT CentOS.      

In [None]:
compute_target_name = 'ghiordanXRgpuvm'

In [None]:
%dotenv  $dotenv_file_path

attach_config = RemoteCompute.attach_configuration(address=os.getenv('COMPUTE_CONTEXT_VM_FQDN'),
                                                   ssh_port=os.getenv('COMPUTE_CONTEXT_VM_SSH_PORT'),
                                                   username=os.getenv('COMPUTE_CONTEXT_VM_USER_NAME'),
                                                   password=os.getenv('COMPUTE_CONTEXT_VM_PWD')
                                                   # If using ssh key
                                                   #private_key_file="path_to_a_file",
                                                   #private_key_passphrase="some_key_phrase"
                                                  )
attached_dsvm_compute = ComputeTarget.attach(workspace=ws, name=compute_target_name, attach_configuration=attach_config)

attached_dsvm_compute.wait_for_completion(show_output=True)   

In [None]:
# see if the compute target exists in the workspace

from azureml.core.compute import DsvmCompute

for crt_dsvm in DsvmCompute.list(ws):
    if (compute_target_name==crt_dsvm.name):    
        print(crt_dsvm.name, crt_dsvm.type, crt_dsvm.address)
    else:
        print(crt_dsvm.name, crt_dsvm.type)

## Create scoring script

Use `%%writefile` magic to write o16n `score.py` file  that embeds the user-prvided R scoring script.

In [None]:
%%writefile {os.path.join(experiment_dir, score_script_filename)} 


import pickle
import json
from azureml.core.model import Model
import rpy2
import rpy2.robjects as robjects
import timeit
import logging

R_MODEL_AML_NAME = 'trained_r_model'


def init():
    from rpy2.rinterface import R_VERSION_BUILD
    print('rpy2 version {};  R version {}'.format(rpy2.__version__, R_VERSION_BUILD))
    
    print('R model AML name: {}'.format(Model.get_model_path(model_name=R_MODEL_AML_NAME)))
    
    global model
    # note here "best_model" is the name of the model registered under the workspace
    # this call should return the path to the model.pkl file on the local disk.
    model_path = Model.get_model_path(model_name =  R_MODEL_AML_NAME)
    # deserialize the model file back into a sklearn model
    robjects.globalenv['model_path'] = model_path    
    # model_path = robjects.StrVector( 'ksvm_model.rds')
    robjects.r('''
            format_proc_time <- function(proc_time_diff){
                
                as.data.frame(t(as.matrix(format(round(proc_time_diff*1000, 2), nsmall = 2))))[, 
                                                            c('user.self', 'sys.self', 'elapsed')]
            }
            library(kernlab)
            library(jsonlite)
            svm_model = readRDS({model_path})
            ''')
    print('AML o16n init() function: SVM model loaded.')

# note you can pass in multiple rows for scoring
def run(aml_jsoned_data):
    logger = logging.getLogger("AML_o16n_run_function")
#     print('Entering run() function')
    try:
        start_time = timeit.default_timer()
#         data = json.loads(raw_data)['data']
        data = json.loads(aml_jsoned_data)['data']
        robjects.globalenv['r_data_to_score'] = data  
        python_to_R_time = timeit.default_timer()
        r_messages = robjects.r('''
                start_time_r = proc.time()
                
                r_data_to_score=jsonlite::fromJSON(r_data_to_score[[1]])
                json_to_df_time_r = proc.time()
                
                scores = kernlab::predict(svm_model,r_data_to_score, type = "p")
                end_time_r = proc.time()
                
                # report total time and json to df time
                time_df = rbind(format_proc_time(end_time_r - start_time_r),
                                format_proc_time(json_to_df_time_r - start_time_r))
                rownames(time_df)=c('all_r_time','json_to_df_time')    
                
                # combine scores and time dataframes in a list
                returned_list = list(as.data.frame(scores),time_df)
                names(returned_list)=c('r_scores', 'r_times')
                
                scores = jsonlite::toJSON(returned_list)
                #print('Exiting R.')
                ''')
        before_R_to_python_time = timeit.default_timer()
        
        jsoned_scores = (robjects.r['scores'])[0]
        end_time = timeit.default_timer()
        
#         logger.info("Predictions: {0}".format(jsoned_scores))
#         print('Exiting run() function')
        return json.dumps({'python_scores': jsoned_scores, 
                           'python_times': json.dumps(
                               {'all_p_time':'{} ms'.format(round((end_time-start_time)*1000, 2)),
                                           'python_to_R_time':'{} ms'.format(round((python_to_R_time-start_time)*1000, 2)),
                                           'R_to_python_time':'{} ms'.format(round((end_time-before_R_to_python_time)*1000, 2))}
                           )
                          })

    except Exception as e:
        result = str(e)
        return json.dumps({"AML o16n run() function: error": result})
    
def main():
    import numpy as np
    import pandas as pd
    
    n_samples = 100

    raw_data = 2 * np.random.random_sample((n_samples, 2)) - 1
    aml_jsoned_data =  json.dumps({'data': json.dumps(raw_data.tolist())})
  
    init()
    response = run(aml_jsoned_data)
#     print(json.loads(response))
#     print( json.loads(json.loads(response)['python_scores']) )
    
    print( pd.DataFrame.from_records(json.loads(json.loads(response)['python_scores'])['r_scores']) )
    print( pd.DataFrame.from_records(json.loads(json.loads(response)['python_scores'])['r_times']) )
    for k, v in json.loads(json.loads(response)['python_times']).items():
        print(v, k)

    print('Exited main() function')
    
if __name__== "__main__":
    main()
    

## Configure a Docker run with new conda environment on the VM  
You can execute in a Docker container in the VM. If you choose this route, you don't need to install anything on the VM yourself. Azure ML execution service will take care of it for you. You can also build a custom Docker image, and execute script in it without building a new conda environment. 

### Configure a run using a custom Docker image & user-managed environment

In [None]:
run_config = RunConfiguration(framework = "python")
run_config.target = attached_dsvm_compute.name

# Use Docker in the remote VM
run_config.environment.docker.enabled = True

run_config.environment.docker.base_image = 'continuumio/miniconda3:4.5.12'

# Ask system to provision a new one based on the conda_dependencies.yml file
run_config.environment.python.user_managed_dependencies = False

run_config.environment.docker.gpu_support = False
run_config.environment.docker.shared_volumes

#### Below is the crux of the R models o16n project:
> Instead of install.packages() using a live R session, we use a conda env .yml file to conda and pip install R and R (and python) packages to a conda environment that will run in our docker container!

In [None]:
# create the conda env yml file

conda_dep = CondaDependencies()

def add_conda_items(list_of_items, item_add_method):
    for current_item in list_of_items:
        item_add_method(current_item)
    
add_conda_items(['3.7.0'], getattr(conda_dep, 'set_python_version')) #'3.6.5' 
add_conda_items(['r', 'conda-forge', 'anaconda'], getattr(conda_dep, 'add_channel'))
add_conda_items(['r-base', 'r-proc', 'r-jsonlite', 'r-kernlab', 'rpy2', 'pandas', 'gfortran_linux-64'], 
                getattr(conda_dep, 'add_conda_package')) #'rpy2==2.8.6'
# add_conda_items(['some_pip_installable_R_package'], getattr(conda_dep, 'add_pip_package'))

#### For demo purposes we also show how a generic (i.e. not SDK created) conda env file can be used in SDK

First show the content of the conda dep object above, so that we can create a clone of it.

In [None]:
# conda_dep.serialize_to_string()
conda_dep.save_to_file(base_directory = os.getcwd() , 
                       conda_file_path=os.path.join(*[experiment_dir, conda_dependencies_filename]))
! cat {os.path.join(os.getcwd(), os.path.join(*[experiment_dir, conda_dependencies_filename]))}

In [None]:
not_SDK_created_conda_env_file = 'not_SDK_created_conda_env_file.yml'

In [None]:
%%writefile ./{not_SDK_created_conda_env_file}

name: ml_conda_env2

channels:
- r
- conda-forge
- anaconda

dependencies:
- python=3.7.0
- r-base
- r-proc
- r-jsonlite
- r-kernlab
- rpy2
- pandas
- gfortran_linux-64
- pip:
    # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults

Now, one can use either of the commands below to set the conda dependencies either from a yml file, or from the CondaDependencies conda_dep object created in memory above 

In [None]:
# run_config.environment.python.conda_dependencies = CondaDependencies(not_SDK_created_conda_env_file)
run_config.environment.python.conda_dependencies = conda_dep

### Submit the Experiment
Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container.

## Configure & Run

In [None]:
src = ScriptRunConfig(source_directory = experiment_dir, script = score_script_filename, run_config = run_config)
run = exp.submit(src)

run

In [None]:
run.get_portal_url()

#### Jupyter widget
Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.

In [None]:
from azureml.widgets import RunDetails
RunDetails(run).show()

#### Get log results upon completion
Scoring script runs in the background. You can use wait_for_completion to block and wait until all data is cored before running more code.

In [None]:
run.wait_for_completion(show_output = False)

In [None]:
# # to recover a previous run
# run = Run(exp, 'runID')

# to get more details
run.get_details_with_logs()

#### We are now ready for deployment. In the next notebook we will package the scoring script in an o16n image and deploy it as web service on an Azure Container Instance and an AKS cluster.

In [None]:
!jupyter nbconvert --to html 010_RegularR_RealTime_test_score_in_docker.ipynb