Copyright (c) Microsoft Corporation. All rights reserved.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/tutorials/regression-part2-automated-ml.png)

# Tutorial: Catboost

Import the necessary packages. The Open Datasets package contains a class representing each data source (`NycTlcGreen` for example).

In [1]:
from catboost import CatBoostClassifier

In [2]:

# Initialize data
cat_features = [0, 1]
train_data = [["a", "b", 1, 4, 5, 6],
              ["a", "b", 4, 5, 6, 7],
              ["c", "d", 30, 40, 50, 60]]
train_labels = [1, 1, -1]
eval_data = [["a", "b", 2, 4, 6, 8],
             ["a", "d", 1, 4, 50, 60]]

# Initialize CatBoostClassifier
model = CatBoostClassifier(iterations=2,
                           learning_rate=1,
                           depth=2)
# Fit model
model.fit(train_data, train_labels, cat_features)
# Get predicted classes
preds_class = model.predict(eval_data)
# Get predicted probabilities for each class
preds_proba = model.predict_proba(eval_data)
# Get predicted RawFormulaVal
preds_raw = model.predict(eval_data, prediction_type='RawFormulaVal')

0:	learn: 0.5800330	total: 48.6ms	remaining: 48.6ms
1:	learn: 0.4935379	total: 49.1ms	remaining: 0us


## Configure workspace


Create a workspace object from the existing workspace. A [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is a class that accepts your Azure subscription and resource information. It also creates a cloud resource to monitor and track your model runs. `Workspace.from_config()` reads the file **config.json** and loads the authentication details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial.

In [3]:
from azureml.core.workspace import Workspace
ws = Workspace.from_config()

ws

Workspace.create(name='cla-azure-ml-workspace', subscription_id='5da07161-3770-4a4b-aa43-418cbbb627cf', resource_group='cla-azure-ml-dev-rg')

Here we will save into a register folder the data set that we are going to register for later use. Notice that we have now created a new folder that holds the dataset we would like to use.

In [4]:
user = 'memasanz'
from azureml.core.experiment import Experiment
experiment = Experiment(ws, user + "catboost-experiment")

### Create Training Script

In [5]:
import os
script_folder = os.path.join(os.getcwd(), "train")
print(script_folder)
os.makedirs(script_folder, exist_ok=True)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/mm-cla-compute/code/Users/memasanz/AMLNew2/AMLHack/02_nyc_taxi_linear_regression_python/train


### TODO: ADD PARAMETER FOR DATASET NAME

Below be use to update the train.py file to **write your user name**

This train script will create a trained model that has been saved to your run outputs folder.

In [15]:
%%writefile $script_folder/train.py

import os
import sys
import argparse
#import joblib
#import pandas as pd

from catboost import CatBoostClassifier

from azureml.core import Run
from azureml.core.run import Run
from azureml.core import Dataset
from azureml.core import Workspace




def getRuntimeArgs():
    parser = argparse.ArgumentParser()
    parser.add_argument('--data-path', type=str)
    args = parser.parse_args()
    return args


def main():
    args = getRuntimeArgs()
    run = Run.get_context()

    
    dataset_dir = './dataset/'
    os.makedirs(dataset_dir, exist_ok=True)
    ws = run.experiment.workspace
    print(ws)

    #copying to "outputs" directory, automatically uploads it to Azure ML
    output_dir = './outputs/'
    os.makedirs(output_dir, exist_ok=True)
    
    model = model_train()
    
    model_name = os.path.join(output_dir, 'cat-model')
    model.save_model(model_name)

def model_train():

    cat_features = [0, 1]
    train_data = [["a", "b", 1, 4, 5, 6],
                  ["a", "b", 4, 5, 6, 7],
                  ["c", "d", 30, 40, 50, 60]]
    train_labels = [1, 1, -1]
    eval_data = [["a", "b", 2, 4, 6, 8],
                 ["a", "d", 1, 4, 50, 60]]

    # Initialize CatBoostClassifier
    model = CatBoostClassifier(iterations=2,
                               learning_rate=1,
                               depth=2)
    # Fit model
    model.fit(train_data, train_labels, cat_features)
    # Get predicted classes
    #preds_class = model.predict(eval_data)
    ## Get predicted probabilities for each class
    #preds_proba = model.predict_proba(eval_data)
    ## Get predicted RawFormulaVal
    #preds_raw = model.predict(eval_data, prediction_type='RawFormulaVal')
    
    #need to save model.
    return model

if __name__ == "__main__":
    main()

Overwriting /mnt/batch/tasks/shared/LS_root/mounts/clusters/mm-cla-compute/code/Users/memasanz/AMLNew2/AMLHack/02_nyc_taxi_linear_regression_python/train/train.py


### Create your compute

In [16]:
from azureml.core.compute import AmlCompute, ComputeTarget
from azureml.exceptions import ComputeTargetException
print(user)
compute_name = user + "-clus"
print(compute_name)

# checks to see if compute target already exists in workspace, else create it
try:
    compute_target = ComputeTarget(workspace=ws, name=compute_name)
except ComputeTargetException:
    config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D13",
                                                   min_nodes=0, 
                                                   max_nodes=1,
                                                   idle_seconds_before_scaledown=550)

    compute_target = ComputeTarget.create(workspace=ws, name=compute_name, provisioning_configuration=config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=40)

memasanz
memasanz-clus


### Create your Run Config

In [17]:
from azureml.core.conda_dependencies import CondaDependencies
dependencies = CondaDependencies()
dependencies.add_pip_package('catboost')


#Create a Run Configuration and add this to your pythonscriptstep
from azureml.core.runconfig import RunConfiguration
run_config = RunConfiguration()
run_config.target = compute_name
run_config.environment.python.conda_dependencies = dependencies
run_config.environment.docker.enabled = True

### Select your training script and create a ScriptRunConfig
A ScriptRunConfig object packages together the environment from a RunConfiguration along with your model training script. This object can then be submitted to your experiment and model training will commence on your remote cluster. 

In this sample, we have put the training script in a separate directory which is targeted for training. This separation allows for a snapshot of just the relevant pieces of code to be stored with the Run in your AML workspace. The <code>train.py</code> file here accesses your registered datasets, trains a model, saves a pickled version, and registers the trained model.

ScriptRunConfiguration documentation: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [18]:
from azureml.core import ScriptRunConfig
src = ScriptRunConfig(source_directory='./train', script='train.py')
src.run_config = run_config

### Submit the training run
Here, the ScriptRunConfiguration is submitted as a run which triggers your model training operation. The cluster you defined above is automatically spun up and the training procedures outlined in ./train/train.py begin. That file contains all the code needed to train and save a pickled version of your trained model. The code below will display the output logs from your training job - you can also monitor training progress inside AML studio.

Note: As you iterate on your model, you should modify the code inside ./train/train.py. The model parameters there were adjusted for rapid training and should not be used for a production scenario.

In [19]:
from azureml.widgets import RunDetails
run = experiment.submit(config=src)
RunDetails(run).show()
run.wait_for_completion(show_output=True)

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', '…

RunId: memasanzcatboost-experiment_1602782028_0c1f12e0
Web View: https://ml.azure.com/experiments/memasanzcatboost-experiment/runs/memasanzcatboost-experiment_1602782028_0c1f12e0?wsid=/subscriptions/5da07161-3770-4a4b-aa43-418cbbb627cf/resourcegroups/cla-azure-ml-dev-rg/workspaces/cla-azure-ml-workspace

Streaming azureml-logs/55_azureml-execution-tvmps_044c1dc3269a50a131f70c61bf6f05e972a51970e1bc415626bca78fb2436b54_d.txt

2020-10-15T17:14:03Z Executing 'Copy ACR Details file' on 10.0.0.4
2020-10-15T17:14:04Z Copy ACR Details file succeeded on 10.0.0.4. Output: 
>>>   
>>>   
2020-10-15T17:14:04Z Starting output-watcher...
2020-10-15T17:14:04Z IsDedicatedCompute == True, won't poll for Low Pri Preemption
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_e7a7c331c84bc233f117c47c21af85b5
Digest: sha256:fc19cbc5e206227c161316c1207acb61e2759ece579f8a5f76baa6dcc4510eb8
Status: Image is up to date for 5e00fd35a888491e9f7c3c9bc695a65a.azurecr.io/azureml/azureml_e


Streaming azureml-logs/65_job_prep-tvmps_044c1dc3269a50a131f70c61bf6f05e972a51970e1bc415626bca78fb2436b54_d.txt


Streaming azureml-logs/75_job_post-tvmps_044c1dc3269a50a131f70c61bf6f05e972a51970e1bc415626bca78fb2436b54_d.txt

Entering job release. Current time:2020-10-15T17:14:26.561919
Starting job release. Current time:2020-10-15T17:14:28.257540
Logging experiment finalizing status in history service.
[2020-10-15T17:14:28.269752] job release stage : upload_datastore starting...
[{}] job release stage : start importing azureml.history._tracking in run_history_release.
Starting the daemon thread to refresh tokens in background for process with pid = 202
[2020-10-15T17:14:28.270811] job release stage : copy_batchai_cached_logs starting...
[2020-10-15T17:14:28.270893] job release stage : copy_batchai_cached_logs completed...
[2020-10-15T17:14:28.271449] job release stage : execute_job_release starting...
[2020-10-15T17:14:28.286287] Entering context manager injector.
[2020-10-15T17:14:

{'runId': 'memasanzcatboost-experiment_1602782028_0c1f12e0',
 'target': 'memasanz-clus',
 'status': 'Completed',
 'startTimeUtc': '2020-10-15T17:14:03.481852Z',
 'endTimeUtc': '2020-10-15T17:14:38.403473Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '4471ce05-6646-4939-9678-ea845440fc02',
  'azureml.git.repository_uri': 'https://github.com/memasanz/AMLHack.git',
  'mlflow.source.git.repoURL': 'https://github.com/memasanz/AMLHack.git',
  'azureml.git.branch': 'main',
  'mlflow.source.git.branch': 'main',
  'azureml.git.commit': '3e2ab0e1b300d1547a84c30cc9a86e36c7cdb336',
  'mlflow.source.git.commit': '3e2ab0e1b300d1547a84c30cc9a86e36c7cdb336',
  'azureml.git.dirty': 'True',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': [],
  'useAbsolutePath': False,
  'arguments': [],
  'sourceDire

In [20]:
import os
script_folder = os.path.join(os.getcwd(), "score")
print(script_folder)
os.makedirs(script_folder, exist_ok=True)

/mnt/batch/tasks/shared/LS_root/mounts/clusters/mm-cla-compute/code/Users/memasanz/AMLNew2/AMLHack/02_nyc_taxi_linear_regression_python/score


In [30]:
%%writefile $script_folder/score.py

import json
import os
import numpy as np
import pandas as pd
import joblib
from catboost import CatBoostClassifier, Pool

def init():
    global model
    
    # Update to your model's filename
    model_filename = "cat-model"

    # AZUREML_MODEL_DIR is injected by AML
    model_dir = os.getenv('AZUREML_MODEL_DIR')

    print("Model dir:", model_dir)
    print("Model filename:", model_filename)
    
    model_path = os.path.join(model_dir, model_filename)

    # Replace this line with your model loading code
    #model = joblib.load(model_path)
    model = CatBoostClassifier()

    model.load_model(model_path)


def run(data):
    try:
        #input_df = pd.DataFrame(data)
        #proba = model.predict(input_df)
        result = model.predict(eval_data)
        #result = {"predict_proba": proba.tolist()}
        return result
    except Exception as e:
        error = str(e)
        return error

Overwriting /mnt/batch/tasks/shared/LS_root/mounts/clusters/mm-cla-compute/code/Users/memasanz/AMLNew2/AMLHack/02_nyc_taxi_linear_regression_python/score/score.py


In [31]:
from azureml.core.model import Model
model_name = user + 'catboost'
trained_model = run.register_model(model_path='outputs/cat-model', model_name=model_name, tags={'Model Type': 'catboost'})

In [32]:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig

env = Environment('tutorial-env')
cd = CondaDependencies.create(pip_packages=['azureml-dataprep[pandas,fuse]>=1.1.14', 'azureml-defaults', 'catboost'], conda_packages = ['scikit-learn==0.22.1'])

env.python.conda_dependencies = cd

# Register environment to re-use later
env.register(workspace = ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/intelmpi2018.3-ubuntu16.04:20200821.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "tutorial-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"

### Model Deployment

 You can register this model and deploy it to an endpoint by defining an inferencing configuration and providing a scoring script. Here the model is deployed to an Azure Container Instance which provides an API endpoint that can be used to make predictions with your model. We utilize an authentication strategy here which requires a key to be provided with any requests sent to the API. These keys can be rotated as needed and allow only approved users to access your endpoint.
 
 Azure Container Instance documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-azure-container-instance

Azure Container Instances are typically lower cost and useful for dev/test purposes during model development, though we recommend deploying to an Azure Kubernetes Service cluster for production purposes.

Below, an InferenceConfig is created which uses the same python dependencies that were used during model training, and references the scoring script located at <code>./score/score.py</code>. This script loads the trained model upon initialization, and facilitates transforming data submitted to the API endpoint, making predictions with the model, and returning formatted results to the user.

In [33]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "sample",  "method" : "catboost"}, 
                                               description='catboost')

In [34]:
model_name

'memasanzcatboost'

### Register your model and deploy to an authenticated endpoint 

Model registration documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-and-where

In [35]:
%%time
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, 'memasanzcatboost')


#myenv = Environment.get(workspace=ws, name="tutorial-env", version="1")
myenv = Environment.get(workspace=ws, name="tutorial-env", version=None)
inference_config = InferenceConfig(source_directory='./score', entry_script="score.py", environment=myenv)

service = Model.deploy(workspace=ws, 
                       name=model_name +'-srv2', 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

WebserviceException: WebserviceException:
	Message: Service memasanzcatboost-srv with the same name already exists, please use a different service name or delete the existing service.
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Service memasanzcatboost-srv with the same name already exists, please use a different service name or delete the existing service."
    }
}

In [36]:
print('Scoring API available at: {}'.format(service.serialize()['scoringUri']))

Scoring API available at: None


### Deploy to AKS

In [45]:
from azureml.core.webservice import AksWebservice, Webservice
from azureml.core.model import Model
from azureml.core.compute import AksCompute, ComputeTarget

# Uses the specific FPGA enabled VM (sku: Standard_PB6s)
# Standard_PB6s are available in: eastus, westus2, westeurope, southeastasia
prov_config = AksCompute.provisioning_configuration(vm_size = "Standard_D3_v2",
                                                       agent_count = 1,
                                                       location = "eastus")

aks_name = 'my-aks-mm'
   # Create the cluster
aks_target = ComputeTarget.create(workspace = ws,
                                     name = aks_name,
                                     provisioning_configuration = prov_config)

    

aks_target = AksCompute(ws,aks_name)

#wait_for_provisioning_completion(aks_target, show_output = TRUE)



In [46]:
# If deploying to a cluster configured for dev/test, ensure that it was created with enough
# cores and memory to handle this deployment configuration. Note that memory is also used by
# things such as dependencies and AML components.
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)
service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config, aks_target)
service.wait_for_deployment(show_output = True)
print(service.state)
print(service.get_logs())

ERROR - Received bad response from Model Management Service:
Response Code: 400
Headers: {'Date': 'Thu, 15 Oct 2020 19:28:07 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '60fbdb557e6144d18d5bcb834c6e8150', 'x-ms-client-session-id': 'e8cb432e-2934-461c-8edb-f47b7a001a8a', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.200', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}
Content: b'{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"ComputeResourceNotCreated","message":"Compute resource with Id: my-aks-mm is not in Succeeded state. Compute provisioning state: Failed"}],"correlation":{"RequestId":"60fbdb557e6144d18d5bcb834c6e8150"}}'



WebserviceException: WebserviceException:
	Message: Received bad response from Model Management Service:
Response Code: 400
Headers: {'Date': 'Thu, 15 Oct 2020 19:28:07 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '60fbdb557e6144d18d5bcb834c6e8150', 'x-ms-client-session-id': 'e8cb432e-2934-461c-8edb-f47b7a001a8a', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.200', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}
Content: b'{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"ComputeResourceNotCreated","message":"Compute resource with Id: my-aks-mm is not in Succeeded state. Compute provisioning state: Failed"}],"correlation":{"RequestId":"60fbdb557e6144d18d5bcb834c6e8150"}}'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from Model Management Service:\nResponse Code: 400\nHeaders: {'Date': 'Thu, 15 Oct 2020 19:28:07 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '60fbdb557e6144d18d5bcb834c6e8150', 'x-ms-client-session-id': 'e8cb432e-2934-461c-8edb-f47b7a001a8a', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.200', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload'}\nContent: b'{\"code\":\"BadRequest\",\"statusCode\":400,\"message\":\"The request is invalid.\",\"details\":[{\"code\":\"ComputeResourceNotCreated\",\"message\":\"Compute resource with Id: my-aks-mm is not in Succeeded state. Compute provisioning state: Failed\"}],\"correlation\":{\"RequestId\":\"60fbdb557e6144d18d5bcb834c6e8150\"}}'"
    }
}