# Azure Machine Learning (Python SDK) による機械学習プロセス

## コンテンツ
- ワークスペース (Workspace) への接続
- データセット (Dataset) の登録
- 環境 (Environments) の登録
- コンピューティングクラスター (Compute Clusters) の作成
- モデル学習の実験 (Experiments)
- モデル登録 (Models)
- 推論環境の作成 (Deployments)
- エンドポイントの利用 (Endpoint)

## Workspace への接続
作業環境から Azure Machine Learning Workspace へ接続を行います。

#### Azure Machine Learning Studio
[ml.azureml.com](ml.azurem.com) にアクセスします。Python SDK を中心に作業される場合にも Azure Machine Learning Studio を併用することが多いです。

<img src="docs/images/azureml-workspace.png" width=500>


#### Python SDK
クライアント環境の Python 環境にインストールした Azure ML Python SDK を用いて Azure Machine Learning Workspace に接続します。

In [7]:
# Compute Instance を利用する場合
from azureml.core import Workspace
ws = Workspace.from_config()

In [1]:
# # その他の任意のクライアント環境を利用する場合
# from azureml.core import Workspace

# ws = Workspace.get(
#     name='name',
#     subscription_id='subscription_id',
#     resource_group='resource_group',
# )

## データセット (Dataset) の登録
Azure のストレージやデータベースに格納されているデータをデータセット (Dataset) として登録します。

### Azure Machine Learning Studio
作業端末に CSV をダウンロードして、データセット (Dataset) として登録します。

<img src="docs/images/azureml-dataset.png" width=500>


#### Python SDK
データソースへの接続情報を保持しているデータストア (Datastore) を利用して、CSV ファイルを Azure ストレージ (Azure ML のデフォルトストレージ) にアップロードします。その後、そのファイルをデータセット (Dataset) として登録します。

In [9]:
from azureml.core import Dataset

# データストア (Datastore) へのアップロード
datastore = ws.get_default_datastore()
datastore.upload_files(files=['./data/Titanic.csv'],
                 target_path='demo',
                 overwrite=True)

# データセット (Dataset) へのアップロード
datastore_paths = [(datastore, './demo/Titanic.csv')]
# create a TabularDataset from 1 file paths in datastore
titanic_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
titanic_ds.register(ws, "titanic", create_new_version=True)


Uploading an estimated of 1 files
Uploading ./data/Titanic.csv
Uploaded ./data/Titanic.csv, 1 files out of an estimated total of 1
Uploaded 1 files


{
  "source": [
    "('workspaceblobstore', './demo/Titanic.csv')"
  ],
  "definition": [
    "GetDatastoreFiles",
    "ParseDelimited",
    "DropColumns",
    "SetColumnTypes"
  ],
  "registration": {
    "id": "4e1c26d3-1b46-464d-8cb5-1c5dfb1988ec",
    "name": "titanic",
    "version": 2,
    "workspace": "Workspace.create(name='azureml', subscription_id='9c0f91b8-eb2f-484c-979c-15848c098a6b', resource_group='azureml')"
  }
}

## 環境 (Environment) の登録

#### Azure Machine Learning Studio

<img src="docs/images/azureml-environment.png" width=500>


#### Python SDK

In [10]:
from azureml.core import Environment
environment_name = "lightgbm-python-env"
file_path = "./environments/requirements.txt"
env = Environment.from_pip_requirements(name = environment_name, file_path = file_path)
env.register(ws);

## コンピューティングクラスター (Compute Clusters) の作成

#### Azure Machine Learning Studio

#### Python SDK

In [11]:
from azureml.core.compute import ComputeTarget, AmlCompute

compute_name = "cpu-clusters"

if compute_name not in ws.compute_targets:
    compute_config = AmlCompute.provisioning_configuration(vm_size = "Standard_DS3_v2", max_nodes=4, idle_seconds_before_scaledown = 300)

    ct = ComputeTarget.create(ws, compute_name, compute_config)
    ct.wait_for_completion(show_output=True)

## モデル学習の実験 (Experiments)

#### Azure Machine Learning Studio

<div class=\"alert alert-info\"><h5> Warning !!! </h5><p>
Azure Machine Learning から Job を実行する機能は **Private Preview** です。</p></div>

#### Python SDK

In [12]:
from azureml.core import Experiment

experiment_name = "lgb-test1"
experiment = Experiment(ws, experiment_name)

In [13]:
from azureml.core import ScriptRunConfig

script_dir = "src"
script_name = "train-lgb.py"
args = ["--input-data", titanic_ds.as_named_input('titanic')]

src = ScriptRunConfig(
    source_directory=script_dir,
    script=script_name,
    environment=env,
    arguments=args,
    compute_target=compute_name,
)

In [14]:
run = experiment.submit(src)

In [15]:
run.wait_for_completion(show_output=True)

RunId: lgb-test1_1642430502_f34a4ccb
Web View: https://ml.azure.com/runs/lgb-test1_1642430502_f34a4ccb?wsid=/subscriptions/9c0f91b8-eb2f-484c-979c-15848c098a6b/resourcegroups/azureml/workspaces/azureml&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

Execution Summary
RunId: lgb-test1_1642430502_f34a4ccb
Web View: https://ml.azure.com/runs/lgb-test1_1642430502_f34a4ccb?wsid=/subscriptions/9c0f91b8-eb2f-484c-979c-15848c098a6b/resourcegroups/azureml/workspaces/azureml&tid=72f988bf-86f1-41af-91ab-2d7cd011db47

This run might be using a new job runtime with improved performance and error reporting. The logs from your script are in user_logs/std_log.txt. Please let us know if you run into any issues, and if you would like to opt-out, please add the environment variable AZUREML_COMPUTE_USE_COMMON_RUNTIME to the environment variables section of the job and set its value to the string "false"




{'runId': 'lgb-test1_1642430502_f34a4ccb',
 'target': 'cpu-clusters',
 'status': 'Completed',
 'startTimeUtc': '2022-01-17T14:43:12.318002Z',
 'endTimeUtc': '2022-01-17T14:43:56.217557Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': '17cd4d35-8f8d-4bbf-a6c9-99acc8cb6c98',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [{'dataset': {'id': 'd2c5cd9f-993a-4027-865e-8bb2b68caafb'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'titanic', 'mechanism': 'Direct'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'train-lgb.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--input-data', 'DatasetConsumptionConfig:titanic'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cpu-clusters',
  'dataReferences': {},
  'data': {'titanic': {'dataLocation': {'dataset': {'id': 'd2c5c

In [16]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
lgb-test1,lgb-test1_1642430502_f34a4ccb,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


## モデル登録 (Model Registry)

Azure Machine Learning studio

Python SDK

In [17]:
from azureml.core import Model

model = run.register_model(model_name="lgb-test", tags={'algorithm': 'lightGBM'}, model_path = 'model')

## 推論環境の作成

Azure Machine Learning studio

Python SDK

In [18]:
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig

In [19]:
env = Environment.get(ws, "lightgbm-python-env")
env.inferencing_stack_version = 'latest'

In [20]:
aciconfig = AciWebservice.deploy_configuration()

In [21]:
model = Model(ws, "lgb-test")
inference_config = InferenceConfig(entry_script="score.py", source_directory="src", environment=env)

In [22]:
service_name = "lgb-aci"
service = Model.deploy(
    workspace=ws,
    name=service_name,
    models=[model],
    inference_config=inference_config,
    deployment_config=aciconfig,
    overwrite=True
)

In [23]:
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-01-17 14:44:25+00:00 Creating Container Registry if not exists.
2022-01-17 14:44:26+00:00 Registering the environment.
2022-01-17 14:44:28+00:00 Use the existing image.
2022-01-17 14:44:28+00:00 Generating deployment configuration.
2022-01-17 14:44:29+00:00 Submitting deployment to compute..
2022-01-17 14:44:33+00:00 Checking the status of deployment lgb-aci..
2022-01-17 14:44:53+00:00 Checking the status of inference endpoint lgb-aci.
Succeeded
ACI service creation operation finished, operation "Succeeded"


## モデルの検証

Azure Machine Learning Studio

Python SDK

In [28]:
import urllib.request
import json
import os
import ssl

data =  {
            "data": [[
                2,
                "Kvillner, Mr. Johan Henrik Johannesson",
                "male",
                31,
                0,
                0,
                "C.A. 18723",
                10.5,
                "",
                "S"
            ]]
        }
body = str.encode(json.dumps(data))

In [29]:
url = service.scoring_uri
headers = {'Content-Type':'application/json'}
req = urllib.request.Request(url, body, headers)

In [30]:
try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(json.loads(error.read().decode("utf8", 'ignore')))


b'[[0.7252904642292589, 0.2747095357707411]]'
