# Azure Machine Learning Studio Notebooks Practice
---

- 당뇨병 데이터를 이용하여 실습을 진행해보자.

In [1]:
from azureml.core import Workspace

ws = Workspace.from_config()    # 워크 스테이션의 상태 확인
print('Workspace name: ' + ws.name, '\n',
      'Azure Region: ' + ws.location, '\n',
      'Subscription ID: ' + ws.subscription_id, '\n',
      'Resource Group: ' + ws.resource_group
)


Workspace name: labuser111ml 
 Azure Region: koreacentral 
 Subscription ID: 27db5ec6-d206-4028-b5e1-6004dca5eeef 
 Resource Group: rg111


- 실험 공간을 준비한다.

In [2]:
from azureml.core import Experiment

experiment = Experiment(workspace=ws, name='diabetes-experiment')    # 워크 스페이스와 시험의 이름 지정

- 데이터를 준비한다.

In [3]:
from azureml.opendatasets import Diabetes
from sklearn.model_selection import train_test_split

x_df = Diabetes.get_tabular_dataset().to_pandas_dataframe().dropna()
y_df = x_df.pop('Y')    # Y를 끄집어 낸다.

X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)

print(X_train)

     AGE  SEX   BMI     BP   S1     S2    S3    S4      S5   S6
440   36    1  30.0   95.0  201  125.2  42.0  4.79  5.1299   85
389   47    2  26.5   70.0  181  104.8  63.0  3.00  4.1897   70
5     23    1  22.6   89.0  139   64.8  61.0  2.00  4.1897   68
289   28    2  31.5   83.0  228  149.4  38.0  6.00  5.3132   83
101   53    2  22.2  113.0  197  115.2  67.0  3.00  4.3041  100
..   ...  ...   ...    ...  ...    ...   ...   ...     ...  ...
122   62    2  33.9  101.0  221  156.4  35.0  6.00  4.9972  103
51    65    2  27.9  103.0  159   96.8  42.0  4.00  4.6151   86
119   53    1  22.0   94.0  175   88.0  59.0  3.00  4.9416   98
316   53    2  27.7   95.0  190  101.8  41.0  5.00  5.4638  101
20    35    1  21.1   82.0  156   87.8  50.0  3.00  4.5109   95

[353 rows x 10 columns]


- 모델 훈련, 로그, 모델 파일 관리를 해보자.

In [4]:
# 모델 훈련, 로그, 모델 파일 관리
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
import math

alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]

for alpha in alphas:
    # 실험의 로그 기록
    run = experiment.start_logging()
    run.log('alpha_value', alpha)    # alpha_value에 alpha 값을 대입한다.

    model = Ridge(alpha=alpha)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    rmse = math.sqrt(mean_squared_error(y_test, y_pred))

    run.log('rmse', rmse)    # rmse에 rmse 값을 대입한다.

    print('model_alpha={0}, rmse={1}'.format(alpha, rmse))

    # 모델을 파일로 저장하기
    model_name = 'model_alpha_' + str(alpha) + '.pkl'    # 모델 이름 지정하기
    filename = 'outputs/' + model_name

    joblib.dump(value=model, filename=filename)

    # Azure ML Service에 모델 파일을 업로드 하기
    run.upload_file(name=model_name, path_or_stream=filename)

    run.complete()   # 로그 기록을 끝낸다.

    print(f'{alpha} experiment completed.')



model_alpha=0.1, rmse=56.605203313391435
0.1 experiment completed.
model_alpha=0.2, rmse=56.61060264545031
0.2 experiment completed.
model_alpha=0.3, rmse=56.61624324548362
0.3 experiment completed.
model_alpha=0.4, rmse=56.62210708871013
0.4 experiment completed.
model_alpha=0.5, rmse=56.628177342751385
0.5 experiment completed.
model_alpha=0.6, rmse=56.63443828302744
0.6 experiment completed.
model_alpha=0.7, rmse=56.64087521475942
0.7 experiment completed.
model_alpha=0.8, rmse=56.64747440101076
0.8 experiment completed.
model_alpha=0.9, rmse=56.65422299625313
0.9 experiment completed.
model_alpha=1, rmse=56.661108984990555
1 experiment completed.


- Azure ML Service에 업로드한 작업 공간을 연동한다.

In [5]:
from azureml.core import Experiment

experiment = Experiment(workspace=ws, name="diabetes-experiment")
experiment

Name,Workspace,Report Page,Docs Page
diabetes-experiment,labuser111ml,Link to Azure Machine Learning studio,Link to Documentation


- 최고의 모델을 탐색하고 다운로드 해보자.

In [6]:
# Best Model 탐색 후 다운로드
minimum_rmse = None
minimum_rmse_runid = None

for exp in experiment.get_runs():
    run_metrics = exp.get_metrics()
    run_details = exp.get_details()

    run_rmse = run_metrics['rmse']
    run_id = run_details['runId']

    # 가장 낮은 rmse 값을 가진 실행 ID를 구하기
    if minimum_rmse is None:   # 제일 처음 실행시켰을 경우
        minimum_rmse = run_rmse
        minimum_rmse_runid = run_id
    else:
        if run_rmse < minimum_rmse:
            minimum_rmse = run_rmse
            minimum_rmse_runid = run_id

print('Best run_id: ' + minimum_rmse_runid)
print('Best run_id rmse: ' + str(minimum_rmse))

Best run_id: cb20ac7e-ed7c-476f-b995-5a3aae59c0b8
Best run_id rmse: 56.605203313391435


- Best Model을 다운로드 받아보자.

In [7]:
from azureml.core import Run

best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)
print(best_run.get_file_names())

best_run.download_file(name=str(best_run.get_file_names()[0]))

['model_alpha_0.1.pkl', 'outputs/.amlignore', 'outputs/.amlignore.amltmp', 'outputs/model_alpha_0.1.pkl', 'outputs/model_alpha_0.2.pkl', 'outputs/model_alpha_0.3.pkl', 'outputs/model_alpha_0.4.pkl', 'outputs/model_alpha_0.5.pkl', 'outputs/model_alpha_0.6.pkl', 'outputs/model_alpha_0.7.pkl', 'outputs/model_alpha_0.8.pkl', 'outputs/model_alpha_0.9.pkl', 'outputs/model_alpha_1.pkl']


---

- 실험에서 사용했던 데이터들을 Data Store에 체계적으로 저장해보자. 우선, 모델에서 사용했던 Feature와 Label을 `.csv` 파일로 저장한다. 그리고 Data Store에 업로드해본다.

In [10]:
import numpy as np
from azureml.core import Dataset

# csv 파일로 저장하기
np.savetxt('feature.csv', X_train, delimiter=',')     # Feature 데이터 저장
np.savetxt('label.csv', y_train, delimiter=',')     # Label 데이터 저장

# Data Store에 업로드 하기
datastore = ws.get_default_datastore()     # Data Store 정보 가져오기

datastore.upload_files(files=['./feature.csv', './label.csv'],
                        target_path='diabetes-experiment/',      # 업로드 할 경로 
                        overwrite=True                           # 덮어쓰기 가능
)

"datastore.upload_files" is deprecated after version 1.0.69. Please use "FileDatasetFactory.upload_directory" instead. See Dataset API change notice at https://aka.ms/dataset-deprecation.


Uploading an estimated of 2 files
Uploading ./feature.csv
Uploaded ./feature.csv, 1 files out of an estimated total of 2
Uploading ./label.csv
Uploaded ./label.csv, 2 files out of an estimated total of 2
Uploaded 2 files


$AZUREML_DATAREFERENCE_738d30174f5e49d596d3f18b8e96aa50

- Data Store에 저장되어 있는 파일들을 가져와보자.

In [11]:
feature_dataset = Dataset.Tabular.from_delimited_files(path=[(datastore, 'diabetes-experiment/feature.csv')])    # 해당 경로에 있는 파일들을 가져온다.
label_dataset = Dataset.Tabular.from_delimited_files(path=[(datastore, 'diabetes-experiment/label.csv')])

- 생성했었던 Best Model을 Data Store에 등록해보자.

In [13]:
import sklearn

from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

In [15]:
# 모델의 등록
model = Model.register(workspace=ws,
                model_name='diabetes-experiment-data',
                model_path=f'./{str(best_run.get_file_names()[0])}',
                model_framework=Model.Framework.SCIKITLEARN,    # 사이킷런
                model_framework_version=sklearn.__version__,
                sample_input_dataset=feature_dataset,
                sample_output_dataset=label_dataset,
                resource_configuration=ResourceConfiguration(cpu=1, memory_in_gb=0.5),    # 실행 환경 지정 : CPU 1개, 0.5GB RAM
                description='Ridge Regression Model to predict diabetes progression',
                tags={'area' : 'diabetes', 'type' : 'regression'}
        )

Registering model diabetes-experiment-data


- Data Store에 모델이 등록되었는지 확인해본다.

In [16]:
print('Model Name: ', model.name)
print('Model Version: ', model.version)

Model Name:  diabetes-experiment-data
Model Version:  1


- 이제 모델을 **배포**해보자.

In [17]:
# 모델의 배포
service_name = 'diabetes-service'

service = Model.deploy(ws, service_name, [model], overwrite=True)   # 모델은 여러 개 배포할 수 있기 때문에 리스트 타입으로 표현한다.
service.wait_for_deployment(show_output=True)

To leverage new model deployment capabilities, AzureML recommends using CLI/SDK v2 to deploy models as online endpoint, 
please refer to respective documentations 
https://docs.microsoft.com/azure/machine-learning/how-to-deploy-managed-online-endpoints /
https://docs.microsoft.com/azure/machine-learning/how-to-attach-kubernetes-anywhere 
For more information on migration, see https://aka.ms/acimoemigration 
  service = Model.deploy(ws, service_name, [model], overwrite=True)   # 모델은 여러 개 배포할 수 있기 때문에 리스트 타입으로 표현한다.


Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2023-06-02 09:32:16+00:00 Creating Container Registry if not exists..
2023-06-02 09:42:16+00:00 Registering the environment..
2023-06-02 09:42:18+00:00 Uploading autogenerated assets for no-code-deployment.
2023-06-02 09:42:22+00:00 Building image..
2023-06-02 09:52:36+00:00 Generating deployment configuration.
2023-06-02 09:52:37+00:00 Submitting deployment to compute..
2023-06-02 09:52:46+00:00 Checking the status of deployment diabetes-service..
2023-06-02 09:54:11+00:00 Checking the status of inference endpoint diabetes-service.
Succeeded
ACI service creation operation finished, operation "Succeeded"


- 배포 완료 후, 서비스 안에서 배포된 모델을 이용하여 예측 작업을 수행해보자.

In [18]:
import json

input_payload = json.dumps({
    'data': X_train[0:2].values.tolist(),
    'method': 'predict'     # 예측 작업 수행 
})

output = service.run(input_payload)
print(output)

{'predict': [204.94506937062147, 74.4641225933554]}
