# Diabetes Modell

## Erstellen eines Modells mit scikit-learn

### Daten importieren

In diesem Notebook verwenden wir das Diabetes Dataset von [Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/#overview).

In [1]:
from azureml.opendatasets import Diabetes

diabetes = Diabetes.get_tabular_dataset()
X = diabetes.drop_columns("Y")
y = diabetes.keep_columns("Y")
X_df = X.to_pandas_dataframe()
y_df = y.to_pandas_dataframe()
X_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 442 entries, 0 to 441
Data columns (total 10 columns):
AGE    442 non-null int64
SEX    442 non-null int64
BMI    442 non-null float64
BP     442 non-null float64
S1     442 non-null int64
S2     442 non-null float64
S3     442 non-null float64
S4     442 non-null float64
S5     442 non-null float64
S6     442 non-null int64
dtypes: float64(6), int64(4)
memory usage: 34.7 KB


Der Pandas Data Fram `X_df` enthält 10 Inputvariablen für die Ausgangssituation, wie Alter, Geschlecht, Body-Mass-Index, durchschnittlicher Blutdruck und sechs Blutserum-Messungen.

Der Pandas Data Frame `y_df` ist die Zielvariable. Diese Zielvariable ist ein quantitatives Mass für die Diabetesentwicklung ein Jahr nach der Ausgangssituation.

### Modell trainieren

Dieses Codeschnipsel konstruiert ein [Ridge-Regressionsmodell](https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression) und serialisiert das Modell mit Hilfe des Python-Pickel-Formats.

In [2]:
import joblib
from sklearn.linear_model import Ridge

model = Ridge().fit(X_df,y_df)
joblib.dump(model, 'diabetes_sklearn_model.pkl')

['diabetes_sklearn_model.pkl']

### Modell registrieren

Neben dem Inhalt der Modelldatei selbst speichert ein registriertes Modell auch Metadaten. Zu den Metadaten gehören die Modellbeschreibung, Tags und Framework-Informationen.

Metadaten sind nützlich, um Modelle im Workspace zu verwalten und bereitzustellen.

In [3]:
import sklearn

from azureml.core import Workspace
from azureml.core import Model
from azureml.core.resource_configuration import ResourceConfiguration

ws = Workspace.from_config()

model = Model.register(
    workspace=ws,
    model_name='diabetes-sklearn-model',         # Name of the registered model in your workspace.
    model_path='./diabetes_sklearn_model.pkl',    # Local file to upload and register as a model.
    model_framework=Model.Framework.SCIKITLEARN,    # Framework used to create the model.
    model_framework_version=sklearn.__version__,    # Version of scikit-learn used to create the model.
    sample_input_dataset=X,
    sample_output_dataset=y,
    resource_configuration=ResourceConfiguration(cpu=2, memory_in_gb=4),
    description='Ridge regression model to predict diabetes progression.',
    tags={'area': 'diabetes', 'type': 'regression'}
)

print('Name:', model.name)
print('Version:', model.version)

Registering model diabetes-sklearn-model
Name: diabetes-sklearn-model
Version: 1


## Modell als Web Service bereitstellen

### Modell Scoring Script

Das Scoring-Skript enthält zwei Methoden:

* Die Methode `init()` wird beim Starten des Services ausgeführt. Sie lädt das Modell (das automatisch aus der Modellregistrierung heruntergeladen wird) und deserialisiert es.
* Die Methode `run(data)` wird ausgeführt, wenn ein Aufruf des Services Eingabedaten enthält, die ausgewertet werden müssen.

In [4]:
%%writefile score.py

import json
import pickle
import numpy as np
import pandas as pd
import os
import joblib
from azureml.core.model import Model

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


def init():
    global model
    # Replace filename if needed.
    path = os.getenv('AZUREML_MODEL_DIR') 
    model_path = os.path.join(path, 'diabetes_sklearn_model.pkl')
    # Deserialize the model file back into a sklearn model.
    model = joblib.load(model_path)


input_sample = pd.DataFrame(data=[{
    "AGE": 5,
    "SEX": 2,
    "BMI": 3.1,
    "BP": 3.1,
    "S1": 3.1,
    "S2": 3.1,
    "S3": 3.1,
    "S4": 3.1,
    "S5": 3.1,
    "S6": 3.1
}])

# This is an integer type sample. Use the data type that reflects the expected result.
output_sample = np.array([0])

# To indicate that we support a variable length of data input,
# set enforce_shape=False
@input_schema('data', PandasParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
    try:
        print("input_data....")
        print(data.columns)
        print(type(data))
        result = model.predict(data)
        print("result.....")
        print(result)

        # You can return any data type, as long as it can be serialized by JSON.
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

Writing score.py


### Definiere das benutzerdefinierte Environment

Definiere im Environment die Python Pakete, wie z. B. `pandas` und `scikit-learn`, die das Scoring Script (`score.py`) benötigt.

In [5]:
from azureml.core.model import InferenceConfig
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

environment = Environment('diabetes-sklearn-environment')
environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[
    'azureml-defaults',
    'inference-schema[numpy-support]',
    'joblib',
    'numpy',
    'pandas',
    'scikit-learn=={}'.format(sklearn.__version__)
])

inference_config = InferenceConfig(entry_script='./score.py',environment=environment)

### Model deployen

In [6]:
service_name = 'diabetes-sklearn-model'

service = Model.deploy(ws, service_name, [model], inference_config, overwrite=True)
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-10-25 21:26:56+00:00 Creating Container Registry if not exists.
2021-10-25 21:26:56+00:00 Registering the environment.
2021-10-25 21:26:57+00:00 Building image..
2021-10-25 21:32:58+00:00 Generating deployment configuration.
2021-10-25 21:32:59+00:00 Submitting deployment to compute..
2021-10-25 21:33:10+00:00 Checking the status of deployment diabetes-sklearn-model..
2021-10-25 21:35:21+00:00 Checking the status of inference endpoint diabetes-sklearn-model.
Succeeded
ACI service creation operation finished, operation "Succeeded"


### Web Service testen

Der Output sollte wie die folgende Python Liste aussehen:

```python
[[205.59094435613133], [68.84146418576978]]
```

In [7]:
import json

input_payload = json.dumps({
    'data': X_df[0:2].values.tolist()
})

output = service.run(input_payload)

print(output)

[[205.59094435613133], [68.84146418576978]]
