# Introduction Azure ML<br> (expérimentations, runs, logs)
<br>

<img src='https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true'>


Documentation : https://docs.microsoft.com/en-us/azure/machine-learning/

In this tutorial, you complete the end-to-end steps to get started with the Azure Machine Learning Python SDK running 
in Jupyter notebooks. 

In this tutorial, you:
- Run some Python code
- Log results into an Azure ML experiment
- Save ML models into Azure ML repository
- Use MLFlow integration with Azure ML
- Use Azure OpenDatasets

## Architecture et concepts

> https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture

<img src="https://github.com/retkowsky/images/blob/master/workspace.png?raw=true">

> Architectures Azure : https://docs.microsoft.com/en-us/azure/architecture/browse/

## 0. Paramétrage

In [None]:
import numpy as np

In [None]:
import datetime
now = datetime.datetime.now()
print("Aujourd'hui :", now)

In [None]:
import sys
sys.version

In [None]:
import platform,socket,re,uuid,json,psutil,logging

def getSystemInfo():
    try:
        info={}
        info['Platform']=platform.system()
        info['Platform-release']=platform.release()
        info['Platform-version']=platform.version()
        info['Architecture']=platform.machine()
        info['Hostname']=socket.gethostname()
        info['IP-address']=socket.gethostbyname(socket.gethostname())
        info['MAC-address']=':'.join(re.findall('..', '%012x' % uuid.getnode()))
        info['Processor']=platform.processor()
        info['RAM']=str(round(psutil.virtual_memory().total / (1024.0 **3)))+" Go"
        return json.dumps(info)
    except Exception as e:
        logging.exception(e)

json.loads(getSystemInfo())

In [None]:
import azureml.core
from azureml.core import Experiment, Workspace

print("Version Azure ML service :", azureml.core.VERSION)

In [None]:
# Rappel des infos du workspace Azure ML service
ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Resource group: ' + ws.resource_group, sep='\n')

In [None]:
from azureml.core import ComputeTarget, Datastore, Dataset

print("Compute Targets:")
for compute_name in ws.compute_targets:
    compute = ws.compute_targets[compute_name]
    print("\t", compute.name, ':', compute.type)
    
print("Datastores:")
for datastore_name in ws.datastores:
    datastore = Datastore.get(ws, datastore_name)
    print("\t", datastore.name, ':', datastore.datastore_type)
    
print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name)

## 1. Chargement des données

In [None]:
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib

In [None]:
# scikit-learn version
import sklearn
print("Version scikit-learn :", sklearn.__version__)

#### Données : DIABETES
From Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) "Least Angle Regression," Annals of Statistics (with discussion), 407-499, we have

"Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline."
> https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

In [None]:
X, y = load_diabetes(return_X_y = True)
columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']

In [None]:
# Taille en % de la base de test
testsizepct=0.20

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=testsizepct, random_state=0)

data = {
    "train":{"X": X_train, "y": y_train},        
    "test":{"X": X_test, "y": y_test}
}

In [None]:
nobstrain=len(data['train']['X'])

In [None]:
nobstest=len(data['test']['X'])

In [None]:
print ("Training =", nobstrain, 'observations')
print ("Test =", nobstest, 'observations')

## 2. Modélisation

Nous allons réaliser un modèle de régression **Ridge**.<br> 
<img src='https://github.com/retkowsky/images/blob/master/ridge.png?raw=true'>
<br>
C'est une version régularisée de la régression linéaire.
Cela permet d'ajuster les données avec des coefficents de pondération du modèle les plus petits possibles.
- Si le paramétre = 0 => nous avons dans ce cas une régression linéaire.
- Si le paramétre a une valeur importante, alors les coefficients de pondérations ont des valeurs proches de 0 => on aura dans ce cas une ligne horizontale qui passe par la moyenne des données.



In [None]:
experiment = Experiment(workspace=ws, name="Exemple1-IntroAzureML")

Les étapes
1. Logs d'informations
2. Modélisation
3. Logs de résultats de modèles et graphiques
4. Sauvegarde modèle

In [None]:
def regridge(k):
    
    # 1. On démarre le logging des informations
    run = experiment.start_logging()
    
    print('k =', k)
    print()    
       
    # 2. Logs d'informations au run
    run.log('k', k) # On loggue la valeur de k dans l'expérimentation Azure ML
    run.log('Test Size', testsizepct) # On loggue la valeur dans l'expérimentation Azure ML
    run.log('Nobs Training', nobstrain) # On loggue la valeur dans l'expérimentation Azure ML
    run.log('Nobs Test', nobstest) # On loggue la valeur dans l'expérimentation Azure ML
    
    # 3. Construction d'un modèle de régression Ridge avec la valeur de k comme hyperparamètre
    regression_model = Ridge(alpha=k)
    regression_model.fit(data['train']['X'], data['train']['y'])
    preds = regression_model.predict(data['test']['X'])

    # 4. Log de la métrique Mean Squared Error du modèle dans l'expérimentation
    from sklearn.metrics import r2_score
    
    print('- MSE =', mean_squared_error(data['test']['y'], preds))
    print('- R2 =', r2_score(data['test']['y'], preds))
    
    run.log('mse', mean_squared_error(data['test']['y'], preds))
    run.log('R2', r2_score(data['test']['y'], preds))

    # 5. Export modèle pour chaque valeur de k
    joblib.dump(value=regression_model, filename='RegRidgeModele-k-'+str(k)+'.pkl')

    # 6. Ajout de tags personnalisés au run
    run.tag("Langage", "Python")
    run.tag("Version_Python", "3.6.9")
    run.tag("Version_AzureML", "1.4.0")
    run.tag("Team", "DataScience")
    run.tag("Pays", "France") 
    run.tag("Auteur", "Serge") 
    
    # 7. Création et log d'un graphique
    %matplotlib inline
    import matplotlib.pyplot as plt

    fig = plt.figure(1)
    idx = np.argsort(data['test']['y'])
    plt.plot(data['test']['y'][idx],preds[idx])
    
    fig.savefig("RegRidgeGraphique-k-"+str(k)+".png") # On sauvegarde chaque graphique avec un nom personnalisé
    plt.title('Ajustement Régression Ridge', fontsize=10)
    
    run.log_image(name='Ajustement Régression Ridge', plot=plt) # On loggue dans l'expérimentation l'image du graphique
    
    # 6. Fin du run
    run.complete()

Rappel: Définition MSE
<img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/e258221518869aa1c6561bb75b99476c4734108e">

### Appel de la fonction

In [None]:
regridge(0.1)

In [None]:
regridge(0.2)

In [None]:
regridge(0.3)

In [None]:
regridge(0.4)

In [None]:
# Liste des pickle des modèles sauvegardés (dans la vm)
%ls RegRidgeModele*.pkl -l

In [None]:
# Liste des graphiques sauvegardés (dans la vm)
%ls RegRidgeGraphique*.png -l

## Référencement du modèle

In [None]:
monmodelepkl='RegRidgeModele-k-0.4.pkl'
k=0.4
MSE=3295.741064355809
R2=0.3572956390661659

In [None]:
from azureml.core.model import Model

model = Model.register(model_path=monmodelepkl, # Référence du pickle
                       model_name="RegressionRidge", # Nom du modèle référencé
                       model_framework=Model.Framework.SCIKITLEARN,  # Framework
                       model_framework_version='0.22',             # Version scikit-learn
                       tags={'area': 'Diabetes', # Ajout de tags au modèle
                             'type': 'Regression Ridge', 
                             'k':k, 
                             'MSE' : MSE, 
                             'R2' : R2,
                             'Framework' : 'Azure ML'},
                       description="Modèle de régression Ridge", # Description du modèle
                       workspace=ws) # Nom du workspace Azure ML

## Informations du modèle

In [None]:
print('Nom du modèle :', model.name)
print('Description :', model.description)
print('ID =', model.id)
print('Version =', model.version)

In [None]:
# Liste des modèles référencés
from azureml.core import Model

for model in Model.list(ws):
    print(model.name, '- version =', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

## On visualise les métriques par run

In [None]:
# Liste des métriques par Run
from azureml.core import Experiment, Run

diabetes_experiment = ws.experiments['Exemple1-IntroAzureML']
for logged_run in diabetes_experiment.get_runs():
    print()
    print('Run ID :', logged_run.id)
    metrics = logged_run.get_metrics()
    for key in metrics.keys():
        print('-', key, metrics.get(key))

> Nous pouvons visualiser les résultats sur le portail dans la section **experiments**

In [None]:
experiment

## 3. Utilisation MLFlow

<img src="https://docs.microsoft.com/en-us/azure/machine-learning/service/media/how-to-use-mlflow/mlflow-diagram-track.png">

Documentation: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-use-mlflow

> https://mlflow.org/

In [None]:
import mlflow
import mlflow.sklearn
import azureml.core
from azureml.core import Workspace
import matplotlib.pyplot as plt

In [None]:
# A installer si besoin
#!pip install azureml-mlflow

In [None]:
ws = Workspace.from_config()

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

experiment_name = "Exemple1-MLFlow"
mlflow.set_experiment(experiment_name)

In [None]:
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

X, y = load_diabetes(return_X_y = True)
columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
data = {
    "train":{"X": X_train, "y": y_train},        
    "test":{"X": X_test, "y": y_test}
}

print ("Données :", len(data['train']['X']), "observations d'apprentissage et",len(data['test']['X']), "observations de test.")

model_save_path = "model"

k=0.4

with mlflow.start_run() as run:
    
    mlflow.log_metric('k', k)
    print('k =', k)
    
    regression_model = Ridge(alpha=k)
    regression_model.fit(data['train']['X'], data['train']['y'])
    preds = regression_model.predict(data['test']['X'])

    print('Mean Squared Error =', mean_squared_error(data['test']['y'], preds))
    mlflow.log_metric('mse', mean_squared_error(data['test']['y'], preds))
    
    mlflow.sklearn.log_model(regression_model,model_save_path)
    
    fig = plt.figure(1)
    idx = np.argsort(data['test']['y'])
    plt.plot(data['test']['y'][idx],preds[idx])
    fig.savefig("mongraphiqueMLFlow.png")
    mlflow.log_artifact("mongraphiqueMLFlow.png")
    

In [None]:
ws.experiments[experiment_name]

## 4. Autres exemples de logging d'informations

In [None]:
experiment = Experiment(workspace=ws, name='Exemple1-Logging')

In [None]:
from tqdm import tqdm

In [None]:
# start logging for the run
run = experiment.start_logging()

# change the scale factor on different runs to see how you can compare multiple runs
scale_factor = 3.14

# change the category on different runs to see how to organize data in reports
category = 'Pi'

In [None]:
run

In [None]:
experiment

In [None]:
# log chaine de caractères
run.log(name='Category', value=category)

In [None]:
# log donneés numériques
run.log(name="scale factor", value = scale_factor)
run.log(name='Magic Number', value=42 * scale_factor)

In [None]:
fibonacci_values = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
scaled_values = (i * scale_factor for i in fibonacci_values)

# Log a list of values. Note this will generate a single-variable line chart.
run.log_list(name='Fibonacci', value=scaled_values)

for i in tqdm(range(-10, 10)):
    # log a metric value repeatedly, this will generate a single-variable line chart.
    run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))

In [None]:
# create a dictionary to hold a table of values
sines = {}
sines['angle'] = []
sines['sine'] = []

for i in tqdm(range(-10, 10)):
    angle = i / 2.0 * scale_factor
    
    # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.
    run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))
        
    sines['angle'].append(angle)
    sines['sine'].append(np.sin(angle))

# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns
run.log_table(name='Sine Wave', value=sines)

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
angle = np.linspace(-3, 3, 50) * scale_factor
plt.plot(angle,np.tanh(angle), label='tanh')
plt.legend(fontsize=12)
plt.title('Tangente hyperbolique', fontsize=16)
plt.grid(True)

run.log_image(name='Tangente hyperbolique', plot=plt)

In [None]:
file_name = 'outputs/myfile.txt'

with open(file_name, "w") as f:
    f.write('This is an output file that will be uploaded.\n')

# Upload the file explicitly into artifacts 
run.upload_file(name = file_name, path_or_stream = file_name)

In [None]:
run.complete()

In [None]:
experiment

In [None]:
from azureml.core import Experiment, Run

diabetes_experiment = ws.experiments['Exemple1-Logging']
for logged_run in diabetes_experiment.get_runs():
    print('Run ID:', logged_run.id)
    metrics = logged_run.get_metrics()
    for key in metrics.keys():
        print('-', key, metrics.get(key))


## 5. Azure Open Datasets
<img src="https://github.com/retkowsky/images/blob/master/opendata.jpg?raw=true">

> https://azure.microsoft.com/fr-fr/services/open-datasets/

In [None]:
#!pip install azureml-opendatasets

In [None]:
# Dataset jours fériés
from azureml.opendatasets import PublicHolidays

from datetime import datetime
from dateutil import parser
from dateutil.relativedelta import relativedelta

In [None]:
# Intervalle en mois
mois=12

In [None]:
import time
datedujour = time.strftime("%d-%m-%Y")
print("Date :", datedujour)

In [None]:
fin = datetime.today() + relativedelta(months=mois)
debut = datetime.today() - relativedelta(months=mois)

In [None]:
print("Début :", debut)

In [None]:
print("Fin :", fin)

In [None]:
hol = PublicHolidays(start_date=debut, end_date=fin)
joursferies = hol.to_pandas_dataframe()

In [None]:
joursferies.shape

In [None]:
joursferies.head(15)

In [None]:
# Pour avoir les jours féries en France
joursferiesFR=joursferies[joursferies.countryRegionCode == 'FR']

In [None]:
# Pour avoir la colonne date en 1ere colonne
joursferiesFR = joursferiesFR[ ['date'] + [ col for col in joursferiesFR.columns if col != 'date' ] ]

In [None]:
print("Nombre de jours fériés sur la période :", len(joursferiesFR.index), "jours.")

In [None]:
joursferiesFR

In [None]:
# Export fichier CSV
joursferiesFR.to_csv(r'exportjoursferies.csv', index = False)

In [None]:
# Export fichier EXCEL
joursferiesFR.to_excel('exportjoursferies.xlsx')  

In [None]:
%ls exportjoursferies.* -l

In [None]:
# Visu du fichier csv exporté
with open(os.path.join('./exportjoursferies.csv'), 'r') as f:
    print(f.read())

<img src="https://github.com/retkowsky/images/blob/master/Powered-by-MS-Azure-logo-v2.png?raw=true" height="300" width="300">