# Modeling the return rate of marketing campaigns with AutoML

<img src="https://github.com/retkowsky/images/blob/master/AzureMLservicebanniere.png?raw=true">

> Author: Serge Retkowsky Microsoft<br>
> Date: 03-Sept-2020

## Description
We want to predict the likelihood of a marketing campaign using Machine Learning.
AutoML from Azure ML will automatically find the best model.

## Objectives
**Automated machine learning**, also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality. Automated ML is based on a breakthrough from our Microsoft Research division.

## Steps
We will access to the dataset in order to apply the AutoML process. We will find the best autoML pipeline and then we can register our best model.

Configuration options available in automated machine learning:

- Select your experiment type: Classification, Regression, or Time Series Forecasting
- Data source, formats, and fetch data
- Choose your compute target: local or remote
- Automated machine learning experiment settings
- Run an automated machine learning experiment
- Explore model metrics
- Register and deploy model

If you prefer a no code experience, you can also Create your automated machine learning experiments in Azure Machine Learning studio.


<img src="https://github.com/retkowsky/images/blob/master/autoML.png?raw=true">

## 0. Settings

In [23]:
import sys
print('You are using Python', sys.version)

You are using Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) 
[GCC 7.3.0]


In [24]:
import datetime
now = datetime.datetime.now()
print('Today is', now)

Today is 2020-09-03 11:05:22.944741


In [25]:
import azureml.core
print("You are using Azure ML", azureml.core.VERSION)

You are using Azure ML 1.13.0


In [26]:
import pandas as pd
import numpy as np
import pandas.io.sql as pd_sql

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun

import logging
import os
import random

In [27]:
import os
subscription_id = os.environ.get("SUBSCRIPTION_ID", "70b8f39e-8863-49f7-b6ba-34a80799550c")
resource_group = os.environ.get("RESOURCE_GROUP", "azuremlsynapse-rg")
workspace_name = os.environ.get("WORKSPACE_NAME", "azuremlsynapse")

from azureml.core import Workspace
try:
   ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
   ws.write_config()
   print("OK")
except:
   print("Error: Workspace not found")

OK


## 1. Azure ML experimentation

In [28]:
from azureml.core import Workspace

try:
   ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
   ws.write_config()
   print("Workspace OK")
except:
   print("Error")

Workspace OK


In [29]:
ws = Workspace.from_config()

experiment_name = 'AutoMLMarketing'

experiment=Experiment(ws, experiment_name)

output = {}
output['Workspace'] = ws.name
output['Resource Group'] = ws.resource_group
output['Location'] = ws.location
output['Experiment Name'] = experiment.name
pd.set_option('display.max_colwidth', -1)
outputDf = pd.DataFrame(data = output, index = [''])
outputDf.T

Unnamed: 0,Unnamed: 1
Workspace,azuremlsynapse
Resource Group,azuremlsynapse-rg
Location,westeurope
Experiment Name,AutoMLMarketing


## 2. Data Access

In [32]:
from azureml.core import Workspace, Dataset

subscription_id = '70b8f39e-8863-49f7-b6ba-34a80799550c'
resource_group = 'azuremlsynapse-rg'
workspace_name = 'azuremlsynapse'

workspace = Workspace(subscription_id, resource_group, workspace_name)

dataset = Dataset.get_by_name(workspace, name='CampagnesMKT')
data=dataset.to_pandas_dataframe()

In [33]:
data.head()

Unnamed: 0,CodeClient,Retour,PointsFid,StatutMarital,Nombre_enfant,CanalWeb,MagasinMontparnasse,MagasinWagram,CarteFidGOLD,Segment,...,CreationClient,Email,Twitter,Facebook,Klout,AppMobile,Magasin,Newsletter,AgeClient,AncienneteClientJours
0,4094343,Non,19416.8,Oui,0.0,Non,Oui,Oui,Non,AMBASSADEUR,...,2016-09-15,GAELLE_OJEDA@outlook.fr,,GAELLE.OJEDA@facebook.com,17.0,0.0,Montparnasse,0.0,31.08,1420.0
1,4096192,Non,23638.1,Oui,2.0,Oui,Oui,Oui,Non,STAR,...,2016-07-05,Dominique.BOSSO@gmail.com,http://twitter.com/BOSSO,Dominique.BOSSO@facebook.com,6.0,0.0,Montparnasse,0.0,33.24,1492.0
2,4101924,Oui,42378.2,Oui,1.0,Non,Oui,Oui,Non,ABCISTES,...,2015-12-14,MOSER@yahoo.fr,,Coline.MOSER@facebook.com,148.0,1.0,Montparnasse,0.0,27.5,1696.0
3,4101928,Non,39745.3,Oui,0.0,Oui,Oui,Non,Oui,STAR,...,2016-09-15,CLAUDE.LEBOUVIER@gmail.com,,LEBOUVIER@facebook.fr,43.0,0.0,Montparnasse,0.0,39.95,1420.0
4,4103571,Non,45189.8,Oui,0.0,Non,Oui,Non,Non,AMBASSADEUR,...,2015-10-27,SONNERY COTTET@yahoo.fr,http://twitter.com/SONNERY COTTET,Denise.SONNERY COTTET@facebook.com,44.0,0.0,Montparnasse,0.0,45.82,1744.0


In [34]:
df=data.drop(['CodeClient', 'Adresse', 'Email', 'Nom', 'Prenom', 'CodePostal', 'Commune', 'DateNaissance' ], axis=1)

### Statistics

In [35]:
df.describe()

Unnamed: 0,PointsFid,Nombre_enfant,Klout,AppMobile,Newsletter,AgeClient,AncienneteClientJours
count,900.0,900.0,818.0,818.0,818.0,818.0,818.0
mean,27210.23,0.97,31.41,0.1,0.09,35.43,1661.31
std,13384.17,1.04,29.54,0.3,0.29,8.65,210.83
min,137.2,0.0,0.0,0.0,0.0,20.6,1312.0
25%,17074.23,0.0,11.0,0.0,0.0,28.6,1480.0
50%,25252.85,1.0,24.0,0.0,0.0,34.47,1660.0
75%,36610.85,2.0,41.0,0.0,0.0,41.51,1852.0
max,61554.6,3.0,197.0,1.0,1.0,62.89,2020.0


### Correlations

In [36]:
df.corr()

Unnamed: 0,PointsFid,Nombre_enfant,Klout,AppMobile,Newsletter,AgeClient,AncienneteClientJours
PointsFid,1.0,0.03,0.02,0.05,-0.04,-0.02,0.03
Nombre_enfant,0.03,1.0,0.02,-0.03,0.05,0.01,0.07
Klout,0.02,0.02,1.0,0.05,-0.03,0.05,0.04
AppMobile,0.05,-0.03,0.05,1.0,0.03,-0.0,0.0
Newsletter,-0.04,0.05,-0.03,0.03,1.0,0.03,0.01
AgeClient,-0.02,0.01,0.05,-0.0,0.03,1.0,0.01
AncienneteClientJours,0.03,0.07,0.04,0.0,0.01,0.01,1.0


## 3. AutoML

<img src="https://github.com/retkowsky/images/blob/master/autoML4.png?raw=true">

> AutoML documentation: https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml

### We want to predict this variable:

In [37]:
label_column_name = 'Retour'

### Let's define the AutoML process

In [43]:
automl_settings = {
    "n_cross_validations": 5,           # Number of cross validations
    "primary_metric": 'AUC_weighted',   # AutoML quality metric
    "enable_early_stopping": True,      # Can stop the process if no better model are available
    "max_concurrent_iterations": 2,     # Number of concurrent iterations
    "iterations": 10,                   # Number of max iterations
    "experiment_timeout_minutes" : 15,  # Number of minutes for the autoML process
    "iteration_timeout_minutes" : 2,    # Number of minutes by iteration
    "verbosity": logging.INFO,
}

automl_config = AutoMLConfig(task = 'classification',                # This is an AutoML classification problem
                             debug_log = 'automl.log',               # AutoML log file
                             training_data = df,                     # This is the data we want to use
                             label_column_name = label_column_name,  # This is the column to predict
                             **automl_settings
                            )

### Let's run the AutoML

In [44]:
run = experiment.submit(automl_config, show_output = True)

Running on local machine
Parent Run ID: AutoML_75dcbf87-da30-4631-a5be-67702c814620

Current status: DatasetEvaluation. Gathering dataset statistics.
Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetFeaturization. Beginning to fit featurizers and featurize the dataset.
Current status: DatasetFeaturizationCompleted. Completed fit featurizers and featurizing the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalancedData

****************************************************************************************************

TYPE:         Missing f

In [16]:
run

Experiment,Id,Type,Status,Details Page,Docs Page
AutoMLMarketing,AutoML_0de558c3-a08d-4c8d-abef-02192ae96848,automl,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [17]:
from azureml.widgets import RunDetails
RunDetails(run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

### Models are available in the experimentation
<img src='https://github.com/retkowsky/images/blob/master/automlsnapshot1.jpg?raw=true'>

## 4. Let's take the best model

In [45]:
best_run, fitted_model = run.get_output()

In [46]:
print("Best model is:", fitted_model)

Best model is: Pipeline(memory=None,
     steps=[('datatransformer', DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
        feature_sweeping_config=None, feature_sweeping_timeout=None,
        featurization_config=None, force_text_dnn=None,
        is_cross_validation=None, is_onnx_compatible=None, logger=None,
        obser...    silent=True, subsample=1.0, subsample_for_bin=200000,
          subsample_freq=0, verbose=-10))])
Y_transformer(['LabelEncoder', LabelEncoder()])


### List of metrics for the best model

In [47]:
best_run_metrics = best_run.get_metrics()
for metric_name in best_run_metrics:
    metric = best_run_metrics[metric_name]
    print('-', metric_name, '=', metric)

- balanced_accuracy = 0.8959959594216371
- AUC_micro = 0.9566790123456791
- average_precision_score_micro = 0.9557387736196119
- precision_score_weighted = 0.9023433110736171
- precision_score_macro = 0.9033361776395832
- norm_macro_recall = 0.7919919188432745
- f1_score_micro = 0.9000000000000001
- AUC_weighted = 0.9544953828630774
- recall_score_macro = 0.8959959594216371
- average_precision_score_macro = 0.9538750636635275
- matthews_correlation = 0.7992586318533791
- recall_score_weighted = 0.9
- recall_score_micro = 0.9
- confusion_matrix = aml://artifactId/ExperimentRun/dcid.AutoML_75dcbf87-da30-4631-a5be-67702c814620_0/confusion_matrix
- precision_score_micro = 0.9
- accuracy = 0.9
- f1_score_macro = 0.8979699223839818
- f1_score_weighted = 0.8995292072006347
- weighted_accuracy = 0.9038109174599521
- log_loss = 0.27286860395442303
- average_precision_score_weighted = 0.95431203818766
- accuracy_table = aml://artifactId/ExperimentRun/dcid.AutoML_75dcbf87-da30-4631-a5be-67702c814

### Let's register the best model

In [48]:
from azureml.core import Model

best_run.register_model(model_path='outputs/model.pkl', 
                        model_name='MarketingCampaignModel',
                        tags={'Training context':'Azure Auto ML sdk', 
                              'Objet':'Marketing Campaign ML prediction model',
                              'Team' : 'Marketing' } ,
                        properties={'AUC': best_run_metrics['AUC_weighted'] , 
                                    'Log Loss': best_run_metrics['log_loss'] ,
                                    'Recall': best_run_metrics['recall_score_weighted'] ,
                                    'Precision': best_run_metrics['precision_score_weighted'] ,
                                    'F1': best_run_metrics['f1_score_weighted'] ,
                                   }
                       )

Model(workspace=Workspace.create(name='azuremlsynapse', subscription_id='70b8f39e-8863-49f7-b6ba-34a80799550c', resource_group='azuremlsynapse-rg'), name=MarketingCampaignModel, id=MarketingCampaignModel:4, version=4, tags={'Training context': 'Azure Auto ML sdk', 'Objet': 'Marketing Campaign ML prediction model', 'Team': 'Marketing'}, properties={'AUC': '0.9544953828630774', 'Log Loss': '0.27286860395442303', 'Recall': '0.9', 'Precision': '0.9023433110736171', 'F1': '0.8995292072006347'})

### We can display all the models saved into the Azure ML model repository

In [49]:
for model in Model.list(ws):
    print(model.name, '- version =', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')

MarketingCampaignModel - version = 4
	 Training context : Azure Auto ML sdk
	 Objet : Marketing Campaign ML prediction model
	 Team : Marketing
	 AUC : 0.9544953828630774
	 Log Loss : 0.27286860395442303
	 Recall : 0.9
	 Precision : 0.9023433110736171
	 F1 : 0.8995292072006347


MarketingCampaignModel - version = 3
	 Training context : Azure Auto ML sdk
	 Objet : Marketing Campaign ML prediction model
	 Team : Marketing
	 AUC : 0.9546916508256732
	 Log Loss : 0.26587787285006387
	 Recall : 0.9011111111111113
	 Precision : 0.9033759666421599
	 F1 : 0.9006485931026293


MarketingCampaignModel - version = 2
	 Training context : Azure Auto ML sdk
	 Objet : Marketing Campaign ML prediction model
	 Team : Marketing
	 AUC : 0.9544953828630774
	 Log Loss : 0.27286860395442303
	 Recall : 0.9
	 Precision : 0.9023433110736171
	 F1 : 0.8995292072006347


MarketingCampaignModel - version = 1
	 Training context : Azure Auto ML sdk
	 Objet : Marketing Campaign ML prediction model
	 Team : Marketing
	

### Models are available in the Azure ML repository
<img src='https://github.com/retkowsky/images/blob/master/automlsnapshot3.jpg?raw=true'>

### Details
<img src='https://github.com/retkowsky/images/blob/master/automlsnapshot2.jpg?raw=true'>

> End of notebook