****Important – Do not use in production, for demonstration purposes only – please review the legal notices before continuing****

# Customer Churn Modeling Using AutoML

Customer churn is the proportion of customers that stop utilizing your product or service. This notebook builds a predictive model for customer churn using Azure's AutoML in a retail scenario.


![Picture](https://stretailprod.blob.core.windows.net/notebookimages/customer_churn.jpg?sp=r&st=2022-02-24T21:05:18Z&se=2024-02-25T05:05:18Z&sv=2020-08-04&sr=b&sig=ijbMsd7bZ%2F0ia9z3RiUIATi3qN6qfxryQaYfh07DOII%3D)

### Importing libraries

In [1]:
import azureml.core
from azureml.core import Experiment, Workspace, Dataset, Datastore
from azureml.train.automl import AutoMLConfig
from azureml.data.dataset_factory import TabularDatasetFactory

In [2]:
from pyspark.sql import SparkSession
import matplotlib.pyplot as plt 
import seaborn as sns
import numpy as np
from azure.storage.blob import ContainerClient, BlobClient
import pandas as pd
from io import BytesIO
from copy import deepcopy
import GlobalVariables as gv

### Reading data

In [3]:
blob = BlobClient.from_connection_string(conn_str=gv.CustomerChurnCONNECTIONSTRING, container_name=gv.CustomerChurnCONTAINER_NAME, blob_name=gv.CustomerChurnBLOBNAME)
blob_data = blob.download_blob()
BytesIO(blob_data.content_as_bytes())
df = pd.read_csv(BytesIO(blob_data.content_as_bytes()))

### EDA 

In [4]:
df = df.iloc[: , 1:]

In [5]:
df

Unnamed: 0,Segment,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Country
0,1,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,0
1,1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,0
2,1,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,0
3,1,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,0
4,1,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,0
...,...,...,...,...,...,...,...,...
354340,0,581457,23401,RUSTIC MIRROR WITH LACE HEART,126,2011-12-08 18:43:00,4.15,0
354341,0,581566,23404,HOME SWEET HOME BLACKBOARD,144,2011-12-09 11:50:00,3.26,0
354342,0,553573,22980,PANTRY SCRUBBING BRUSH,1,2011-05-18 09:52:00,1.65,0
354343,0,553573,22982,PANTRY PASTRY BRUSH,1,2011-05-18 09:52:00,1.25,0


In [6]:
# All columns in the data
df.columns

Index(['Segment', 'Invoice', 'StockCode', 'Description', 'Quantity',
       'InvoiceDate', 'Price', 'Country'],
      dtype='object')

In [7]:
# Selecting specific columns for our model
df = df[['Segment','Quantity','Price','StockCode']]

### Configuring workspace

In [8]:
# Setting up experiment
experiment_name = "syndreamdemoretailprod-CustomerChurnData-20211231061227"
ws = Workspace.get(name=gv.workspace_name, subscription_id=gv.subscription_id, resource_group=gv.resource_group)
experiment = Experiment(ws, experiment_name)
datastore = Datastore.get_default(ws)


### Creating the dataset for Azure Machine Learning

In [9]:
dataset = TabularDatasetFactory.register_pandas_dataframe(df, datastore, name = experiment_name + "-dataset")

Validating arguments.
Arguments validated.
Successfully obtained datastore reference and path.
Uploading file to managed-dataset/46b36ecb-046d-4b3b-afd4-29978bb37162/
Successfully uploaded file to datastore.
Creating and registering a new dataset.
Successfully created and registered a new dataset.


### Model Configuration

In [10]:
# Initializing AutoML Config
automl_config = AutoMLConfig(task = "classification",
                             training_data = df,
                             label_column_name = "Segment",
                             primary_metric = "accuracy",
                             experiment_timeout_hours = 0.25,
                             max_concurrent_iterations = 2,
                             enable_onnx_compatible_models = False)

### Submitting Experiment

In [12]:
# Running AutoML
run = experiment.submit(automl_config)

2022-02-23:14:42:24,177 INFO     [modeling_bert.py:226] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
2022-02-23:14:42:24,191 INFO     [modeling_xlnet.py:339] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
2022-02-23:14:42:35,36 INFO     [utils.py:159] NumExpr defaulting to 2 threads.


Experiment,Id,Type,Status,Details Page,Docs Page
syndreamdemoretailprod-CustomerChurnData-20211231061227,AutoML_bed650f3-e741-4b42-bd5f-8c2e5ed5423d,automl,Preparing,Link to Azure Machine Learning studio,Link to Documentation


2022-02-23:15:07:56,868 INFO     [explanation_client.py:332] Using default datastore for uploads


### Registering the best model using mlflow

In [13]:
# Choosing best model
run.wait_for_completion()

import mlflow

# Get best model from automl run
best_run, non_onnx_model = run.get_output()

artifact_path = experiment_name + "_artifact"

mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())
mlflow.set_experiment(experiment_name)

with mlflow.start_run() as run:
    # Save the model to the outputs directory for capture
    mlflow.sklearn.log_model(non_onnx_model, artifact_path)

    # Register the model to AML model registry
    mlflow.register_model("runs:/" + run.info.run_id + "/" + artifact_path, "synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best")

2022-02-23:15:08:30,832 INFO     [utils.py:117] Parsing artifact uri azureml://experiments/syndreamdemoretailprod-CustomerChurnData-20211231061227/runs/1b5cb6a8-643d-45d8-ac72-90cac9289807/artifacts
2022-02-23:15:08:30,834 INFO     [utils.py:128] Artifact uri azureml://experiments/syndreamdemoretailprod-CustomerChurnData-20211231061227/runs/1b5cb6a8-643d-45d8-ac72-90cac9289807/artifacts info: {'experiment': 'syndreamdemoretailprod-CustomerChurnData-20211231061227', 'runid': '1b5cb6a8-643d-45d8-ac72-90cac9289807'}
Successfully registered model 'synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best'.
2022/02/23 15:08:33 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best, version 1
Created version '1' of model 'synretailprod-AdobeAnalytics_AdobeAnalyticsWebsiteContacts-20220113085215-Best'.
