# <center>Open Scale Tutorial</center>

This notebook uses Python and modern machine learning libraries to predict telco churn using an existing dataset stored within IBM COS. Given CRISP-DM is the industry accepted methodology for working on predictive/statistical challenges, this same approach will be used here.

<h2>Prerequisites</h2>
<ol>
<li>IBM Watson Studio</li>
<li>Watson Machine Learning</li>
<li>IBM OpenScale</li>
</ol>

# 1. Import Data Using IBM COS
This section goes through the process of importing training and testing datasets originally uploaded from a CSV into the Notebook. 

<p><b>1.1</b> To load data into Watson Studio, select the IO button in the top right hand corner of the notebook. </p>
<p><img src="https://i.imgur.com/dQ1Uf3U.png" align="left"/></p></br>

<p><b>1.2</b> Select browse to load your data</p></br>
<p><img src="https://i.imgur.com/DPg0KhF.png" align="left" width=800 display="inline"/></p>

<p><b>1.3</b> Choose insert Pandas dataframe to insert new code cell</p></br>
<p><img src="https://i.imgur.com/7Air7sc.png" align="left" style="height:250px" display="inline"/></p>

In [None]:
# Import data as CSV from IBM COS using steps above

df.head()

# 2. Data Understanding
In this section, we go through key parts of the DS process including reviewing data quality and creating summary statistics. 

In [None]:
# View length of data frame
len(df)

In [None]:
# Check columns
df.columns

In [None]:
# Check datatypes
df.dtypes

In [None]:
# View unique values within each column
for col in df.columns:
     print(col, len(df[col].unique()), df[col].unique())

In [None]:
# Calculate summary statistics numerical variables
df.describe()

In [None]:
# Calculate summary statistics for categorical variables
df.describe(include=['object'])

In [None]:
# Get columns which have less than two variables
less_than_2 = []
for col in df.columns:
    if len(df[col].unique()) == 2 and df[col].dtype=='object':
        print(col)
        less_than_2.append(col)

In [None]:
# Calculate number of missing values
df.isnull().sum()

In [None]:
# Create a pairplot
import seaborn as sns
from matplotlib import pyplot as plt
plt.figure(figsize=(14,14))
sns.pairplot(df)
plt.show()

# 3. Data Preparation
We can then prepare the data for modelling by one-hot encoding categorical features where required and resolving troublesome data types. 

In [None]:
# One hot encode columns with less than 2 unique values
less_than_2
for col in less_than_2:
    if col == 'gender':
        df[col] = df[col].apply(lambda x: 1 if x=='Female' else 0)
    else:
         df[col] = df[col].apply(lambda x: 1 if x=='Yes' else 0)

In [None]:
# Look for blank strings within Total Charges column
df[df['Total Charges']==' '].head()
df['Total Charges'].replace(' ', 0, inplace=True)

In [None]:
df.columns

In [None]:
# Convert Total Charges column to float
df['Total Charges'] = df['Total Charges'].astype(float)

In [None]:
# Calculate correlation
df.corr()

In [None]:
# Plot heatmap from correlation
plt.figure(figsize=(9,9))
sns.heatmap(df.corr())
plt.show()

In [None]:
# Drop identifier column
df.drop('Customer ID', axis=1, inplace=True)

In [None]:
# One hot encode an entire data frame and store it in a new dataframe called abt
abt = pd.get_dummies(df)

In [None]:
# Create X and y variables
y = abt['Churn']
X = abt.drop('Churn', axis=1)

In [None]:
# Create training and testing splits
random_state=1234
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=random_state)

In [None]:
# Print out training and testing split lengths
print(len(X_train), len(X_test), len(y_train), len(y_test))

# 4. Modelling
Onto modelling whereby we use hyperparameter optimization to tune a range of models to the training data. We ultimately select the best performing model and store it as a saved model to WML. 

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [None]:
from sklearn.linear_model import RidgeClassifier, SGDClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

In [None]:
# Create data pipelines
pipelines = {
     'rf':make_pipeline(StandardScaler(), RandomForestClassifier(random_state=random_state)),
     'gb':make_pipeline(StandardScaler(), GradientBoostingClassifier(random_state=random_state)),
     'sgd':make_pipeline(StandardScaler(), SGDClassifier(random_state=random_state)),
     'ridge':make_pipeline(StandardScaler(), RidgeClassifier(random_state=random_state))
 }

In [None]:
# Create HPO Tuning Grid
grid = {
     'rf':{'randomforestclassifier__n_estimators':[10,20,30]},
     'gb':{'gradientboostingclassifier__n_estimators':[10,20,30]},
     'sgd':{'sgdclassifier__alpha':[0.5,0.9,0.99]},
     'ridge':{'ridgeclassifier__alpha':[0.5,0.9,0.99]}
 }

In [None]:
from sklearn.model_selection import GridSearchCV

In [None]:
# Fit models with Cross Validation
fit_models = {}
for algo, pipeline in pipelines.items():
    # 4.1 Create a Grid Search CV instance
    model = GridSearchCV(pipeline, grid[algo], cv=10, n_jobs=-1)
    # 4.2 Fit the model
    model.fit(X_train, y_train)
    # 4.3 Save it to the fit models dictionary
    fit_models[algo] = model
    print(algo, 'model has been fit.')

# 5. Evaluation
Basic evaluation completed for a logistic regression model. Calculate accuracy, F1 and propagate the confusion matrix. 

In [None]:
fit_models['gb'].predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score, confusion_matrix, f1_score

In [None]:
for algo, model in fit_models.items():
    yhat = model.predict(X_test)
    print('Acc for', algo, 'is', accuracy_score(y_test, yhat))
    print('F1 for', algo, 'is', f1_score(y_test, yhat))
    print(algo)
    print(confusion_matrix(y_test, yhat))

# 6. Deployment
This section goes through how to deploy the model to WML. In order to complete this stage you will need to retrieve Service Credentials from Watson Machine Learning. These can be obtained from the Watson Machine Learning service credentials section. 

<img src="https://i.imgur.com/0q6dz1g.png" width=800 align="left"/>

In [None]:
best_model = fit_models['gb'].best_estimator_

In [None]:
best_model.predict(X_test)

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

In [None]:
WML_CREDENTIALS = {
    "url": "UPDATE CREDENTIALS",
    "apikey": "UPDATE CREDENTIALS",
    "instance_id": "UPDATE CREDENTIALS"
}

In [None]:
client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

In [None]:
import sklearn

In [None]:
sklearn.__version__

In [None]:
metadata = {
    client.repository.ModelMetaNames.NAME: "Scikit Learn Churn Model",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.20"
}

In [None]:
model_details = client.repository.store_model(model=best_model, meta_props=metadata )

In [None]:
model_details

In [None]:
import json
print(json.dumps(model_details, indent=2))

In [None]:
uid = model_details['metadata']['guid']

In [None]:
loaded_model = client.repository.load(uid)

In [None]:
loaded_model

In [None]:
loaded_model.predict(X_test)

In [None]:
deployment_details = client.deployments.create(uid, "Deployment of Sklearn Churn Model")

In [None]:
client.deployments.list()

In [None]:
scoring_endpoint = client.deployments.get_scoring_url(deployment_details)

print(scoring_endpoint)

In [None]:
url = scoring_endpoint

In [None]:
loaded_model.predict(X_test.values)

In [None]:
values = X_test.to_numpy().tolist()

In [None]:
scoring_payload =  {"fields":X_test.columns.to_numpy().tolist(),  "values":values}

In [None]:
predictions = client.deployments.score(url, scoring_payload)

# 7. Setup OpenScale
Last but not least, setup monitors in OpenScale using the deployed WML model. For this step you will require your cloud API key. This can be retrieved from your IBM Cloud account as follows:

<p><b>7.1</b> Navigate to <a href="cloud.ibm.com">cloud.ibm.com</a> and select Manage Users.</p>
<p><img src="https://i.imgur.com/OqDyije.png" align="left" style="height:400px"></p>

<p><b>7.2</b> Select IBM Cloud API Keys then Create an IBM Cloud API Key.</p>
<p><img src="https://i.imgur.com/II15lyS.png" align="left" style="height:400px"></p>

<p><b>7.3</b> Name your key then replace the value in the CLOUD_API_KEY variable below. </p>
<p><img src="https://i.imgur.com/w9Yj5IW.png" align="left" style="height:400px"></p>

In [None]:
!pip install ibm_ai_openscale

In [None]:
from ibm_ai_openscale import APIClient
from ibm_ai_openscale.engines import *
from ibm_ai_openscale.utils import *
from ibm_ai_openscale.supporting_classes import PayloadRecord, Feature
from ibm_ai_openscale.supporting_classes.enums import *

In [3]:
CLOUD_API_KEY = "UPDATE CREDENTIALS"

In [None]:
# Get Watson OpenScale API Key
import requests
from ibm_ai_openscale.utils import get_instance_guid

WOS_GUID = get_instance_guid(api_key=CLOUD_API_KEY)
WOS_CREDENTIALS = {
    "instance_guid": WOS_GUID,
    "apikey": CLOUD_API_KEY,
    "url": "https://api.aiopenscale.cloud.ibm.com"
}

if WOS_GUID is None:
    print('Watson OpenScale GUID NOT FOUND')
else:
    print(WOS_GUID)

In [None]:
# Create instance of WOS Client
ai_client = APIClient(aios_credentials=WOS_CREDENTIALS)
ai_client.version

# 8. Create OpenScale Datamart

In [None]:
KEEP_MY_INTERNAL_POSTGRES = True

In [None]:
# Create Datamart
try:
    data_mart_details = ai_client.data_mart.get_details()
    if 'internal_database' in data_mart_details and data_mart_details['internal_database']:
        if KEEP_MY_INTERNAL_POSTGRES:
            print('Using existing internal datamart.')
        else:
            if DB_CREDENTIALS is None:
                print('No postgres credentials supplied. Using existing internal datamart')
            else:
                print('Switching to external datamart')
                ai_client.data_mart.delete(force=True)
                ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
    else:
        print('Using existing external datamart')
except:
    if DB_CREDENTIALS is None:
        print('Setting up internal datamart')
        ai_client.data_mart.setup(internal_db=True)
    else:
        print('Setting up external datamart')
        try:
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS)
        except:
            print('Setup failed, trying Db2 setup')
            ai_client.data_mart.setup(db_credentials=DB_CREDENTIALS, schema=DB_CREDENTIALS['username'])

In [None]:
data_mart_details = ai_client.data_mart.get_details()

In [None]:
data_mart_details

# 9. Bind WML to OpenScale

In [None]:
# Bind WML Instance to WOS
binding_uid = ai_client.data_mart.bindings.add('WML Binding', WatsonMachineLearningInstance(WML_CREDENTIALS))
bindings_details = ai_client.data_mart.bindings.get_details()

if binding_uid is None:
    binding_uid = [binding['metadata']['guid'] for binding in bindings_details['service_bindings'] if binding['entity']['name']=='WML Cloud Instance'][0]

ai_client.data_mart.bindings.list()

In [None]:
print(binding_uid)

In [None]:
ai_client.data_mart.bindings.list_assets(binding_uid=binding_uid)

# 10. Create new OpenScale Monitor

In [None]:
MODEL_NAME = "Scikit Learn Churn Model"

In [None]:
# Delete WOS monitor if it already exists
subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
for subscription in subscriptions_uids:
    sub_name = ai_client.data_mart.subscriptions.get_details(subscription)['entity']['asset']['name']
    if sub_name == MODEL_NAME:
        ai_client.data_mart.subscriptions.delete(subscription)
        print('Deleted existing subscription for', MODEL_NAME)

In [None]:
# Create new monitor
subscription = ai_client.data_mart.subscriptions.add(WatsonMachineLearningAsset(
    uid,
    problem_type=ProblemType.BINARY_CLASSIFICATION,
    input_data_type=InputDataType.STRUCTURED,
    prediction_column='prediction',
    label_column='prediction', 
    probability_column='probability',
    categorical_columns=[],
    feature_columns = X_test.columns.to_numpy().tolist()
))

if subscription is None:
    print('Subscription already exists; get the existing one')
    subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
    for sub in subscriptions_uids:
        if ai_client.data_mart.subscriptions.get_details(sub)['entity']['asset']['name'] == MODEL_NAME:
            subscription = ai_client.data_mart.subscriptions.get(sub)

In [4]:
# Review all subscriptions
#subscriptions_uids = ai_client.data_mart.subscriptions.get_uids()
#ai_client.data_mart.subscriptions.list()

In [None]:
subscription_details = subscription.get_details()

In [None]:
subscription.uid

# 11. Send Feedback Scoring Request

In [None]:
predictions = client.deployments.score(url, scoring_payload)

In [None]:
time.sleep(10)
subscription.payload_logging.get_records_count()

# 12. Enable Quality Monitoring

In [None]:
time.sleep(10)
subscription.quality_monitoring.enable(threshold=0.7, min_records=50)

In [None]:
feedback = X_test.join(y_test).to_numpy().tolist()

In [None]:
subscription.feedback_logging.store(feedback)

# 13. Enable Explainability

In [None]:
explain = abt
explain['prediction'] = explain['Churn']
explain.drop('Churn', axis=1, inplace=True)

In [None]:
subscription.explainability.enable(training_data=explain)