<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Predicting customer churn with IBM Watson Machine Learning</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
  <tr style="border: none">
       <th style="border: none"><img src="https://github.com/pmservice/wml-sample-models/blob/master/spark/customer-satisfaction-prediction/images/users_banner_2-03.png?raw=true" width="600" alt="Icon"> </th>
   </tr>
</table>

This notebook contains steps and code to develop a predictive model, and start scoring new data. This notebook introduces commands for getting data and for basic data cleaning and exploration, pipeline creation, model training, model persistance to Watson Machine Learning repository, model deployment, and scoring.

Some familiarity with Python is helpful. This notebook uses Python 3.6

You will use a data set, **Telco Customer Churn**, which details anonymous customers data of telecommunication company. Use the details of this data set to predict customer churn which is very critical to business as it's easier to retain existing customers rather than acquiring new ones.

## Learning goals

The learning goals of this notebook are:

-  Load a CSV file into a DataFrame.
-  Explore data.
-  Prepare data for training and evaluation.
-  Train and evaluate a model.
-  Persist a pipeline and model in Watson Machine Learning repository.
-  Deploy a model for online scoring using Wastson Machine Learning API.


## Contents

This notebook contains the following parts:

1.	[Setup](#setup)
2.	[Load and explore data](#load)
3.	[Create ml model](#model)
4.	[Persist model](#persistence)
5.	[Predict locally and visualize](#visualization)
6.	[Deploy and score in a Cloud](#scoring)
7.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Setup

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a [Watson Machine Learning Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) instance required for model storing and deployment (a free plan is offered).
- Create a [Cloud Object Storage](https://console.bluemix.net/catalog/infrastructure/cloud-object-storage) instance required for storing scoring data (a free plan is offered).




In [59]:
## Insertar datos aquí 


<a id="load"></a>
## 2. Load and explore data

In this section you will load the data as a DataFrame and perform a basic exploration. 

In [9]:
df_data.head()



Unnamed: 0,ID,CHURN,Gender,Status,Children,Est Income,Car Owner,Age,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,Usage,RatePlan
0,1,T,F,S,1.0,38000.0,N,24.393333,23.56,0.0,206.08,0.0,CC,Budget,Intnl_discount,229.64,3.0
1,6,F,M,M,2.0,29616.0,N,49.426667,29.78,0.0,45.5,0.0,CH,FreeLocal,Standard,75.29,2.0
2,8,F,M,M,0.0,19732.8,N,50.673333,24.81,0.0,22.44,0.0,CC,FreeLocal,Standard,47.25,3.0
3,11,F,M,S,2.0,96.33,N,56.473333,26.13,0.0,32.88,1.0,CC,Budget,Standard,59.01,1.0
4,14,F,F,M,2.0,52004.8,N,25.14,5.03,0.0,23.11,0.0,CH,Budget,Intnl_discount,28.14,1.0


As you can see, the data contains 21 fields. "Churn" field is the one we would like to predict (label).

In [10]:
print("Total number of records: " + str(df_data.count()))

Total number of records: ID                      1799
CHURN                   1799
Gender                  1799
Status                  1799
Children                1799
Est Income              1799
Car Owner               1799
Age                     1799
LongDistance            1799
International           1799
Local                   1799
Dropped                 1799
Paymethod               1799
LocalBilltype           1799
LongDistanceBilltype    1799
Usage                   1799
RatePlan                1799
dtype: int64


<a id="model"></a>
## 3. Create a machine learning model

In this section you will learn how to prepare data, create a scikit-learn machine learning pipeline, and train a model.

In [11]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.tree import DecisionTreeClassifier
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split

### 3.1: Prepare the pipeline

Prepare pipeline to process categorical data and final processing pipeline
We may want to add another preprocessing pipeline to handle numerical null values

In [12]:

categorical_features = ['Gender','Status','Car Owner','Paymethod','LocalBilltype','LongDistanceBilltype']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', DecisionTreeClassifier())])

Split into train and test

In [17]:

X = df_data.drop('CHURN', axis=1)
y = df_data['CHURN']

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)


As you can see our data has been successfully split into three datasets: 

-  The train data set, which is the largest group, is used for training.
-  The test data set will be used for model evaluation and is used to test the assumptions of the model.
-  The predict data set will be used for prediction.

### 3.2: Create pipeline and train a model

We train the model

In [18]:
model = clf.fit(X_train, y_train)
res_predict = model.predict(X_test)
print("model score: %.3f" % clf.score(X_test, y_test))
print(classification_report(y_test, res_predict, target_names=["False", "True"]))

model score: 0.756
              precision    recall  f1-score   support

       False       0.79      0.80      0.79       213
        True       0.70      0.69      0.70       147

   micro avg       0.76      0.76      0.76       360
   macro avg       0.75      0.75      0.75       360
weighted avg       0.76      0.76      0.76       360



<a id="persistence"></a>
## 4. Persist model

In this section you will learn how to store your pipeline and model in Watson Machine Learning repository by using python client libraries.

First, you must import Watson Machine Learning client libraries.

In [19]:
!rm -rf $PIP_BUILD/watson-machine-learning-client

In [20]:
!pip install watson-machine-learning-client --upgrade

Collecting watson-machine-learning-client
[?25l  Downloading https://files.pythonhosted.org/packages/12/67/66db412f00d19bfdc5725078bff373787513bfb14320f2804b9db3abb53a/watson_machine_learning_client-1.0.378-py3-none-any.whl (536kB)
[K     |████████████████████████████████| 542kB 8.1MB/s eta 0:00:01
Installing collected packages: watson-machine-learning-client
  Found existing installation: watson-machine-learning-client 1.0.376
    Uninstalling watson-machine-learning-client-1.0.376:
      Successfully uninstalled watson-machine-learning-client-1.0.376
Successfully installed watson-machine-learning-client-1.0.378


In [21]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Reason: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'Server': 'nginx', 'Date': 'Thu, 02 Apr 2020 18:36:12 GMT', 'Content-Type': 'application/json', 'Content-Length': '172', 'Connection': 'keep-alive', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1', 'Pragma': 'no-cache', 'Cache-Control': 'private, no-cache, no-store, must-revalidate', 'X-WML-User-Client': 'PythonClient', 'x-global-transaction-id': '23d1e645999i92lfhede8fkba9ku112925az', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'})
HTTP response body: {"trace":"1tbredbipzq5d","errors":[{"code":"unhandled_exception_with_message","message":"Unhandled exception of type TenantMismatchForRecord with message: None provided"}]}



Authenticate to Watson Machine Learning service on Bluemix.

**Action**: Put credentials from your instance of Watson Machine Learning service in Bluemix here.</div>

In [22]:
wml_credentials = {
  "apikey": "I3yMuUHSXcm5iOxTVHeC5eBPczLOESfMKR8T_1Tw5JHV",
  "iam_apikey_description": "Auto-generated for key 78322aaf-9975-4524-9dd5-4477683e447e",
  "iam_apikey_name": "Service credentials-1",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/290f8f438c15a26b2b129419c1c1b952::serviceid:ServiceId-93e75cc2-9264-43b0-af52-3d997b368a34",
  "instance_id": "cb024e31-0448-4b71-8213-4046aa0a1198",
  "url": "https://us-south.ml.cloud.ibm.com"
}

In [23]:
client = WatsonMachineLearningAPIClient(wml_credentials)

### 4.1: Save pipeline and model

In this subsection you will learn how to save pipeline and model artifacts to your Watson Machine Learning instance.

In [24]:
# Checking the sklearn version we are using
import sklearn
sklearn.__version__

'0.20.3'

In [27]:
saved_model = client.repository.store_model(model=model, meta_props={'name':'telco churn prediction model'})

Get saved model metadata from Watson Machine Learning.

In [28]:
published_model_ID = client.repository.get_model_uid(saved_model)

print("Model Id: " + str(published_model_ID))

Model Id: 63887535-c01f-495e-9141-e9ea65f2dbc0


**Model Id** can be used to retrive latest model version from Watson Machine Learning instance.

## Get ML instance details

In [29]:
import json

instance_details = client.service_instance.get_details()

print(json.dumps(instance_details, indent=2))

{
  "metadata": {
    "guid": "cb024e31-0448-4b71-8213-4046aa0a1198",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198",
    "created_at": "2020-04-02T18:04:36.783Z",
    "modified_at": "2020-04-02T18:37:52.095Z"
  },
  "entity": {
    "source": "Bluemix",
    "published_models": {
      "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198/published_models"
    },
    "usage": {
      "expiration_date": "2020-05-01T00:00:00.000Z",
      "computation_time": {
        "limit": 180000,
        "current": 0
      },
      "gpu_count_k80": {
        "limit": 8,
        "current": 0
      },
      "model_count": {
        "limit": 200,
        "current": 1
      },
      "gpu_count_p100": {
        "limit": 0,
        "current": 0
      },
      "prediction_count": {
        "limit": 5000,
        "current": 0
      },
      "capacity_units": {
        "limit": 180000000,
        "current": 0
      

#### Get published_models details

In [30]:
published_model_uid = client.repository.get_model_uid(saved_model)
model_details = client.repository.get_details(published_model_ID)
print(json.dumps(model_details, indent=2))

{
  "metadata": {
    "guid": "63887535-c01f-495e-9141-e9ea65f2dbc0",
    "url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198/published_models/63887535-c01f-495e-9141-e9ea65f2dbc0",
    "created_at": "2020-04-02T18:37:51.929Z",
    "modified_at": "2020-04-02T18:37:52.000Z"
  },
  "entity": {
    "runtime_environment": "python-3.6",
    "learning_configuration_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198/published_models/63887535-c01f-495e-9141-e9ea65f2dbc0/learning_configuration",
    "name": "telco churn prediction model",
    "learning_iterations_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198/published_models/63887535-c01f-495e-9141-e9ea65f2dbc0/learning_iterations",
    "feedback_url": "https://us-south.ml.cloud.ibm.com/v3/wml_instances/cb024e31-0448-4b71-8213-4046aa0a1198/published_models/63887535-c01f-495e-9141-e9ea65f2dbc0/feedback",
    "l

####  Create deployment

In [44]:
deployment_name   = "Web Service"
deployment_desc  = "Online deployment of Python customer churn"
deployment       = client.deployments.create( published_model_ID, deployment_name, deployment_desc )
scoring_endpoint = client.deployments.get_scoring_url( deployment )




#######################################################################################

Synchronous deployment creation for uid: '63887535-c01f-495e-9141-e9ea65f2dbc0' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='847856a3-56be-44da-8c06-b4ef93d2cab5'
------------------------------------------------------------------------------------------------




In [45]:
# See that it was deployed
client.deployments.list()

------------------------------------  -----------  ------  --------------  ------------------------  -----------------  -------------
GUID                                  NAME         TYPE    STATE           CREATED                   FRAMEWORK          ARTIFACT TYPE
847856a3-56be-44da-8c06-b4ef93d2cab5  Web Service  online  DEPLOY_SUCCESS  2020-04-02T18:43:31.966Z  scikit-learn-0.20  model
fadcf709-8594-492e-bceb-bd6de2e21b45  Web Service  online  DEPLOY_SUCCESS  2020-04-02T18:38:12.016Z  scikit-learn-0.20  model
------------------------------------  -----------  ------  --------------  ------------------------  -----------------  -------------


## Accessing deployed model

In [46]:
scoring_payload = {'fields': ['ID','Gender','Status','Children','Est Income','Car Owner',
                              'Age','LongDistance','International','Local','Dropped',
                              'Paymethod','LocalBilltype','LongDistanceBilltype',
                              'Usage','RatePlan'],
                   'values': [[1,0,0,1.0,38000.0,'N',24.393333,23.56,0.0,206.08,0.0,'CC','Budget','Intnl_discount',229.64,3.0],
                              [6,1,'M',2.0,29616.0,'N',49.426667,29.78,0.0,45.5,0.0,  'CH','FreeLocal','Standard',75.29,2.0]
                             ]} 
deploy_uid = client.deployments.get_uid(deployment)
predictions = client.deployments.score(scoring_endpoint, scoring_payload)
predictions

{'fields': ['prediction', 'probability'],
 'values': [['T', [0.1935483870967742, 0.8064516129032258]],
  ['T', [0.2857142857142857, 0.7142857142857143]]]}

In [47]:
for prediction in predictions['values'] :
    print("Prediction: {}, probability: {}".format(prediction[0],prediction[1]) )

Prediction: T, probability: [0.1935483870967742, 0.8064516129032258]
Prediction: T, probability: [0.2857142857142857, 0.7142857142857143]


## Cleanup
Remove the two models that we deployed in this notebook and the model we saved. This way, we reset the environment to where it was before we stated executing the notebook.

In [48]:
# list the existing deployments to see what we currently have
client.deployments.list()


------------------------------------  -----------  ------  --------------  ------------------------  -----------------  -------------
GUID                                  NAME         TYPE    STATE           CREATED                   FRAMEWORK          ARTIFACT TYPE
847856a3-56be-44da-8c06-b4ef93d2cab5  Web Service  online  DEPLOY_SUCCESS  2020-04-02T18:43:31.966Z  scikit-learn-0.20  model
fadcf709-8594-492e-bceb-bd6de2e21b45  Web Service  online  DEPLOY_SUCCESS  2020-04-02T18:38:12.016Z  scikit-learn-0.20  model
------------------------------------  -----------  ------  --------------  ------------------------  -----------------  -------------


In [53]:
# Retrieve the deployment details we want to remove
deployments_details = client.deployments.get_details()
model_deployed_details = next(item for item in deployments_details['resources']
                    if item['entity']["name"] == "Web Service")


client.deployments.delete(client.deployments.get_uid(model_deployed_details))

# See if the deployments were removed
client.deployments.list()

----  ----  ----  -----  -------  ---------  -------------
GUID  NAME  TYPE  STATE  CREATED  FRAMEWORK  ARTIFACT TYPE
----  ----  ----  -----  -------  ---------  -------------


In [54]:
# list the models currently in our WML service
client.repository.list_models()

------------------------------------  ----------------------------  ------------------------  -----------------
GUID                                  NAME                          CREATED                   FRAMEWORK
63887535-c01f-495e-9141-e9ea65f2dbc0  telco churn prediction model  2020-04-02T18:37:51.929Z  scikit-learn-0.20
------------------------------------  ----------------------------  ------------------------  -----------------


In [55]:
# We still have the saved_model variable that includes the model details.
client.repository.delete(saved_model['metadata']['guid'])
client.repository.list_models()

----  ----  -------  ---------
GUID  NAME  CREATED  FRAMEWORK
----  ----  -------  ---------
