<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Lab: Build, Save and Deploy Model to IBM Watson Machine Learning (WML)</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>


This notebook walks you through these steps:
- Build a model
- Save the model in the WML repository
- Create a Deployment in WML
- Invoke the deployed model with a Rest Client to test it

### Step 1: Connect to dashDB and load CUSTOMER table

#### Important: Replace dashDB connection information for loading data from Customer and Churn tables prior to running the cells.

In [None]:
# Import the required API and instantiate Spark Context
from ingest.Connectors import Connectors
from pyspark.sql import SparkSession

sparkSession = SparkSession.builder.master("local").appName("spark session example").getOrCreate()

# IMPORTANT: Replace all values with values in your dashDB instance
customerTable = { Connectors.DASHDB.HOST              : 'dashdb-entry-yp-dal09-09.services.dal.bluemix.net',
                      Connectors.DASHDB.DATABASE          : 'BLUDB',
                      Connectors.DASHDB.USERNAME          : 'dash9737',
                      Connectors.DASHDB.PASSWORD          : 'gDO~np@2IKj4',
                      Connectors.DASHDB.SOURCE_TABLE_NAME : 'DASH9737.CUSTOMER'}

customer = sparkSession.read.format("com.ibm.spark.discover").options(**customerTable).load()
customer.printSchema()
customer.show()

### Step 2: Connect to dashDB and load CHURN table

In [None]:
# IMPORTANT: Replace all values with values in your dashDB instance
churnTable = { Connectors.DASHDB.HOST              : 'dashdb-entry-yp-dal09-09.services.dal.bluemix.net',
                      Connectors.DASHDB.DATABASE          : 'BLUDB',
                      Connectors.DASHDB.USERNAME          : 'dash9737',
                      Connectors.DASHDB.PASSWORD          : 'gDO~np@2IKj4',
                      Connectors.DASHDB.SOURCE_TABLE_NAME : 'DASH9737.CHURN'}

customer_churn = sparkSession.read.format("com.ibm.spark.discover").options(**churnTable).load()
customer_churn.printSchema()
customer_churn.show()

### Step 3: Merge Files

In [None]:
merged=customer.join(customer_churn,customer['ID']==customer_churn['ID']).select(customer['*'],customer_churn['CHURN'])

### Step 4: Rename some columns
This step is to remove spaces from columns names

In [None]:
merged = merged.withColumnRenamed("Est Income", "EstIncome").withColumnRenamed("Car Owner","CarOwner")
merged.toPandas().head()

### Step 5: Build the Spark pipeline and the Random Forest model
"Pipeline" is an API in SparkML that's used for building models.
Additional information on SparkML: https://spark.apache.org/docs/2.0.2/ml-guide.html

In [None]:
from pyspark.ml.feature import StringIndexer, VectorIndexer
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier

# Prepare string variables so that they can be used by the decision tree algorithm
stringIndexer1 = StringIndexer(inputCol='Gender', outputCol='GenderEncoded')
stringIndexer2 = StringIndexer(inputCol='Status',outputCol='StatusEncoded')
stringIndexer3 = StringIndexer(inputCol='CarOwner',outputCol='CarOwnerEncoded')
stringIndexer4 = StringIndexer(inputCol='Paymethod',outputCol='PaymethodEncoded')
stringIndexer5 = StringIndexer(inputCol='LocalBilltype',outputCol='LocalBilltypeEncoded')
stringIndexer6 = StringIndexer(inputCol='LongDistanceBilltype',outputCol='LongDistanceBilltypeEncoded')
stringIndexer7 = StringIndexer(inputCol='CHURN', outputCol='label')

# Pipelines API requires that input variables are passed in  a vector
assembler = VectorAssembler(inputCols=["GenderEncoded", "StatusEncoded", "CarOwnerEncoded", "PaymethodEncoded", "LocalBilltypeEncoded", \
                                       "LongDistanceBilltypeEncoded", "Children", "EstIncome", "Age", "LongDistance", "International", "Local",\
                                      "Dropped","Usage","RatePlan"], outputCol="features")


# instantiate the algorithm, take the default settings
rf=RandomForestClassifier(labelCol="label", featuresCol="features")

#pipeline = Pipeline(stages=[stringIndexer1, stringIndexer2, stringIndexer3, assembler, rf])
pipeline = Pipeline(stages=[stringIndexer1,stringIndexer2,stringIndexer3,stringIndexer4,stringIndexer5,stringIndexer6,stringIndexer7, assembler, rf])

In [None]:
# Split data into train and test datasets
train, test = merged.randomSplit([0.8,0.2], seed=6)

In [None]:
# Build models
model = pipeline.fit(train)

### Step 6: Score the test data set

In [None]:
results = model.transform(test)

### Step 7: Model Evaluation 

In [None]:
print 'Precision model1 = {:.2f}.'.format(results.filter(results.label == results.prediction).count() / float(results.count()))

In [None]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Evaluate model
evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="label", metricName="areaUnderROC")
print 'Area under ROC curve = {:.2f}.'.format(evaluator.evaluate(results))

### Step 8: Save Model in WML repository

In this section you will store your model in the Watson Machine Learning (WML) repository by using Python client libraries.
* <a href="https://console.ng.bluemix.net/docs/services/PredictiveModeling/index.html">WML Documentation</a>
* <a href="http://watson-ml-api.mybluemix.net/">WML API</a> 
<br/>

First, you must import client libraries.

In [None]:
from repository.mlrepositoryclient import MLRepositoryClient
from repository.mlrepositoryartifact import MLRepositoryArtifact

Put your authentication information from your instance of the Watson Machine Learning service in <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> in the next cell. You can find your information on the **Service Credentials** tab of your service instance in Bluemix.

<span style="color:red">Replace the service_path and credentials with your own information</span>

service_path=[your url]<br/>
username=[your username]<br/>
password=[your password]<br/>

In [None]:
service_path = 'https://ibm-watson-ml.mybluemix.net'
username = 'db7336ae-b258-4b0c-9bd2-57ca9d090f08'
password = 'ff129993-058d-472b-bbcb-edf40568b6c8'

Authorize the repository client:

In [None]:
ml_repository_client = MLRepositoryClient(service_path)
ml_repository_client.authorize(username, password)

Create the model artifact (abstraction layer).

<b>Tip:</b> The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the Watson Machine Learning service).


In [None]:
model_artifact = MLRepositoryArtifact(model, training_data=train, name="Predict Customer Churn")

Save pipeline and model artifacts to your Watson Machine Learning instance:

In [None]:
saved_model = ml_repository_client.models.save(model_artifact)

In [None]:
# Print the saved model properties
print "modelType: " + saved_model.meta.prop("modelType")
print "creationTime: " + str(saved_model.meta.prop("creationTime"))
print "modelVersionHref: " + saved_model.meta.prop("modelVersionHref")
print "label: " + saved_model.meta.prop("label")

### Step 9:  Generate Authorization Token for Invoking the model

In [None]:
import urllib3, requests, json

headers = urllib3.util.make_headers(basic_auth='{}:{}'.format(username, password))
url = '{}/v2/identity/token'.format(service_path)
response = requests.get(url, headers=headers)
mltoken = json.loads(response.text).get('token')
print mltoken

#### Step 9.1 Copy the generated token into your notepad

### Step 10:  Go to WML in Bluemix to create a Deployment Endpoint and Test the Deployed model

* In your <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> dashboard, click into your WML Service and click the **Launch Dashboard** button under Watson Machine Learing.
![WML Launch Dashboard](https://raw.githubusercontent.com/yfphoon/dsx_demo/master/WML_Launch_Dashboard.png)

<br/>
* You should see your deployed model in the **Models** tab


* Under *Actions*, click on the 3 ellipses and click ***Create Deployment***.  Give your deployment configuration a unique name, e.g. "Predict Customer Churn Deply", accept the defaults and click **Save**.
<br/>
<br/>
* In the *Deployments tab*, under *Actions*, click **View Details**
<br/>
<br/>
* Scoll down to **API Details**, copy the value of the **Scoring Endpoint** into your notepad.  (e.g. https://ibm-watson-ml.mybluemix.net/v2/published_models/64fd0462-3f8a-4b42-820b-59a4da9b7dc6/deployments/7d9995ed-7daf-4cfd-b40f-37cb8ab3d88f/online)



### Step 11:  Invoke the model with a REST Client, e.g. https://client.restlet.com/

In the REST client interface enter the following information:

1. Protocol:  **HTTPS**
<br/>
<br/>

2. URI: **your scoring endpoint**  (Step 10)
<br/>
<br/>
3. method: **POST**
<br/>
<br/>
4. Authorization:  **your generated token** (Step 9). Hint: Add "Basic authorization" with a dummy value of 1 in the userid field. Then replace the value with the token. 
<br/>
<br/>
5. Content Type: **application/JSON**
<br/>
<br/>
6. JSON Body:<br/>**{
  "fields": [
    "ID","Gender","Status","Children","EstIncome","CarOwner","Age","LongDistance","International","Local","Dropped","Paymethod","LocalBilltype","LongDistanceBilltype","Usage","RatePlan"
  ],
  "values": [ 
  [999,"F","M",2.0,77551.100000,"Y",33.600000,20.530000,0.000000,41.890000,1.000000,"CC","Budget","Intnl_discount",62.420000,2.000000]
  ]
} **
<br/>
<br/>
7. Click **Send*

Scroll down to the **RESPONSE** section to see the scored results

**Note:** The values in the JSON body does not include the label.


You have come to the end of this notebook


**Sidney Phoon**
<br/>
yfphoon@us.ibm.com
<br/>
April 25, 2017