<div style="background:#F5F7FA; height:100px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Want to do more?</span><span style="border: 1px solid #3d70b2;padding: 15px;float:right;margin-right:40px; color:#3d70b2; "><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
<span style="color:#5A6872;"> Try out this notebook with your free trial of IBM Watson Studio.</span>
</div>

<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>From scikit-learn Model to Cloud with Watson Machine Learning client</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
   <tr style="border: none">
       <th style="border: none"><img src="https://github.com/pmservice/wml-sample-models/raw/master/scikit-learn/hand-written-digits-recognition/images/numbers_banner-04.png" width="600" alt="Icon"> </th>
   </tr>
</table>

This notebook contains the steps and code to work with <a href="https://pypi.python.org/pypi/watson-machine-learning-client" target="_blank" rel="noopener noreferrer">watson-machine-learning-client</a> library available in the PyPI repository. This notebook introduces commands for getting data and for basic data exploration, pipeline creation, model training and evaluation, model persistance to the Watson Machine Learning repository, model deployment, and scoring.

Some familiarity with Python is helpful. This notebook uses Python 3.5, Spark 2.1, scikit-learn, and the watson-machine-learning-client package.

You will use the sample data set **sklearn.datasets.load_digits**, which is available in scikit-learn and which contains hand-written images of digits. You will use the sample data set to recognize hand-written digits.

## Learning goals

The learning goals of this notebook are:

-  Load a sample data set from scikit-learn
-  Explore data
-  Prepare data for training and evaluation
-  Create an Scikit-learn machine learning pipeline
-  Train and evaluate a model
-  Persist a model in the Watson Machine Learning repository
-  Deploy a model for online scoring using the client library
-  Score sample records using the client library


## Contents

This notebook contains the following parts:

1.	[Set up the environment](#setup)
2.	[Load and explore data](#load)
3.	[Create scikit-learn model](#model)
4.	[Persist model](#persistence)
5.	[Deploy and score in a Cloud](#scoring)
6.	[Summary and next steps](#summary)

<a id="setup"></a>
## 1. Set up the environment

Before you can use the sample code in this notebook, you must perform the following setup tasks:

- Create a <a href="https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/" target="_blank" rel="noopener noreferrer">Watson Machine Learning Service</a> instance (a free plan is offered). 
- Configure the local Python environment as follows:
  + Python 3.5
  + scikit-learn 0.17.1
  + watson-machine-learning-client

**Tip**: Run the cell below to install libraries from <a href="https://pypi.python.org/pypi/" target="_blank" rel="noopener noreferrer">PyPI</a>.

In [None]:
!pip install watson-machine-learning-client --upgrade

<a id="load"></a>
## 2. Load and explore data

In this section you load and explore the data from scikit-learn sample data sets.

In [2]:
import sklearn
from sklearn import datasets

digits = datasets.load_digits()

The sample data set is loaded and it consists of 8x8 pixel images of hand-written digits.

Let's display the first digit data and label using `data` and `target`.

In [3]:
print(digits.data[0].reshape((8, 8)))

[[ 0.  0.  5. 13.  9.  1.  0.  0.]
 [ 0.  0. 13. 15. 10. 15.  5.  0.]
 [ 0.  3. 15.  2.  0. 11.  8.  0.]
 [ 0.  4. 12.  0.  0.  8.  8.  0.]
 [ 0.  5.  8.  0.  0.  9.  8.  0.]
 [ 0.  4. 11.  0.  1. 12.  7.  0.]
 [ 0.  2. 14.  5. 10. 12.  0.  0.]
 [ 0.  0.  6. 13. 10.  0.  0.  0.]]


In [4]:
digits.target[0]

0

Now, count the number of data samples.

In [5]:
samples_count = len(digits.images)

print("Number of samples: " + str(samples_count))

Number of samples: 1797


<a id="model"></a>
## 3. Create a scikit-learn model

In this section you will learn how to:
- [Prepare the data](#prepdata)
- [Create a Scikit-learn machine learning pipeline](#createpipe)
- [Train a model](#trainmodel)

### 3.1: Prepare the data<a id="prepdata"></a>

In this subsection you split your data into train, test, and score data sets.

In [6]:
train_data = digits.data[: int(0.7*samples_count)]
train_labels = digits.target[: int(0.7*samples_count)]

test_data = digits.data[int(0.7*samples_count): int(0.9*samples_count)]
test_labels = digits.target[int(0.7*samples_count): int(0.9*samples_count)]

score_data = digits.data[int(0.9*samples_count): ]

print("Number of training records: " + str(len(train_data)))
print("Number of testing records : " + str(len(test_data)))
print("Number of scoring records : " + str(len(score_data)))

Number of training records: 1257
Number of testing records : 360
Number of scoring records : 180


As you can see, the data has been split into three data sets: 

-  The train data set: The largest group, used for training.
-  The test data set: Used for model evaluation and to test the assumptions of the model.
-  The score data set: Used for scoring in the Cloud.

### 3.2: Create a pipeline <a id="createpipe"></a> 

In this section you will create scikit-learn machine learning pipeline and then train the model.

The first step is to import the scikit-learn machine learning packages that you need in the subsequent steps.

In [7]:
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
from sklearn import svm, metrics

Remove the mean and scale to unit variance to standardize the features..

In [8]:
scaler = preprocessing.StandardScaler()

Next, define the estimators you want to use for classification. Support Vector Machines (SVM) with a radial basis function as kernel is used in the following example.

In [9]:
clf = svm.SVC(kernel='rbf')

Now, build the pipeline. A pipeline consists of a transformer and an estimator.

In [10]:
pipeline = Pipeline([('scaler', scaler), ('svc', clf)])

### 3.3: Train the model<a id="trainmodel"></a>

Now, you can use `train_data`([step 3.1](#prepdata)) and `pipeline`([step 3.2](#createpipe)) to train your SVM model.

In [11]:
model = pipeline.fit(train_data, train_labels)

Check the quality of your model. 
To evaluate the model, use **test data**.

In [12]:
predicted = model.predict(test_data)

print("Evaluation report: \n\n%s" % metrics.classification_report(test_labels, predicted))

Evaluation report: 

             precision    recall  f1-score   support

          0       1.00      0.97      0.99        37
          1       0.97      0.97      0.97        34
          2       1.00      0.97      0.99        36
          3       1.00      0.94      0.97        35
          4       0.78      0.97      0.87        37
          5       0.97      0.97      0.97        38
          6       0.97      0.86      0.91        36
          7       0.92      0.97      0.94        35
          8       0.91      0.89      0.90        35
          9       0.97      0.92      0.94        37

avg / total       0.95      0.94      0.95       360



You can now tune your model to achieve better accuracy. For simplicity, the tuning section is omitted in this example.

<a id="persistence"></a>
## 4. Persist model

In this section you will learn how to use the common Python client to store your model in the Watson Machine Learning repository.

- [Work with your instance](#workwithinstance)
- [Publish the model](#pubmodel)
- [Get model details](#getmodel)
- [Load the model](#loadmodel)
- [Delete the model](#deletemodel)

**Tip**: You can find documentation about the watson-machine-learning-client package <a href="https://wml-api-pyclient.mybluemix.net" target="_blank" rel="noopener noreferrer">here</a>.

### 4.1: Work with your instance<a id="workwithinstance"></a>

First, import the client libraries.

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Authenticate to your Watson Machine Learning service instance on IBM Cloud.

**Tip**: Authentication information (your credentials) can be found in the [Service Credentials](https://console.bluemix.net/docs/services/service_credentials.html#service_credentials) tab of the service instance that you created on IBM Cloud. <BR>If you cannot see the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

**Action**: Enter your Watson Machine Learning service instance credentials here.

In [16]:
wml_credentials={
  "url": "https://ibm-watson-ml.mybluemix.net",
  "access_key": "***",
  "username": "***",
  "password": "***",
  "instance_id": "***"
}

Run the cell below to create the API client.

In [17]:
client = WatsonMachineLearningAPIClient(wml_credentials)

Get the instance details.

In [None]:
#Output has been removed for this cell for Watson Studio sharing.

import json

instance_details = client.service_instance.get_details()
print(json.dumps(instance_details, indent=2))

### 4.2: Publish the model<a id="pubmodel"></a>

Publish the model in the Watson Machine Learning repository in the IBM Cloud.

Define the model name, the author name and author email.

In [21]:
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "IBM", 
               client.repository.ModelMetaNames.AUTHOR_EMAIL: "ibm@ibm.com",
               client.repository.ModelMetaNames.NAME: "LOCALLY created Digits prediction model"}

In [22]:
published_model = client.repository.store_model(model=model, meta_props=model_props, \
                                                training_data=train_data, training_target=train_labels)

### 4.3: Get model details<a id="getmodel"></a>

Run the code in the cell below to view information about your published model.

In [23]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)

print(json.dumps(model_details, indent=2))

{
  "metadata": {
    "created_at": "2018-03-06T12:27:21.070Z",
    "url": "https://ibm-watson-ml.mybluemix.net/v3/wml_instances/8f11cfac-57ca-4a0c-9815-7abbd7a72491/published_models/6ebd93c7-b902-4e04-9827-618b3b21eccc",
    "modified_at": "2018-03-06T12:27:21.133Z",
    "guid": "6ebd93c7-b902-4e04-9827-618b3b21eccc"
  },
  "entity": {
    "latest_version": {
      "created_at": "2018-03-06T12:27:21.133Z",
      "url": "https://ibm-watson-ml.mybluemix.net/v3/ml_assets/models/6ebd93c7-b902-4e04-9827-618b3b21eccc/versions/6fcede0b-4c32-4fa3-9bb7-4d740ee7b685",
      "guid": "6fcede0b-4c32-4fa3-9bb7-4d740ee7b685"
    },
    "runtime_environment": "python-3.5",
    "name": "LOCALLY created Digits prediction model",
    "learning_configuration_url": "https://ibm-watson-ml.mybluemix.net/v3/wml_instances/8f11cfac-57ca-4a0c-9815-7abbd7a72491/published_models/6ebd93c7-b902-4e04-9827-618b3b21eccc/learning_configuration",
    "model_type": "scikit-learn-0.19",
    "input_data_schema": {
      "f

List all the models. Your model should be displayed in the list.

In [24]:
models_details = client.repository.list_models()

------------------------------------  ---------------------------------------  ------------------------  -----------------  -----
GUID                                  NAME                                     CREATED                   FRAMEWORK          TYPE
c19ea489-1674-4883-bcbd-68f0089ef7dc  Customer Churn Model                     2018-02-19T15:01:13.485Z  mllib-2.0          model
d9083d24-67f6-418e-869d-4fe344f7d9e5  Customer Churn Model                     2018-02-20T07:20:03.989Z  mllib-2.1          model
7bce0af2-3d3a-4163-9bce-f5303c729d07  Customer Churn Model                     2018-02-20T09:13:27.542Z  mllib-2.1          model
1698755c-ae01-4ab9-80b3-4f5d4d92bcc3  VIOLATIONS_SCALA211_SPARK20              2018-02-20T13:25:01.431Z  mllib-2.1          model
6ebd93c7-b902-4e04-9827-618b3b21eccc  LOCALLY created Digits prediction model  2018-03-06T12:27:21.070Z  scikit-learn-0.19  model
------------------------------------  ---------------------------------------  ------------

### 4.4: Load a model<a id="loadmodel"></a>

Run the code in the cell below to reload a saved model from a specified instance of Watson Machine Learning.

In [25]:
loaded_model = client.repository.load(published_model_uid)

You can run a test prediction to verify that that model has been loaded correctly.

In [26]:
test_predictions = loaded_model.predict(test_data[:10])

In [27]:
print(test_predictions)

[4 0 5 3 6 9 6 4 7 5]


As you can see you are able to make predictions, which means the model reloaded successfully. You now know how to save the model to, and load the model from, the Watson Machine Learning repository.

### 4.5: Delete the model<a id="deletemodel"></a>

The code in the cell below deletes the published model from the Watson Machine Learning repository. The code is commented out at this stage because you need the model later for deployment.

In [None]:
# client.repository.delete(published_model_uid)

<a id="scoring"></a>
## 5. Deploy and score in a Cloud

In this section you will learn how to use the watson-machine-learning-client to create online scoring and to score a new data record.

- [Create model deployment](#modeldeploy)
- [Get deployments](#getdeploy)
- [Run a scoring request](#score)
- [Delete the deployment](#deletedeploy)
- [Delete the model](#deletemodel)


### 5.1: Create the model deployment<a id="modeldeploy"></a>

Create an online deployment for the published model.

In [28]:
created_deployment = client.deployments.create(published_model_uid, "Deployment of locally created scikit model")

**Note**: Here you use the deployment url saved in published_model object. In next section you will retrieve the deployment url from the Watson Mchine Learning instance.

Now, print an online scoring endpoint. 

In [29]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)

print(scoring_endpoint)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/8f11cfac-57ca-4a0c-9815-7abbd7a72491/published_models/6ebd93c7-b902-4e04-9827-618b3b21eccc/deployments/2d8e45a3-a1e3-4395-b40e-0323e943bab4/online


### 5.2: Get deployments<a id="getdeploy"></a>
    
Print information about your deployments.

In [None]:
#Output is hidden for Watson Studio Sharing
deployments = client.deployments.get_details()

print(json.dumps(deployments, indent=2))

You can get the deployment_url by parsing the deployment details for last deployed model.

In [31]:
deployment_url = client.deployments.get_url(created_deployment)

print(deployment_url)

https://ibm-watson-ml.mybluemix.net/v3/wml_instances/8f11cfac-57ca-4a0c-9815-7abbd7a72491/published_models/6ebd93c7-b902-4e04-9827-618b3b21eccc/deployments/2d8e45a3-a1e3-4395-b40e-0323e943bab4


### 5.3: Run a scoring request<a id="score">

Run a test scoring request against the deployed model.

**Action**: Prepare the scoring payload with records to score.

In [32]:
scoring_payload = {"values": [list(score_data[0]), list(score_data[1])]}

Use the ``client.deployments.score()`` method to run scoring, and print the output.

In [33]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)

In [34]:
print(json.dumps(predictions, indent=2))

{
  "fields": [
    "prediction"
  ],
  "values": [
    [
      5
    ],
    [
      2
    ]
  ]
}


### 5.4: Delete the deployment<a id="deletedeploy"></a>

Use the following method to delete the deployment.

In [35]:
client.deployments.delete(client.deployments.get_uid(created_deployment))

Run ``list`` to verify that the deployment is not displayed in the list and has been deleted.

In [36]:
client.deployments.list()

------------------------------------  -------------------------  -----  ------------------------  ---------
GUID                                  NAME                       TYPE   CREATED                   FRAMEWORK
bb3cb635-7d08-4115-9592-113ff94aabde  Customer Churn Prediction  batch  2018-02-19T15:11:40.938Z  mllib-2.0
------------------------------------  -------------------------  -----  ------------------------  ---------


### 5.5: Delete the model<a id="deletemodel"></a>

Run the code in the cell below to delete your model.

In [37]:
client.repository.delete(published_model_uid)

Run ``list`` to verify that the model has been deleted. 

In [38]:
client.repository.list_models()

------------------------------------  ---------------------------  ------------------------  ---------  -----
GUID                                  NAME                         CREATED                   FRAMEWORK  TYPE
c19ea489-1674-4883-bcbd-68f0089ef7dc  Customer Churn Model         2018-02-19T15:01:13.485Z  mllib-2.0  model
d9083d24-67f6-418e-869d-4fe344f7d9e5  Customer Churn Model         2018-02-20T07:20:03.989Z  mllib-2.1  model
7bce0af2-3d3a-4163-9bce-f5303c729d07  Customer Churn Model         2018-02-20T09:13:27.542Z  mllib-2.1  model
1698755c-ae01-4ab9-80b3-4f5d4d92bcc3  VIOLATIONS_SCALA211_SPARK20  2018-02-20T13:25:01.431Z  mllib-2.1  model
------------------------------------  ---------------------------  ------------------------  ---------  -----


<a id="summary"></a>
## 6. Summary and next steps     

You have successfully completed this notebook! 
You learned how to use scikit-learn machine learning as well as Watson Machine Learning for model creation and deployment. 
Check out our [Online Documentation](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html) for more samples, tutorials, documentation, how-tos, and blog posts. 

### Authors

**Wojciech Sobala**, Data Scientist at IBM developing enterprise-level applications that substantially increases clients' ability to turn data into actionable knowledge.

Copyright © 2017, 2018 IBM. This notebook and its source code are released under the terms of the MIT License.