<html>
<body>
    <table style="border: none" align="center">
        <tr style="border: none">
            <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="45" width="45"></th>
            <th style="border: none"><font face="verdana" size="6" color="black"><b>Watson Machine Learning</b></font></th>
        </tr>
    </table>
</body>

This notebook contains steps and code to train a Scikit-Learn model that uses a custom defined transformer and use it with Watson Machine Learning service. Once the model is trained, this notebook contains steps to persist the model and custom defined transformer to Watson Machine Learning Repository, deploy and score it using Watson Machine Learning python client.

In this notebook, we use GNFUV dataset that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens, to train a Scikit-Learn model for predicting the temperature. 

Some familiarity with Python is helpful. This notebook uses Python-3.5, scikit-learn-0.19.1.

## Learning goals

The learning goals of this notebook are:

- Train a model with custom defined transformer
- Persist the custom defined transformer and the model in Watson Machine Learning repository.
- Deploy the model using Watson Machine Learning Service
- Perform predictions using the deployed model

## Contents
1.	[Set up the environment](#setup)
2.	[Install python library containing custom transformer implementation](#install_lib)
3.  [Prepare training data](#load)
3.	[Train the scikit-learn model](#train)
4.	[Save the model and library to WML Repository](#persistence)
5.	[Deploy and score data in the IBM Cloud](#deploy)
6.	[Summary and next steps](#summary)


<a id="setup"></a>
## 1. Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Create a [Watson Machine Learning (WML) Service](https://console.ng.bluemix.net/catalog/services/ibm-watson-machine-learning/) instance (a free plan is offered and information about how to create the instance is [here](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html))

- Configure your local python environment:
  + python 3.5
  + scikit-learn 0.19.1
  + watson-machine-learning-client, version: 1.0.293 or above

**Tip**: Run the cell below to install libraries from <a href="https://pypi.python.org/pypi" target="_blank" rel="noopener no referrer">PyPI</a>.

In [None]:
!rm -rf $PIP_BUILD/watson-machine-learning-client

In [None]:
!pip install watson-machine-learning-client --upgrade

<a id="install_lib"></a>

## 2. Install the library containing custom transformer

The library - `linalgnorm-0.1.zip` is a python distributable package that contains the implementation of a user defined Scikit-Learn transformer - `LNormalizer` . <br>
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains implementation of the transformer. 


In this section, we download the library and install it in the current notebook environment. 

In [None]:
cd ~/data/libs

In [None]:
!wget https://github.com/pmservice/wml-sample-models/raw/master/scikit-learn/custom-transformer-temperature-prediction/libraries/linalgnorm-0.1.zip --output-document=linalgnorm-0.1.zip

Install the downloaded library using `pip` command

In [None]:
ls -ltr

In [None]:
!pip install linalgnorm-0.1.zip

<a id="load"></a>

## 3. Download training dataset and prepare training data

Download the data from UCI repository - https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip

In [None]:
!rm -rf dataset
!mkdir dataset

In [None]:
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip

In [None]:
cd dataset

In [None]:
!unzip gnfuv_dataset.zip

Create pandas datafame based on the downloaded dataset

In [None]:
import json
import pandas as pd
import numpy as np
import os
from datetime import datetime
from json import JSONDecodeError

In [None]:
## Get all the entries
home_dir = '.'
pi_dirs = os.listdir(home_dir)

data_list = []
base_time = None
columns = None

for pi_dir in pi_dirs:
    if 'pi' not in pi_dir:
        continue
    curr_dir = os.path.join(home_dir, pi_dir)
    data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0])
    with open(data_file, 'r') as f:
        line = f.readline().strip().replace("'", '"')
        while line != '':
            try:
                input_json = json.loads(line)
                sensor_datetime = datetime.fromtimestamp(input_json['time'])
                if base_time is None:
                    base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0)
                input_json['time'] = (sensor_datetime - base_time).seconds
                data_list.append(list(input_json.values()))
                if columns is None:
                    columns = list(input_json.keys())
            except JSONDecodeError as je:
                pass
            line = f.readline().strip().replace("'", '"')

data_df = pd.DataFrame(data_list, columns=columns)

In [None]:
data_df.head()

Create training and test datasets from the downloaded GNFUV-USV dataset.

In [None]:
from sklearn.preprocessing import MinMaxScaler
from sklearn.cross_validation import train_test_split

Y = data_df['temperature']
X = data_df.drop('temperature', axis=1)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)

<a id="train"></a>

## 4. Train a model

In this section, you will use the custom transformer as a stage in the Scikit-Learn `Pipeline` and train a model.

#### Import the custom transformer 
Here, import the custom transformer that has been defined in `linalgnorm-0.2.zip` and create an instance of it that will inturn be used as stage in `sklearn.Pipeline`

In [None]:
from linalg_norm.sklearn_transformers import LNormalizer

In [None]:
lnorm_transf = LNormalizer()

Import other objects required to train a model

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression

Now, you can create a `Pipeline` with user defined transformer as one of the stages and train the model

In [None]:

skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())])
skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)


In [None]:
y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values)
rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5
print('RMSE: {}'.format(rmse))

<a id="persistence"></a>

## 5. Persist the model and custom library to WML Repository

In this section, using `watson_machine_learning_client`, you will ...
- save the library `linalgnorm-0.1.zip` in WML Repository by creating a Library resource
- create a Runtime resource and bind the Library resource to it. This Runtime resource will be used to configure the online deployment runtime environment for a model 
- bind Runtime resource to the model and save the model to WML Repository

In [None]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

Authenticate to the Watson Machine Learning service on IBM Cloud.

**Tip**: Authentication information (your credentials) can be found in the [Service Credentials](https://console.bluemix.net/docs/services/service_credentials.html#service_credentials) tab of the service instance that you created on IBM Cloud. <BR>If you cannot see the **instance_id** field in **Service Credentials**, click **New credential (+)** to generate new authentication information. 

**Action**: Enter your Watson Machine Learning service instance credentials here.


In [None]:

wml_credentials = {
    "apikey"    : "value",
    "instance_id" : "instance_id",
    "url"    : "url"
}



**Create WML API client**

In [None]:
client = WatsonMachineLearningAPIClient(wml_credentials)

### 5.1 Save Library in WML Repository

In [None]:
cd ~/data/libs

Define the meta data required to create Library resource and save the library. <br>

The value for `client.runtimes.LibraryMetaNames.FILEPATH` metadata contains the library file name that must be saved to WML Repository

In [None]:
lib_meta = {
        client.runtimes.LibraryMetaNames.NAME: "K_Linag_norm_skl",
        client.runtimes.LibraryMetaNames.DESCRIPTION: "K_Linag_norm_skl",
        client.runtimes.LibraryMetaNames.FILEPATH: "linalgnorm-0.1.zip",
        client.runtimes.LibraryMetaNames.VERSION: "1.0",
        client.runtimes.LibraryMetaNames.PLATFORM: {"name": "python", "versions": ["3.5"]}
    }
custom_library_details = client.runtimes.store_library(lib_meta)
custom_library_uid = client.runtimes.get_library_uid(custom_library_details)
print("Custom Library UID: " + custom_library_uid)

Display the details of the Library resource that was created in the above cell

In [None]:
custom_library_details

### 5.2 Create Runtime and bind library to runtime

Define the meta data required to create Runtimes resource and bind the library. This Runtime resource will be used to configure the online deployment runtime environment for a model.

The `client.runtimes.ConfigurationMetaNames.LIBRARIES_UIDS` metadata property is used to specify the list of Library resource GUIDs that needs to be part of the runtime.

In [None]:
runtimes_meta = {
    client.runtimes.ConfigurationMetaNames.NAME: "K_linalg_gnfuv1", 
    client.runtimes.ConfigurationMetaNames.DESCRIPTION: "skl linalg gnfuv model", 
    client.runtimes.ConfigurationMetaNames.PLATFORM: { "name": "python", "version": "3.5" }, 
    client.runtimes.ConfigurationMetaNames.LIBRARIES_UIDS: [custom_library_uid]
}

**Alternate method:** Create library and runtime together by specifying the metadata property below

`client.runtimes.ConfigurationMetaNames.LIBRARIES_DEFINITIONS: [
    LibraryDefinition("my_lib_1", "1.0", "/home/user/my_lib_1.zip", description="t", platform={"name": "python", "versions": ["3.5"]}), 
    LibraryDefinition("my_lib_2", "1.1", "/home/user/my_lib_2.zip") ]`

Create a Runtime resource based on the metadata specified above and display the details

In [None]:
runtime_details = client.runtimes.store(runtimes_meta)
runtime_details

APIs to retrieve URL and GUID information about a spepcific Runtime

In [None]:
runtime_url = client.runtimes.get_url(runtime_details)
runtime_uid = client.runtimes.get_uid(runtime_details)
print("Runtimes URL: " + runtime_url)
print("Runtimes UID: " + runtime_uid)

### 5.3 Save the model

Define the metadata to save the trained model to WML Repository along with the information about the Runtime resource required for the model. 

The `client.repository.ModelMetaNames.RUNTIME_UID` metadata property is used to specify the GUID of the Runtime resource that needs to be associated with the model 

In [None]:
model_props = {client.repository.ModelMetaNames.NAME: "cust norm linalg_norm gnfuv1",
               client.repository.ModelMetaNames.RUNTIME_UID: runtime_uid
              }

Save the model to the WML Repository and display its saved metadata. 

In [None]:
published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)

In [None]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

<a id="deploy"></a>

## 6 Deploy and Score

In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use WML client to perform these tasks.

### 6.1 Deploy the model

In [None]:
created_deployment = client.deployments.create(published_model_uid, name="k_linalg_gnfuv1_skl")


### 6.2 Predict using the deployed model

Get the URL to use for prediction. The prediction URL is obtained from the deployment details of the deployment created above.

In [None]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
print(scoring_endpoint)

Prepare the payload for prediction. The payload contains the input records for which predictions has to be performed.

In [None]:
scoring_payload = {'fields': ["time", "humidity"], 
                   'values': [[79863, 47]]}

Execute the method to perform online predictions and display the prediction results

In [None]:
predictions = client.deployments.score(scoring_endpoint, scoring_payload)

In [None]:
print(json.dumps(predictions, indent=2))


### 6.3 Delete the deployments

Use the following method to delete the deployment 

In [None]:
client.deployments.delete(client.deployments.get_uid(created_deployment))

<a id="summary"></a>

### 7. Summary

You successfully completed this notebook! 
 
You learned how to use a scikit-learn model with custom transformer in Watson Machine Learning service to deploy and score.

Check out our [Online Documentation](https://dataplatform.ibm.com/docs/content/analyze-data/wml-setup.html) for more samples, tutorials, documentation, how-tos, and blog posts. 

## Author

**Krishnamurthy Arthanarisamy**, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.