# Import a scikit-learn model into IBM Watson Machine Learning

Importing a model into Watson Machine Learning means to store a trained model in your Watson Machine Learning repository and then deploy the stored model.  This notebook demonstrates importing a scikit-learn Pipeline object.

See also: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-import-scikit-learn.html" target="_blank" rel="noopener noreferrer">Importing a scikit-learn model</a>

This notebook runs on Spark Python 3.5.


### Notebook sections

[Step 0: Build, train, and save a model](#step0)

[Step 1: Store the model in your Watson Machine Learning repository](#step1)

[Step 2: Deploy the stored modelin your Watson Machine Learning service](#step2)

## <a id="step0"></a> Step 0: Build, train, and save a model

**About the sample model**

The sample model built here is a logistic regression model for predicting whether or not a customer will purchase a tent from a fictional outdoor equipment store, based on the customer charateristics.

The data used to train the model is the "GoSales.csv" training data in the IBM Watson Studio community: <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/aa07a773f71cf1172a349f33e2028e4e" target="_blank" rel="noopener noreferrer">GoSales sample data</a>.

### Get and prepare training data

In [1]:
!pip install wget # Needed to download sample training data

[33mYou are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
# Download sample training data to notebook working directory
import wget
training_data_url = 'https://dataplatform.cloud.ibm.com/data/exchange-api/v1/entries/aa07a773f71cf1172a349f33e2028e4e/data?accessKey=e98b7315f84e5448aa94c633ca66ea83'
filename = wget.download( training_data_url )
print( filename )

GoSales (1).csv


In [3]:
# Read sample data into a pandas DataFrame
import pandas as pd
df = pd.read_csv( filename )
df[0:5]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,PRODUCT_LINE,PURCHASE_AMOUNT
0,M,27,Single,Professional,True,Camping Equipment,144.78
1,F,39,Married,Other,False,Outdoor Protection,144.83
2,F,39,Married,Other,False,Outdoor Protection,137.37
3,F,56,Unspecified,Hospitality,False,Personal Accessories,92.61
4,M,45,Married,Retired,False,Golf Equipment,119.04


In [4]:
# Select columns of interest
training_data = df[["GENDER","AGE","MARITAL_STATUS","PROFESSION","IS_TENT"]].copy()
print( training_data[0:5] )

  GENDER  AGE MARITAL_STATUS    PROFESSION  IS_TENT
0      M   27         Single  Professional     True
1      F   39        Married         Other    False
2      F   39        Married         Other    False
3      F   56    Unspecified   Hospitality    False
4      M   45        Married       Retired    False


In [5]:
# Create label encoders for string columns
from sklearn.preprocessing import LabelEncoder
import numpy as np
le_GENDER = LabelEncoder().fit( training_data["GENDER"] )
le_MARITAL_STATUS = LabelEncoder().fit( training_data["MARITAL_STATUS"] )
le_PROFESSION = LabelEncoder().fit( training_data["PROFESSION"] )

print( "le_GENDER:" )
print( np.sort( np.array( [ le_GENDER.transform(le_GENDER.classes_), le_GENDER.classes_ ] ).T, axis=0 ) )
print( "\nle_MARITAL_STATUS:" )
print( np.sort( np.array( [ le_MARITAL_STATUS.transform(le_MARITAL_STATUS.classes_), le_MARITAL_STATUS.classes_ ] ).T, axis=0 ) )
print( "\nle_PROFESSION:" )
print( np.sort( np.array( [ le_PROFESSION.transform(le_PROFESSION.classes_), le_PROFESSION.classes_ ] ).T, axis=0 ) )

le_GENDER:
[[0 'F']
 [1 'M']]

le_MARITAL_STATUS:
[[0 'Married']
 [1 'Single']
 [2 'Unspecified']]

le_PROFESSION:
[[0 'Executive']
 [1 'Hospitality']
 [2 'Other']
 [3 'Professional']
 [4 'Retail']
 [5 'Retired']
 [6 'Sales']
 [7 'Student']
 [8 'Trades']]


In [6]:
# Create encoded colums in the training data
training_data["GENDER_index"] = le_GENDER.transform( training_data["GENDER"] )
training_data["MARITAL_STATUS_index"] = le_MARITAL_STATUS.transform( training_data["MARITAL_STATUS"] )
training_data["PROFESSION_index"] = le_PROFESSION.transform( training_data["PROFESSION"] )
training_data[0:5]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,GENDER_index,MARITAL_STATUS_index,PROFESSION_index
0,M,27,Single,Professional,True,1,1,3
1,F,39,Married,Other,False,0,0,2
2,F,39,Married,Other,False,0,0,2
3,F,56,Unspecified,Hospitality,False,0,2,1
4,M,45,Married,Retired,False,1,0,5


### Build a Pipeline

In [7]:
# Create a simple Pipeline
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline( steps = [ ( "classifier", LogisticRegression() ) ] )

### Train and evaluate the model

In [8]:
# Split the training data into a training set and a test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( training_data[[ "AGE", "GENDER_index", "MARITAL_STATUS_index", "PROFESSION_index" ]], training_data["IS_TENT"] )

In [9]:
# Train the model
pipeline.fit( X_train, y_train )

Pipeline(memory=None,
     steps=[('classifier', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

In [10]:
# Evaluate the model performance
predictions = pipeline.predict( X_test )
num_correct = ( ( predictions == y_test.values ) == True ).sum()
print( "Success rate: " + str( round( 100 * ( num_correct / len( predictions ) ) ) ) + "%" )

Success rate: 86.0%


In [11]:
# Grab some example data for quick test
df[13:15]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,PRODUCT_LINE,PURCHASE_AMOUNT
13,F,35,Married,Professional,False,Golf Equipment,152.95
14,M,20,Single,Sales,True,Mountaineering Equipment,124.66


In [12]:
negative_example_payload = [ 35, le_GENDER.transform( ["F"] )[0], le_MARITAL_STATUS.transform( ["Married"] )[0], le_PROFESSION.transform( ["Professional"] )[0] ]
print( "negative_example: " + str( negative_example_payload ) )

negative_example: [35, 0, 0, 3]


In [13]:
positive_example_payload = [ 20, le_GENDER.transform( ["M"] )[0], le_MARITAL_STATUS.transform( ["Single"] )[0], le_PROFESSION.transform( ["Sales"] )[0] ]
print( "positive_example: " + str( positive_example_payload ) )

positive_example: [20, 1, 1, 6]


In [14]:
pipeline.predict( [ negative_example_payload ] )

array([False])

In [15]:
pipeline.predict( [ positive_example_payload ] )

array([ True])

### Save the Pipeline

You can import your scikit-learn Pipeline into Watson Machine Learning in any of the following formats:
- In-memory Pipeline object
- Pipeline saved in a pickle file
- Pipeline saved in a pickle file in a tar.gz file

In this section of the notebook, the Pipeline object is saved to a pickle file and a tar.gz file to demonstrate all options.

In [16]:
import pickle
pickle.dump( pipeline, open( "tent-prediction-model.pkl", 'wb') )

In [17]:
!mkdir model-dir
!cp tent-prediction-model.pkl model-dir

mkdir: model-dir: File exists


In [18]:
!tar -zcvf tent-prediction-model.tar.gz tent-prediction-model.pkl

a tent-prediction-model.pkl


## <a id="step1"></a> Step 1: Store the model in your Watson Machine Learning repository

### Format options

This section of the notebook demonstrates calling the <a href="https://wml-api-pyclient.mybluemix.net/index.html?highlight=store_model#client.Repository.store_model" target="_blank" rel="noopener noreferrer">store_model</a> function, passing the model in three formats:
- Format 1: In-memory Pipeline object
- Format 2: Pipeline saved in a pickle file
- Format 3: Pipeline saved in a pickle file in a tar.gz file

In [19]:
!pip install watson_machine_learning_client # Needed to work with the Watson Machine Learning Python client

[33mYou are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Paste your Watson Machine Learning credentials in the following cell.

See: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">Looking up credentials</a>

In [21]:
# Create a Watson Machine Learning client instance
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
    "apikey"    : "value",
    "instance_id" : "instance_id",
    "url"    : "url"
}
client = WatsonMachineLearningAPIClient( wml_credentials )

In [22]:
# Format 1: In-mmory Pipeline object
#
# Parameters:
# 1. The in-memory model object
# 2. A name you choose for the stored model
#
model_details_inmem = client.repository.store_model( pipeline, "scikit-learn model (in-memory object)" )

In [23]:
# Format 2: Pipeline saved in a pickle file
#
# Parameters:
# 1. The directory containing the pickle file
# 2. Metadata, including a name you choose for the stored model, as well as information about the framework
#
metadata = {
    client.repository.ModelMetaNames.NAME: "scikit-learn model (pickle file in a directory)",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.19"
}
model_details_pkl = client.repository.store_model( model="model-dir", meta_props=metadata )

In [24]:
# Format 3: Pipeline saved in a pickle file in a tar.gz file
#
# Parameters:
# 1. The tar.gz file containing the pickle file
# 2. Metadata, including a name you choose for the stored model, as well as information about the framework
#
metadata = {
    client.repository.ModelMetaNames.NAME: "scikit-learn model (tar.gz)",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.19"
}
model_details_targz = client.repository.store_model( model="tent-prediction-model.tar.gz", meta_props=metadata )

## <a id="step2"></a> Step 2: Deploy the stored the model in your Watson Machine Learning service

This section of the notebook demonstrates calling the <a href="https://wml-api-pyclient.mybluemix.net/index.html?highlight=deploy#client.Deployments.create" target="_blank" rel="noopener noreferrer">deployments.create</a> function

In [25]:
# Deploy the stored model as an online web service deployment
model_id_targz = model_details_targz["metadata"]["guid"]
deployment_details_targz = client.deployments.create( artifact_uid=model_id_targz, name="scikit-learn deployment (tar.gz)" )



#######################################################################################

Synchronous deployment creation for uid: 'd136050c-fc7f-4b31-847c-d26b20f12693' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='86335300-1e4a-4a24-b904-dedf90af0728'
------------------------------------------------------------------------------------------------




In [26]:
# Test the deployment
model_endpoint_url_targz = client.deployments.get_scoring_url( deployment_details_targz )
client.deployments.score( model_endpoint_url_targz, { "values" : [ [ 35, 0, 0, 3 ], [20, 1, 1, 6] ] } )

{'fields': ['prediction', 'probability'],
 'values': [[False, [0.9168509324042778, 0.08314906759572217]],
  [True, [0.09513035380503876, 0.9048696461949612]]]}

## Summary
In this notebook, you imported a scikit-learn Pipeline into Watson Machine Learning using the Watson Machine Learning Python client.

### <a id="authors"></a>Authors

**Sarah Packowski** is a member of the IBM Watson Studio Content Design team in Canada.


<hr>
Copyright &copy; IBM Corp. 2019. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>