# Import a scikit-learn model to IBM Watson Machine Learning

Importing a model into Watson Machine Learning means to store a trained model in your Watson Machine Learning repository and then deploy the stored model.  This notebook demonstrates multiple methods for importing a scikit-learn model using the Watson Machine Learning Python client.

See also: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-import-scikit-learn.html" target="_blank" rel="noopener noreferrer">Importing a scikit-learn model</a>

This notebook runs on Spark Python 3.5.


### Notebook sections

[Setup](#setup)

1. [Load training data](#loadata)
2. [Build, train, and evaluate model](#trainmodel)
4. [Store and deploy model](#storedeploymodel)
    - [Method 1: In-memory object](#inmem)
    - [Method 2: Pickle file](#pkl)
    - [Method 3: tar.gz file](#targz)

# <a id="setup"></a> Set up
- Install packages
- Import libraries
- Instaiate a Watson Machine Learning client

In [None]:
!pip install wget # needed to download sample files

In [None]:
!pip install watson_machine_learning_client

Paste your Watson Machine Learning credentials in the following cell.

See: <a href="https://dataplatform.cloud.ibm.com/docs/content/analyze-data/ml-get-wml-credentials.html" target="_blank" rel="noopener noreferrer">Looking up credentials</a>

In [None]:
# Create a Watson Machine Learning client instance
from watson_machine_learning_client import WatsonMachineLearningAPIClient
wml_credentials = {
    "instance_id" : "",
    "password"    : "",
    "url"         : "",
    "username"    : ""
}
client = WatsonMachineLearningAPIClient( wml_credentials )

## <a id="loaddata"></a> 1. Load and prepare sample training data

**About the sample model**

The sample model built here is a logistic regression model for predicting whether or not a customer will purchase a tent from a fictional outdoor equipment store, based on the customer charateristics.

The data used to train the model is the "GoSales.csv" training data in the IBM Watson Studio community: <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/aa07a773f71cf1172a349f33e2028e4e" target="_blank" rel="noopener noreferrer">GoSales sample data</a>.

In [4]:
# Download sample training data to notebook working directory
import wget
training_data_url = 'https://dataplatform.cloud.ibm.com/data/exchange-api/v1/entries/aa07a773f71cf1172a349f33e2028e4e/data?accessKey=e98b7315f84e5448aa94c633ca66ea83'
filename = wget.download( training_data_url )
print( filename )

GoSales.csv


In [5]:
# Read sample data into a pandas DataFrame
import pandas as pd
df = pd.read_csv( filename )
df[0:5]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,PRODUCT_LINE,PURCHASE_AMOUNT
0,M,27,Single,Professional,True,Camping Equipment,144.78
1,F,39,Married,Other,False,Outdoor Protection,144.83
2,F,39,Married,Other,False,Outdoor Protection,137.37
3,F,56,Unspecified,Hospitality,False,Personal Accessories,92.61
4,M,45,Married,Retired,False,Golf Equipment,119.04


In [6]:
# Select columns of interest
training_data = df[["GENDER","AGE","MARITAL_STATUS","PROFESSION","IS_TENT"]].copy()
print( training_data[0:5] )

  GENDER  AGE MARITAL_STATUS    PROFESSION  IS_TENT
0      M   27         Single  Professional     True
1      F   39        Married         Other    False
2      F   39        Married         Other    False
3      F   56    Unspecified   Hospitality    False
4      M   45        Married       Retired    False


In [7]:
# Create label encoders for string columns
from sklearn.preprocessing import LabelEncoder
import numpy as np
le_GENDER = LabelEncoder().fit( training_data["GENDER"] )
le_MARITAL_STATUS = LabelEncoder().fit( training_data["MARITAL_STATUS"] )
le_PROFESSION = LabelEncoder().fit( training_data["PROFESSION"] )

print( "le_GENDER:" )
print( np.sort( np.array( [ le_GENDER.transform(le_GENDER.classes_), le_GENDER.classes_ ] ).T, axis=0 ) )
print( "\nle_MARITAL_STATUS:" )
print( np.sort( np.array( [ le_MARITAL_STATUS.transform(le_MARITAL_STATUS.classes_), le_MARITAL_STATUS.classes_ ] ).T, axis=0 ) )
print( "\nle_PROFESSION:" )
print( np.sort( np.array( [ le_PROFESSION.transform(le_PROFESSION.classes_), le_PROFESSION.classes_ ] ).T, axis=0 ) )

le_GENDER:
[[0 'F']
 [1 'M']]

le_MARITAL_STATUS:
[[0 'Married']
 [1 'Single']
 [2 'Unspecified']]

le_PROFESSION:
[[0 'Executive']
 [1 'Hospitality']
 [2 'Other']
 [3 'Professional']
 [4 'Retail']
 [5 'Retired']
 [6 'Sales']
 [7 'Student']
 [8 'Trades']]


In [8]:
# Create encoded colums in the training data
training_data["GENDER_index"] = le_GENDER.transform( training_data["GENDER"] )
training_data["MARITAL_STATUS_index"] = le_MARITAL_STATUS.transform( training_data["MARITAL_STATUS"] )
training_data["PROFESSION_index"] = le_PROFESSION.transform( training_data["PROFESSION"] )
training_data[0:5]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,GENDER_index,MARITAL_STATUS_index,PROFESSION_index
0,M,27,Single,Professional,True,1,1,3
1,F,39,Married,Other,False,0,0,2
2,F,39,Married,Other,False,0,0,2
3,F,56,Unspecified,Hospitality,False,0,2,1
4,M,45,Married,Retired,False,1,0,5


## <a id="buildmodel"></a> 2. Build, train, and evaluate a model

In [9]:
# Create a simple Pipeline
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline( steps = [ ( "classifier", LogisticRegression() ) ] )

In [10]:
# Split the training data into a training set and a test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( training_data[[ "AGE", "GENDER_index", "MARITAL_STATUS_index", "PROFESSION_index" ]], training_data["IS_TENT"] )

In [11]:
# Train the model
pipeline.fit( X_train, y_train )

Pipeline(memory=None,
     steps=[('classifier', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

In [12]:
# Evaluate the model performance
predictions = pipeline.predict( X_test )
num_correct = ( ( predictions == y_test.values ) == True ).sum()
print( "Success rate: " + str( round( 100 * ( num_correct / len( predictions ) ) ) ) + "%" )

Success rate: 85.0%


In [13]:
# Grab some example data for quick test
df[13:15]

Unnamed: 0,GENDER,AGE,MARITAL_STATUS,PROFESSION,IS_TENT,PRODUCT_LINE,PURCHASE_AMOUNT
13,F,35,Married,Professional,False,Golf Equipment,152.95
14,M,20,Single,Sales,True,Mountaineering Equipment,124.66


In [15]:
negative_example_payload = [ 35, le_GENDER.transform( ["F"] )[0], le_MARITAL_STATUS.transform( ["Married"] )[0], le_PROFESSION.transform( ["Professional"] )[0] ]
print( "negative_example: " + str( negative_example_payload ) )

negative_example: [35, 0, 0, 3]


In [16]:
pipeline.predict( [ negative_example_payload ] )

array([False], dtype=bool)

In [17]:
positive_example_payload = [ 20, le_GENDER.transform( ["M"] )[0], le_MARITAL_STATUS.transform( ["Single"] )[0], le_PROFESSION.transform( ["Sales"] )[0] ]
print( "positive_example: " + str( positive_example_payload ) )

positive_example: [20, 1, 1, 6]


In [18]:
pipeline.predict( [ negative_example_payload ] )

array([False], dtype=bool)

## <a id="storedeploymodel"></a> 4. Store and deploy the model in Watson Machine Learning

### <a id="inmem"></a> Method 1: In-memory object

In [20]:
# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The in-memory model object
# 2. A name you choose for the stored model 
model_details_inmem = client.repository.store_model( pipeline, "scikit-learn model (in-memory object)" )

In [None]:
# Deploy the stored model as an online web service deployment
model_id_inmem = model_details_inmem["metadata"]["guid"]
deployment_details_inmem = client.deployments.create( artifact_uid=model_id_inmem, name="scikit-learn deployment (in-memory object)" )

In [26]:
# Test the deployment
model_endpoint_url_inmem = client.deployments.get_scoring_url( deployment_details_inmem )
client.deployments.score( model_endpoint_url_inmem, { "values" : [ [ 35, 0, 0, 3 ], [20, 1, 1, 6] ] } )

{'fields': ['prediction', 'probability'],
 'values': [[False, [0.9183658244384358, 0.08163417556156413]],
  [True, [0.09219365255696554, 0.9078063474430345]]]}

### <a id="pkl"></a> Method 2: Pickle file in a directory

In [33]:
# Save the model to a file in the notebook working directory using pickle
import pickle
pkl_filename = "scikit-learn-model.pkl"
with open( pkl_filename, 'wb') as file:
    pickle.dump( pipeline, file )

In [None]:
# Create a directory, called "pkldir", and copy th pickle file into that directory
!mkdir pkldir
!cp scikit-learn-model.pkl pkldir

In [30]:
# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The directory containing the pickle file
# 2. Metadata, including a name you choose for the stored model, as well as information about the framework
metadata = {
    client.repository.ModelMetaNames.NAME: "scikit-learn model (pickle file in a directory)",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.19"
}
model_details_pkl = client.repository.store_model( model="pkldir", meta_props=metadata )

In [None]:
# Deploy the stored model as an online web service deployment
model_id_pkl = model_details_pkl["metadata"]["guid"]
deployment_details_pkl = client.deployments.create( artifact_uid=model_id_pkl, name="scikit-learn deployment (pickle file in a directory)" )

In [32]:
# Test the deployment
model_endpoint_url_pkl = client.deployments.get_scoring_url( deployment_details_pkl )
client.deployments.score( model_endpoint_url_pkl, { "values" : [ [ 35, 0, 0, 3 ], [20, 1, 1, 6] ] } )

{'fields': ['prediction', 'probability'],
 'values': [[False, [0.9183658244384358, 0.08163417556156413]],
  [True, [0.09219365255696554, 0.9078063474430345]]]}

### <a id="targz"></a> Method 3: Pickle file in a tar.gz file

In [38]:
# Use the tar command to put the pickle file in a tar.gz fie.  Parameters:
# (-c) to create
# (-z) use gzip
# (-f) output goes in an archive file
# (-v) for verbose output
!tar -zcvf scikit-learn-model.tar.gz scikit-learn-model.pkl

scikit-learn-model.pkl


In [40]:
# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The tar.gz file containing the pickle file
# 2. Metadata, including a name you choose for the stored model, as well as information about the framework
metadata = {
    client.repository.ModelMetaNames.NAME: "scikit-learn model (tar.gz)",
    client.repository.ModelMetaNames.FRAMEWORK_NAME: "scikit-learn",
    client.repository.ModelMetaNames.FRAMEWORK_VERSION: "0.19"
}
model_details_targz = client.repository.store_model( model="scikit-learn-model.tar.gz", meta_props=metadata )

In [None]:
# Deploy the stored model as an online web service deployment
model_id_targz = model_details_targz["metadata"]["guid"]
deployment_details_targz = client.deployments.create( artifact_uid=model_id_targz, name="scikit-learn deployment (tar.gz)" )

In [42]:
# Test the deployment
model_endpoint_url_targz = client.deployments.get_scoring_url( deployment_details_targz )
client.deployments.score( model_endpoint_url_targz, { "values" : [ [ 35, 0, 0, 3 ], [20, 1, 1, 6] ] } )

{'fields': ['prediction', 'probability'],
 'values': [[False, [0.9183658244384358, 0.08163417556156413]],
  [True, [0.09219365255696554, 0.9078063474430345]]]}

## Summary
In this notebook, you used several methods to import a scikit-learn model into Watson Machine Learning using the Watson Machine Learning Python client.

### <a id="authors"></a>Authors

**Sarah Packowski** is a member of the IBM Watson Studio Content Design team in Canada.


<hr>
Copyright &copy; IBM Corp. 2019. This notebook and its source code are released under the terms of the MIT License.

<div style="background:#F5F7FA; height:110px; padding: 2em; font-size:14px;">
<span style="font-size:18px;color:#152935;">Love this notebook? </span>
<span style="font-size:15px;color:#152935;float:right;margin-right:40px;">Don't have an account yet?</span><br>
<span style="color:#5A6872;">Share it with your colleagues and help them discover the power of Watson Studio!</span>
<span style="border: 1px solid #3d70b2;padding:8px;float:right;margin-right:40px; color:#3d70b2;"><a href="https://ibm.co/wsnotebooks" target="_blank" style="color: #3d70b2;text-decoration: none;">Sign Up</a></span><br>
</div>