# Save a scikit-learn model in PMML format

This notebook demonstrates saving a trained scikit-learn model in PMML format.

This notebook runs on Python 3.6.


## Notebook sections

1. [Load and prepare training data](#loadata)
2. [Train and evaluate model](#trainmodel)
3. [Save model in PMML format](#savemodel)



**About the sample model**

The sample model built here is a logistic regression model for predicting whether or not a customer will purchase a tent from a fictional outdoor equipment store, based on the customer charateristics.

The data used to train the model is the "GoSales.csv" training data in the IBM Watson Studio community: <a href="https://dataplatform.cloud.ibm.com/exchange/public/entry/view/aa07a773f71cf1172a349f33e2028e4e" target="_blank" rel="noopener noreferrer">GoSales sample data</a>.

### <a id="loaddata"></a> 1. Load and prepare sample training data

In [None]:
!pip install wget

In [None]:
# Download sample training data to notebook working directory
import wget
training_data_url = 'https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/aa07a773f71cf1172a349f33e2028e4e/data?accessKey=ff592aa24479638a4007dfc50e7e3088'
filename = wget.download( training_data_url )
print( filename )

In [None]:
# Read sample data into a pandas DataFrame
import pandas as pd
df = pd.read_csv( filename )
df[0:5]

In [None]:
# Select columns of interest
training_data = df[["GENDER","AGE","MARITAL_STATUS","PROFESSION","IS_TENT"]].copy()
print( training_data[0:5] )

In [None]:
# Create label encoders for string columns
from sklearn.preprocessing import LabelEncoder
import numpy as np
le_GENDER = LabelEncoder().fit( training_data["GENDER"] )
le_MARITAL_STATUS = LabelEncoder().fit( training_data["MARITAL_STATUS"] )
le_PROFESSION = LabelEncoder().fit( training_data["PROFESSION"] )

print( "le_GENDER:" )
print( np.sort( np.array( [ le_GENDER.transform(le_GENDER.classes_), le_GENDER.classes_ ] ).T, axis=0 ) )
print( "\nle_MARITAL_STATUS:" )
print( np.sort( np.array( [ le_MARITAL_STATUS.transform(le_MARITAL_STATUS.classes_), le_MARITAL_STATUS.classes_ ] ).T, axis=0 ) )
print( "\nle_PROFESSION:" )
print( np.sort( np.array( [ le_PROFESSION.transform(le_PROFESSION.classes_), le_PROFESSION.classes_ ] ).T, axis=0 ) )

In [None]:
# Create encoded colums in the training data
training_data["GENDER_index"] = le_GENDER.transform( training_data["GENDER"] )
training_data["MARITAL_STATUS_index"] = le_MARITAL_STATUS.transform( training_data["MARITAL_STATUS"] )
training_data["PROFESSION_index"] = le_PROFESSION.transform( training_data["PROFESSION"] )
training_data[0:5]

### <a id="trainmodel"></a> 2. Create a logistic regression model and then train and evaluate the model

In [None]:
!pip install git+https://github.com/jpmml/sklearn2pmml.git

In [None]:
# Create a pipeline that can be saved in PMML format implementing a logistic regression model
from sklearn2pmml.pipeline import PMMLPipeline
from sklearn.linear_model import LogisticRegression
pmml_pipeline = PMMLPipeline( [ ("classifier", LogisticRegression() ) ] )

In [None]:
# Split the training data into a training set and a test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( training_data[[ "AGE", "GENDER_index", "MARITAL_STATUS_index", "PROFESSION_index" ]], training_data["IS_TENT"].astype(int) )

In [None]:
# Train the model
pmml_pipeline.fit( X_train, y_train )

In [None]:
# Evaluate the model performance
predictions = pmml_pipeline.predict( X_test )
num_correct = ( ( predictions == y_test.values ) == True ).sum()
print( "Success rate: " + str( round( 100 * ( num_correct / len( predictions ) ) ) ) + "%" )

In [None]:
# Grab some example data for quick test
df[13:15]

In [None]:
negative_example_payload = [ 35, le_GENDER.transform( ["F"] )[0], le_MARITAL_STATUS.transform( ["Married"] )[0], le_PROFESSION.transform( ["Professional"] )[0] ]
print( "Negative_example_payload (did not buy a tent): " + str( negative_example_payload ) )

In [None]:
pmml_pipeline.predict( [ negative_example_payload ] )

In [None]:
positive_example_payload = [ 20, le_GENDER.transform( ["M"] )[0], le_MARITAL_STATUS.transform( ["Single"] )[0], le_PROFESSION.transform( ["Sales"] )[0] ]
print( "Positive_example payload (did buy a tent): " + str( positive_example_payload ) )

In [None]:
pmml_pipeline.predict( [ positive_example_payload ] )

### <a id="savemodel"></a> 3. Save the model in PMML format

In [None]:
# Save the model to a file in PMML format
from sklearn2pmml import sklearn2pmml
pmml_filename = "scikit-learn-lr-model-pmml.xml"
sklearn2pmml( pmml_pipeline, pmml_filename )

In [None]:
!cat scikit-learn-lr-model-pmml.xml

**Tip**

You can use your mouse to highlight-copy the PMML content from running the previous cell, then paste the content into a text editor on your local computer, and then save the file on your local computer as "scikit-learn-lr-model-pmml.xml"

# Import a model from PMML into IBM Watson Machine Learning 1.1.x

Here, <i>importing a trained model</i> means to store a trained model in your Watson Machine Learning repository and then deploy the stored model in your Watson Machine Learning service.  This notebook demonstrates how to use the Watson Machine Learning Python client to import a model that has been saved in PMML format.

This notebook runs on Python 3.6.


### Notebook sections


- [Step 0: Load PMML file](#load1)
- [Step 1: Connect WML server](#store1)
- [Step 2: Save the stored model](#Save)

In [1]:
!pip install wget
!pip install --upgrade watson-machine-learning-client-V4==1.0.60

Requirement already up-to-date: watson-machine-learning-client-V4==1.0.60 in /opt/conda/envs/Python36/lib/python3.6/site-packages (1.0.60)


# Load sample PMML file

In [9]:
# Download sample PMML file
#in realtime please import the same PMML file what you generated in previous section. This is sample pmml file 
import wget
pmml_file_url = 'https://raw.githubusercontent.com/pmservice/wml-sample-models/master/scikit-learn/import-pmml/scikit-learn-lr-model-pmml.xml'
pmml_filename = wget.download( pmml_file_url )
print(pmml_filename)

scikit-learn-lr-model-pmml (5).xml


In [12]:
!pwd

/home/dsxuser/work


In [13]:
# View the PMML
!cat 'scikit-learn-lr-model-pmml.xml'

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3">
	<Header>
		<Application name="JPMML-SkLearn" version="1.5.35"/>
		<Timestamp>2020-04-21T02:26:42Z</Timestamp>
	</Header>
	<DataDictionary>
		<DataField name="IS_TENT" optype="categorical" dataType="integer">
			<Value value="0"/>
			<Value value="1"/>
		</DataField>
		<DataField name="AGE" optype="continuous" dataType="double"/>
		<DataField name="GENDER_index" optype="continuous" dataType="double"/>
		<DataField name="MARITAL_STATUS_index" optype="continuous" dataType="double"/>
		<DataField name="PROFESSION_index" optype="continuous" dataType="double"/>
	</DataDictionary>
	<RegressionModel functionName="classification" normalizationMethod="logit">
		<MiningSchema>
			<MiningField name="IS_TENT" usageType="target"/>
			<MiningField name="AGE"/>
			<MiningField name="GENDER_index"/>
			<MiningField n

# Connecting to Watson Machine Learning Server

In [14]:
import urllib3
import json
import requests
from string import Template
from watson_machine_learning_client import WatsonMachineLearningAPIClient
# Enter your credentials here.
wml_credentials = {
    "url": "https://alexbar.ml.test.cloud.ibm.com:31843",
    "username": "admin",
    "password": "password",
    "instance_id": "wml_local",
    "version": "1.1"
}

In [15]:
#Now, instantiate a WatsonMachineLearningAPIClient object.
client = WatsonMachineLearningAPIClient(wml_credentials)
client.version

'1.0.60'

In [16]:
# Obtain the UId of your space
def guid_from_space_name(client, space_name):
    instance_details = client.service_instance.get_details()
    space = client.spaces.get_details()
    #print(space)
    return(next(item for item in space['resources'] if item['entity']["name"] == space_name)['metadata']['guid'])

In [17]:
 #Enter the name of your deployment space here:
space_uid = guid_from_space_name(client, 'PMML-Space')
print("Space UID = " + space_uid)

Space UID = c485f271-d3bc-4bd6-b8e4-654fdc532489


In [18]:
#Setting the default space is mandatory for WML Server. You can set this using the cell below.
client.set.default_space(space_uid)

'SUCCESS'

# Save the model

In [None]:
model_metadata = {
    client.repository.ModelMetaNames.NAME: "Model from PMML",
    client.repository.ModelMetaNames.DESCRIPTION: "My PMML Model",   
    client.repository.ModelMetaNames.RUNTIME_UID: "pmml_4.3",
    client.repository.ModelMetaNames.TYPE: "pmml_4.3"
}

# Store the model in the Watson Machine Learning repository.
# Parameters:
# 1. The name of the PMML file
# 2. Metadata that includes a name you choose for the stored model and the framework

model_details2 = client.repository.store_model( model=pmml_filename, meta_props=model_metadata )

In [None]:
client.repository.ModelMetaNames.get() 

In [None]:
client.repository.list_models() 