# Return Propensity using ICP4D and Watson Machine Learning.

We'll use this notebook to create a machine learning model to predict customer churn.

## 1.0 Import the data set

We need to import the data in the AggregatedOrderData.csv file. 

In [1]:
import os, pandas as pd
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

## Add credentials here!!!!

body = client_8aa33f240a004f5683941ade3d2b1ba6.get_object(Bucket='dsworkshop-donotdelete-pr-7bxfdbxyx7dtjo',Key='AggregatedOrderData.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.head()

ModuleNotFoundError: No module named 'botocore'

## 2.0 Clean the data

### 2.1 We will first fill all NA(s) and empty values with 0.

In [None]:
df=df.fillna(0)

### 2.2 Next we will see if we have any columns of dtype=object. These will then be converted to category codes in order to be fed into the model.

In [None]:
df.dtypes

In [None]:
# qual = list( df.loc[:,df.dtypes == 'object'].columns.values )
# for col in qual:
#      df[col] = df[col].astype('category')
# quant = list( df.loc[:,df.dtypes != 'category'].columns.values )
# print(qual,quant)

In [None]:
# cats = list( df.loc[:,df.dtypes == 'category'].columns.values)
# categories={}
# for col in cats:
#     categories[col]= dict(enumerate(df[col].cat.categories))

In [None]:
# categories

In [None]:
#df.dtypes

### 2.3 Next, we find out how many orders were returned and how many were not returned.

In [None]:
df["RETURN_FLAG"].value_counts()

### Here we can see that there are ~24K orders that have been returned and ~128K orders that have not been returned. 

### 2.4 Let's split our data into training and test sets.

In [None]:
from sklearn.model_selection import train_test_split
X=(df.drop(["RETURN_FLAG"], axis=1))
y=df['RETURN_FLAG']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3., random_state=42)

## 3.0 Install Custom Modules for the Pipeline Transformations

### 3.1 Let us now install the custom transformation library that we had uploaded to the project - CustTrans-0.2.zip

In [None]:
from botocore.client import Config
import ibm_boto3
# @hidden_cell
# The following code contains the credentials for a file in your IBM Cloud Object Storage.
# You might want to remove those credentials before you share your notebook.


## Add credentials here!!!!


def downloadFileCos(bucketDetails): 
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=bucketDetails['IBM_API_KEY_ID'],
    ibm_service_instance_id=bucketDetails['IAM_SERVICE_ID'],
    ibm_auth_endpoint=bucketDetails['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=bucketDetails['ENDPOINT'])
    res=cos.download_file(Bucket=bucketDetails['BUCKET'],Key="CustTrans-0.2.zip",Filename="CustTrans-0.2.zip")
    print("CustomTransformer file downloaded")

downloadFileCos(credentials_1)

In [None]:
!ls

In [None]:
!pip install --upgrade CustTrans-0.2.zip

### 3.2 Next, we install the sklearn-pandas library

In [None]:
!pip install sklearn-pandas

## 4.0 Build the model

### 4.1 Now, let us create the custom pipeline transformer which essentially is our model.

In [None]:
from CustomTransformer.CustTrans import TypeSelector,StringIndexer,ConvToCategorical

In [None]:
import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn_pandas import DataFrameMapper


transformer = Pipeline([
   ('features', FeatureUnion(n_jobs=1, transformer_list=[
       # Part 1
       ('boolean', Pipeline([
           ('selector', TypeSelector('bool')),
       ])),  # booleans close
       ('numericals', Pipeline([
           ('selector', TypeSelector(np.number)),
           ('scaler', StandardScaler()),
       ])),
       # Part 2
       ('categoricals', Pipeline([
           ('convertor', ConvToCategorical()),
           ('selector', TypeSelector('category')),
           ('labeler', StringIndexer()),
           ('encoder', OneHotEncoder(handle_unknown='ignore')),
       ]))
       # categoricals close
   ])),  # features close
   ('clf' , RandomForestClassifier(n_estimators=30,criterion="entropy")),
    
])

### 4.2 Let's now pass the input data through the transformer(fit), also known as training model.

In [None]:
import timeit
start_time = timeit.default_timer()
transformer.fit(X_train, y_train)
print("Time for model training",timeit.default_timer() - start_time)

### 4.3 Once training is complete, we can evaluate the accuracy of the model using the hold-out test data.

In [None]:
scores= transformer.predict(X_test)
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, scores)
accuracy

## 5.0 Save and deploy the model to WML

### 5.1 Create a WML API client.

In [None]:
!pip install ibm-watson-machine-learning

In [None]:
from ibm_watson_machine_learning import APIClient

In [None]:
apikey = "## Add API Key here!!!!"
wml_credentials = {
                   "url": "https://eu-gb.ml.cloud.ibm.com",
                   "apikey":apikey
                  }

In [None]:
client = APIClient(wml_credentials)

### Use the following cell to perform any clean up of previously created models, deployments and spaces.

In [None]:
client.spaces.list()

In [None]:
# see if any spaces already exist
# client.spaces.list()

# set the default space before moving ahead
# client.set.default_space('<GUID of the space>')

# see if any stored models exist
# client.repository.list_models()
# client.repository.delete('<GUID of model to delete>')

# see if any deployments exist
# client.deployments.list()
# client.deployments.delete('<GUID of deployment to delete>')

# once the deployments and models are deleted, the space can be deleted
# client.spaces.delete('<GUID of the space>')

### Create a deployment space and set it as the default space to be used for deployments. If you would rather use an existing space (that was previously created), skip the code in the cell below and directly use the next cell to set the default space.

In [None]:
# Use this code to create a new deployment space.
# space_details = client.spaces.store(meta_props={client.spaces.ConfigurationMetaNames.NAME: "ReturnPropensity_Space"})
# space_id = client.spaces.get_uid(space_details)
# print(space_id)

In [None]:
# Set default space - if you have a previously created space that you'd like to use, 
# use that space's id instead of `space_id`. For eg. client.set.default_space('<GUID of the space>')
client.set.default_space("## Add SPACE ID here!!!!")
print(client.deployments.list())

### 5.2 Before we deploy the model, let's create a custom python runtime with our custom transformer library installed.

#### 5.2.1 Create a package extension

In [None]:
meta_prop_pkg_extn = {
    client.package_extensions.ConfigurationMetaNames.NAME: "CustomTransformers_v0.1",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib",
    client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip"
}

pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="CustTrans-0.2.zip")
pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details)
pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)

In [None]:
details = client.package_extensions.get_details(pkg_extn_uid)

#### 5.2.2 Create software specification and add custom library

In [None]:
client.software_specifications.ConfigurationMetaNames.show()

In [None]:
client.software_specifications.list()

In [None]:
base_sw_spec_uid = client.software_specifications.get_uid_by_name("default_py3.7")

In [None]:
meta_prop_sw_spec = {
    client.software_specifications.ConfigurationMetaNames.NAME: "CustomTransformers_v0.1",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for CustomTransformers_v0.1",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid}
}

sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_spec_uid = client.software_specifications.get_uid(sw_spec_details)


client.software_specifications.add_package_extension(sw_spec_uid, pkg_extn_uid)

### 5.3 Now, let us store our model.

In [None]:
model_props = {
    client.repository.ModelMetaNames.NAME: "ReturnRiskPandas_v0.1",
    client.repository.ModelMetaNames.TYPE: 'scikit-learn_0.23',
    client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_spec_uid
    
}

In [None]:
published_model = client.repository.store_model(model=transformer, meta_props=model_props,training_data=X_train, training_target=y_train)
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)

In [None]:
import json
print(json.dumps(model_details, indent=2))

### 5.4 Finally, let's deploy the model.

In [None]:
metaProps = {
client.deployments.ConfigurationMetaNames.NAME: "ReturnRiskPandas_CustomTransformers_v0.1",
client.deployments.ConfigurationMetaNames.ONLINE: {}
}

In [None]:
created_deployment = client.deployments.create(published_model_uid, metaProps)

## 6.0 Test the model

### 6.1 Obtain the deployment_id and deployment_href for the model.

The deployment_id is required to score the model using the client.deployments.score() methos in the WML API Client.
The deployment_href can be used to generate the URL to be used to score the model via a cURL command. The scoring_url can be generated as `"<URL for your IBM Cloud Pak for Data cluster>" + <deployment_href>`

In [None]:
deployment_uid = client.deployments.get_uid(created_deployment)

In [None]:
scoring_endpoint = client.deployments.get_scoring_href(created_deployment)
print(scoring_endpoint)

### 6.2 Score the model using a sample payload.

In [None]:
scoring_payload={client.deployments.ScoringMetaNames.INPUT_DATA: [{"fields":["BASKET_SIZE","EXTN_COMPOSITION","CARRIER_SERVICE_CODE_OL","CATEGORY","COUNTRY_OF_ORIGIN_OI","DAY_OF_MONTH","DAY_OF_WEEK","DAY_OF_YEAR","EXTN_BRAND","EXTN_DISCOUNT_ID","EXTN_IS_GIFT","EXTN_IS_PREORDER","EXTN_SHIP_TO_CITY","EXTN_SHIP_TO_COUNTRY","EXTN_SEASON","LIST_PRICE","MONTH_OF_YEAR","OTHER_CHARGES","OTHER_CHARGES_OL","REQ_DELIVERY_DATE","TOTAL_AMOUNT_USD","WEEKEND","ZIP_CODE","MTS_CTS","HOUR_OF_DAY","LOCKID"],"values":[[3, '91% Nylon, 9% Elastercell', 'STANDARD', 'Bikini', 'US', 18, 'Saturday', 322, 'XYZAI', 'None', 'N', 'N', 'Los Angeles', 'US', 'FW17', 75, 11, 0.0, 0.0, 0, 165.35, 1, 'Zipcode_401', 24, 19, 277]]}]}

In [None]:
prediction = client.deployments.score(deployment_uid, scoring_payload)

In [None]:
prediction

### The first field - prediction - indicates the model's prediction of whether the items indicated by the sample payload will be returned (value of 0) or not (value of 1). The second field - probability - has 2 numeric values. The first corresponds to the probability of a prediction value of 0 and the second corresponds to the probability of the prediction value of 1.