# Credit risk prediction with Scikit-learn and custom library

This notebook uses dataset Credit risk dataset (https://raw.githubusercontent.com/leonardofurnielis/wml-toolkit/master/datasets/credit_risk_dataset.csv)

The notebook will train, create, and deploy a Credit Risk model.

### Contents

1. [Set up the environment](#setup_environment)
1. [Explore and prepare training data](#explore_prepare_data)
1. [Install custom python library](#install_custom_library)
1. [Create train and test dataset](#train_test_set)
1. [Train the model](#train_model)
1. [Persist custom library](#publish_custom_library)
1. [Save the model](#save_model)
1. [Deploy and score](#deploy_model)

In [None]:
import pandas as pd
import numpy as np
import matplotlib as mlp
import matplotlib.pyplot as plt
import seaborn as sns
import json

<a id="setup_environment"></a>
## 1. Set up the environment

To authenticate to Watson Machine Learning in the IBM Cloud, you need api_key and service location.

Using [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) or directly through the IBM Cloud portal.

Using IBM Cloud CLI:

```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
```

NOTE: To get the Service URL [Endpoint URLs section of the Watson Machine Learning docs](https://cloud.ibm.com/apidocs/machine-learning).

**Action**: Enter your api_key and location in the following cell.

In [None]:
api_key = 'API_KEY'
location = 'LOCATION'

In [None]:
wml_credentials = {
    "apikey": api_key,
    "url": location
}

**Action**: Assign space ID below

In [None]:
space_id = 'SPACE_ID'

**Action**: Assign project ID below

In [None]:
project_id = 'PROJECT_ID'

### 1.2 Installing IBM Watson Machine Learning library

NOTE: Documentation could be found [here](http://ibm-wml-api-pyclient.mybluemix.net/)

In [None]:
!pip install -U ibm-watson-machine-learning --quiet

In [None]:
from ibm_watson_machine_learning import APIClient

client = APIClient(wml_credentials)
print(client.version)

<a id="explore_prepare_data"></a>
## 2. Explore and prepare training data

### 2.1 Importing training data

In [None]:
# The code was removed by IBM Watson Studio for sharing.

### 2.2. Exploring and preparing data

In [None]:
df = df_data_1

In [None]:
df.describe()

In [None]:
ax = sns.countplot(x="Risk", data=df)
plt.title("Risk label distribution")

<a id="install_custom_library"></a>
## 3. Install custom python library

In this step you will install the library containing custom transformer.

In [None]:
!mkdir -p dummiesnorm-0.1/dummies_norm

In [None]:
%%writefile dummiesnorm-0.1/dummies_norm/sklearn_transformers.py

from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class DNormalizer(BaseEstimator, TransformerMixin):
    def fit(self, X, y = None):
        return self

    def transform(self, X, y = None):
        X_ = X.copy()
        X_dummy = pd.get_dummies(X_)

        return X_dummy

Wrap created code into Python source distribution package.

In [None]:
%%writefile dummiesnorm-0.1/dummies_norm/__init__.py

__version__ = "0.1"

In [None]:
%%writefile dummiesnorm-0.1/README.md

A simple library containing a simple custom scikit estimator.

In [None]:
%%writefile dummiesnorm-0.1/setup.py

from setuptools import setup

VERSION='0.1'
setup(name='dummiesnorm',
      version=VERSION,
      author='IBM',
      url='https://github.com/leonardofurnielis',
      author_email='ibm@ibm.com',
      license='IBM',
      packages=[
            'dummies_norm'
      ],
      zip_safe=False
)

In [None]:
%%bash

cd dummiesnorm-0.1
python setup.py sdist --formats=zip
cd ..
mv dummiesnorm-0.1/dist/dummiesnorm-0.1.zip .
rm -rf dummiesnorm-0.1

Install the downloaded library using pip command

In [None]:
!pip install dummiesnorm-0.1.zip

<a id="train_test_set"></a>
## 4. Create train and test dataset
NOTE: Test dataset (30%) and Training dataset (70%)

In [None]:
Y = df['Risk']
df = df.drop(['Risk'], axis=1)
df.head()

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(df, Y, test_size=0.4)

<a id="train_model"></a>
## 5. Train the model

Create a Scikit-learn Pipeline containing: 

1. dummy transformation
1. normalization
1. model training

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Normalizer
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [None]:
from dummies_norm.sklearn_transformers import DNormalizer

In [None]:
pipeline = Pipeline([('dummy_trans', DNormalizer()), ('normalize', Normalizer()), ('lr', LogisticRegression())])

In [None]:
risk_model = pipeline.fit(X_train, Y_train)

### 5.1 Model evaluation

In [None]:
risk_model_predicted = risk_model.predict(X_test)

In [None]:
print(metrics.accuracy_score(Y_test, risk_model_predicted))

In [None]:
print(metrics.classification_report(Y_test, risk_model_predicted))

In [None]:
risk_model_conf_matrix = metrics.confusion_matrix(Y_test, risk_model_predicted)
sns.heatmap(risk_model_conf_matrix, annot=True,  fmt='');
plt.title('Confusion matrix, Logistic Regression');

<a id="publish_custom_library"></a>
## 6. Persist custom library

In this section, using ibm-watson_machine_learning SDK, you will:

- save the library dummiesnorm-0.1.zip in WML Repository by creating a package extension resource
- create a Software Specification resource and bind the package resource to it. This Software Specification resource will be used to configure the online deployment runtime environment for a model
- bind Software Specification resource to the model and save the model to WML Repository

### 6.1 Create package extension

Define the meta data required to create package extension resource.

The value for `file_path` in `client.package_extensions.LibraryMetaNames.store()` contains the library file name that must be uploaded to the WML.

NOTE: You can also use conda environment configuration file `yaml` as package extension input. In such case set the TYPE to `conda_yml` and `file_path` to yaml file.

    client.package_extensions.ConfigurationMetaNames.TYPE = "conda_yml"

In [None]:
client.set.default_space(space_id)

In [None]:
!ls

In [None]:
meta_prop_pkg_extn = {
    client.package_extensions.ConfigurationMetaNames.NAME: "dummies_norm_skl",
    client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib",
    client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip"
}

pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="dummiesnorm-0.1.zip")
pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details)
pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)

In [None]:
details = client.package_extensions.get_details(pkg_extn_uid)
details

### 6.2 Create software specification and add custom library

Define the meta data required to create software spec resource and bind the package. This software spec resource will be used to configure the online deployment runtime environment for a model.

In [None]:
client.software_specifications.ConfigurationMetaNames.show()

In [None]:
client.software_specifications.list()

In [None]:
base_sw_spec_uid = client.software_specifications.get_uid_by_name("runtime-22.2-py3.10")

In [None]:
meta_prop_sw_spec = {
    client.software_specifications.ConfigurationMetaNames.NAME: "dummiesnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for dummiesnorm-0.1",
    client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid}
}

sw_custom_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_custom_spec_uid = client.software_specifications.get_uid(sw_custom_spec_details)


client.software_specifications.add_package_extension(sw_custom_spec_uid, pkg_extn_uid)

<a id="save_model"></a>
## 7. Save the model

### 7.1 Save the model to IBM Watson Studio project

In [None]:
client.set.default_project(project_id)

In [None]:
metadata = {
            client.repository.ModelMetaNames.NAME: 'preprod_credit_risk_model_custom_library',
            client.repository.ModelMetaNames.TYPE: 'scikit-learn_1.1',
            client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_custom_spec_uid
}

published_model = client.repository.store_model(model=risk_model, meta_props=metadata, training_data=df, training_target=Y)

### 7.2 Save the model to IBM Watson Studio space

In [None]:
client.set.default_space(space_id)

In [None]:
client.spaces.list(limit=10)

In [None]:
published_model = client.repository.store_model(model=risk_model, meta_props=metadata, training_data=df, training_target=Y)

In [None]:
published_model_uid = client.repository.get_model_id(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

In [None]:
client.repository.list_models()

<a id="deploy_model"></a>
## 8. Deploy and score

NOTE: Deploy and score the model deployed at IBM Watson Machine Learning

In [None]:
metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "preprod_credit_risk_model_deployment",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

created_deployment = client.deployments.create(published_model_uid, meta_props=metadata)

In [None]:
deployment_uid = client.deployments.get_uid(created_deployment)
client.deployments.get_details(deployment_uid)

In [None]:
client.deployments.list()