# Credit risk prediction with Scikit-learn

This notebook uses dataset Credit risk dataset (https://raw.githubusercontent.com/leonardofurnielis/wml-toolkit/master/datasets/credit_risk_dataset.csv)

The notebook will train, create, and deploy a Credit Risk model.

### Contents

1. [Import dataset](#import_dataset)
1. [Explore data](#explore_data)
1. [Data preparation](#data_preparation)
1. [Create train and test dataset](#train_test_set)
1. [Create a model](#create_model)
1. [Publish the model](#publish_model)
1. [Deploy and score](#deploy_model)

In [None]:
import pandas as pd
import numpy as np
import matplotlib as mlp
import matplotlib.pyplot as plt
import seaborn as sns
import json

<a id="import_dataset"></a>
## 1. Import dataset

In [None]:
# The code was removed by IBM Watson Studio for sharing.

<a id="explore_data"></a>
## 2. Explore data

In [None]:
df = df_data_1

In [None]:
df.describe()

In [None]:
ax = sns.countplot(x="Risk", data=df)
plt.title("Risk label distribution")

<a id="data_preparation"></a>
## 3. Data preparation

In this step you will prepare data for training a model. Encode feature columns into numeric values.

1. Transform categorical variables using dummy approach
1. Normalize features values

In [None]:
from sklearn.base import BaseEstimator, TransformerMixin

class DummyTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y = None):
        return self

    def transform(self, X, y = None):
        X_ = X.copy()
        X_dummy = pd.get_dummies(X_)

        return X_dummy

<a id="train_test_set"></a>
## 4. Create train and test dataset
NOTE: Test dataset (30%) and Training dataset (70%)

In [None]:
Y = df['Risk']
df = df.drop(['Risk'], axis=1)
df.head()

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(df, Y, test_size=0.4)

<a id="create_model"></a>
## 5. Create a model

Create a Scikit-learn Pipeline containing: 

1. dummy transformation
1. normalization
1. model training

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Normalizer
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

In [None]:
pipe = Pipeline([('dummy_trans', DummyTransformer()), ('normalize', Normalizer()), ('lr', LogisticRegression())])

In [None]:
risk_model = pipe.fit(X_train, Y_train)

### 5.1 Model evaluation

In [None]:
risk_model_predicted = risk_model.predict(X_test)

In [None]:
print(metrics.accuracy_score(Y_test, risk_model_predicted))

In [None]:
print(metrics.classification_report(Y_test, risk_model_predicted))

In [None]:
risk_model_conf_matrix = metrics.confusion_matrix(Y_test, risk_model_predicted)
sns.heatmap(risk_model_conf_matrix, annot=True,  fmt='');
plt.title('Confusion matrix, Logistic Regression');

<a id="publish_model"></a>
## 6. Publish the model

To authenticate to Watson Machine Learning in the IBM Cloud, you need api_key and service location.

Using [IBM Cloud CLI](https://cloud.ibm.com/docs/cli/index.html) or directly through the IBM Cloud portal.

Using IBM Cloud CLI:

```
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
```

NOTE: To get the Service URL [Endpoint URLs section of the Watson Machine Learning docs](https://cloud.ibm.com/apidocs/machine-learning).

In [None]:
api_key = 'API_KEY'
location = 'LOCATION'

In [None]:
wml_credentials = {
    "apikey": api_key,
    "url": location
}

### 6.1 Installing IBM Watson Machine Learning library

NOTE: Documentation could be found [here](http://ibm-wml-api-pyclient.mybluemix.net/)

In [None]:
!pip install -U ibm-watson-machine-learning --quiet

In [None]:
from ibm_watson_machine_learning import APIClient

client = APIClient(wml_credentials)
print(client.version)

### 6.2 Publish model to project

In [None]:
project_id = 'PROJECT_ID'

In [None]:
client.set.default_project(project_id)

In [None]:
sofware_spec_uid = client.software_specifications.get_id_by_name("runtime-22.2-py3.10")
metadata = {
            client.repository.ModelMetaNames.NAME: 'preprod_credit_risk_model',
            client.repository.ModelMetaNames.TYPE: 'scikit-learn_1.1',
            client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sofware_spec_uid
}

published_model = client.repository.store_model(model=risk_model, meta_props=metadata, training_data=df, training_target=Y)

### 6.3 Publish model to deployment space

In [None]:
space_id = 'SPACE_ID'

In [None]:
client.set.default_space(space_id)

In [None]:
client.spaces.list(limit=10)

In [None]:
published_model = client.repository.store_model(model=risk_model, meta_props=metadata, training_data=df, training_target=Y)

In [None]:
published_model_uid = client.repository.get_model_id(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))

In [None]:
client.repository.list_models()

<a id="deploy_model"></a>
## 7. Deploy and Score

NOTE: Deploy and score the model deployed at IBM Watson Machine Learning

In [None]:
metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "preprod_credit_risk_model_deployment",
    client.deployments.ConfigurationMetaNames.ONLINE: {}
}

created_deployment = client.deployments.create(published_model_uid, meta_props=metadata)

In [None]:
deployment_uid = client.deployments.get_uid(created_deployment)
client.deployments.get_details(deployment_uid)

In [None]:
client.deployments.list()