In [None]:
import numpy as np
import pandas as pd 
from matplotlib import pyplot as plt
import seaborn as sns
import os
import sklearn
from sklearn.model_selection import train_test_split
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Predicting Loan Default - When should credit be approved?

**Problem statement**:

The aim of this notebook is to create a simple program that can predict wether a credit should be approved or not, based on previous client data. The program should help the bank minimize risk with future clients.

The model used will predict wether a client should have their loan approved or not, based on his or her history, using logistic regression for classification.

**The data:**

The data consists of a csv file containing records of clients from a private german bank. Includes the client profile (account balance, number of credits,...) and a variable **Creditability** (1 : credit-worthy 0 : not credit-worthy).
A detailed description of the variables can be found [here](https://newonlinecourses.science.psu.edu/stat508/book/export/html/803).

In [None]:
# read and inspect dataset
data = pd.read_csv('../input/german-credit-risk/german_credit.csv')
data.head()

In [None]:
data.describe() # data summary

In [None]:
data.corr() # check for correlations with target variable

In [None]:
# new data will include the variables with highest correlation
# with dependent variable - creditability
x = data[['Account Balance', 'Duration of Credit (month)', 'Payment Status of Previous Credit']]

y = data['Creditability']

In [None]:
# splitting data
# test = 80% of data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.80, random_state=6)

In [None]:
# using statsmodels for model and metrics
import statsmodels.api as sm

# building the model and fitting the training data 
model = sm.Logit(y_train, x_train).fit()

In [None]:
# summary statistics of log. regression
model.summary()

In [None]:
pred_y = model.predict(x_test) # values predicted by model
pred_y.head()

Our logistic regression function returns the likelihood of a credit being worthy, and so values can range from 0 to 1.
We'll transform the likelyhood into a binary variable for classification.
If a credit is more than 50% likely to be worthy, it will be labeled worthy.
The function below can apply this process.

In [None]:
# function for turning likelihood into labels
def binary_classify(x): # takes int x returns output int label
    x = round(x, 2)
    if x >= 0.50:
        return 1
    return 0

pred = list(map(binary_classify, pred_y)) # apply function to all predictions

### Classification performance

Now that we have used logistic regression to perform classification, we can check how accurate the predictions are:

In [None]:
# accuracy score
sklearn.metrics.accuracy_score(y_test, pred)

The model makes 73.75% of predictions correct.

## Proposed solution

In order to classify future clients as credit worthy or not, we can build a simple program that will take as inputs their account balance, credit duration and payment status of current credit. The program will then apply the previous model and classify the client, providing information to the bank of wether the credit should be approved or not.
The formula used for classification was obtained previously with *statsmodels*.

In [None]:
# predictive function based on logistic model - returns likelyhood
def log_func(x):
    balance, credit, pay_status = x
    result = (0.6364 * balance) + (-0.05 * credit) + (0.2374 * pay_status)
    return result

def binary_classify(x): # takes int x = likelihood returns output int label
    x = round(x, 2)
    if x >= 0.50:
        return 1
    return 0

def predict(x): # makes credit predictions
    return binary_classify(log_func(x)) 

def print_result(x): # prints output to user
    if predict(x) == 1:
        return "Credit worthy"
    return "Not credit worthy"

Let's try to apply the program as an example.
We can check the data of a specific client as a test:

In [None]:
test = data.iloc[97] # data from client
test = test[['Creditability', 'Account Balance', 'Duration of Credit (month)',
      'Payment Status of Previous Credit']]
test

Client data:
* Creditability: 0 - was classified not credit worthy
* Account Balance: 2
* Duration of Credit (month): 36
* Payment Status of Previous Credit: 3

In [None]:
# testing

input_test = (2, 36, 3) # client data
print('Result: ' + print_result(input_test))

The client was correctly classified as not credit worthy.