## More Loans

In this activity you will pratice using random and SMOTE oversampling in combination with logistic regression to predict whether or not someone is likely to default on their credit card loans in a given month given demographic information. 

ln_balance_limit is the log of the maximum balance they can have on the card; 1 is female, 0 male for sex; the education is denoted: 1 = graduate school; 2 = university; 3 = high school; 4 = others; 1 is married and 0 single for marriage; default_next_month is whether the person defaults in the following month (1 yes, 0 no).

In [52]:
import pandas as pd
from path import Path
import pandas as pd
from collections import Counter

In [35]:
data = Path('../Resources/cc_default.csv')
df = pd.read_excel(data, skiprows=[0])

In [54]:
x_cols = [i for i in df.columns if i != 'default_next_month']
X = df[x_cols]
y = df['default_next_month']

In [55]:
# train-test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

In [56]:
Counter(y_train)

Counter({0: 17532, 1: 4968})

### Random Oversampling

In [57]:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
X_resampled, y_resampled = ros.fit_resample(X_train, y_train)
Counter(y_resampled)

Counter({0: 17532, 1: 17532})

In [58]:
# Logistic regression using random oversampled data
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=1)
model.fit(X_resampled, y_resampled)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [59]:
# Display the confusion matrix
from sklearn.metrics import confusion_matrix
y_pred = model.predict(X_test)
confusion_matrix(y_test, y_pred)

array([[3744, 2088],
       [ 734,  934]], dtype=int64)

In [60]:
from sklearn.metrics import balanced_accuracy_score
balanced_accuracy_score(y_test, y_pred)

0.6009636735056398

In [61]:
# Print the imbalanced classification report
from imblearn.metrics import classification_report_imbalanced
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.84      0.64      0.56      0.73      0.60      0.36      5832
          1       0.31      0.56      0.64      0.40      0.60      0.36      1668

avg / total       0.72      0.62      0.58      0.65      0.60      0.36      7500



### SMOTE Oversampling

In [62]:
from imblearn.over_sampling import SMOTE
X_resampled, y_resampled = SMOTE(random_state=1, ratio=1.0).fit_resample(X_train, y_train)
from collections import Counter
Counter(y_resampled)

Counter({0: 17532, 1: 17532})

In [63]:
model = LogisticRegression(random_state=1)
model.fit(X_resampled, y_resampled)



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=1, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [64]:
# Calculated the balanced accuracy score
y_pred = model.predict(X_test)
balanced_accuracy_score(y_test, y_pred)

0.5991207034372501

In [65]:
# Display the confusion matrix
confusion_matrix(y_test, y_pred)

array([[3726, 2106],
       [ 735,  933]], dtype=int64)

In [66]:
# Print the imbalanced classification report
print(classification_report_imbalanced(y_test, y_pred))

                   pre       rec       spe        f1       geo       iba       sup

          0       0.84      0.64      0.56      0.72      0.60      0.36      5832
          1       0.31      0.56      0.64      0.40      0.60      0.35      1668

avg / total       0.72      0.62      0.58      0.65      0.60      0.36      7500

