# Predicting Credit Card (CC) Approvals with Machine Learning

## Background tale

Commercial banks receive _a lot_ of applications for credit cards. Many of them get rejected for many reasons, like high loan balances, low income levels, or too many inquiries on an individual's credit report, for example. Manually analyzing these applications is mundane, error-prone, and time-consuming (and time is money!). Luckily, this task can be automated with the power of machine learning and pretty much every commercial bank does so nowadays. In this workbook, you will build an automatic credit card approval predictor using machine learning techniques, just like real banks do.

## The Data

The data is a small subset of the Credit Card Approval dataset from the UCI Machine Learning Repository showing the credit card applications a bank receives. This dataset has been loaded as a `pandas` DataFrame called `cc_apps`. The last column in the dataset is the target value.

## Goal

A ML model for prediction of CC approvals with an accuracy of at least 75%.

## Getting to know the data

In [18]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression, RidgeClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV

# Load the dataset
cc_apps = pd.read_csv("cc_approvals.data", header=None)
print('Getting to know the data:\n')
print('----------------------------------')
print('cc_apps shape:', cc_apps.shape)
print('----------------------------------')
print('cc_apps columns:\n', cc_apps.columns)
print('----------------------------------')
print('cc_apps dtypes:\n', cc_apps.dtypes)
print('----------------------------------')
print('cc_apps info:\n', cc_apps.info())
print('----------------------------------')
print('cc_apps head:\n', cc_apps.head())
print('----------------------------------')


Getting to know the data:

----------------------------------
cc_apps shape: (690, 14)
----------------------------------
cc_apps columns:
 Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')
----------------------------------
cc_apps dtypes:
 0      object
1      object
2     float64
3      object
4      object
5      object
6      object
7     float64
8      object
9      object
10      int64
11     object
12      int64
13     object
dtype: object
----------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 14 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       690 non-null    object 
 1   1       690 non-null    object 
 2   2       690 non-null    float64
 3   3       690 non-null    object 
 4   4       690 non-null    object 
 5   5       690 non-null    object 
 6   6       690 non-null    object 
 7   7       690 non-null    float64
 8   8       690 no

In [17]:
# Let's first provide a good name for the columns. The last one is the target
cc_apps.columns = ['col_' + str(i) for i in range(cc_apps.shape[1] - 1)] + ['target']
# Let's also map the target to be 1 for + and 0 for -
cc_apps['target'] = cc_apps['target'].map({'+': 1, '-': 0}) 

## Building the Pipeline

In [None]:
def build_pipeline(model, num_cols: list, cat_cols: list) -> Pipeline:
    """
    Creates a pipeline for the given model. This function accounts for the model's scaling sensitivity, that is, it doesn't apply the StandardScaler to models that are not sensitive to scaling.
    
    Args:
        model (class object): The model to be used in the pipeline.
        
    Returns:
        pipeline (Pipeline): The constructed pipeline.
    """
    # Check the scaling sensitivity
    is_scale_sensitive = isinstance(model, (LogisticRegression, RidgeClassifier, SVC, KNeighborsClassifier))
    
    transformers = []
    
    if is_scale_sensitive:
        transformers.append(
            ('num', StandardScaler(), num_cols)
        )
    else:
        transformers.append(
            ('num', 'passthrough', num_cols)
        )
    
    transformers.append(
        ('cats', OneHotEncoder(handle_unknown='ignore'), cat_cols)
    )
    
    preprocessor = ColumnTransformer(transformers=transformers)
    
    return Pipeline(steps=[
        ('preprocessor', preprocessor),
        ('model', model)
    ])

def apply_grid_search(models: list):
    """
    
    """
    
    for model, param_grid in models:
        pipeline = build_pipeline(model, num_cols, cat_cols)

In [None]:
# Applying the GridSearch

models = [
    (LogisticRegression(), {'model__C': [.001, .01, .1, 1, 10,100],
                            'model__max_iter': [100, 500, 1000],
                            'model__penalty': ['l1', 'l2', 'none'],
                            'model__solver': ['liblinear', 'libfgs']}),
    (RidgeClassifier(), {}),
    (DecisionTreeClassifier(), {}),
    (RandomForestClassifier(), {}),
    (GradientBoostingClassifier(), {}),
    (SVC(), {}),
    (GaussianNB(), {}),
    (KNeighborsClassifier(), {})
]