#  Intro to the Dataset and the Aim
<img src="loantap_logo.png" alt="LoanTap logo banner" style="width: 800px;"/>

**Problem Statement**: LoanTap, an online platform offering customized loan products, is facing challenges in efficiently assessing the creditworthiness of loan applicants. By predicting the likelihood of default, the company aims to minimize risks and improve the decision-making process for loan approvals.

**Objective**: The goal is to develop a machine learning model that can predict whether an applicant will default on a personal loan, based on their financial and credit history attributes. The model should help LoanTap make data-driven decisions, reducing the overall risk of default.

**Dataset Overview**: LoanTap has provided a dataset containing various financial and credit-related features for loan applicants. Below is a summary of the dataset:

| Column               | Description                                                              |
|----------------------|--------------------------------------------------------------------------|
| loan_amnt            | The loan amount applied for by the borrower                              |
| term                 | Loan term in months (36 or 60)                                           |
| int_rate             | Interest rate on the loan                                                |
| installment          | Monthly payment owed if the loan originates                              |
| grade                | LoanTap assigned grade                                                   |
| sub_grade            | LoanTap assigned subgrade                                                |
| emp_title            | Job title supplied by the borrower                                       |
| emp_length           | Employment length in years (0-10)                                        |
| home_ownership       | Home ownership status                                                    |
| annual_inc           | Self-reported annual income                                              |
| verification_status  | Income verification status (verified/not verified)                       |
| issue_d              | Date the loan was funded                                                 |
| loan_status          | Target variable (current loan status: default or not)                    |
| purpose              | Purpose of the loan                                                      |
| dti                  | Debt-to-income ratio                                                     |
| earliest_cr_line     | Month the borrower’s earliest credit line was opened                     |
| open_acc             | Number of open credit lines                                              |
| revol_bal            | Total revolving credit balance                                           |
| revol_util           | Revolving line utilization rate                                          |
| total_acc            | Total number of credit lines                                             |
| pub_rec              | Number of derogatory public records                                      |
| application_type     | Individual or joint application                                          |
| mort_acc             | Number of mortgage accounts                                              |
| pub_rec_bankruptcies | Number of public record bankruptcies                                     |
| loan_status          | Target variable indicating loan repayment status (Default or Fully Paid) |

**Aim**

1. To analyze which factors are critical in determining whether a borrower will default on a personal loan.
2. To develop a predictive model that estimates the likelihood of loan default based on borrower attributes.
3. Ensure interpretability of the model so LoanTap can understand the key drivers of defaults.

**Methods and Techniques used:** EDA, feature engineering, modeling using sklearn pipelines, hyperparameter tuning

**Measure of Performance and Minimum Threshold to reach the business objective** : Recall > 90% and  precision > 70% 

**Assumptions**
* The dataset is assumed to be representative of LoanTap’s entire customer base.
* The data remains stable over time, and thus, the model is assumed not to decay rapidly.
* External factors (e.g., economic downturns) are not considered, though they could influence loan repayment behavior.

## Library Setup

In [2]:
# Scientific libraries
import numpy as np
import pandas as pd

# Logging
import logging

# Visual libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Helper libraries
from tqdm.notebook import tqdm, trange # Progress bar
import warnings 
#warnings.filterwarnings('ignore') # ignore all warkings

# To not cache lib import (.py modification won't refelect unless kernal restarts)
#%load_ext autoreload
#%autoreload 2

# Visual setup
%config InlineBackend.figure_format = 'retina' # sets the figure format to 'retina' for high-resolution displays.

# Pandas options
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all' # display all interaction 

# Table styles
table_styles = {
    'cerulean_palette': [
        dict(selector="th", props=[("color", "#FFFFFF"), ("background", "#004D80")]),
        dict(selector="td", props=[("color", "#333333")]),
        dict(selector="table", props=[("font-family", 'Arial'), ("border-collapse", "collapse")]),
        dict(selector='tr:nth-child(even)', props=[('background', '#D3EEFF')]),
        dict(selector='tr:nth-child(odd)', props=[('background', '#FFFFFF')]),
        dict(selector="th", props=[("border", "1px solid #0070BA")]),
        dict(selector="td", props=[("border", "1px solid #0070BA")]),
        dict(selector="tr:hover", props=[("background", "#80D0FF")]),
        dict(selector="tr", props=[("transition", "background 0.5s ease")]),
        dict(selector="th:hover", props=[("font-size", "1.07rem")]),
        dict(selector="th", props=[("transition", "font-size 0.5s ease-in-out")]),
        dict(selector="td:hover", props=[('font-size', '1.07rem'),('font-weight', 'bold')]),
        dict(selector="td", props=[("transition", "font-size 0.5s ease-in-out")])
    ]
}

# Seed value for numpy.random => makes notebooks stable across runs
np.random.seed(42)