The dataset is based on accepted LendingClub loan applications, and the cleaned
dataset includes only loans with a final status of either "Fully Paid" (0) or "Charged Off" (1).

**Project AIM** 
To analyze loan and borrower attributes to understand what drives loan default, and to eventually build a predictive model for loan default risk.

| Feature               | Type      | Description                                                 |
|-----------------------|-----------|-------------------------------------------------------------|
| loan_amnt             | Numeric   | Loan amount funded by investors                             |
| term                  | Numeric   | Duration of loan in months (36 or 60)                       |
| int_rate              | Numeric   | Interest rate charged on the loan                           |
| installment           | Numeric   | Monthly installment amount                                  |
| grade                 | Categorical | Internal credit grade (A-G)                               |
| emp_length            | Numeric   | Years of employment (0–10, or NaN)                          |
| home_ownership        | Categorical | Home ownership status                                     |
| annual_inc            | Numeric   | Annual income of the borrower                               |
| verification_status   | Categorical | Whether income was verified                               |
| purpose               | Categorical | Purpose of the loan (e.g., debt consolidation, home, etc.) |
| dti                   | Numeric   | Debt-to-income ratio                                        |
| delinq_2yrs           | Numeric   | Number of delinquencies in the past 2 years                 |
| inq_last_6mths        | Numeric   | Number of credit inquiries in past 6 months                 |
| open_acc              | Numeric   | Number of open credit lines                                 |
| pub_rec               | Numeric   | Number of derogatory public records                         |
| revol_bal             | Numeric   | Total revolving balance_


EDA Goals (Before Modeling)
You’ll want to explore:

1. Class Imbalance
Distribution of loan_status (default vs fully paid).

Why: Helps determine if you need techniques like oversampling/undersampling or weighted loss functions.

2. Univariate Analysis
Understand distribution of each feature individually.

e.g. loan_amnt, int_rate, dti, emp_length, grade, purpose, etc.

Why: Detect outliers, skewness, or problematic features (e.g., near-zero variance).

3. Bivariate Analysis with Target (loan_status)
How each feature correlates with loan default:

Does higher int_rate lead to more defaults?

Does emp_length reduce risk?

Does loan grade/purpose affect default rate?

Visuals: Boxplots, bar plots, KDEs, violin plots grouped by loan_status.

4. Correlations
Correlation matrix for numeric variables.

Why: Helps spot multicollinearity (e.g., fico_range_low and fico_range_high).

5. Missing Values
Especially for emp_length, which still has ~6% missing.

Strategy may depend on its relationship with loan_status.

6. Feature Importance (Pre-Model Insight)
Simple logistic regression or tree model to get a first glance at influential features.

| Goal                                     | Why it Matters                                       |
| ---------------------------------------- | ---------------------------------------------------- |
| Understand target distribution           | Default modeling needs balanced or corrected targets |
| Understand each feature                  | Detect outliers, transformations, binning needs      |
| Understand feature-target relationship   | Discover useful predictors                           |
| Understand feature-feature relationships | Reduce multicollinearity                             |
| Diagnose missing values                  | Decide whether to impute, drop, or encode            |
