# Loan Approval Prediction
This project aims to predict whether a loan application will be approved or not based on applicant details.  
We compare Logistic Regression, Decision Tree, and Random Forest models.


# Loan Approval Prediction Using Machine Learning


# 📌 Problem Statement
Banks and financial institutions often receive thousands of loan applications.  
Approving or rejecting loans manually is time-consuming and may lead to human errors.  



**Objective**: Build a machine learning model that can predict whether a loan should be approved or not, based on applicant details.


## 📊 Dataset Description
The dataset contains information about loan applicants such as:
- Gender, Married, Education, Self_Employed, Credit_History
- ApplicantIncome, CoapplicantIncome, LoanAmount, Loan_Amount_Term
- Loan_Status (Target variable: Approved = 1, Not Approved = 0)

---


## ⚙️ Methodology
1. Load the dataset  
2. Handle missing values  
3. Encode categorical variables  
4. Split data into **training** and **testing** sets  
5. Train models:
   - Logistic Regression  
   - Decision Tree  
   - Random Forest  
6. Evaluate models using accuracy  
7. Compare results  


In [1]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [2]:
# Step 1: Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import pickle

# Step 2: Load Dataset
df = pd.read_csv('/content/drive/MyDrive/LoanApprovalPrediction.csv')

# Step 3: Handle Missing Values
df = df.fillna(df.mode().iloc[0])   # replaces NaN with most frequent values

# Drop Loan_ID
df = df.drop("Loan_ID", axis=1)

# Step 4: Convert categorical variables into numbers
df = pd.get_dummies(df, drop_first=True)

# Step 5: Split into features and target
X = df.drop("Loan_Status_Y", axis=1)   # target column is Loan_Status_Y
y = df["Loan_Status_Y"]


# Step 6: Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Step 7: Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# ==============================
# Logistic Regression
# ==============================
lr = LogisticRegression(max_iter=5000)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)
acc_lr = accuracy_score(y_test, y_pred_lr)
print("Logistic Regression Accuracy:", round(acc_lr*100, 2), "%")

# ==============================
# Decision Tree
# ==============================
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
y_pred_dt = dt.predict(X_test)
acc_dt = accuracy_score(y_test, y_pred_dt)
print("Decision Tree Accuracy:", round(acc_dt*100, 2), "%")

# ==============================
# Random Forest
# ==============================
rf = RandomForestClassifier(random_state=42)
rf.fit(X_train, y_train)
y_pred_rf = rf.predict(X_test)
acc_rf = accuracy_score(y_test, y_pred_rf)
print("Random Forest Accuracy:", round(acc_rf*100, 2), "%")

# Step 8: Save Best Model
with open("loan_model.pkl", "wb") as f:
    pickle.dump({"model": lr, "scaler": scaler, "columns": list(X.columns)}, f)

print("✅ Model saved as loan_model.pkl")


Logistic Regression Accuracy: 82.5 %
Decision Tree Accuracy: 66.67 %
Random Forest Accuracy: 81.67 %
✅ Model saved as loan_model.pkl


In [5]:
print("Summary of Model Accuracies:")
print("Logistic Regression:", round(acc_lr*100, 2), "%")
print("Decision Tree:", round(acc_dt*100, 2), "%")
print("Random Forest:", round(acc_rf*100, 2), "%")

# Find the best model
accuracies = {
    "Logistic Regression": acc_lr,
    "Decision Tree": acc_dt,
    "Random Forest": acc_rf
}

best_model = max(accuracies, key=accuracies.get)
print("\nConclusion: The best model is", best_model,
      "with", round(accuracies[best_model]*100, 2), "% accuracy.")


Summary of Model Accuracies:
Logistic Regression: 82.5 %
Decision Tree: 66.67 %
Random Forest: 81.67 %

Conclusion: The best model is Logistic Regression with 82.5 % accuracy.


In [3]:
import pickle

# After training your final model
with open("loan_model.pkl", "wb") as file:
    pickle.dump(lr, file)


In [4]:
from google.colab import files
files.download("loan_model.pkl")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>