## 📂 Project Setup Instructions

To run this notebook correctly, please make sure to **upload the following files** before execution:

1. **Training Data.csv**  
   → This is the original training dataset used for building the model.  
   📌 It's required to **generate city/state mappings** and perform preprocessing steps such as feature analysis or re-encoding.  
   📁 This file is located in the project repository.

2. **random_forest_model.pkl**  
   → This file contains the trained Random Forest model.  
   🔗 Download it from Google Drive: [Click Here](https://drive.google.com/file/d/1RBZA9K3C8uHZ269RJe1aciK2PU8bs29c/view?usp=drive_link)

3. **scaler.pkl**  
   → This is the fitted `StandardScaler` used during training to standardize numeric features.  
   📁 This file is located in the project repository.

4. **model_columns.pkl**  
   → This contains the list of encoded feature columns used by the model for prediction.  
   📁 This file is also located in the project repository.

> ⚠️ Be sure to upload all four files **before running any prediction or deployment steps**.
> The training dataset is especially important for generating the **city and state mappings** used to interpret user input.


# Model Deployment

In [21]:
import joblib
import pandas as pd
import joblib

# 🔄 Load the best-performing model (Random Forest)
rf_model = joblib.load("random_forest_model.pkl")

# 🔄 Load the fitted StandardScaler
scaler = joblib.load("scaler.pkl")

# 🔄 Load the encoded feature columns used during training
model_columns = joblib.load("model_columns.pkl")


In [22]:
# Load the training dataset
train_df = pd.read_csv("Training Data.csv")

# Step 1: Get unique city/state values
unique_cities = sorted(train_df['city'].unique())
unique_states = sorted(train_df['state'].unique())

# Step 2: Create mappings (original -> encoded name)
city_mapping = {city.lower(): f"city_{i}" for i, city in enumerate(unique_cities)}
state_mapping = {state.lower(): f"state_{i}" for i, state in enumerate(unique_states)}

# Step 3: Save the mappings to use later in deployment
joblib.dump(city_mapping, "city_mapping.pkl")
joblib.dump(state_mapping, "state_mapping.pkl")


['state_mapping.pkl']

## 🧠 predict_credit_risk() – Deployment-Ready Function

In [23]:
def predict_credit_risk(input_data: dict) -> int:
    """
    Predict credit risk using the saved Random Forest model.

    Args:
        input_data (dict): Borrower data with original city and state names.

    Returns:
        int: 0 for low risk, 1 for high risk
    """

    import pandas as pd
    import joblib

    # 🔁 Load model components
    model = joblib.load("random_forest_model.pkl")
    scaler = joblib.load("scaler.pkl")
    model_columns = joblib.load("model_columns.pkl")

    # 🔁 Load city and state mapping
    city_mapping = joblib.load("city_mapping.pkl")
    state_mapping = joblib.load("state_mapping.pkl")

    # 🔁 Convert user-inputted city/state to encoded values
    input_data['city'] = city_mapping.get(input_data['city'].lower(), "unknown_city")
    input_data['state'] = state_mapping.get(input_data['state'].lower(), "unknown_state")

    # 📦 Convert to DataFrame
    df = pd.DataFrame([input_data])

    # 🔠 Encode and align
    df_encoded = pd.get_dummies(df)
    df_encoded = df_encoded.reindex(columns=model_columns, fill_value=0)

    # 📏 Scale numeric features
    df_scaled = scaler.transform(df_encoded)

    # 🔍 Make prediction
    prediction = model.predict(df_scaled)

    return int(prediction[0])


### 🧪 Example: Predict Credit Risk for a New Borrower

In [24]:
# 📋 Sample input: borrower's profile
user_input = {
    'income': 50000,
    'age': 30,
    'experience': 5,
    'married': 'yes',
    'house_ownership': 'rented',
    'car_ownership': 'no',
    'profession': 'engineer',
    'city': 'Surat',
    'state': 'Gujarat',
    'current_job_years': 2,
    'current_house_years': 4
}


# 🔍 Get the prediction
result = predict_credit_risk(user_input)

# 🧾 Show result
print("🔍 Prediction:", "❌ High Risk" if result == 1 else "✅ Low Risk")


🔍 Prediction: ❌ High Risk


## 🧾 get_user_input() – Interactive CLI Input for Deployment Testing

In [25]:
def get_user_input():
    """
    Collects borrower information from the user via command-line input,
    and maps city/state to encoded format using saved mappings.

    Returns:
        dict: A dictionary with borrower features for risk prediction.
    """
    print("🔍 Please enter borrower information for credit risk prediction:\n")

    # Load city/state mappings
    city_mapping = joblib.load("city_mapping.pkl")
    state_mapping = joblib.load("state_mapping.pkl")

    # Ask for raw user input
    state_input = input("State (e.g., Gujarat): ").strip().lower()
    city_input = input("City (e.g., Surat): ").strip().lower()


    # Map to encoded values
    encoded_city = city_mapping.get(city_input, "unknown_city")
    encoded_state = state_mapping.get(state_input, "unknown_state")


    input_data = {
        'income': float(input("Income (e.g., 50000): ")),
        'age': int(input("Age (e.g., 30): ")),
        'experience': int(input("Years of Work Experience: ")),
        'married': input("Married (yes or no): ").strip().lower(),
        'house_ownership': input("House Ownership (owned, rented, norent_noown): ").strip().lower(),
        'car_ownership': input("Car Ownership (yes or no): ").strip().lower(),
        'profession': input("Profession (e.g., software engineer): ").strip().lower(),
        'city': encoded_city,
        'state': encoded_state,
        'current_job_years': int(input("Years at Current Job: ")),
        'current_house_years': int(input("Years in Current House: "))
    }

    return input_data


### 🧪 Run the Full Prediction Flow with Live User Input

In [26]:
# 🔽 Get user input from the terminal
user_input = get_user_input()

# 🔮 Predict
result = predict_credit_risk(user_input)

# ✅ Display prediction
print("\n📊 Prediction:", "❌ High Risk" if result == 1 else "✅ Low Risk")


🔍 Please enter borrower information for credit risk prediction:

State (e.g., Gujarat): Gujarat
City (e.g., Surat): Surat
Income (e.g., 50000): 50000
Age (e.g., 30): 30
Years of Work Experience: 2
Married (yes or no): yes
House Ownership (owned, rented, norent_noown): rented
Car Ownership (yes or no): yes
Profession (e.g., software engineer): engineer
Years at Current Job: 2
Years in Current House: 1

📊 Prediction: ❌ High Risk
