# Alternative Credit Scoring Model

This notebook implements an alternative credit scoring system based on non-traditional data points. The model uses the Lending Club dataset as a foundation and implements feature engineering to create a scoring system aligned with the specified categories:

1. Income Stability (35%)
2. Payment Consistency (30%)
3. Asset Value (20%)
4. Behavioral Factors (15%)

## Setup and Configuration

In [None]:
# Install required packages
!pip install numpy pandas matplotlib seaborn scikit-learn xgboost

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve, precision_recall_curve
from sklearn.feature_selection import SelectFromModel
import xgboost as xgb
import pickle
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

## Data Loading and Initial Exploration

### Option 1: Download the Lending Club dataset from Kaggle

To download the dataset directly from Kaggle, you'll need to:
1. Create a Kaggle account if you don't have one
2. Generate an API token from your Kaggle account settings
3. Upload the kaggle.json file to Colab

Alternatively, you can download the dataset manually from Kaggle and upload it to Colab.

In [None]:
# Option 1: Download from Kaggle (uncomment and run if you have Kaggle API credentials)

# # Upload your kaggle.json file
# from google.colab import files
# files.upload()  # Upload your kaggle.json file

# # Set up Kaggle API credentials
# !mkdir -p ~/.kaggle
# !cp kaggle.json ~/.kaggle/
# !chmod 600 ~/.kaggle/kaggle.json

# # Download the dataset
# !kaggle datasets download -d wordsforthewise/lending-club
# !unzip -q lending-club.zip

In [None]:
# Option 2: Upload the dataset manually
# Uncomment and run this cell if you've downloaded the dataset manually

# from google.colab import files
# uploaded = files.upload()  # Upload your dataset file

In [None]:
# Option 3: Use sample data for demonstration
# This is useful if you don't have the full dataset

def create_sample_data(n_samples=5000):
    """
    Create a sample dataset mimicking the Lending Club data structure
    for demonstration purposes
    
    Parameters:
    -----------
    n_samples : int
        Number of samples to generate
    
    Returns:
    --------
    pd.DataFrame
        Sample dataframe
    """
    np.random.seed(42)
    
    # Generate sample data
    data = {
        'loan_amnt': np.random.uniform(1000, 40000, n_samples),
        'term': np.random.choice([' 36 months', ' 60 months'], n_samples),
        'int_rate': np.random.uniform(5, 25, n_samples),
        'installment': np.random.uniform(100, 1500, n_samples),
        'grade': np.random.choice(['A', 'B', 'C', 'D', 'E', 'F', 'G'], n_samples),
        'sub_grade': np.random.choice(['A1', 'A2', 'B1', 'B2', 'C1', 'C2', 'D1', 'D2', 'E1', 'E2', 'F1', 'G1'], n_samples),
        'emp_title': np.random.choice(['Teacher', 'Engineer', 'Manager', 'Developer', 'Nurse'], n_samples),
        'emp_length': np.random.choice(['< 1 year', '1 year', '2 years', '3 years', '4 years', '5 years', '6 years', '7 years', '8 years', '9 years', '10+ years'], n_samples),
        'home_ownership': np.random.choice(['RENT', 'MORTGAGE', 'OWN', 'OTHER'], n_samples),
        'annual_inc': np.random.uniform(20000, 200000, n_samples),
        'verification_status': np.random.choice(['Verified', 'Source Verified', 'Not Verified'], n_samples),
        'issue_d': np.random.choice(['Jan-2019', 'Feb-2019', 'Mar-2019', 'Apr-2019'], n_samples),
        'loan_status': np.random.choice(['Fully Paid', 'Current', 'Charged Off', 'Late (31-120 days)', 'In Grace Period'], n_samples),
        'purpose': np.random.choice(['debt_consolidation', 'credit_card', 'home_improvement', 'major_purchase', 'car'], n_samples),
        'title': np.random.choice(['Debt consolidation', 'Credit card refinancing', 'Home improvement', 'Major purchase', 'Car financing'], n_samples),
        'zip_code': np.random.choice(['123xx', '456xx', '789xx'], n_samples),
        'addr_state': np.random.choice(['CA', 'NY', 'TX', 'FL', 'IL'], n_samples),
        'dti': np.random.uniform(0, 30, n_samples),
        'earliest_cr_line': np.random.choice(['Jan-2000', 'Jan-2005', 'Jan-2010', 'Jan-2015'], n_samples),
        'open_acc': np.random.randint(1, 20, n_samples),
        'pub_rec': np.random.choice([0, 1, 2], n_samples, p=[0.85, 0.1, 0.05]),
        'revol_bal': np.random.uniform(0, 50000, n_samples),
        'revol_util': np.random.uniform(0, 100, n_samples),
        'total_acc': np.random.randint(1, 50, n_samples),
        'initial_list_status': np.random.choice(['w', 'f'], n_samples),
        'application_type': np.random.choice(['Individual', 'Joint'], n_samples, p=[0.9, 0.1]),
        'mort_acc': np.random.choice([0, 1, 2, 3], n_samples),
        'pub_rec_bankruptcies': np.random.choice([0, 1], n_samples, p=[0.95, 0.05]),
        'delinq_2yrs': np.random.choice([0, 1, 2], n_samples, p=[0.8, 0.15, 0.05]),
        'inq_last_6mths': np.random.choice([0, 1, 2, 3, 4], n_samples),
    }
    
    # Create DataFrame
    df = pd.DataFrame(data)
    
    # Create target variable (1 = default, 0 = no default)
    # For simplicity, we'll use a combination of features to determine default
    conditions = [
        (df['grade'].isin(['F', 'G'])) & (df['dti'] > 20),
        (df['pub_rec'] > 0) & (df['revol_util'] > 80),
        (df['delinq_2yrs'] > 0) & (df['inq_last_6mths'] > 2)
    ]
    df['is_default'] = np.where(np.any(conditions, axis=0), 1, 0)
    
    return df

# Create sample data
df = create_sample_data(n_samples=5000)
print(f"Sample data created with {df.shape[0]} rows and {df.shape[1]} columns.")

In [None]:
# Load the dataset (uncomment the appropriate line)
# If you downloaded from Kaggle or uploaded manually:
# df = pd.read_csv('accepted_2007_to_2018Q4.csv', low_memory=False)

# Or use the sample data we created:
# df is already defined above

# Display basic information about the dataset
print(f"Dataset shape: {df.shape}")
print("\nData types:")
print(df.dtypes.value_counts())
print("\nMissing values:")
missing = df.isnull().sum()
print(missing[missing > 0].sort_values(ascending=False).head(10))

# Check the distribution of the target variable
print("\nDefault distribution:")
print(df['is_default'].value_counts())
print(f"Default rate: {df['is_default'].mean():.2%}")

## Data Preprocessing

In [None]:
def preprocess_data(df):
    """
    Preprocess the data for modeling
    
    Parameters:
    -----------
    df : pd.DataFrame
        Raw dataset
    
    Returns:
    --------
    pd.DataFrame
        Preprocessed dataset
    X : pd.DataFrame
        Features for modeling
    y : pd.Series
        Target variable
    """
    print("Preprocessing data...")
    
    # Create a copy to avoid modifying the original dataframe
    data = df.copy()
    
    # Define target variable
    # If 'is_default' exists, use it directly
    if 'is_default' in data.columns:
        y = data['is_default']
        data = data.drop('is_default', axis=1)
    # Otherwise, create it from 'loan_status'
    elif 'loan_status' in data.columns:
        # Define defaults based on loan status
        default_statuses = ['Charged Off', 'Default', 'Late (31-120 days)', 'Late (16-30 days)', 'Does not meet the credit policy. Status:Charged Off']
        data['is_default'] = data['loan_status'].isin(default_statuses).astype(int)
        y = data['is_default']
        data = data.drop(['is_default', 'loan_status'], axis=1)
    else:
        raise ValueError("No target variable found in the dataset")
    
    # Select relevant features based on our feature selection analysis
    selected_features = [
        # Income Stability features
        'emp_length', 'annual_inc', 'verification_status',
        # Payment Consistency features
        'delinq_2yrs', 'pub_rec', 'revol_util',
        # Asset Value features
        'home_ownership', 'mort_acc',
        # Behavioral Factors features
        'dti', 'open_acc', 'total_acc', 'inq_last_6mths'
    ]
    
    # Filter features that exist in the dataset
    existing_features = [f for f in selected_features if f in data.columns]
    missing_features = [f for f in selected_features if f not in data.columns]
    
    if missing_features:
        print(f"Warning: The following selected features are not in the dataset: {missing_features}")
    
    # Select only the features we need
    X = data[existing_features].copy()
    
    # Handle employment length
    if 'emp_length' in X.columns:
        # Convert employment length to numeric
        emp_length_map = {
            '< 1 year': 0,
            '1 year': 1,
            '2 years': 2,
            '3 years': 3,
            '4 years': 4,
            '5 years': 5,
            '6 years': 6,
            '7 years': 7,
            '8 years': 8,
            '9 years': 9,
            '10+ years': 10,
            'n/a': np.nan
        }
        X['emp_length'] = X['emp_length'].map(lambda x: emp_length_map.get(x, np.nan))
    
    # Handle verification status
    if 'verification_status' in X.columns:
        # Convert verification status to ordinal
        verification_map = {
            'Not Verified': 0,
            'Verified': 1,
            'Source Verified': 2
        }
        X['verification_status'] = X['verification_status'].map(verification_map)
    
    # Handle home ownership
    if 'home_ownership' in X.columns:
        # Convert to dummy variables later in the pipeline
        pass
    
    print(f"Preprocessed data shape: {X.shape}")
    return data, X, y

# Preprocess the data
data, X, y = preprocess_data(df)

## Feature Engineering

In [None]:
def engineer_features(X):
    """
    Engineer additional features for the model
    
    Parameters:
    -----------
    X : pd.DataFrame
        Preprocessed features
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with engineered features
    """
    print("Engineering features...")
    
    # Create a copy to avoid modifying the original dataframe
    X_eng = X.copy()
    
    # 1. Income Stability Score
    # Higher score for longer employment and higher income
    if 'emp_length' in X_eng.columns and 'annual_inc' in X_eng.columns:
        # Normalize employment length (0-10 years)
        X_eng['emp_length_norm'] = X_eng['emp_length'] / 10
        
        # Normalize income (assuming max income of 300,000)
        X_eng['annual_inc_norm'] = X_eng['annual_inc'] / 300000
        X_eng['annual_inc_norm'] = X_eng['annual_inc_norm'].clip(0, 1)
        
        # Calculate income stability score (weighted average)
        X_eng['income_stability_score'] = (0.7 * X_eng['emp_length_norm'] + 
                                          0.3 * X_eng['annual_inc_norm'])
    
    # 2. Payment Consistency Score
    # Higher score for fewer delinquencies and public records
    if 'delinq_2yrs' in X_eng.columns and 'pub_rec' in X_eng.columns:
        # Normalize delinquencies (inverse, as fewer is better)
        X_eng['delinq_2yrs_norm'] = 1 - (X_eng['delinq_2yrs'] / 5).clip(0, 1)
        
        # Normalize public records (inverse, as fewer is better)
        X_eng['pub_rec_norm'] = 1 - (X_eng['pub_rec'] / 3).clip(0, 1)
        
        # Calculate payment consistency score
        X_eng['payment_consistency_score'] = (0.6 * X_eng['delinq_2yrs_norm'] + 
                                             0.4 * X_eng['pub_rec_norm'])
    
    # 3. Asset Value Score
    # Higher score for home ownership and more mortgage accounts
    if 'home_ownership' in X_eng.columns:
        # Create home ownership score
        home_ownership_map = {
            'OWN': 1.0,
            'MORTGAGE': 0.7,
            'RENT': 0.3,
            'OTHER': 0.1,
            'NONE': 0.0
        }
        X_eng['home_ownership_score'] = X_eng['home_ownership'].map(
            lambda x: home_ownership_map.get(x, 0.0) if pd.notna(x) else 0.0
        )
        
        # Normalize mortgage accounts
        if 'mort_acc' in X_eng.columns:
            X_eng['mort_acc_norm'] = (X_eng['mort_acc'] / 5).clip(0, 1)
            
            # Calculate asset value score
            X_eng['asset_value_score'] = (0.7 * X_eng['home_ownership_score'] + 
                                         0.3 * X_eng['mort_acc_norm'])
        else:
            X_eng['asset_value_score'] = X_eng['home_ownership_score']
    
    # 4. Behavioral Score
    # Higher score for lower DTI and fewer inquiries
    if 'dti' in X_eng.columns and 'inq_last_6mths' in X_eng.columns:
        # Normalize DTI (inverse, as lower is better)
        X_eng['dti_norm'] = 1 - (X_eng['dti'] / 40).clip(0, 1)
        
        # Normalize inquiries (inverse, as fewer is better)
        X_eng['inq_norm'] = 1 - (X_eng['inq_last_6mths'] / 10).clip(0, 1)
        
        # Calculate behavioral score
        X_eng['behavioral_score'] = (0.7 * X_eng['dti_norm'] + 
                                    0.3 * X_eng['inq_norm'])
    
    # 5. Overall Alternative Credit Score
    # Weighted average of the four component scores
    score_components = [
        ('income_stability_score', 0.35),
        ('payment_consistency_score', 0.30),
        ('asset_value_score', 0.20),
        ('behavioral_score', 0.15)
    ]
    
    # Check which components are available
    available_components = [c for c, _ in score_components if c in X_eng.columns]
    
    if available_components:
        # Normalize weights for available components
        available_weights = [w for c, w in score_components if c in X_eng.columns]
        normalized_weights = [w / sum(available_weights) for w in available_weights]
        
        # Calculate overall score
        X_eng['alternative_credit_score'] = sum(
            X_eng[c] * w for c, w in zip(available_components, normalized_weights)
        )
        
        # Scale to 0-1000 range
        X_eng['alternative_credit_score'] = (X_eng['alternative_credit_score'] * 1000).clip(0, 1000)
    
    print(f"Engineered features shape: {X_eng.shape}")
    return X_eng

# Engineer features
X_eng = engineer_features(X)

## Model Building

In [None]:
def build_preprocessing_pipeline(X):
    """
    Build a preprocessing pipeline for the model
    
    Parameters:
    -----------
    X : pd.DataFrame
        Features dataframe
    
    Returns:
    --------
    sklearn.pipeline.Pipeline
        Preprocessing pipeline
    """
    # Identify numeric and categorical columns
    numeric_features = X.select_dtypes(include=['int64', 'float64']).columns.tolist()
    categorical_features = X.select_dtypes(include=['object', 'category']).columns.tolist()
    
    # Create preprocessing pipelines for numeric and categorical features
    numeric_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='median')),
        ('scaler', StandardScaler())
    ])
    
    categorical_transformer = Pipeline(steps=[
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ])
    
    # Combine preprocessing steps
    preprocessor = ColumnTransformer(
        transformers=[
            ('num', numeric_transformer, numeric_features),
            ('cat', categorical_transformer, categorical_features)
        ]
    )
    
    return preprocessor

def train_models(X, y):
    """
    Train multiple models and select the best one
    
    Parameters:
    -----------
    X : pd.DataFrame
        Features
    y : pd.Series
        Target variable
    
    Returns:
    --------
    dict
        Dictionary containing trained models, preprocessor, and performance metrics
    """
    print("Training models...")
    
    # Split data into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42, stratify=y
    )
    
    # Build preprocessing pipeline
    preprocessor = build_preprocessing_pipeline(X)
    
    # Define models to train
    models = {
        'logistic_regression': LogisticRegression(max_iter=1000, random_state=42),
        'random_forest': RandomForestClassifier(n_estimators=100, random_state=42),
        'gradient_boosting': GradientBoostingClassifier(n_estimators=100, random_state=42),
        'neural_network': MLPClassifier(hidden_layer_sizes=(50, 25), max_iter=300, random_state=42)
    }
    
    # Train and evaluate each model
    results = {}
    best_auc = 0
    best_model_name = None
    
    for name, model in models.items():
        print(f"Training {name}...")
        
        # Create pipeline with preprocessing and model
        pipeline = Pipeline(steps=[
            ('preprocessor', preprocessor),
            ('model', model)
        ])
        
        # Train the model
        pipeline.fit(X_train, y_train)
        
        # Evaluate on test set
        y_pred_proba = pipeline.predict_proba(X_test)[:, 1]
        auc = roc_auc_score(y_test, y_pred_proba)
        
        # Store results
        results[name] = {
            'pipeline': pipeline,
            'auc': auc,
            'y_test': y_test,
            'y_pred_proba': y_pred_proba
        }
        
        print(f"{name} - AUC: {auc:.4f}")
        
        # Track best model
        if auc > best_auc:
            best_auc = auc
            best_model_name = name
    
    print(f"\nBest model: {best_model_name} (AUC: {best_auc:.4f})")
    
    # Return all results and the best model
    return {
        'models': results,
        'best_model_name': best_model_name,
        'preprocessor': preprocessor,
        'X_train': X_train,
        'X_test': X_test,
        'y_train': y_train,
        'y_test': y_test
    }

# Train models
model_results = train_models(X_eng, y)

## Model Evaluation

In [None]:
# Evaluate the best model
best_model_name = model_results['best_model_name']
best_model = model_results['models'][best_model_name]['pipeline']
y_test = model_results['y_test']
y_pred_proba = model_results['models'][best_model_name]['y_pred_proba']
y_pred = (y_pred_proba > 0.5).astype(int)

# Print classification report
print(f"Classification Report for {best_model_name}:")
print(classification_report(y_test, y_pred))

# Plot confusion matrix
plt.figure(figsize=(8, 6))
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title(f'Confusion Matrix - {best_model_name}')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Plot ROC curve
plt.figure(figsize=(8, 6))
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr, label=f'{best_model_name} (AUC = {roc_auc_score(y_test, y_pred_proba):.3f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

## Save the Model

In [None]:
def save_model(model_results, file_path):
    """
    Save the trained model to a file
    
    Parameters:
    -----------
    model_results : dict
        Dictionary containing trained models and results
    file_path : str
        Path to save the model
    
    Returns:
    --------
    str
        Path to the saved model
    """
    print(f"Saving model to {file_path}...")
    
    # Get the best model
    best_model_name = model_results['best_model_name']
    best_pipeline = model_results['models'][best_model_name]['pipeline']
    
    # Save the model
    with open(file_path, 'wb') as f:
        pickle.dump({
            'pipeline': best_pipeline,
            'model_name': best_model_name,
            'feature_names': model_results['X_train'].columns.tolist(),
            'auc': model_results['models'][best_model_name]['auc']
        }, f)
    
    print(f"Model saved successfully to {file_path}")
    return file_path

# Save the model
model_path = 'alternative_credit_scoring_model.pkl'
save_model(model_results, model_path)

## Inference Function

Now let's create a function to use the model for inference on new data.

In [None]:
def load_model(file_path):
    """
    Load a trained model from a file
    
    Parameters:
    -----------
    file_path : str
        Path to the model file
    
    Returns:
    --------
    dict
        Dictionary containing the loaded model and metadata
    """
    print(f"Loading model from {file_path}...")
    
    try:
        with open(file_path, 'rb') as f:
            model_data = pickle.load(f)
        
        print(f"Model loaded successfully: {model_data['model_name']} (AUC: {model_data['auc']:.4f})")
        return model_data
    except Exception as e:
        print(f"Error loading model: {e}")
        return None

def grade_from_score(score):
    """
    Convert a numeric score to a letter grade
    
    Parameters:
    -----------
    score : int
        Numeric score
    
    Returns:
    --------
    str
        Letter grade
    """
    if score >= 800:
        return "A+"
    elif score >= 750:
        return "A"
    elif score >= 720:
        return "A-"
    elif score >= 700:
        return "B+"
    elif score >= 680:
        return "B"
    elif score >= 660:
        return "B-"
    elif score >= 640:
        return "C+"
    elif score >= 620:
        return "C"
    elif score >= 600:
        return "C-"
    elif score >= 580:
        return "D+"
    elif score >= 560:
        return "D"
    elif score >= 540:
        return "D-"
    elif score >= 520:
        return "E+"
    elif score >= 500:
        return "E"
    elif score >= 480:
        return "E-"
    else:
        return "F"

def predict_credit_score(model_data, input_data):
    """
    Predict credit score for new data
    
    Parameters:
    -----------
    model_data : dict
        Dictionary containing the loaded model and metadata
    input_data : dict or pd.DataFrame
        Input data for prediction
    
    Returns:
    --------
    dict
        Dictionary containing prediction results
    """
    # Convert input to DataFrame if it's a dictionary
    if isinstance(input_data, dict):
        input_df = pd.DataFrame([input_data])
    else:
        input_df = input_data.copy()
    
    # Check for required features
    required_features = model_data['feature_names']
    missing_features = [f for f in required_features if f not in input_df.columns]
    
    if missing_features:
        print(f"Warning: Missing features: {missing_features}")
        # Add missing features with NaN values
        for feature in missing_features:
            input_df[feature] = np.nan
    
    # Select only the required features in the correct order
    input_df = input_df[required_features]
    
    # Make prediction
    pipeline = model_data['pipeline']
    default_probability = pipeline.predict_proba(input_df)[:, 1]
    
    # Calculate credit score (inverse of default probability)
    # Scale to 300-850 range (similar to FICO)
    credit_score = 850 - (default_probability * 550)
    
    # Determine credit grade
    grade_ranges = {
        'A': (720, 850),
        'B': (690, 719),
        'C': (660, 689),
        'D': (620, 659),
        'E': (580, 619),
        'F': (520, 579),
        'G': (300, 519)
    }
    
    credit_grade = 'G'  # Default grade
    for grade, (min_score, max_score) in grade_ranges.items():
        if min_score <= credit_score <= max_score:
            credit_grade = grade
            break
    
    # Determine loan approval recommendation
    if credit_score >= 660:  # Grade C or better
        recommendation = "Approved"
        rate_range = f"{5 + (720 - credit_score) / 40:.2f}% - {6 + (720 - credit_score) / 30:.2f}%"
    elif credit_score >= 600:  # Grade D or E
        recommendation = "Conditionally Approved"
        rate_range = f"{8 + (660 - credit_score) / 20:.2f}% - {10 + (660 - credit_score) / 15:.2f}%"
    else:  # Grade F or G
        recommendation = "Denied"
        rate_range = "N/A"
    
    # Create component scores (for demonstration)
    # In a real implementation, these would be calculated from the actual features
    component_scores = {
        'income_stability': int(credit_score * (0.9 + np.random.uniform(-0.1, 0.1))),
        'payment_consistency': int(credit_score * (0.85 + np.random.uniform(-0.15, 0.15))),
        'asset_profile': int(credit_score * (0.95 + np.random.uniform(-0.2, 0.1))),
        'behavioral_factors': int(credit_score * (0.9 + np.random.uniform(-0.1, 0.1)))
    }
    
    # Format results
    results = {
        'score': int(credit_score),
        'grade': credit_grade,
        'default_probability': float(default_probability),
        'recommendation': recommendation,
        'rate_range': rate_range,
        'breakdown': {
            'income_stability': grade_from_score(component_scores['income_stability']),
            'payment_consistency': grade_from_score(component_scores['payment_consistency']),
            'asset_profile': grade_from_score(component_scores['asset_profile']),
            'behavioral_factors': grade_from_score(component_scores['behavioral_factors'])
        }
    }
    
    return results

# Load the model
model_data = load_model(model_path)

## Test the Inference Function

Let's test the inference function with some sample inputs.

In [None]:
# Example 1: Good credit profile
good_profile = {
    'emp_length': 8,                # 8 years of employment
    'annual_inc': 120000,           # $120,000 annual income
    'verification_status': 2,       # Source Verified
    'delinq_2yrs': 0,               # No delinquencies
    'pub_rec': 0,                   # No public records
    'revol_util': 20,               # 20% revolving utilization
    'home_ownership': 'OWN',        # Owns home
    'mort_acc': 1,                  # 1 mortgage account
    'dti': 10,                      # 10% debt-to-income ratio
    'open_acc': 3,                  # 3 open accounts
    'total_acc': 10,                # 10 total accounts
    'inq_last_6mths': 0             # No inquiries in last 6 months
}

# Example 2: Average credit profile
average_profile = {
    'emp_length': 4,                # 4 years of employment
    'annual_inc': 65000,            # $65,000 annual income
    'verification_status': 1,       # Verified
    'delinq_2yrs': 1,               # 1 delinquency
    'pub_rec': 0,                   # No public records
    'revol_util': 50,               # 50% revolving utilization
    'home_ownership': 'MORTGAGE',   # Mortgage
    'mort_acc': 1,                  # 1 mortgage account
    'dti': 20,                      # 20% debt-to-income ratio
    'open_acc': 5,                  # 5 open accounts
    'total_acc': 15,                # 15 total accounts
    'inq_last_6mths': 2             # 2 inquiries in last 6 months
}

# Example 3: Poor credit profile
poor_profile = {
    'emp_length': 1,                # 1 year of employment
    'annual_inc': 35000,            # $35,000 annual income
    'verification_status': 0,       # Not Verified
    'delinq_2yrs': 3,               # 3 delinquencies
    'pub_rec': 1,                   # 1 public record
    'revol_util': 90,               # 90% revolving utilization
    'home_ownership': 'RENT',       # Rents
    'mort_acc': 0,                  # No mortgage accounts
    'dti': 35,                      # 35% debt-to-income ratio
    'open_acc': 8,                  # 8 open accounts
    'total_acc': 9,                 # 9 total accounts
    'inq_last_6mths': 5             # 5 inquiries in last 6 months
}

# Make predictions
print("Good Credit Profile:")
good_result = predict_credit_score(model_data, good_profile)
print(f"Credit Score: {good_result['score']}")
print(f"Grade: {good_result['grade']}")
print(f"Default Probability: {good_result['default_probability']:.4f}")
print(f"Recommendation: {good_result['recommendation']}")
print(f"Rate Range: {good_result['rate_range']}")
print("Component Breakdown:")
for component, grade in good_result['breakdown'].items():
    print(f"  - {component}: {grade}")

print("\nAverage Credit Profile:")
avg_result = predict_credit_score(model_data, average_profile)
print(f"Credit Score: {avg_result['score']}")
print(f"Grade: {avg_result['grade']}")
print(f"Default Probability: {avg_result['default_probability']:.4f}")
print(f"Recommendation: {avg_result['recommendation']}")
print(f"Rate Range: {avg_result['rate_range']}")
print("Component Breakdown:")
for component, grade in avg_result['breakdown'].items():
    print(f"  - {component}: {grade}")

print("\nPoor Credit Profile:")
poor_result = predict_credit_score(model_data, poor_profile)
print(f"Credit Score: {poor_result['score']}")
print(f"Grade: {poor_result['grade']}")
print(f"Default Probability: {poor_result['default_probability']:.4f}")
print(f"Recommendation: {poor_result['recommendation']}")
print(f"Rate Range: {poor_result['rate_range']}")
print("Component Breakdown:")
for component, grade in poor_result['breakdown'].items():
    print(f"  - {component}: {grade}")

## Custom Input for Testing

You can use this section to test the model with your own custom inputs.

In [None]:
# Define your custom profile here
custom_profile = {
    'emp_length': 5,                # Years of employment
    'annual_inc': 80000,            # Annual income
    'verification_status': 1,       # Verification status (0=Not Verified, 1=Verified, 2=Source Verified)
    'delinq_2yrs': 0,               # Number of delinquencies in past 2 years
    'pub_rec': 0,                   # Number of public records
    'revol_util': 30,               # Revolving utilization percentage
    'home_ownership': 'MORTGAGE',   # Home ownership status (OWN, MORTGAGE, RENT, OTHER)
    'mort_acc': 2,                  # Number of mortgage accounts
    'dti': 15,                      # Debt-to-income ratio
    'open_acc': 5,                  # Number of open accounts
    'total_acc': 12,                # Total number of accounts
    'inq_last_6mths': 1             # Number of inquiries in last 6 months
}

# Make prediction
print("Custom Profile:")
custom_result = predict_credit_score(model_data, custom_profile)
print(f"Credit Score: {custom_result['score']}")
print(f"Grade: {custom_result['grade']}")
print(f"Default Probability: {custom_result['default_probability']:.4f}")
print(f"Recommendation: {custom_result['recommendation']}")
print(f"Rate Range: {custom_result['rate_range']}")
print("Component Breakdown:")
for component, grade in custom_result['breakdown'].items():
    print(f"  - {component}: {grade}")

## Conclusion

This notebook has demonstrated how to build an alternative credit scoring model using machine learning techniques. The model incorporates features aligned with the four main categories specified in the project requirements:

1. Income Stability (35%)
2. Payment Consistency (30%)
3. Asset Value (20%)
4. Behavioral Factors (15%)

The model can be used to predict credit scores for individuals with limited or no traditional credit history, providing a more inclusive approach to credit assessment.

### Key Components:

1. **Data Preprocessing**: Handling missing values, encoding categorical variables, and scaling numerical features.
2. **Feature Engineering**: Creating derived features that align with the four main categories.
3. **Model Training**: Training multiple models and selecting the best performer.
4. **Inference**: Providing a function to predict credit scores for new applicants.

### Next Steps:

1. **Model Refinement**: Further tune the model parameters for better performance.
2. **Additional Features**: Incorporate more alternative data sources as they become available.
3. **Fairness Analysis**: Ensure the model is fair and unbiased across different demographic groups.
4. **Deployment**: Integrate the model into a production environment for real-time scoring.