# Credit Adjustment

## Problem Breakdown

 * **Inputs**: Customer covariates, past financial behavior, payment history (last 6 months, customizable), payment amount, utilization rate, default history, and other relevant features.
 * **Outputs**: New credit limits and interest rates over the prediction period (3 months or longer, as specified by the user).
 * **Constraints**:
     * Interest rate range: 22%-60%.
     * Credit limit range: 10,000 – 500,000.
     * Customers must be notified of an interest rate change a month in advance.
     * Rarely decrease interest rates, only for top-performing customers.
 * **Objective**: Maximize profit, which is a combination of interest received from customers and minimizing losses from defaults.

## Model Structure
### Predicting Interest Rates and Credit Limits
To predict both interest rates and credit limits, we’ll need a model that can handle both time-series forecasting and optimization. Here’s a suitable approach:

 * **Reinforcement Learning (RL)**, specifically a **Deep Q-Learning** or **Proximal Policy Optimization (PPO)** method, which is suitable for scenarios involving continuous decision-making over time):
     * **State**: The state includes all customer data such as covariates, payment history, utilization rates, and credit behavior over time.
     * **Action**: Adjust the interest rate or credit limit. These actions must consider the constraints (e.g., notifying a month in advance for interest rate increases or decreases).
     * **Reward**: The reward function is the profit earned by the bank, calculated as the interest received minus default costs. This encourages the model to maximize profit while managing the risk of defaults.
     * **Policy**: The policy learns when and how much to adjust the interest rate or credit limit based on the customer’s financial behavior and covariates.

### Temporal Aspects (Handling Time Dependency)
Because credit updates and notifications occur monthly, the model must handle time dependencies:

 * Recurrent Neural Networks (RNNs) or LSTMs (Long Short-Term Memory) can be used to model the sequential nature of the data (e.g., monthly updates of financial behavior and credit decisions). LSTMs are particularly useful to capture long-term dependencies (e.g., the effect of payment behavior over several months).
 * Time-series forecasting models like ARIMA can be used for predicting financial variables such as utilization rates and repayment amounts.


### Optimization Component
The model also needs an optimization layer to find the ideal balance between profit and risk. This can be handled through:

 * **Profit Function**:
     * The profit function is based on the following factors:
         * Interest Income: `(credit limit * interest rate * utilization rate * repayment schedule)`
         * Cost of Default: `(default rate * credit limit)`
     * The goal is to maximize the interest income while minimizing the cost of defaulting customers.
     * A penalty term can be added for high-risk customers to ensure that credit limits are not too high, leading to defaults.
 * Penalty for Interest Rate Increases:
     * To factor in the cost of increasing interest rates (which could negatively affect customer retention), you can include a penalty term in the objective function when the interest rate is increased.
     * Interest rate decreases should be applied only in rare cases (for top-performing customers). This can be modeled as a constrained optimization problem where interest rate decreases are limited based on a threshold of past performance (e.g., high repayment rates, low utilization rates).


Profit Maximization Function
Your profit function is based on:

 * **Interest Income**: Interest Income = Credit Limit x Interest rate x Utilization Rate x Repayment Schedule
 * **Cost of Default**: Default cost = Credit Limit x Default Risk
 * **Profit**: Itnerest Income - Default Cost

## Feature Engineering
Key features should be engineered to capture a customer’s financial behavior and repayment risk. These might include:

 * **Utilization Rate**: Ratio of credit used to the total credit limit.
 * **Repayment Behavior**: Average repayment amount over the past 6 months, delinquency counts, etc.
 * **Default Risk**: A predicted probability of default based on past behavior, financial stability, and other risk factors.
 * **Income and Employment**: These may provide insights into the customer’s repayment capacity.
 * **Historical Interest Rates**: To assess how past interest rates have influenced repayment behavior and defaults.
 * **Macro-economic Factors**: Inflation, employment rates, and other factors that might affect repayment behavior and credit demand.
 * **Behavioral Variables**: These could include the frequency of transactions, types of transactions (e.g., large vs. small purchases), and social network data (as a proxy for financial stability).

## Training the Model
### Training Data
You will need a dataset with the following characteristics:
 * Historical payment data (e.g., credit limits, repayment amounts, interest rates).
 * Customer-level information (e.g., income, employment status, default history).
 * Macro-economic variables (e.g., inflation, GDP growth).
### Model Architecture

 * **Input Layer**: Customer covariates, past financial behavior, macroeconomic variables.
 * **Hidden Layers**:
     * LSTMs for capturing sequential patterns in customer behavior.
     * Dense layers to learn non-linear interactions between customer features.
 * **Output Layer**: Predict the interest rate and credit limit for each customer.
 * **Reinforcement Learning Layer**: Use RL to optimize the interest rate and credit limit updates based on the bank’s profit objectives.

### Loss Function
The loss function should include:

 * A profit-maximization component that encourages higher interest income and lower default rates.
 * A regularization term to penalize high-risk actions (e.g., giving too much credit to a risky customer).

## Model Implementation
You can implement this model using a framework like TensorFlow or PyTorch for deep learning and reinforcement learning. For time-series forecasting, you can integrate models like Prophet or ARIMA.
 * RL libraries such as OpenAI's Gym or Stable-Baselines can help in building the reinforcement learning component.
 * The optimization function can be implemented using SciPy’s optimization package or any modern gradient-based optimizers (e.g., Adam).

In [1]:
import numpy as np
import pandas as pd
import gymnasium as gym
from gymnasium import spaces
from stable_baselines3 import PPO
from sklearn.model_selection import train_test_split, KFold
import matplotlib.pyplot as plt

# Define the environment for Reinforcement Learning
class CreditEnv(gym.Env):
    """
    Custom Environment for predicting credit limit and interest rate, based on customer data.
    This environment is built to maximize profit for the bank.
    """
    def __init__(self, data, n_months=6):
        super(CreditEnv, self).__init__()

        # Customer data
        self.data = data
        self.n_months = n_months
        self.current_step = 0
        
        # Observation space: customer features (normalized between 0 and 1)
        self.observation_space = spaces.Box(low=0, high=1, shape=(self.data.drop(columns=['customer_id']).shape[1],), dtype=np.float32)
        
        # Action space: credit limit and interest rate (continuous values)
        self.action_space = spaces.Box(low=np.array([10000, 22]), high=np.array([500000, 60]), dtype=np.float32)
        
    def reset(self, seed=None, options=None):
        """
        Reset the environment at the beginning of each episode.
        """
        self.current_step = 0
        if seed is not None:
            np.random.seed(seed)
        
        # Return observation and an empty info dictionary
        return self.data.drop(columns=['customer_id']).iloc[self.current_step].values, {}
    
    def step(self, action):
        """
        Take a step in the environment based on the action (credit limit and interest rate).
        """
        # Extract customer data for current step
        customer = self.data.iloc[self.current_step]
        
        # Unpack action (credit_limit, interest_rate)
        credit_limit, interest_rate = action
        
        # --- Custom profit function ---
        utilization_rate = customer['utilization_rate']
        repayment_schedule = customer['repayment_schedule']  # Binary: 1 for payment, 0 for no payment
        default_risk = customer['default_risk']
        
        # Interest income and default cost calculation
        interest_income = credit_limit * interest_rate * utilization_rate * repayment_schedule
        default_cost = credit_limit * default_risk
        
        # Calculate profit as the reward
        profit = interest_income - default_cost
        
        # --- Reward is the calculated profit ---
        reward = profit
        
        # Move to the next customer in the dataset
        self.current_step += 1
        done = self.current_step >= len(self.data)
        
        # Get the next observation (next customer) or reset
        if done:
            obs, _ = self.reset()
        else:
            obs = self.data.drop(columns=['customer_id']).iloc[self.current_step].values
        
        # Return five values: observation, reward, done, truncated (False), and info
        return obs, reward, done, False, {}
    
    def render(self, mode='human'):
        """
        Render the environment (optional for debugging).
        """
        pass

# Generate the toy data with time-varying variables
def generate_toy_data_with_time_varying(n_obs=1000, n_months=6):
    """
    Generate toy data with time-varying variables for n_obs customers over n_months.
    """
    time_varying_data = {
        'customer_id': np.repeat(np.arange(1, n_obs + 1), n_months),
        'month': np.tile(np.arange(1, n_months + 1), n_obs),
        'monthly_income': np.random.randint(5000, 20000, size=n_obs * n_months),
        'credit_limit': np.random.randint(10000, 500000, size=n_obs * n_months),
        'interest_rate': np.random.uniform(22, 60, size=n_obs * n_months),
        'payment_history': np.random.uniform(0.5, 1.5, size=n_obs * n_months),
        'utilization_rate': np.random.uniform(0.2, 0.95, size=n_obs * n_months),
        'default_risk': np.random.uniform(0, 0.5, size=n_obs * n_months),
        'repayment_amount': np.random.randint(10000, 200000, size=n_obs * n_months),
        'repayment_schedule': np.random.choice([0, 1], size=n_obs * n_months),  # Binary repayment schedule
        'financial_behavior_score': np.random.uniform(300, 850, size=n_obs * n_months)
    }
    toy_data = pd.DataFrame(time_varying_data)
    return toy_data

# Split the dataset into training, testing, and holdout samples based on customer_id
def split_full_data_by_customer_id(data, train_size=0.6, test_size=0.2):
    # Split customer ids to ensure all observations of each customer are in one split
    unique_customers = data['customer_id'].unique()
    
    # Split into training and remaining (test + holdout)
    train_customers, remaining_customers = train_test_split(unique_customers, train_size=train_size, random_state=42)
    
    # Split remaining into test and holdout
    test_customers, holdout_customers = train_test_split(remaining_customers, test_size=test_size/(1-train_size), random_state=42)
    
    # Create datasets based on the customer splits
    train_data = data[data['customer_id'].isin(train_customers)]
    test_data = data[data['customer_id'].isin(test_customers)]
    holdout_data = data[data['customer_id'].isin(holdout_customers)]
    
    return train_data, test_data, holdout_data

# Function to normalize data, excluding the 'customer_id' column
def normalize_data(data, features_to_normalize):
    data_normalized = data.copy()
    data_normalized[features_to_normalize] = (data[features_to_normalize] - data[features_to_normalize].min()) / (data[features_to_normalize].max() - data[features_to_normalize].min())
    return data_normalized

# Function to predict for each customer over multiple months
def predict_for_all_customers(model, env, future_months=6):
    """
    Predict credit limit and interest rate for all customers for the next 'future_months'.
    """
    predictions = []
    
    # Loop over each customer
    for customer_id in env.data['customer_id'].unique():
        obs, _ = env.reset()  # Reset for each customer
        for month in range(future_months):
            action, _states = model.predict(obs)
            obs, reward, done, truncated, info = env.step(action)
            predictions.append({
                'Customer ID': customer_id,
                'Month': month + 1,
                'Predicted Credit Limit': action[0],
                'Predicted Interest Rate': action[1],
                'Profit (Reward)': reward
            })
            if done:
                break
            
    return pd.DataFrame(predictions)

# Function for counterfactual analysis
def counterfactual_analysis(model, env, holdout_data, future_months=6):
    """
    Perform counterfactual analysis by comparing predicted values with actual values from the holdout data.
    """
    predicted_df = predict_for_all_customers(model, env, future_months=future_months)
    
    # Get the true values from the holdout dataset
    true_values = holdout_data[['customer_id', 'credit_limit', 'interest_rate']]
    
    # Merge predicted and actual values
    comparison_df = pd.merge(predicted_df, true_values, left_on='Customer ID', right_on='customer_id', how='left')
    
    # Calculate profit difference and drop NaN values (handle missing values)
    comparison_df['Profit Difference'] = comparison_df['Profit (Reward)'] - (
        comparison_df['credit_limit'] * comparison_df['interest_rate'] * 
        holdout_data['utilization_rate'] * holdout_data['repayment_schedule'] - 
        comparison_df['credit_limit'] * holdout_data['default_risk']
    )
    
    # Drop rows with NaN values in 'Profit Difference'
    comparison_df = comparison_df.dropna(subset=['Profit Difference'])
    
    # Plot the profit differences
    plt.figure(figsize=(10, 6))
    plt.hist(comparison_df['Profit Difference'], bins=50, alpha=0.7, color='blue', label='Profit Difference')
    plt.title('Profit Difference Between Predicted and Actual')
    plt.xlabel('Profit Difference')
    plt.ylabel('Frequency')
    plt.legend()
    plt.show()
    
    return comparison_df

# Generate toy data
n_obs = 1000
n_months = 6
data = generate_toy_data_with_time_varying(n_obs=n_obs, n_months=n_months)

# Split into training, testing, and holdout sets based on customer_id
train_data, test_data, holdout_data = split_full_data_by_customer_id(data)

# Select relevant features for RL environment, including 'customer_id'
features = ['customer_id', 'monthly_income', 'payment_history', 'utilization_rate', 'default_risk',
            'repayment_amount', 'repayment_schedule', 'financial_behavior_score']
train_for_rl = train_data[features]

# Normalize the data for RL environment, except for 'customer_id'
features_to_normalize = train_for_rl.columns.drop('customer_id')
train_for_rl = normalize_data(train_for_rl, features_to_normalize)

# Create the environment for training
env = CreditEnv(train_for_rl)

# Initialize the PPO model
model = PPO('MlpPolicy', env, verbose=1)

# Cross-validation setup
kf = KFold(n_splits=5)
for train_index, test_index in kf.split(train_for_rl['customer_id'].unique()):
    train_customers = train_for_rl['customer_id'].unique()[train_index]
    test_customers = train_for_rl['customer_id'].unique()[test_index]
    
    # Create training and test folds based on customer_id
    train_fold = train_for_rl[train_for_rl['customer_id'].isin(train_customers)]
    test_fold = train_for_rl[train_for_rl['customer_id'].isin(test_customers)]
    
    # Create environment for this training fold
    env = CreditEnv(train_fold)
    
    # Train the model on the current fold
    model.learn(total_timesteps=10000)

# After training on all folds, perform counterfactual analysis with the holdout data and predict for 6 months
comparison_df = counterfactual_analysis(model, env, holdout_data, future_months=6)

# Display the comparison DataFrame with predicted and actual values
print(comparison_df)


  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")


Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.


  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")


-----------------------------
| time/              |      |
|    fps             | 767  |
|    iterations      | 1    |
|    time_elapsed    | 2    |
|    total_timesteps | 2048 |
-----------------------------


-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 3.6e+03       |
|    ep_rew_mean          | 1.78e+08      |
| time/                   |               |
|    fps                  | 524           |
|    iterations           | 2             |
|    time_elapsed         | 7             |
|    total_timesteps      | 4096          |
| train/                  |               |
|    approx_kl            | 1.8044375e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -2.84         |
|    explained_variance   | -3.58e-07     |
|    learning_rate        | 0.0003        |
|    loss                 | 3.36e+11      |
|    n_updates            | 10            |
|    policy_gradient_loss | -1.97e-06     |
|    std                  | 1             |
|    value_loss           | 6.66e+11      |
-------------------------------------------


-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 3.6e+03       |
|    ep_rew_mean          | 1.78e+08      |
| time/                   |               |
|    fps                  | 452           |
|    iterations           | 3             |
|    time_elapsed         | 13            |
|    total_timesteps      | 6144          |
| train/                  |               |
|    approx_kl            | 1.8626451e-09 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -2.84         |
|    explained_variance   | 6.56e-07      |
|    learning_rate        | 0.0003        |
|    loss                 | 3.6e+11       |
|    n_updates            | 20            |
|    policy_gradient_loss | -2.46e-06     |
|    std                  | 1             |
|    value_loss           | 7.55e+11      |
-------------------------------------------


-------------------------------------------
| rollout/                |               |
|    ep_len_mean          | 3.6e+03       |
|    ep_rew_mean          | 1.78e+08      |
| time/                   |               |
|    fps                  | 459           |
|    iterations           | 4             |
|    time_elapsed         | 17            |
|    total_timesteps      | 8192          |
| train/                  |               |
|    approx_kl            | 8.1490725e-10 |
|    clip_fraction        | 0             |
|    clip_range           | 0.2           |
|    entropy_loss         | -2.84         |
|    explained_variance   | 5.96e-08      |
|    learning_rate        | 0.0003        |
|    loss                 | 3.44e+11      |
|    n_updates            | 30            |
|    policy_gradient_loss | -1.64e-06     |
|    std                  | 1             |
|    value_loss           | 7.02e+11      |
-------------------------------------------


KeyboardInterrupt: 

1. Soft Actor-Critic (SAC)
Why: SAC is a robust and efficient model for continuous action spaces, and your problem involves continuous variables like credit limits and interest rates. SAC can handle the continuous nature of your objectives while optimizing for a balance between exploration and exploitation using entropy maximization.
How: You can define the credit limit and interest rate as continuous actions and use the reward function to optimize for profit, factoring in credit risk, default rates, and customer retention. You can also incorporate penalty mechanisms for defaults or higher risk.
2. Proximal Policy Optimization (PPO)
Why: PPO is a popular choice for stability and ease of use in multi-objective tasks. It can handle continuous prediction tasks like yours while ensuring policy updates remain stable.
How: You could model credit limit and interest rate predictions as two separate outputs within the same policy. The reward function would encourage balancing risk, maximizing profit, and managing default probabilities.
3. Multi-Objective Reinforcement Learning (MORL) with SAC or PPO
Why: Since you have multiple objectives (profit maximization and minimizing risk of default), a multi-objective approach with SAC or PPO can be adapted to handle both objectives simultaneously. This approach allows you to balance the competing trade-offs, for example, between higher credit limits (higher potential profit but also higher risk) and lower interest rates (to retain customers but maintain profitability).
How: The reward function could use a weighted sum of objectives or a Pareto front approach to optimize both the credit limit and interest rate simultaneously. This would allow you to manage trade-offs effectively.
4. Hierarchical Reinforcement Learning (HRL)
Why: In your case, predicting credit limits and interest rates involves multiple decisions over time (both short-term and long-term). HRL can model this by breaking down tasks into sub-tasks: determining an optimal credit limit first, then optimizing the interest rate based on that decision.
How: HRL can handle your task by learning higher-level decisions (credit limit) and lower-level decisions (interest rate) hierarchically. This approach would allow you to optimize for each decision level while considering long-term profit and default risk.
5. Multi-Agent Reinforcement Learning (MARL)
Why: If you want to model each customer as an agent with individual characteristics and optimize for each agent’s credit limit and interest rate, MARL might be a good approach. It allows for modeling interactions between agents (e.g., customers with different risk profiles).
How: Each customer (or agent) could have their own policy determining credit limits and interest rates, while the global system optimizes overall profit and risk at the portfolio level.


Recommendation

Soft Actor-Critic (SAC) or PPO with a multi-objective reward function is likely the best fit for your use case. These models are well-suited for continuous control problems and can handle both the credit limit and interest rate predictions simultaneously.