
---

### Name: Isidora Gajic  
### Course: AI for Finance  
### Assignment 10: Fine-Tuning PLM for Monetary Policy Stance Classification
### Date: October 29, 2024  

---

### Disclaimer:

This notebook was developed with the assistance of ChatGPT, an AI language model. While the majority of the code was generated with the help of ChatGPT, the conceptualization of the analysis, selection of specific metrics, and the overall approach were directed by me. I reviewed and tweaked the code to ensure it aligns with the objectives of the assignment and meets the required standards.

The analysis and conclusions presented in this document are my own. ChatGPT was used to improve the cohesiveness and fluency of my original writing and to convert text from a Word document into Markdown. All final interpretations, decisions, and the overall approach to the analysis are entirely mine.

---

### **Step 0: Background Research**

#### **2. Defining Hawkish and Dovish in Monetary Policy**

In the context of monetary policy, **"hawkish"** and **"dovish"** are terms used to describe the stances and attitudes of central banks, like the Federal Reserve, toward inflation, economic growth, and interest rates. These terms influence how financial markets interpret and react to central bank communications.

- **Hawkish**:
  - **Definition**: A hawkish stance refers to a monetary policy approach that prioritizes controlling inflation over stimulating economic growth. Hawks are concerned that excessive economic growth could lead to high inflation.
  - **Characteristics**:
    - **Interest Rates**: Advocates for higher interest rates to cool down an overheating economy.
    - **Policy Actions**: Supports tightening monetary policy by reducing the money supply or ending asset purchase programs.
    - **Implications for Markets**:
      - **Bonds**: Higher interest rates can lead to lower bond prices.
      - **Stocks**: May negatively affect stock markets due to increased borrowing costs.
      - **Currency**: Can strengthen the national currency as higher rates attract foreign investment.

- **Dovish**:
  - **Definition**: A dovish stance emphasizes stimulating economic growth and reducing unemployment over controlling inflation. Doves are less concerned about inflation and more focused on supporting the economy.
  - **Characteristics**:
    - **Interest Rates**: Favors lower interest rates to encourage borrowing and investment.
    - **Policy Actions**: Supports expanding monetary policy by increasing the money supply or initiating asset purchase programs.
    - **Implications for Markets**:
      - **Bonds**: Lower interest rates can lead to higher bond prices.
      - **Stocks**: Often boosts stock markets due to cheaper borrowing costs.
      - **Currency**: May weaken the national currency as lower rates can reduce foreign investment appeal.

**Relation to Federal Reserve Communications**:

The Federal Reserve uses various communication tools—such as meeting minutes, speeches, and press releases—to signal its monetary policy stance. Market participants closely analyze this language to gauge future policy actions.

- **Hawkish Communications**:
  - **Signals**: Indicate concerns about inflation and suggest potential interest rate hikes.
  - **Market Reaction**: Can lead to increased volatility, as investors adjust expectations for tighter monetary conditions.

- **Dovish Communications**:
  - **Signals**: Highlight concerns about economic slowdown or unemployment, suggesting that interest rates may remain low or decrease.
  - **Market Reaction**: Often leads to stock market rallies and lower bond yields, as investors anticipate more accommodative monetary policy.

Understanding these terms is crucial for investors, policymakers, and economists, as they directly impact financial market dynamics and economic forecasting.

---

#### **3. Key Monetary Policy Events (1996-September 2024)**

**1. The Dot-Com Bubble Burst and Federal Reserve Actions (Late 1990s - Early 2000s)**

- **Overview**:
  - During the late 1990s, the U.S. economy experienced rapid growth, particularly in technology and internet-related stocks, leading to the "dot-com bubble."
  - The Federal Reserve, led by Chairman Alan Greenspan, increased the federal funds rate multiple times to prevent the economy from overheating.

- **Impact**:
  - **Interest Rate Hikes**: Between June 1999 and May 2000, the Fed raised rates six times, from 4.75% to 6.5%.
  - **Market Reaction**: The higher interest rates contributed to the bursting of the dot-com bubble in 2000, leading to significant stock market losses and an economic slowdown.

**2. Federal Reserve Response to the 9/11 Attacks (2001)**

- **Overview**:
  - The September 11, 2001 terrorist attacks caused immediate economic uncertainty and financial market disruptions.
  - The Federal Reserve acted swiftly to stabilize the economy.

- **Impact**:
  - **Emergency Rate Cuts**: The Fed cut the federal funds rate from 3.5% to 3% shortly after the attacks and continued reducing it to 1.75% by the end of 2001.
  - **Liquidity Measures**: Implemented policies to ensure liquidity in financial markets.
  - **Economic Support**: These actions helped restore confidence and supported economic recovery.

**3. The 2008 Financial Crisis and Quantitative Easing (2007-2009)**

- **Overview**:
  - A collapse in the U.S. housing market led to a global financial crisis, with major financial institutions facing insolvency.
  - The Federal Reserve, under Chairman Ben Bernanke, took unprecedented actions.

- **Impact**:
  - **Interest Rates**: Reduced the federal funds rate to near zero (0-0.25%) by December 2008.
  - **Quantitative Easing (QE)**:
    - **QE1 (2008-2010)**: The Fed began purchasing large amounts of mortgage-backed securities and Treasuries to inject liquidity.
    - **Objective**: Lower long-term interest rates, support mortgage lending, and stimulate the economy.
  - **Market Reaction**: These measures helped stabilize financial markets but also raised concerns about long-term inflation and asset bubbles.

**4. Monetary Policy Response to the COVID-19 Pandemic (2020)**

- **Overview**:
  - The COVID-19 pandemic led to a sudden economic shutdown and a sharp downturn.

- **Impact**:
  - **Emergency Rate Cuts**: In March 2020, the Fed cut the federal funds rate to 0-0.25%.
  - **Quantitative Easing**:
    - Announced unlimited QE to purchase Treasury securities and mortgage-backed securities.
  - **Additional Measures**:
    - **Emergency Lending Facilities**: Established programs to support businesses, municipalities, and financial markets.
    - **Forward Guidance**: Signaled that rates would remain low until the economy showed signs of recovery.
  - **Market Reaction**: Stabilized financial markets and supported economic activity, but raised concerns about long-term debt and inflation.

**5. Inflation Surge and Policy Shift Toward Tightening (2021-2023)**

- **Overview**:
  - Post-pandemic recovery led to supply chain disruptions, labor shortages, and significant fiscal stimulus, contributing to rising inflation.

- **Impact**:
  - **Shift in Stance**: The Fed signaled a move from a dovish to a more hawkish stance to address inflation.
  - **Interest Rate Increases**:
    - Began raising rates in 2022, with multiple hikes throughout the year.
  - **Reducing Balance Sheet**:
    - Initiated plans to reduce the Fed's balance sheet by ceasing reinvestments.
  - **Market Reaction**:
    - Increased market volatility, concerns over potential economic slowdown, and adjustments in asset valuations.

#### **6. Federal Reserve's Half-Percentage Point Rate Cut (September 2024)**

- **Overview**:
  - On **September 18, 2024**, the Federal Reserve cut the federal funds rate by **half a percentage point**, bringing it to a range between **4.75% and 5%**.
  - This was the **first rate cut since 2020**, marking a significant shift from the Fed's previous focus on combating inflation to supporting the labor market.
  - The decision was more aggressive than most analysts anticipated; many expected a smaller **quarter-point** reduction.

- **Impact**:
  - **Monetary Policy Shift**:
    - The rate cut signaled the Fed's commitment to preventing a gentle cooling in the labor market from turning into a deeper slowdown.
    - Fed Chair **Jerome Powell** stated that the decision reflects growing confidence in maintaining economic strength with an appropriate policy recalibration.
  - **Market Reaction**:
    - **Immediate Relief**: The cut provided immediate relief to consumers with credit-card balances and small businesses with variable-rate debt.
    - **Stock Market**: Stocks initially rose following the announcement but ended the day lower, indicating mixed investor sentiment.
    - **Bond Market**: Long-term borrowing costs had been declining in anticipation of rate cuts, affecting mortgages and corporate debt.
  - **Economic Indicators**:
    - **Unemployment Rate**: Rose to **4.2%**, up from **3.7%** in January 2024, indicating a softening labor market.
    - **Inflation**: Had fallen over the past year, reducing pressure on the Fed to maintain higher interest rates.
  - **Policy Implications**:
    - **Future Rate Cuts**: Projections indicated potential additional cuts in November and December 2024.
    - **Risk Management**: The larger-than-expected cut was seen as a proactive measure to mitigate the risk of an economic downturn.
    - **Fed Communications**: Emphasized maintaining the strength of the labor market and the overall economy through adjusted monetary policy.

---


### Importing Necessary Libraries

The following libraries are imported for data manipulation, model training, and evaluation:

- **pandas** and **numpy** for data handling.
- **matplotlib** and **seaborn** for data visualization.
- **torch** and **torch.utils.data** for building datasets.
- **transformers** from Hugging Face for model and tokenizer.
- **sklearn** for evaluation metrics and data splitting.


In [1]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# PyTorch libraries
import torch
from torch.utils.data import Dataset

# Hugging Face transformers
from transformers import RobertaTokenizer, RobertaForSequenceClassification, Trainer, TrainingArguments

# Sklearn for evaluation metrics and data splitting
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, f1_score

# Set random seed for reproducibility
import random
import os

def set_seed(seed):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

set_seed(0)


### Loading the Data

The labeled datasets for hawkish-dovish classification are loaded:

- **Training Data**: `lab-manual-mm-train-5768.xlsx`
- **Testing Data**: `lab-manual-mm-test-5768.xlsx`


In [None]:
# Loading the training data
train_df = pd.read_excel('/content/drive/My Drive/Computers/My Mac/Desktop/AI_for_Finance/Assignment 10/lab-manual-mm-train-5768.xlsx')

# Loading the testing data
test_df = pd.read_excel('/content/drive/My Drive/Computers/My Mac/Desktop/AI_for_Finance/Assignment 10/lab-manual-mm-test-5768.xlsx')

# Display the columns and number of samples
print("Columns in train_df:", train_df.columns.tolist())
print("Columns in test_df:", test_df.columns.tolist())

print("Number of samples in train_df:", len(train_df))
print("Number of samples in test_df:", len(test_df))


### Data Preprocessing

- **Label Mapping**: Ensure labels are integers for model compatibility.
- **Missing Values**: Checked for any missing values and dropped them to ensure data integrity.


In [None]:
# Ensure that 'label' is of integer type
train_df['label'] = train_df['label'].astype(int)
test_df['label'] = test_df['label'].astype(int)

# Check for missing values in 'sentence' and 'label' columns
print("Training Data Missing Values:\n", train_df[['sentence', 'label']].isnull().sum())
print("Testing Data Missing Values:\n", test_df[['sentence', 'label']].isnull().sum())

# Drop rows with missing values in 'sentence' or 'label'
train_df = train_df.dropna(subset=['sentence', 'label'])
test_df = test_df.dropna(subset=['sentence', 'label'])

# Verify the number of samples after cleaning
print("Number of samples in train_df after cleaning:", len(train_df))
print("Number of samples in test_df after cleaning:", len(test_df))

# Optional: Reset index after dropping rows
train_df = train_df.reset_index(drop=True)
test_df = test_df.reset_index(drop=True)


### Exploring Label Distribution

- Visualized the distribution of labels in the training data to check for class imbalance.


In [None]:
# Explore label distribution in training data
label_counts = train_df['label'].value_counts()
print("Label distribution in training data:\n", label_counts)

# Plotting label distribution
plt.figure(figsize=(6,4))
sns.barplot(x=label_counts.index, y=label_counts.values)
plt.title('Label Distribution in Training Data')
plt.xlabel('Labels')
plt.ylabel('Count')
plt.show()


### Setting Up the RoBERTa Model

- **Tokenizer**: Loaded `RobertaTokenizer` for tokenizing the text data.
- **Model**: Loaded `RobertaForSequenceClassification` with 3 output labels corresponding to hawkish, dovish, and neutral classes.


In [None]:
# Loading the tokenizer
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Loading the pre-trained model for sequence classification
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=3)


### Tokenization

The tokenizer is configured with the following parameters:

- `padding='max_length'`: Pads all sequences to the maximum length specified.
- `truncation=True`: Truncates sequences longer than the maximum length.
- `max_length=256`: Sets the maximum sequence length.


In [None]:
# Tokenization parameters
tokenizer_kwargs = {
    'padding': 'max_length',  # Pad sequences to the maximum length
    'truncation': True,       # Truncate sequences longer than the maximum length
    'max_length': 256,        # Maximum sequence length
    'return_tensors': 'pt'    # Return PyTorch tensors
}


**Why is tokenization important for models like RoBERTa?**

Tokenization is crucial for models like RoBERTa because it transforms raw text into numerical input IDs that the model can process. RoBERTa uses Byte-Pair Encoding (BPE) to handle rare and sub-word tokens, which allows the model to understand and represent words that may not be in its vocabulary. Proper tokenization ensures:

- **Consistency**: Text data is consistently formatted, allowing the model to learn effectively.
- **Context Preservation**: Subword tokenization preserves the meaning of words in context.
- **Efficiency**: Fixed-length inputs allow for efficient batch processing.
- **Model Compatibility**: The model expects inputs in a specific tokenized format.

---

### Preparing the Custom Dataset

A custom `FOMCDataset` class is defined to:

- **Initialize**: Store texts, labels, tokenizer, and tokenization parameters.
- **`__len__`**: Return the total number of samples.
- **`__getitem__`**: Retrieve an item by index, tokenize the text, and return input tensors and labels.


In [None]:
class FOMCDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, tokenizer_kwargs):
        self.texts = texts.reset_index(drop=True)
        self.labels = labels.reset_index(drop=True)
        self.tokenizer = tokenizer
        self.tokenizer_kwargs = tokenizer_kwargs

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        text = str(self.texts[idx])
        label = self.labels[idx]

        encoding = self.tokenizer(text, **self.tokenizer_kwargs)

        # Remove the extra dimension added by 'return_tensors' for input_ids and attention_mask
        item = {key: val.squeeze(0) for key, val in encoding.items()}
        item['labels'] = torch.tensor(label, dtype=torch.long)

        return item


### Splitting the Data

- **Data Split**: The training data is split into training and validation sets (80/20 split).
- **Labels**: Ensured that labels are integers for model compatibility.


In [None]:
# Splitting the training data into training and validation sets (80/20 split)
train_texts, val_texts, train_labels, val_labels = train_test_split(
    train_df['sentence'],
    train_df['label'],
    test_size=0.2,
    random_state=0
)

# Ensure labels are integers
train_labels = train_labels.astype(int)
val_labels = val_labels.astype(int)
test_labels = test_df['label'].astype(int)


### Creating Dataset Instances

- **Datasets**: Instances of `FOMCDataset` are created for training, validation, and testing data.


In [None]:
# Creating dataset instances
train_dataset = FOMCDataset(train_texts, train_labels, tokenizer, tokenizer_kwargs)
val_dataset = FOMCDataset(val_texts, val_labels, tokenizer, tokenizer_kwargs)
test_dataset = FOMCDataset(test_df['sentence'], test_labels, tokenizer, tokenizer_kwargs)


### Defining Evaluation Metrics

- **Purpose**: Define a function to compute the weighted F1 score during evaluation.
- **Weighted F1 Score**: Accounts for class imbalance by weighting classes according to their presence.


In [None]:
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=1)
    f1 = f1_score(labels, predictions, average='weighted')
    return {'weighted_f1': f1}


### Hyperparameter Optimization

Performed a grid search over the following hyperparameters:

- **Learning Rates**: `[5e-5, 3e-5, 2e-5, 1e-5]`
- **Batch Sizes**: `[4, 8, 16]`
- **Epochs**: `[3, 5, 10]`

The `Trainer` from Hugging Face is used to train the model with different hyperparameter combinations, optimizing for the highest weighted F1 score.


In [None]:
# Hyperparameter ranges
learning_rates = [5e-5, 3e-5, 2e-5, 1e-5]
batch_sizes = [4, 8, 16]
num_epochs = [3, 5, 10]

# For tracking results
results = []

# Loop over hyperparameters
for lr in learning_rates:
    for batch_size in batch_sizes:
        for epochs in num_epochs:
            print(f"\nTraining with learning_rate={lr}, batch_size={batch_size}, epochs={epochs}")

            # Re-initialize the model for each run
            model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=3)

            # Update training arguments
            training_args = TrainingArguments(
                output_dir='./results',
                num_train_epochs=epochs,
                per_device_train_batch_size=batch_size,
                per_device_eval_batch_size=batch_size,
                evaluation_strategy='epoch',
                save_strategy='no',
                logging_strategy='epoch',
                learning_rate=lr,
                load_best_model_at_end=False,
                seed=0
            )

            # Initialize the Trainer
            trainer = Trainer(
                model=model,
                args=training_args,
                train_dataset=train_dataset,
                eval_dataset=val_dataset,
                compute_metrics=compute_metrics,
                tokenizer=tokenizer
            )

            # Train the model
            trainer.train()

            # Evaluate on validation set
            eval_result = trainer.evaluate()
            weighted_f1 = eval_result['eval_weighted_f1']
            print(f"Validation Weighted F1 Score: {weighted_f1:.4f}")

            results.append({
                'learning_rate': lr,
                'batch_size': batch_size,
                'epochs': epochs,
                'weighted_f1': weighted_f1
            })


### Analyzing Hyperparameter Results

- **Best Hyperparameters**: Identified based on the highest weighted F1 score from the grid search.
  - **Learning Rate**: `{best_learning_rate}`
  - **Batch Size**: `{best_batch_size}`
  - **Epochs**: `{best_num_epochs}`
- **Results Summary**: Displayed all hyperparameter combinations and their corresponding performance.


In [None]:
# Convert results to DataFrame
results_df = pd.DataFrame(results)

# Find the best hyperparameters
best_row = results_df.loc[results_df['weighted_f1'].idxmax()]
best_learning_rate = best_row['learning_rate']
best_batch_size = best_row['batch_size']
best_num_epochs = best_row['epochs']

print("\nBest Hyperparameters:")
print(f"Learning Rate: {best_learning_rate}")
print(f"Batch Size: {best_batch_size}")
print(f"Epochs: {best_num_epochs}")
print(f"Best Weighted F1 Score: {best_row['weighted_f1']:.4f}")

# Display the results DataFrame
print("\nHyperparameter Tuning Results:")
print(results_df.sort_values(by='weighted_f1', ascending=False))


### Training the Final Model

- **Best Hyperparameters**: Used the optimal settings identified earlier to train the final model.
- **Training Data**: Combined training and validation sets for final training.
- **Model Saving**: Configured to save the best model based on the evaluation metric.


In [None]:
# Combine training and validation data
final_train_texts = pd.concat([train_texts, val_texts]).reset_index(drop=True)
final_train_labels = pd.concat([train_labels, val_labels]).reset_index(drop=True)

# Create final training dataset
final_train_dataset = FOMCDataset(final_train_texts, final_train_labels, tokenizer, tokenizer_kwargs)

# Re-initialize the model
model = RobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=3)

# Update training arguments with best hyperparameters
training_args = TrainingArguments(
    output_dir='./best_model',
    num_train_epochs=int(best_num_epochs),
    per_device_train_batch_size=int(best_batch_size),
    per_device_eval_batch_size=int(best_batch_size),
    evaluation_strategy='epoch',
    save_strategy='epoch',
    logging_strategy='epoch',
    learning_rate=best_learning_rate,
    load_best_model_at_end=True,
    metric_for_best_model='weighted_f1',
    seed=0
)

# Initialize the Trainer with best hyperparameters
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=final_train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer
)

# Train the final model
trainer.train()


### Evaluating the Model

- **Predictions**: Made on the test dataset.
- **Metrics Computed**:
  - **Weighted F1 Score**: Evaluates the overall performance.
  - **Classification Report**: Provides precision, recall, and F1 score for each class.


In [None]:
# Make predictions on the test set
predictions_output = trainer.predict(test_dataset)
predictions = predictions_output.predictions
labels = predictions_output.label_ids
predicted_labels = np.argmax(predictions, axis=1)

# Compute the weighted F1 score
test_f1 = f1_score(labels, predicted_labels, average='weighted')
print(f'\nWeighted F1 Score on Test Set: {test_f1:.4f}')

# Generate classification report
print("\nClassification Report:")
print(classification_report(labels, predicted_labels, target_names=['Dovish', 'Hawkish', 'Neutral']))


### Confusion Matrix Analysis

- **Visualization**: The confusion matrix is plotted to visualize the model's performance across classes.
- **Insights**:
  - **Diagonal Elements**: Represent correct predictions.
  - **Off-Diagonal Elements**: Represent misclassifications.
- **Interpretation**:
  - Analyzed which classes are often confused with each other.
  - Identified if the model is biased towards a particular class.


In [None]:
# Confusion matrix
cm = confusion_matrix(labels, predicted_labels)

# Plot the confusion matrix
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Dovish', 'Hawkish', 'Neutral'], yticklabels=['Dovish', 'Hawkish', 'Neutral'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()


### Hyperparameter Analysis

- **Learning Rate**:
  - **Higher Learning Rates**: May cause the model to converge quickly but can overshoot optimal solutions.
  - **Lower Learning Rates**: Provide more stable convergence but may require more epochs.
- **Batch Size**:
  - **Smaller Batch Sizes**: Lead to more updates and can help in generalization but increase training time.
  - **Larger Batch Sizes**: Reduce training time but may lead to less generalization.
- **Number of Epochs**:
  - **Fewer Epochs**: May result in underfitting.
  - **More Epochs**: Can improve performance but risk overfitting.
- **Observations**:
  - **Optimal Combination**: The best performance was achieved with a learning rate of `{best_learning_rate}`, batch size of `{best_batch_size}`, and `{best_num_epochs}` epochs.
  - **Performance Trends**: Described any trends observed during hyperparameter tuning.
