## When central banks talk too much: quantifying information saturation in monetary policy communication

#### Introduction 

**Objective:**  
Study how the intensity and frequency of central bank communication affect market behavior, focusing on the potential non-linear effects of “information overload.”  
 
Hypothesis: beyond a certain threshold, excessive communication (too frequent, repetitive, or inconsistent) leads to *communication fatigue*, markets begin to discount the information, and new speeches lose their marginal impact on volatility and pricing.

#### Installation of the packages

Installation of the packages listed in the README (if you haven’t already)

In [4]:
# pip install selenium chromedriver-autoinstaller webdriver-manager pandas numpy geopy dill matplotlib scipy seaborn folium scikit-learn xgboost s3fs

In [5]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#### I. **Data collection & preprocessing**

- Official speeches from the Federal Reserve (2007–2025) 

In [6]:
## The code for scraping the Federal Reserve website is located in the file "scraping.py"
#from scraping2 import main
#main()

- Financial data (VIX, S&P500, US 10Y, USD Index)
- Merge and align both datasets at a weekly or daily frequency

#### II. **NLP analysis (content features)**
- Measure the tone of each speech using FinBERT (positive, negative, neutral)
- Compute lexical and semantic indicators:
    - Lexical richness, sentence length, word count
    - Semantic similarity across speeches (redundancy)
    - Tone dispersion across speakers in the same week (coherence)
- Aggregate these NLP features at a weekly level

#### III. **Building the Communication Intensity Index (CII)**

- Combine:
    - Quantitative metrics (# speeches, # speakers, avg word count)
    - Tonal metrics (average tone, tone dispersion)
    - Semantic metrics (redundancy, overlap)
- Standardize variables (z-scores)
- Optionally, derive weights via PCA or regression importance
- Visualize and interpret temporal patterns of CII (crisis vs calm periods)

#### IV.. **Modeling the market response**
- Link CII_t (and its lags) to changes in:
    - VIX (volatility expectations)
    - Realized volatility of S&P500
    - Bond yields (10Y)
    - USD index (DXY)
- Estimation methods:
    - Baseline OLS or ridge regression
    - Non-linear models: Random Forest, XGBoost, LightGBM
    - Temporal models: LSTM or GRU for sequential effects
- Evaluate predictive performance (R², MAE, MSE)

#### V. **Interpretability & non-linear effects**
- Use Partial Dependence Plots (PDP) and SHAP values
    - Identify where the marginal effect of communication intensity turns negative
    - Visualize which features contribute most to volatility
- Compare models (linear vs non-linear) to validate saturation effects

#### VI. **Robustness & extensions**
- Subsample analysis:
    - Crisis vs calm periods (e.g., 2008, 2013, 2020–2022)
    - Hawkish vs dovish speakers
- Granger causality tests:
    - Does communication *drive* volatility or react to it?
- Cross-bank comparison (ECB, BoE) for future extension
