In [3]:
pip install fredapi


Collecting fredapi
  Using cached fredapi-0.5.2-py3-none-any.whl.metadata (5.0 kB)
Using cached fredapi-0.5.2-py3-none-any.whl (11 kB)
Installing collected packages: fredapi
Successfully installed fredapi-0.5.2
Note: you may need to restart the kernel to use updated packages.


# 1. FRED Economic Indicators


In [1]:
import pandas as pd
from fredapi import Fred
import datetime

fred = Fred(api_key="ca1df25d337ca0574f127bd05ef02257")

start_date = "2000-01-01"
end_date = datetime.datetime.today().strftime("%Y-%m-%d")  
series_dict = {
    "fed_funds_rate": "FEDFUNDS",      
    "cpi": "CPIAUCSL",                 
    "unemployment": "UNRATE",          
    "gdp": "GDP",                      
    "nonfarm_payrolls": "PAYEMS",     
    "industrial_production": "INDPRO"  
}

def download_fred_series(series_id, start, end):
    data = fred.get_series(series_id, observation_start=start, observation_end=end)
    data = data.to_frame(name=series_id)
    data.index = pd.to_datetime(data.index)
    return data

dataframes = {}
for name, sid in series_dict.items():
    print(f"Downloading {name} ({sid})...")
    df = download_fred_series(sid, start_date, end_date)
    dataframes[name] = df


monthly_index = pd.date_range(start=start_date, end=end_date, freq='M')



final_df = pd.DataFrame(index=monthly_index)

for name, df in dataframes.items():
   
    if name == "gdp":
        
        df_monthly = df.resample('M').ffill()
    else:
       
        df_monthly = df.resample('M').last()  
    
    final_df = final_df.join(df_monthly, how='left')

col_map = {
    "FEDFUNDS": "fed_funds_rate",
    "CPIAUCSL": "cpi",
    "UNRATE": "unemployment",
    "GDP": "gdp",
    "PAYEMS": "nonfarm_payrolls",
    "INDPRO": "industrial_production"
}
final_df = final_df.rename(columns=col_map)


final_df = final_df.ffill()

print(final_df.head())
print(final_df.tail())

final_df.to_csv("economic_indicators_2000_present_monthly.csv")
print("Data saved to economic_indicators_2000_present_monthly.csv")


Downloading fed_funds_rate (FEDFUNDS)...
Downloading cpi (CPIAUCSL)...
Downloading unemployment (UNRATE)...
Downloading gdp (GDP)...
Downloading nonfarm_payrolls (PAYEMS)...
Downloading industrial_production (INDPRO)...
            fed_funds_rate    cpi  unemployment        gdp  nonfarm_payrolls  \
2000-01-31            5.45  169.3           4.0  10002.179          131009.0   
2000-02-29            5.73  170.0           4.1  10002.179          131120.0   
2000-03-31            5.85  171.0           4.0  10002.179          131604.0   
2000-04-30            6.02  170.9           3.8  10247.720          131883.0   
2000-05-31            6.27  171.2           4.0  10247.720          132106.0   

            industrial_production  
2000-01-31                91.4092  
2000-02-29                91.7245  
2000-03-31                92.0830  
2000-04-30                92.6659  
2000-05-31                92.9347  
            fed_funds_rate      cpi  unemployment        gdp  \
2024-07-31         

In [3]:
import pandas as pd

input_file_path = r"C:\Users\User\Desktop\interest rate decision\economic_indicators_2000_present_monthly.xlsx"
output_file_path = r"C:\Users\User\Desktop\interest rate decision\economic_indicators_transformed.csv"

df = pd.read_excel(input_file_path, index_col=0, parse_dates=True)

df = df.sort_index()

print("Initial Data:")
print(df.head())


df['gdp'] = df['gdp'] / 1000.0

df['nonfarm_payrolls'] = df['nonfarm_payrolls'] / 1000.0

df['cpi_mom'] = df['cpi'].pct_change() * 100
df['industrial_production_mom'] = df['industrial_production'].pct_change() * 100


df.dropna(inplace=True)

print("\nData After Transformations:")
print(df.head())

df.to_csv(output_file_path, index=True)

print(f"\nThe transformed dataset has been saved to: {output_file_path}")


Initial Data:
            fed_funds_rate    cpi  unemployment        gdp  nonfarm_payrolls  \
date                                                                           
2000-01-31            5.45  169.3           4.0  10002.179            131009   
2000-02-29            5.73  170.0           4.1  10002.179            131120   
2000-03-31            5.85  171.0           4.0  10002.179            131604   
2000-04-30            6.02  170.9           3.8  10247.720            131883   
2000-05-31            6.27  171.2           4.0  10247.720            132106   

            industrial_production  
date                               
2000-01-31                91.4092  
2000-02-29                91.7245  
2000-03-31                92.0830  
2000-04-30                92.6659  
2000-05-31                92.9347  

Data After Transformations:
            fed_funds_rate    cpi  unemployment        gdp  nonfarm_payrolls  \
date                                                            

## Data Preparation Process: Key Economic Indicators

As part of the data preparation process for analyzing key economic indicators, I began by collecting data from **FRED (Federal Reserve Economic Data)**, an essential repository for economic datasets. This task required connecting to the **FRED API** using my unique API key, enabling seamless access to various economic time series.

---

### Data Collection  

The first step was defining the specific economic indicators of interest. I selected six key datasets critical for understanding economic conditions:

1. **Federal Funds Rate (FEDFUNDS)**:  
   Represents the interest rate at which banks lend to each other overnight, providing a direct measure of monetary policy.

2. **Consumer Price Index (CPIAUCSL)**:  
   A vital indicator for inflation, showing changes in the purchasing power of money over time.

3. **Unemployment Rate (UNRATE)**:  
   Reflects the health of the labor market and the percentage of the labor force currently without work.

4. **Gross Domestic Product (GDP)**:  
   Measures the economy's overall size and health, recorded quarterly.

5. **Nonfarm Payrolls (PAYEMS)**:  
   A key indicator of employment trends, representing the number of employed workers in various sectors (excluding farming).

6. **Industrial Production Index (INDPRO)**:  
   Tracks changes in the output of industrial sectors like manufacturing, mining, and utilities.

I set the **date range** to span from **January 2000 to the present**, ensuring comprehensive coverage of historical and recent economic activity. Each series was downloaded from FRED and stored as a separate DataFrame with a **timestamp index** for temporal alignment.

---

### Data Processing and Alignment  

The data from FRED came in **varying frequencies**—monthly for most indicators but quarterly for GDP. To create a consistent dataset for analysis, I:  

- **Reindexed** all series to match a **monthly frequency**.  
- For **GDP**, which naturally has a quarterly frequency, I applied **forward-filling** to propagate each quarter's value across its respective months.

To improve clarity and usability:  
- I **renamed columns** for better readability:  
   - `FEDFUNDS` → `fed_funds_rate`  
   - `CPIAUCSL` → `cpi`  
   - `UNRATE` → `unemployment`  

Any **missing values** were forward-filled to maintain continuity and prevent gaps in the analysis. The clean, aligned dataset was then saved as:  
**`economic_indicators_2000_present_monthly.csv`**

---

### Data Transformation  

With the initial dataset prepared, I performed several transformations to make the data more interpretable and analysis-ready.

#### **Unit Conversions**  
- **GDP** values, originally recorded in billions, were converted to **trillions** for consistency with macroeconomic reporting.  
- **Nonfarm Payrolls**, expressed in thousands, were converted to **millions**, a more intuitive scale for employment analysis.  

#### **Growth Rates**  
To capture dynamic changes in key indicators, I calculated **Month-over-Month (MoM)** percentage growth rates for:  
- **CPI (Consumer Price Index)**: Measuring inflation changes over time.  
- **Industrial Production Index**: Reflecting fluctuations in industrial output.  

These growth rates provided insights into how inflation and industrial activity evolved month to month.

#### **Cleaning Data**  
Rows with **NaN values** arising from the percentage change calculations (e.g., the first row of each series) were dropped to ensure a clean dataset.

---

### Final Dataset Preparation  

The final dataset included:  
- **Original indicators**  
- **Derived metrics** like **growth rates**  

The resulting well-structured and transformed dataset was saved as:  
**`economic_indicators_transformed.csv`**  

---

### Summary  

Through these steps, I ensured that the data collected from FRED was not only comprehensive but also processed and aligned in a way that supports **robust and insightful analysis**. This dataset is now ready for integration with other datasets or for standalone economic analysis.


# 2. FOMC Statement

In [15]:
import requests
from bs4 import BeautifulSoup
import os

file_path = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts\fomc_statements_cleaned.xlsx"  
import pandas as pd
fomc_data = pd.read_excel(file_path)

output_dir = "fomc_statements_texts"
os.makedirs(output_dir, exist_ok=True)

for index, row in fomc_data.iterrows():
    date = row['Date']
    link = row['Statement Link']
    
    try:
        response = requests.get(link)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')

        paragraphs = soup.find_all('p')

        statement_text = ""
        
        for p in paragraphs:
            p_text = p.get_text().strip()

            if p_text and len(p_text.split()) > 5:  
                statement_text += p_text + "\n"

        output_file = os.path.join(output_dir, f"{date}.txt")
        with open(output_file, "w", encoding="utf-8") as file:
            file.write(statement_text)
        
        print(f"Successfully saved statement for {date}")
    except Exception as e:
        print(f"Failed to download statement for {date}. Error: {e}")

print(f"All statements have been processed. Saved in '{output_dir}'")


Successfully saved statement for 2000-02-02
Successfully saved statement for 2000-03-10
Successfully saved statement for 2000-03-21
Successfully saved statement for 2000-05-16
Successfully saved statement for 2000-06-28
Successfully saved statement for 2000-08-22
Successfully saved statement for 2000-11-15
Successfully saved statement for 2000-12-19
Successfully saved statement for 2001-01-31
Successfully saved statement for 2001-02-10
Successfully saved statement for 2001-03-01
Successfully saved statement for 2001-03-20
Successfully saved statement for 2001-04-18
Successfully saved statement for 2001-05-15
Successfully saved statement for 2001-06-11
Successfully saved statement for 2001-06-27
Successfully saved statement for 2001-08-21
Successfully saved statement for 2001-09-17
Successfully saved statement for 2001-11-12
Successfully saved statement for 2002-01-30
Successfully saved statement for 2002-03-19
Successfully saved statement for 2002-06-11
Successfully saved statement for

In [19]:
import os

folder_path = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts"

def remove_duplicate_paragraphs(file_path):
    with open(file_path, "r", encoding="utf-8") as file:
        # Read the content of the file
        content = file.read()
    
    paragraphs = content.split('\n')
    
    normalized_paragraphs = [ ' '.join(paragraph.split()) for paragraph in paragraphs ]
    
    seen = set()
    unique_paragraphs = []
    
    for paragraph in normalized_paragraphs:
        if paragraph not in seen:
            unique_paragraphs.append(paragraph)
            seen.add(paragraph)
    
    cleaned_content = '\n'.join(unique_paragraphs)
    
    with open(file_path, "w", encoding="utf-8") as file:
        file.write(cleaned_content)

for filename in os.listdir(folder_path):
    if filename.endswith(".txt"):
        file_path = os.path.join(folder_path, filename)
        try:
            remove_duplicate_paragraphs(file_path)
            print(f"Removed duplicates from: {filename}")
        except Exception as e:
            print(f"Failed to clean {filename}: {e}")

print("Duplicate removal complete for all files.")


Removed duplicates from: 2000-02-02.txt
Removed duplicates from: 2000-03-10.txt
Removed duplicates from: 2000-03-21.txt
Removed duplicates from: 2000-05-16.txt
Removed duplicates from: 2000-06-28.txt
Removed duplicates from: 2000-08-22.txt
Removed duplicates from: 2000-11-15.txt
Removed duplicates from: 2000-12-19.txt
Removed duplicates from: 2001-01-31.txt
Removed duplicates from: 2001-02-10.txt
Removed duplicates from: 2001-03-01.txt
Removed duplicates from: 2001-03-20.txt
Removed duplicates from: 2001-04-18.txt
Removed duplicates from: 2001-05-15.txt
Removed duplicates from: 2001-06-11.txt
Removed duplicates from: 2001-06-27.txt
Removed duplicates from: 2001-08-21.txt
Removed duplicates from: 2001-09-17.txt
Removed duplicates from: 2001-11-12.txt
Removed duplicates from: 2002-01-30.txt
Removed duplicates from: 2002-03-19.txt
Removed duplicates from: 2002-06-11.txt
Removed duplicates from: 2002-06-26.txt
Removed duplicates from: 2002-07-05.txt
Removed duplicates from: 2002-08-13.txt


In [7]:
import os

folder_path = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts"
output_folder = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts_cleaned"
os.makedirs(output_folder, exist_ok=True)

def preprocess_text(text):
    text = text.replace("\n", " ").strip()
    text = " ".join(text.split())  
    return text

for file_name in os.listdir(folder_path):
    if file_name.endswith(".txt"):
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, "r", encoding="utf-8") as f:
            text = f.read()
        
        cleaned_text = preprocess_text(text)
        
        cleaned_file_path = os.path.join(output_folder, file_name)
        with open(cleaned_file_path, "w", encoding="utf-8") as f:
            f.write(cleaned_text)
        
        print(f"Processed and saved cleaned text for: {file_name}")

print(f"All text files have been processed and saved to: {output_folder}")


Processed and saved cleaned text for: 2000-02-02.txt
Processed and saved cleaned text for: 2000-03-10.txt
Processed and saved cleaned text for: 2000-03-21.txt
Processed and saved cleaned text for: 2000-05-16.txt
Processed and saved cleaned text for: 2000-06-28.txt
Processed and saved cleaned text for: 2000-08-22.txt
Processed and saved cleaned text for: 2000-11-15.txt
Processed and saved cleaned text for: 2000-12-19.txt
Processed and saved cleaned text for: 2001-01-31.txt
Processed and saved cleaned text for: 2001-02-10.txt
Processed and saved cleaned text for: 2001-03-01.txt
Processed and saved cleaned text for: 2001-03-20.txt
Processed and saved cleaned text for: 2001-04-18.txt
Processed and saved cleaned text for: 2001-05-15.txt
Processed and saved cleaned text for: 2001-06-11.txt
Processed and saved cleaned text for: 2001-06-27.txt
Processed and saved cleaned text for: 2001-08-21.txt
Processed and saved cleaned text for: 2001-09-17.txt
Processed and saved cleaned text for: 2001-11-

In [9]:
pip install textblob

Collecting textblob
  Using cached textblob-0.18.0.post0-py3-none-any.whl.metadata (4.5 kB)
Using cached textblob-0.18.0.post0-py3-none-any.whl (626 kB)
Installing collected packages: textblob
Successfully installed textblob-0.18.0.post0
Note: you may need to restart the kernel to use updated packages.


In [1]:
import os
from textblob import TextBlob

folder_path = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts_cleaned"

files = [f for f in os.listdir(folder_path) if f.endswith(".txt")]

if not files:
    print("No .txt files found in the specified directory.")
else:
    for file_name in files:
        file_path = os.path.join(folder_path, file_name)
        
        with open(file_path, "r", encoding="utf-8") as f:
            text = f.read()
        
        snippet = text[:200]  
        print(f"\n--- Analyzing File: {file_name} ---")
        print(f"Text snippet:\n{snippet}...\n")
        
        
        blob = TextBlob(text)
        sentiment = blob.sentiment
        
        print(f"Sentiment Analysis for {file_name}:")
        print(f"  Polarity: {sentiment.polarity}")
        print(f"  Subjectivity: {sentiment.subjectivity}")

    print("\nSentiment analysis complete. Review the polarity and subjectivity scores along with text snippets to confirm structure and quality.")



--- Analyzing File: 2000-02-02.txt ---
Text snippet:
For immediate release The Federal Open Market Committee voted today to raise its target for the federal funds rate by 50 basis points to 6-1/2 percent. In a related action, the Board of Governors appr...

Sentiment Analysis for 2000-02-02.txt:
  Polarity: 0.12222222222222225
  Subjectivity: 0.425

--- Analyzing File: 2000-03-10.txt ---
Text snippet:
For immediate release The Federal Open Market Committee voted today to raise its target for the federal funds rate by 25 basis points to 6 percent. In a related action, the Board of Governors approved...

Sentiment Analysis for 2000-03-10.txt:
  Polarity: 0.10020202020202022
  Subjectivity: 0.30030303030303035

--- Analyzing File: 2000-03-21.txt ---
Text snippet:
For immediate release The Federal Open Market Committee voted today to raise its target for the federal funds rate by 25 basis points to 5-3/4 percent. In a related action, the Board of Governors appr...

Sentiment Analysis for 

In [None]:
import os
import openai
import json
import csv

openai.api_key = ""

folder_path = r"C:\Users\User\Desktop\interest rate decision\fomc_statements_texts_cleaned"

model = "gpt-3.5-turbo"  

output_csv = "fomc_sentiment_analysis_results.csv"

headers = [
    "filename",
    "policy_stance",
    "overall_sentiment",
    "sentiment_score",
    "tone_keywords",
    "key_economic_concerns",
    "forward_guidance",
    "summary"
]

prompt_template = """
You are an expert in economics and monetary policy. I will provide you with the text of an FOMC (Federal Open Market Committee) statement. Analyze the statement carefully and provide a structured assessment in JSON format only, following these steps:

1. **Policy Stance (Hawkish / Dovish / Neutral)**: Determine if the statement suggests the Fed is more likely to raise rates (hawkish), lower rates (dovish), or maintain the current stance (neutral).

2. **Overall Sentiment**: Is the statement generally positive, negative, or neutral about current economic conditions? Consider outlook on growth, employment, and inflation.

3. **Sentiment Score**: Provide a numeric sentiment score between -1 and +1, where:
   - -1 indicates very negative sentiment
   - 0 indicates neutral sentiment
   - +1 indicates very positive sentiment
   Adjust intermediate values as you see fit. For example, slightly positive might be 0.2, strongly positive 0.7 or above, etc.

4. **Tone Keywords**: Identify up to three keywords that describe the tone (e.g., "confident", "cautious", "uncertain", "optimistic", "concerned").

5. **Key Economic Concerns**: Summarize the main economic factors mentioned (e.g., inflation pressure, employment situation, growth outlook).

6. **Forward Guidance**: Does the statement hint at future policy moves or conditions that would change policy? Summarize any indications.

7. **Summary**: Provide a short summary (1-2 sentences) capturing the essence of the Fed’s message.

Return your answer strictly in JSON format with keys:
{
  "policy_stance": "...",
  "overall_sentiment": "...",
  "sentiment_score": 0.0,
  "tone_keywords": ["...","...","..."],
  "key_economic_concerns": "...",
  "forward_guidance": "...",
  "summary": "..."
}

Do not include any additional text outside of the JSON.
"""

def analyze_text_with_openai(text):
    prompt = prompt_template + "\n\nFOMC Statement:\n\n" + text

    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,  # lower temperature for more consistent outputs
        max_tokens=1500,  # adjust if needed
    )

    content = response.choices[0].message['content'].strip()
    return content

results = []

for file_name in os.listdir(folder_path):
    if file_name.endswith(".txt"):
        file_path = os.path.join(folder_path, file_name)
        with open(file_path, "r", encoding="utf-8") as f:
            text = f.read()

        print(f"Analyzing {file_name}...")
        analysis_result = analyze_text_with_openai(text)
        
        try:
            data = json.loads(analysis_result)
            results.append([
                file_name,
                data.get("policy_stance", ""),
                data.get("overall_sentiment", ""),
                data.get("sentiment_score", ""),
                ", ".join(data.get("tone_keywords", [])),
                data.get("key_economic_concerns", ""),
                data.get("forward_guidance", ""),
                data.get("summary", "")
            ])
        except json.JSONDecodeError:
            print(f"JSON parsing failed for {file_name}. Response:\n{analysis_result}")
            results.append([file_name, "PARSE ERROR", "", "", "", "", "", ""])

with open(output_csv, "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(headers)
    writer.writerows(results)

print(f"\nAnalysis complete. Results saved to {output_csv}")


Analyzing 2000-02-02.txt...
Analyzing 2000-03-10.txt...
Analyzing 2000-03-21.txt...
Analyzing 2000-05-16.txt...
Analyzing 2000-06-28.txt...
Analyzing 2000-08-22.txt...
Analyzing 2000-11-15.txt...
Analyzing 2000-12-19.txt...
Analyzing 2001-01-31.txt...
Analyzing 2001-02-10.txt...
Analyzing 2001-03-01.txt...
Analyzing 2001-03-20.txt...
Analyzing 2001-04-18.txt...
Analyzing 2001-05-15.txt...
Analyzing 2001-06-11.txt...
Analyzing 2001-06-27.txt...
Analyzing 2001-08-21.txt...
Analyzing 2001-09-17.txt...
Analyzing 2001-11-12.txt...
Analyzing 2002-01-30.txt...
Analyzing 2002-03-19.txt...
Analyzing 2002-06-11.txt...
Analyzing 2002-06-26.txt...
Analyzing 2002-07-05.txt...
Analyzing 2002-08-13.txt...
Analyzing 2002-09-24.txt...
Analyzing 2002-10-12.txt...
Analyzing 2003-01-29.txt...
Analyzing 2003-03-18.txt...
Analyzing 2003-06-05.txt...
Analyzing 2003-06-25.txt...
Analyzing 2003-09-12.txt...
Analyzing 2003-09-16.txt...
Analyzing 2003-10-28.txt...
Analyzing 2003-12-08.txt...
Analyzing 2004-01-28

## FOMC Statement Analysis Process

### Identifying the Official Source  
I began by locating the **Federal Reserve’s official website**, where historical FOMC statements are published after each meeting. This ensured that all data was sourced from an **authentic and reliable source**.

---

### Manual Download of Raw Files  
The next step involved:  
- **Manually downloading** FOMC statements corresponding to each meeting date directly from the Fed’s website.  
- The statements were initially available in **HTML format** (or other provided formats).  
- All files were saved into a dedicated local folder: **`fomc_statements_texts`**.

---

### Extracting Plain Text from HTML  
To extract the relevant content:  
- I used **Python scripts** leveraging the `requests` library to fetch the HTML files and **BeautifulSoup** to parse them.  
- The scripts isolated the **main textual content** of each statement while removing:  
   - Headers, navigation menus, and footers  
   - Any irrelevant or boilerplate sections  

This process ensured that only the **core content** of each statement was retained.

---

### Cleaning and Normalizing the Text  
Once the plain text was extracted:  
- I cleaned the text by:  
   - Removing **duplicate paragraphs** and **excessive whitespace**  
   - Eliminating repetitive or irrelevant sections (e.g., boilerplate text)  

- The cleaned versions of the statements were saved in **`.txt` format** into a new directory: **`fomc_statements_texts_cleaned`**.  

---

### Qualitative Analysis Using OpenAI Models  
For a deeper understanding of the FOMC statements:  
- Each cleaned `.txt` file was sent to the **OpenAI API** (e.g., `gpt-3.5-turbo` or `gpt-4`) using a carefully designed **prompt**.  

- I requested a **structured JSON output** that included:  
   - **Policy stance**: Hawkish, Dovish, or Neutral  
   - **Overall sentiment**: Positive, Negative, or Neutral  
   - **Numerical sentiment score**: A value between **-1 and 1**  
   - **Tone-related keywords**: e.g., “cautious”, “accommodative”  
   - **Key economic concerns** mentioned in the statement  
   - **Forward guidance indications**  
   - **A short summary** of the statement  

This structured output provided a rich, qualitative assessment of each FOMC statement.

---

### Storing Results in a Centralized File  
The final step involved:  
- Parsing the **JSON responses** generated by the OpenAI model.  
- Aggregating all extracted features into a single **tabular format**.  
- The results were saved into an **Excel** or **CSV file**:  
   **`fomc_sentiment_analysis_results.xlsx`**.  

This centralized file contained all qualitative insights, neatly organized for further analysis or modeling.  

---

### Summary  
By following this process, I transformed raw FOMC statements into a **cleaned, structured dataset** enriched with **policy stance, sentiment scores, tone analysis**, and **key economic insights**. This dataset forms a strong foundation for understanding the FOMC’s decisions and their potential implications on monetary policy.


# 3. World Bank Economic Indicators


In [13]:
pip install wbgapi

Collecting wbgapi
  Downloading wbgapi-1.0.12-py3-none-any.whl.metadata (13 kB)
Downloading wbgapi-1.0.12-py3-none-any.whl (36 kB)
Installing collected packages: wbgapi
Successfully installed wbgapi-1.0.12
Note: you may need to restart the kernel to use updated packages.


In [5]:
import wbgapi as wb
import pandas as pd
import datetime

current_year = datetime.datetime.now().year

countries = [
    'CAN', # Canada
    'MEX', # Mexico
    'CHN', # China
    'JPN', # Japan
    'DEU', # Germany
    'GBR', # UK
    'FRA', # France
    'ITA', # Italy
    'ESP', # Spain
    'NLD', # Netherlands
    'CHE'  # Switzerland
]

# :
indicators = {
    'NY.GDP.MKTP.KD.ZG': 'gdp_growth',   # GDP growth (annual %)
    'FP.CPI.TOTL.ZG': 'inflation_cpi',  # Inflation, consumer prices (annual %)
    'SL.UEM.TOTL.ZS': 'unemployment'    # Unemployment (% of total labor force)
}

time_range = range(2000, current_year + 1)

df_list = []
for ind_code, ind_name in indicators.items():
    df_temp = wb.data.DataFrame(ind_code, countries, time_range, mrv=None, numericTimeKeys=True)
    df_temp = df_temp.stack().reset_index()
    df_temp.columns = ['economy', 'time', ind_name]
    df_list.append(df_temp)

df_final = df_list[0]
for extra_df in df_list[1:]:
    df_final = pd.merge(df_final, extra_df, on=['economy', 'time'], how='outer')

df_final.sort_values(by=['economy', 'time'], inplace=True)

df_final['date'] = pd.to_datetime(df_final['time'].astype(str) + '-01-01')

print(df_final.head(20))

df_final.to_csv('world_bank_indicators_with_date.csv', index=False)
print("Data saved to world_bank_indicators_with_date.csv")


   economy  time  gdp_growth  inflation_cpi  unemployment       date
0      CAN  2000    5.138539       2.719440         6.829 2000-01-01
1      CAN  2001    1.875098       2.525120         7.219 2001-01-01
2      CAN  2002    2.999255       2.258394         7.665 2002-01-01
3      CAN  2003    1.806385       2.758563         7.574 2003-01-01
4      CAN  2004    3.092364       1.857259         7.185 2004-01-01
5      CAN  2005    3.210454       2.213552         6.758 2005-01-01
6      CAN  2006    2.637944       2.002025         6.484 2006-01-01
7      CAN  2007    2.049905       2.138384         6.156 2007-01-01
8      CAN  2008    0.995406       2.370271         6.284 2008-01-01
9      CAN  2009   -2.915086       0.299467         8.460 2009-01-01
10     CAN  2010    3.090806       1.776872         8.178 2010-01-01
11     CAN  2011    3.137194       2.912135         7.637 2011-01-01
12     CAN  2012    1.755661       1.515678         7.392 2012-01-01
13     CAN  2013    2.325814      

## Global Economic Indicators Data Acquisition and Preparation  

To complement the U.S. domestic economic and FOMC statement data, I incorporated **global economic indicators** from the **World Bank’s World Development Indicators (WDI)**. My goal was to contextualize the Federal Reserve’s interest rate decisions within a broader international economic environment. The following process outlines the detailed steps I followed:  

---

### Rationale for Data Selection  

The U.S. economy and monetary policy are often influenced by **global economic conditions**. Changes in foreign GDP growth, inflation, and labor market performance can indirectly affect U.S. exports, imports, commodity prices, and financial markets. Therefore, I aimed to gather a **small but representative set of global indicators** to capture this international dimension.  

---

### Choosing Countries  

I selected countries that are either top U.S. trading partners or significant global economies capable of influencing worldwide economic trends. Specifically, I chose:  

- **Canada and Mexico**: As immediate neighbors and leading U.S. trading partners, their economic performance can quickly impact U.S. markets.  
- **China**: A major global economic powerhouse and key U.S. trading partner, China’s economic shifts influence global supply chains and price levels.  
- **Japan, Germany, the UK, and other European economies** (France, Italy, Spain, Netherlands, Switzerland): Including these advanced economies provides a broader perspective on the international environment. Their stable data and significant financial and trade linkages reveal global trends that may affect U.S. monetary policy.  

By selecting this **mix of North American, European, and Asian economies**, I ensured a well-rounded view of external economic conditions.

---

### Choosing Indicators  

To capture relevant macroeconomic trends, I focused on key indicators commonly used to gauge economic health:  

- **GDP Growth (annual %)**: Reflects economic momentum. Higher global growth could increase demand for U.S. exports or influence commodity prices.  
- **Inflation, Consumer Prices (annual %)**: Tracking global inflation trends helps identify external inflationary pressures that might spill over into the U.S.  
- **Unemployment (% of total labor force)**: Provides insights into global labor market conditions, affecting consumer demand, trade flows, and overall economic stability.  

---

### Data Source and Tools  

I accessed the **World Bank’s WDI database** programmatically using the **`wbgapi` Python package**. This allowed me to query and download the selected indicators seamlessly for the chosen countries over a defined time period (2000 to the present).  

---

### Data Download Process  

1. **API Query Setup**:  
   - I wrote a Python script specifying the **countries**, **indicators**, and the year range (2000 to the current year).  

2. **Downloading Data**:  
   - Using WBGAPI functions, I retrieved annual data for each country and indicator. Initially, the data was stored with **years as columns** and **countries as rows**.  

3. **Reshaping and Merging**:  
   - I transformed the data so that each row represented a unique combination of **country** and **year**.  
   - Additional columns for each chosen indicator were added.  
   - The year was converted into a proper **date format** (e.g., YYYY-01-01) for consistency, even though the data remains annual.  

---

### Data Cleaning and Formatting  

- Ensured all indicators were numeric (**float64**) and the date column was interpreted as a **datetime type**.  
- No missing values were found, simplifying the cleaning process.  
- Dropped any unnecessary columns, such as temporary time columns used during import.  
- The final dataset included:  
   - **economy**  
   - **date**  
   - **gdp_growth**  
   - **inflation_cpi**  
   - **unemployment**  

---

### Verification and Validation  

After processing, I verified:  
- **Data types**: Ensured all columns had correct formats (e.g., date as datetime, indicators as float).  
- **Content consistency**: Reviewed the first few rows to confirm the indicators made intuitive sense. GDP growth percentages, inflation rates, and unemployment levels appeared reasonable for the selected countries.  

---

### Conclusion  

This process of selecting countries, indicators, and careful data handling produced a **relevant, manageable, and high-quality dataset**. By integrating these **global economic indicators** into my analysis, I enhanced the analytical framework and provided greater context for understanding the Federal Reserve’s interest rate decisions. This approach also potentially improves the **predictive power** of the forecasting model.


# 4. Prepare the final file

In [21]:
import pandas as pd

economic_indicators_path = r"C:\Users\User\Desktop\interest rate decision\economic_indicators_transformed.xlsx"

economic_data = pd.read_excel(economic_indicators_path)

economic_data['date'] = pd.to_datetime(economic_data['date'], errors='coerce')

economic_data.set_index('date', inplace=True)


economic_data = economic_data.resample('M').ffill()

economic_data.reset_index(inplace=True)

output_path = r"C:\Users\User\Desktop\interest rate decision\economic_indicators_monthly_aligned.xlsx"
economic_data.to_excel(output_path, index=False)

print("Monthly frequency alignment complete and file saved at:", output_path)


Monthly frequency alignment complete and file saved at: C:\Users\User\Desktop\interest rate decision\economic_indicators_monthly_aligned.xlsx


In [23]:
import pandas as pd

world_bank_path = r"C:\Users\User\Desktop\interest rate decision\world_bank_indicators_cleaned.xlsx"


countries = [
    'CAN', # Canada
    'MEX', # Mexico
    'CHN', # China
    'JPN', # Japan
    'DEU', # Germany
    'GBR', # UK
    'FRA', # France
    'ITA', # Italy
    'ESP', # Spain
    'NLD', # Netherlands
    'CHE'  # Switzerland
]

wb_data = pd.read_excel(world_bank_path)

wb_data['date'] = pd.to_datetime(wb_data['date'], errors='coerce')

wb_data = wb_data[wb_data['country'].isin(countries)]

global_annual = wb_data.groupby('date', as_index=False).agg({
    'gdp_growth': 'mean',
    'inflation_cpi': 'mean',
    'unemployment': 'mean'
})

global_annual.set_index('date', inplace=True)

global_monthly = global_annual.resample('M').ffill()


global_monthly.reset_index(inplace=True)

global_output_path = r"C:\Users\User\Desktop\interest rate decision\world_bank_global_monthly.xlsx"
global_monthly.to_excel(global_output_path, index=False)

print("Global monthly dataset (using specified countries) created and saved at:", global_output_path)


Global monthly dataset (using specified countries) created and saved at: C:\Users\User\Desktop\interest rate decision\world_bank_global_monthly.xlsx


In [35]:
import pandas as pd

file_path = r"C:\Users\User\Desktop\interest rate decision\fomc_sentiment_analysis_results.xlsx"

df_fomc = pd.read_excel(file_path)

print("Original FOMC Sentiment Data - First Few Rows:")
print(df_fomc.head())

df_fomc['date'] = pd.to_datetime(df_fomc['filename'].str.extract(r'(\d{4}-\d{2}-\d{2})')[0])

df_fomc['merge_date'] = df_fomc['date'] + pd.offsets.MonthEnd(0)

df_fomc.rename(columns={
    'policy_stance': 'fomc_policy_stance',
    'overall_sentiment': 'fomc_overall_sentiment',
    'sentiment_score': 'fomc_sentiment_score',
    'tone_keywords': 'fomc_tone_keywords',
    'key_economic_concerns': 'fomc_key_economic_concerns',
    'forward_guidance': 'fomc_forward_guidance',
    'summary': 'fomc_summary'
}, inplace=True)

print("\nColumns after renaming:")
print(df_fomc.columns)

df_fomc.sort_values(by=['merge_date', 'date'], inplace=True)

df_fomc_latest = df_fomc.drop_duplicates(subset=['merge_date'], keep='last')

columns_to_keep = ['merge_date', 'fomc_policy_stance', 'fomc_sentiment_score']
if all(column in df_fomc_latest.columns for column in columns_to_keep):
    df_fomc_latest = df_fomc_latest[columns_to_keep]
else:
    raise KeyError(f"One or more columns from {columns_to_keep} are missing.")

min_date, max_date = df_fomc_latest['merge_date'].min(), df_fomc_latest['merge_date'].max()
monthly_range = pd.date_range(start=min_date, end=max_date, freq='M')

df_fomc_reindexed = df_fomc_latest.set_index('merge_date').reindex(monthly_range).rename_axis('merge_date').reset_index()

df_fomc_reindexed['fomc_policy_stance'].fillna(method='ffill', inplace=True)
df_fomc_reindexed['fomc_sentiment_score'].fillna(method='ffill', inplace=True)

print("\nCleaned and Complete FOMC Sentiment Data - First Few Rows:")
print(df_fomc_reindexed.head(20))

output_path = r"C:\Users\User\Desktop\interest rate decision\fomc_sentiment_cleaned_filled_latest.xlsx"
df_fomc_reindexed.to_excel(output_path, index=False)
print(f"\nCleaned and complete FOMC Sentiment data saved to {output_path}")


Original FOMC Sentiment Data - First Few Rows:
         filename policy_stance overall_sentiment  sentiment_score  \
0  2000-02-02.txt       Hawkish          Negative             -0.8   
1  2000-03-10.txt       Hawkish          Negative             -0.7   
2  2000-03-21.txt       Hawkish          Negative             -0.7   
3  2000-05-16.txt        Dovish          Negative             -0.6   
4  2000-06-28.txt        Dovish          Negative             -0.7   

                         tone_keywords  \
0   concerned, inflationary, imbalance   
1  concerned, inflationary, heightened   
2  concerned, inflationary, imbalances   
3     uncertainty, restraint, weigh on   
4         soften, erosion, uncertainty   

                               key_economic_concerns  \
0  Excess demand, potential supply growth, inflat...   
1  Concerns about demand exceeding potential supp...   
2  Rising demand exceeding potential supply, infl...   
3  Decline in capital equipment investment, erosi...   

In [37]:
import pandas as pd

original_file_path = r"C:\Users\User\Desktop\interest rate decision\fomc_sentiment_analysis_results.xlsx"
cleaned_file_path = r"C:\Users\User\Desktop\interest rate decision\fomc_sentiment_cleaned_filled_latest.xlsx"

df_original = pd.read_excel(original_file_path)
df_cleaned = pd.read_excel(cleaned_file_path)

print("Original FOMC Sentiment Data - First Few Rows:")
print(df_original.head())

print("\nCleaned FOMC Sentiment Data - First Few Rows:")
print(df_cleaned.head())

print("\nOriginal FOMC Sentiment Data Info:")
df_original.info()

print("\nCleaned FOMC Sentiment Data Info:")
df_cleaned.info()

print("\nOriginal FOMC Sentiment Date Range:")
print(df_original['filename'].str.extract(r'(\d{4}-\d{2}-\d{2})').dropna().astype('datetime64').min(),
      df_original['filename'].str.extract(r'(\d{4}-\d{2}-\d{2})').dropna().astype('datetime64').max())

print("\nCleaned FOMC Sentiment Date Range:")
print(df_cleaned['merge_date'].min(), df_cleaned['merge_date'].max())

print("\nMissing Values in Original FOMC Sentiment Data:")
print(df_original.isnull().sum())

print("\nMissing Values in Cleaned FOMC Sentiment Data:")
print(df_cleaned.isnull().sum())

print("\nCleaned Data - Unique Values in 'fomc_policy_stance':")
print(df_cleaned['fomc_policy_stance'].unique())

print("\nCleaned Data - Sentiment Score Statistics:")
print(df_cleaned['fomc_sentiment_score'].describe())

comparison_output_path = r"C:\Users\User\Desktop\interest rate decision\fomc_comparison.xlsx"
with pd.ExcelWriter(comparison_output_path) as writer:
    df_original.to_excel(writer, sheet_name="Original", index=False)
    df_cleaned.to_excel(writer, sheet_name="Cleaned", index=False)

print(f"\nComparison saved to {comparison_output_path}")


Original FOMC Sentiment Data - First Few Rows:
         filename policy_stance overall_sentiment  sentiment_score  \
0  2000-02-02.txt       Hawkish          Negative             -0.8   
1  2000-03-10.txt       Hawkish          Negative             -0.7   
2  2000-03-21.txt       Hawkish          Negative             -0.7   
3  2000-05-16.txt        Dovish          Negative             -0.6   
4  2000-06-28.txt        Dovish          Negative             -0.7   

                         tone_keywords  \
0   concerned, inflationary, imbalance   
1  concerned, inflationary, heightened   
2  concerned, inflationary, imbalances   
3     uncertainty, restraint, weigh on   
4         soften, erosion, uncertainty   

                               key_economic_concerns  \
0  Excess demand, potential supply growth, inflat...   
1  Concerns about demand exceeding potential supp...   
2  Rising demand exceeding potential supply, infl...   
3  Decline in capital equipment investment, erosi...   

## Data Preparation and Alignment for Merging  

To seamlessly integrate the three datasets—**FOMC Sentiment Data**, **Economic Indicators Data**, and **World Bank Global Indicators Data**—a series of preprocessing steps were undertaken. These adjustments were essential to align the structure, format, and temporal resolution of the data, ensuring compatibility while maintaining the integrity of the information. Below is a summary of the steps and their rationale.

---

### FOMC Sentiment Data  

- **Original Format**:  
   - The dataset had irregular dates derived from FOMC statement release schedules, which typically occur every two months.  

- **Adjustments Made**:  
   - Extracted dates from filenames to create a **merge_date** column in a monthly format.  
   - Added missing months to align with the monthly resolution of the other datasets. Missing data was forward-filled to propagate the latest available sentiment and policy stance values.  
   - Prefixed all columns with **`fomc_`** to distinguish them from similar indicators in other datasets.  

- **Reason**:  
   - Standardizing the temporal resolution to monthly intervals enabled alignment with the Economic Indicators Data and ensured that no gaps occurred during the merging process.

---

### Economic Indicators Data  

- **Original Format**:  
   - This dataset was already in a **monthly resolution**, containing detailed U.S.-specific economic metrics (e.g., inflation, unemployment, industrial production).  

- **Adjustments Made**:  
   - Renamed the `date` column to **merge_date** for consistency across datasets.  
   - Verified data integrity, ensuring no missing values or inconsistencies.  

- **Reason**:  
   - Minimal adjustments were needed as the dataset was already optimized for analysis and compatible with the merging process.

---

### World Bank Global Indicators Data  

- **Original Format**:  
   - The dataset contained **annual data** for multiple countries (excluding the U.S.), which required temporal alignment with the other datasets.  

- **Adjustments Made**:  
   - Converted annual data into **monthly resolution** by forward-filling values across months within each year.  
   - Added a **merge_date** column to match the temporal resolution of the other datasets.  
   - Prefixed all columns with **`global_`** to differentiate global metrics from U.S.-specific data in the Economic Indicators Data.  

- **Reason**:  
   - Aligning the temporal resolution and adding prefixes ensured the integration of global metrics without conflicts or overwriting U.S.-specific data.

---

### Importance of the Adjustments  

These preprocessing steps were critical for creating a unified dataset capable of accurate and comprehensive analysis of economic and policy trends. Without these adjustments:  

1. **Temporal Misalignments**:  
   - Irregular dates across datasets would have resulted in incomplete or misleading data.  

2. **Overlapping Column Names**:  
   - Duplicate column names could have caused confusion or inaccuracies during analysis.  

3. **Missing Data**:  
   - Unfilled gaps for specific months would have reduced the effectiveness of downstream machine learning models and statistical analyses.  

---

### Conclusion  

By addressing these challenges, the datasets were successfully prepared for seamless merging. The resulting unified dataset serves as a strong foundation for building predictive models and extracting meaningful insights into the relationship between economic indicators and Federal Reserve interest rate decisions.


In [44]:
import pandas as pd

world_bank_path = r"C:\Users\User\Desktop\interest rate decision\workfile\world_bank_global_monthly.xlsx"
economic_indicators_path = r"C:\Users\User\Desktop\interest rate decision\workfile\economic_indicators_monthly_aligned.xlsx"
fomc_sentiment_path = r"C:\Users\User\Desktop\interest rate decision\workfile\fomc_sentiment_cleaned_filled_latest.xlsx"

world_bank_data = pd.read_excel(world_bank_path)
economic_indicators_data = pd.read_excel(economic_indicators_path)
fomc_sentiment_data = pd.read_excel(fomc_sentiment_path)

fomc_sentiment_data.rename(columns={'merge_date': 'date'}, inplace=True)

merged_data = pd.merge(economic_indicators_data, fomc_sentiment_data, on='date', how='outer')
merged_data = pd.merge(merged_data, world_bank_data, on='date', how='outer')

merged_data.sort_values(by='date', inplace=True)

merged_data.fillna(method='ffill', inplace=True)

print("Merged Data - First Few Rows:")
print(merged_data.head())
print("\nMerged Data Info:")
merged_data.info()

output_path = r"C:\Users\User\Desktop\interest rate decision\workfile\final_merged_data.xlsx"
merged_data.to_excel(output_path, index=False)
print(f"Merged dataset saved to {output_path}")


Merged Data - First Few Rows:
          date  fed_funds_rate    cpi  unemployment        gdp  \
299 2000-01-31             NaN    NaN           NaN        NaN   
0   2000-02-29            5.73  170.0           4.1  10.002179   
1   2000-03-31            5.85  171.0           4.0  10.002179   
2   2000-04-30            6.02  170.9           3.8  10.247720   
3   2000-05-31            6.27  171.2           4.0  10.247720   

     nonfarm_payrolls  industrial_production   cpi_mom  \
299               NaN                    NaN       NaN   
0             131.120                91.7245  0.413467   
1             131.604                92.0830  0.588235   
2             131.883                92.6659 -0.058480   
3             132.106                92.9347  0.175541   

     industrial_production_mom fomc_policy_stance  fomc_sentiment_score  \
299                        NaN                NaN                   NaN   
0                     0.344932            Hawkish                  -0.8   

## Steps for Merging the Datasets  

Merging the datasets—**World Bank Data**, **Economic Indicators Data**, and **FOMC Sentiment Data**—was a critical step to create a unified and comprehensive dataset for analysis. Here’s a detailed breakdown of the process:  

---

### 1. Loading the Data  

I loaded the three datasets into memory:  

- **World Bank Data**:  
   - Contains global economic indicators, such as GDP growth, inflation, and unemployment, aggregated monthly.  

- **Economic Indicators Data**:  
   - Provides U.S.-specific metrics like the Federal Funds Rate, unemployment rate, and industrial production data, recorded monthly.  

- **FOMC Sentiment Data**:  
   - Includes sentiment analysis results and policy stance derived from FOMC statements, available every two months.  

---

### 2. Standardizing the Date Column  

To prepare the datasets for merging:  
- I ensured the **date column** in all three files was in a consistent format (**datetime64[ns]**).  
- This step was crucial because the **merge operation** relies on matching dates across datasets.  

---

### 3. Filling Missing Months in FOMC Data  

The FOMC data lacked entries for certain months due to the bi-monthly schedule of FOMC meetings. To align it with the monthly frequency of the other datasets:  

- I applied **forward-fill** to propagate the latest available sentiment and policy stance to subsequent months.  
- This ensured that every month had an associated **FOMC stance** and **sentiment score**, making the dataset compatible with the monthly resolution of the others.  

---

### 4. Aligning the World Bank Data  

The World Bank data was initially provided at an **annual frequency**. To align it with the other datasets:  
- I transformed the data to a **monthly resolution** by repeating each annual value across all months of the corresponding year.  
- This adjustment allowed for proper alignment with the **monthly data** from the Economic Indicators Data and FOMC Sentiment Data.  

---

### 5. Outer Join for Merging  

To combine the datasets:  
- I performed an **outer join** on the `date` column.  
- This method preserved all dates from the three datasets, even if one or more sources didn’t have data for a specific month.  
- The **outer join** ensured no information was lost during the merging process.  

---

### 6. Handling Missing Values  

After merging, some missing values were observed, especially for months where a particular source lacked data. To address this:  

- I used **forward-fill** to propagate previous values for both **numeric** and **categorical fields**.  
- This maintained **continuity** in the data while avoiding artificial fluctuations.  

---

### Summary  

By following these steps, I created a unified dataset that integrates **U.S. domestic metrics**, **global economic indicators**, and **FOMC sentiment analysis results**. The resulting dataset preserves temporal consistency, ensures completeness, and provides a robust foundation for subsequent analysis and modeling.


In [53]:
import pandas as pd

world_bank_path = r"C:\Users\User\Desktop\interest rate decision\workfile\world_bank_global_monthly.xlsx"
economic_indicators_path = r"C:\Users\User\Desktop\interest rate decision\workfile\economic_indicators_monthly_aligned.xlsx"
fomc_sentiment_path = r"C:\Users\User\Desktop\interest rate decision\workfile\fomc_sentiment_cleaned_filled_latest.xlsx"
merged_file_path = r"C:\Users\User\Desktop\interest rate decision\workfile\final_merged_data.xlsx"

world_bank = pd.read_excel(world_bank_path)
economic_indicators = pd.read_excel(economic_indicators_path)
fomc_sentiment = pd.read_excel(fomc_sentiment_path)
merged_data = pd.read_excel(merged_file_path)


print("\nValidation 1: Global Indicators Consistency")
world_bank_check = merged_data[['date', 'global_gdp_growth', 'global_inflation_cpi', 'global_unemployment']].merge(
    world_bank, on='date', how='inner', suffixes=('_merged', '_original')
)
world_bank_check['gdp_growth_match'] = (
    world_bank_check['global_gdp_growth_merged'] == world_bank_check['global_gdp_growth_original']
)
world_bank_check['inflation_match'] = (
    world_bank_check['global_inflation_cpi_merged'] == world_bank_check['global_inflation_cpi_original']
)
world_bank_check['unemployment_match'] = (
    world_bank_check['global_unemployment_merged'] == world_bank_check['global_unemployment_original']
)
print(world_bank_check[['date', 'gdp_growth_match', 'inflation_match', 'unemployment_match']].head())

print("\nValidation 2: FOMC Sentiment Consistency")
fomc_check = merged_data[['date', 'fomc_policy_stance', 'fomc_sentiment_score']].merge(
    fomc_sentiment, left_on='date', right_on='merge_date', how='inner', suffixes=('_merged', '_original')
)
fomc_check['policy_stance_match'] = (
    fomc_check['fomc_policy_stance_merged'] == fomc_check['fomc_policy_stance_original']
)
fomc_check['sentiment_score_match'] = (
    fomc_check['fomc_sentiment_score_merged'] == fomc_check['fomc_sentiment_score_original']
)
print(fomc_check[['date', 'policy_stance_match', 'sentiment_score_match']].head())

print("\nValidation 3: Economic Indicators Consistency")
economic_check = merged_data[['date', 'fed_funds_rate', 'cpi', 'unemployment', 'gdp']].merge(
    economic_indicators[['date', 'fed_funds_rate', 'cpi', 'unemployment', 'gdp']],
    on='date',
    how='inner',
    suffixes=('_merged', '_original')
)
economic_check['fed_funds_rate_match'] = (
    economic_check['fed_funds_rate_merged'] == economic_check['fed_funds_rate_original']
)
economic_check['cpi_match'] = (
    economic_check['cpi_merged'] == economic_check['cpi_original']
)
economic_check['unemployment_match'] = (
    economic_check['unemployment_merged'] == economic_check['unemployment_original']
)
print(economic_check[['date', 'fed_funds_rate_match', 'cpi_match', 'unemployment_match']].head())

print("\nSummary of Validation Mismatches:")
print(f"World Bank mismatches:\n{world_bank_check[(~world_bank_check['gdp_growth_match']) | (~world_bank_check['inflation_match']) | (~world_bank_check['unemployment_match'])].shape[0]} rows")
print(f"FOMC mismatches:\n{fomc_check[(~fomc_check['policy_stance_match']) | (~fomc_check['sentiment_score_match'])].shape[0]} rows")
print(f"Economic mismatches:\n{economic_check[(~economic_check['fed_funds_rate_match']) | (~economic_check['cpi_match']) | (~economic_check['unemployment_match'])].shape[0]} rows")



Validation 1: Global Indicators Consistency
        date  gdp_growth_match  inflation_match  unemployment_match
0 2000-02-29              True             True                True
1 2000-03-31              True             True                True
2 2000-04-30              True             True                True
3 2000-05-31              True             True                True
4 2000-06-30              True             True                True

Validation 2: FOMC Sentiment Consistency
        date  policy_stance_match  sentiment_score_match
0 2000-02-29                 True                   True
1 2000-03-31                 True                   True
2 2000-04-30                 True                   True
3 2000-05-31                 True                   True
4 2000-06-30                 True                   True

Validation 3: Economic Indicators Consistency
        date  fed_funds_rate_match  cpi_match  unemployment_match
0 2000-02-29                  True       True     

## Validation Results of the Merged Dataset  

The validation results confirm that the merged dataset is consistent with the source datasets. Comprehensive checks were performed to ensure alignment, and no mismatches were detected. Below is a detailed summary:

---

### Global Indicators (World Bank Data)  
- All rows in the merged dataset match the original World Bank dataset for key indicators:  
  - **global_gdp_growth**  
  - **global_inflation_cpi**  
  - **global_unemployment**  
- The global data was correctly aligned and repeated for monthly intervals, as intended, ensuring the temporal transformation was accurate.

---

### FOMC Sentiment Data  
- The fields **fomc_policy_stance** and **fomc_sentiment_score** align perfectly between the merged dataset and the original FOMC sentiment data.  
- This confirms that the forward-filling process and integration of FOMC data into a monthly timeline were executed correctly.  

---

### Economic Indicators (U.S.-Specific Data)  
- Key economic indicators, including **fed_funds_rate**, **cpi**, and **unemployment**, match the original Economic Indicators dataset.  
- This alignment ensures that the U.S.-specific monthly data was preserved without errors during the merging process.  

---

### Observations  
1. **No Mismatches Detected**:  
   - There are zero mismatched rows across all datasets, confirming the accuracy of the merging process.  

2. **Logical Alignment Achieved**:  
   - The forward-filling process and monthly alignment for all datasets were implemented correctly, preserving the integrity and logical structure of the data.  

---

### Conclusion  
These validation results affirm that the merged dataset is a faithful integration of the source datasets. It is consistent, complete, and ready for downstream analysis, ensuring that no critical information was lost or misaligned during the merging process.
