---
title: "Data Collection"
format:
    html: 
        code-fold: false
---

{{< include instructions.qmd >}} 


{{< include overview.qmd >}} 

{{< include methods.qmd >}} 

# Code 

Provide the source code used for this section of the project here.

If you're using a package for code organization, you can import it at this point. However, make sure that the **actual workflow steps**—including data processing, analysis, and other key tasks—are conducted and clearly demonstrated on this page. The goal is to show the technical flow of your project, highlighting how the code is executed to achieve your results.

Ensure that the code is well-commented to enhance readability and understanding for others who may review or use it. If relevant, link to additional documentation or external references that explain any complex components. This section should give readers a clear view of how the project is implemented from a technical perspective.

This page is a technical narrative, NOT just a notebook with a collection of code cells, include in-line Prose, to describe what is going on.

## Import data

In the following code, we utilize the FRED API (Federal Reserve Economic Data) for macroeconomic data and BEA API (Bureau of Economic Analysis) for assets data. This combination allows us to gather comprehensive economic indicators and market performance data for our analysis.

## Set the path to store data

In [1]:
import os  # Import os module for handling file paths

# Define the target folder for saving CSV files
output_folder = "../../data/raw-data"

# Create the folder if it doesn't exist
if not os.path.exists(output_folder):
    os.makedirs(output_folder)



## Importing and Processing Macroeconomic Data from FRED API

In [21]:
import requests
import pandas as pd

# API Key
API_KEY = "01e2e0f764ac9522003f01e4458beabe"

# FRED API URL
base_url = "https://api.stlouisfed.org/fred/series/observations"

# Define the series IDs to fetch
series_ids = {
    "GDP": "GDP",
    "CPI": "CPIAUCSL",
    "Unemployment": "UNRATE",
    "FedFundsRate": "FEDFUNDS",
    "M2": "M2SL",
    "Umscent": "UMCSENT",
    "real_estate": "CSUSHPINSA",
    "Exports": "EXPGS",
    "Imports": "IMPGS"
}

# Create an empty list to store dataframes
data_frames = []

for name, series_id in series_ids.items():
    # Construct the API request parameters
    params = {
        "series_id": series_id,
        "api_key": API_KEY,
        "file_type": "json"
    }
    # Send the API request
    response = requests.get(base_url, params=params)
    json_data = response.json()

    # Extract the observations data
    observations = json_data["observations"]
    df = pd.DataFrame(observations)
    
    # Process the dataframe
    df["value"] = pd.to_numeric(df["value"], errors="coerce")  # Convert invalid values to NaN
    df["date"] = pd.to_datetime(df["date"])  # Ensure the date column is in datetime format
    df = df.rename(columns={"value": name})  # Rename the value column to the series name
    df = df[["date", name]]  # Keep only the date and series value columns
    data_frames.append(df)

# Merge all the dataframes into a single dataframe
merged_data = data_frames[0]
for df in data_frames[1:]:
    merged_data = pd.merge(merged_data, df, on="date", how="outer")

# Save the raw collected data for inspection or further processing
raw_data_file = os.path.join(output_folder, "macro_series_raw_collection.csv")
merged_data.to_csv(raw_data_file, index=False)
print(f"Raw collected data saved to '{raw_data_file}'.")

Data for macro_series saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/macro_series_raw.csv'.
          date        GDP    CPI  Unemployment  FedFundsRate      M2  Umscent  \
640 2000-01-01  10002.179  169.3           4.0          5.45  4667.6    112.0   
641 2000-02-01  10250.952  170.0           4.1          5.73  4680.9    111.3   
642 2000-03-01  10250.952  171.0           4.0          5.85  4711.7    107.1   
643 2000-04-01  10247.720  170.9           3.8          6.02  4767.8    109.2   
644 2000-05-01  10250.952  171.2           4.0          6.27  4755.7    110.7   
645 2000-06-01  10250.952  172.2           4.0          6.53  4773.6    106.4   
646 2000-07-01  10318.165  172.7           4.0          6.54  4791.3    108.3   
647 2000-08-01  10250.952  172.7           4.1          6.50  4819.5    107.3   
648 2000-09-01  10250.952  173.6           3.9          6.52  4855.3    106.8   
649 2000-10-01  10435.744  173.9           3.9          6.51  4871

## Fetching 10-Year Treasury Yield Data from FRED API

In [22]:
# Define request parameters
params = {
    "series_id": "DGS10",   # 10-Year Treasury Yield
    "api_key": API_KEY,
    "file_type": "json",
    "observation_start": "2000-01-01",
    "observation_end": "2024-12-31",
    "frequency": "d",  # Monthly frequency
    "aggregation_method": "eop",  # Use period end value
    "units": "pch",  # Percentage change
}

# Send the request
response = requests.get(base_url, params=params)

# Check if the request is successful
if response.status_code == 200:
    print("Data request successful!")
    data = response.json()
    
    # Extract data and convert to DataFrame
    records = []
    for item in data['observations']:
        records.append({
            "date": item['date'],
            "10_year_yield": float(item['value']) if item['value'] != "." else None
        })
    df = pd.DataFrame(records)
    
    # Process data
    df['date'] = pd.to_datetime(df['date'])
    print(df.head())
else:
    print("Data request failed!", response.status_code, response.text)

# Save the result to a CSV file
file_name = os.path.join(output_folder, f"10_year_treasury.csv")
df.to_csv(file_name, index=False)
print(f"Data for 10_year_treasury saved to '{file_name}'.")

Data request successful!
        date  10_year_yield
0 2000-01-03        2.01550
1 2000-01-04       -1.36778
2 2000-01-05        2.00308
3 2000-01-06       -0.75529
4 2000-01-07       -0.76104
Data for 10_year_treasury saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/10_year_treasury.csv'.


## Fetching National Income and Product Accounts (NIPA) Data from BEA API
### These data tables include:
T20100: Gross Domestic Product and Personal Consumption

T10105: National Account Savings and Investment Data

In [23]:
import requests
import pandas as pd
import matplotlib.pyplot as plt
import os

# BEA API basic configuration
BEA_API_KEY = "0539E64B-28C5-43F2-8885-D19E3D784EEE"  # Replace with your API key
base_url = "https://apps.bea.gov/api/data/"

def fetch_bea_data(table_name):
    params = {
        "UserID": BEA_API_KEY,
        "method": "GetData",
        "datasetname": "NIPA",
        "Frequency": "A",  # Annual data
        "TableName": table_name, 
        "Year": "X",  # Retrieve all years
        "ResultFormat": "JSON"
    }
    response = requests.get(base_url, params=params)
    data = response.json()
    if "Results" in data["BEAAPI"] and "Data" in data["BEAAPI"]["Results"]:
        return pd.DataFrame(data["BEAAPI"]["Results"]["Data"])
    else:
        print(f"Failed to fetch data for table {table_name}, error message:", data)
        return pd.DataFrame()  # Return an empty DataFrame if no data is retrieved



# Fetch T20100 data
t20100_data = fetch_bea_data("T20100")
t10105_data = fetch_bea_data("T10105")

file_name = os.path.join(output_folder, f"t20100_raw_data.csv")
t20100_data.to_csv(file_name, index=False)
print(f"Data for t20100_raw_data saved to '{file_name}'.")

file_name = os.path.join(output_folder, f"t10105_raw_data.csv")
t10105_data.to_csv(file_name, index=False)
print(f"Data for t10105_raw_data saved to '{file_name}'.")

Data for t20100_raw_data saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/t20100_raw_data.csv'.
Data for t10105_raw_data saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/t10105_raw_data.csv'.


## Downloading Market Data from Yahoo Finance
### This code fetches daily market data for major indices and commodities:

S&P 500 (^GSPC)

Nasdaq (^IXIC)

Dow Jones (^DJI)

Gold Futures (GC=F)

Crude Oil Futures (CL=F)

In [None]:
import yfinance as yf
import pandas as pd
import os

def fetch_data(ticker, start_date, end_date):
    """
    Fetch asset data from Yahoo Finance.
    
    Args:
        ticker (str): Asset ticker symbol (e.g., "^GSPC").
        start_date (str): Start date (format "YYYY-MM-DD").
        end_date (str): End date (format "YYYY-MM-DD").

    Returns:
        pd.DataFrame: DataFrame containing basic asset data (Date, Open, High, Low, Close, Volume).
    """
    print(f"Downloading data for {ticker}...")
    try:
        # Download data from Yahoo Finance
        data = yf.download(ticker, start=start_date, end=end_date, interval="1d")
        if data.empty:
            print(f"No data found for {ticker}.")
            return None

        # Retain relevant columns and reset the index
        data.reset_index(inplace=True)
        data = data[["Date", "Open", "High", "Low", "Close", "Volume"]]
        return data
    except Exception as e:
        print(f"Error fetching data for {ticker}: {e}")
        return None


if __name__ == "__main__":
    # Define the list of assets
    assets = {
        "^GSPC": "S&P_500",
        "^IXIC": "Nasdaq",
        "^DJI": "Dow_Jones",
        "GC=F": "Gold",
        "CL=F": "Crude_Oil",
    }

    # Define the date range
    start_date = "2000-11-01"
    end_date = "2024-11-30"


    # Iterate through the assets and download data
    for ticker, name in assets.items():
        data = fetch_data(ticker, start_date, end_date)
        # Ensure column names are single-level (remove multi-level index)
        if isinstance(data.columns, pd.MultiIndex):
            data.columns = data.columns.droplevel(1)
        if data is not None:
            # Save the raw data to a CSV file
            
            file_path = os.path.join(output_folder, f"{name}_raw_a.csv")
            data.to_csv(file_path, index=False)
            print(f"Raw data for {name} saved to '{file_path}'.")

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed

Downloading data for ^GSPC...
Raw data for S&P_500 saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/S&P_500_raw_a.csv'.
Downloading data for ^IXIC...



[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Raw data for Nasdaq saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/Nasdaq_raw_a.csv'.
Downloading data for ^DJI...
Raw data for Dow_Jones saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/Dow_Jones_raw_a.csv'.
Downloading data for GC=F...
Raw data for Gold saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/Gold_raw_a.csv'.
Downloading data for CL=F...


[*********************100%***********************]  1 of 1 completed


Raw data for Crude_Oil saved to '/Users/qqmian/Desktop/GU_5000/Stock_Market_Performance/data/raw-data/Crude_Oil_raw_a.csv'.


{{< include closing.qmd >}} 