# Data Collection

## 0. Setup

In [1]:
# Automatic reloading
%load_ext autoreload
%autoreload 2

In [2]:
####################
# Required Modules #
####################

# Generic/Built-in
import sys
import os

# Libs
import pandas as pd

In [3]:
# Get the project directory 
current_dir = os.path.abspath('') # Current '\notebooks' directory
project_dir = os.path.abspath(os.path.join(current_dir, '..')) # Move up one level to project root directory

# Add the project directory to sys.path
sys.path.append(project_dir)

# Move up to project directory
os.chdir(project_dir)
os.getcwd()

'c:\\Users\\Ryan Lee\\Desktop\\50.038 Computational Data Science\\Digital-Asset-Prediction'

In [4]:
from dotenv import load_dotenv

# Load in environment variables from `.env` file.
load_dotenv()

True

## 1. FRED Data
The **Federal Reserve Economic Data (FRED)** is an online database maintained by the research department at the Federal Reserve Bank of St. Louis. It provides a wide range of economic time series data.

- The [FRED API](https://fred.stlouisfed.org/docs/api/fred/) will be used to retrieve the necessary datasets. An API key can be requested for free. Ensure that your API key is set by specifying it in the `FRED_API_KEY` environment variable.
- Alternatively, the datasets can be downloaded directly from the website itself without making an account.

In [5]:
from src.data_collection.data_scraper import fetch_data_from_fred

### 1.1. 10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity
The **10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity** spread measures the difference between long-term (10-year) and short-term (2-year) U.S. Treasury bond yields. A positive spread indicates a normal yield curve, suggesting confidence in economic growth, while a negative spread (inverted curve) may signal market concerns about an economic slowdown or impending recession.

As a macro-economic indicator, this spread can be used in crypto price prediction by reflecting investor sentiment and economic expectations. A widening spread may indicate optimism, which could drive higher demand for risk assets like cryptocurrencies, while an inverted spread could signal economic uncertainty, potentially leading to market volatility and lower crypto prices.

**Citation**:

Federal Reserve Bank of St. Louis, 10-Year Treasury Constant Maturity Minus 2-Year Treasury Constant Maturity [T10Y2Y], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/T10Y2Y, March 25, 2025. 


In [None]:
output_file = "data/raw/treasury_constant_maturity_spread.csv"

df_t10y2y = fetch_data_from_fred(
    series_id="T10Y2Y",
    start_date="2022-03-24",
    end_date="2025-03-24",
    output_filename=output_file
)

df_t10y2y.head()

Data saved to data/raw/treasury_constant_maturity_spread.csv


Unnamed: 0,realtime_start,realtime_end,date,value
0,2025-03-25,2025-03-25,2022-03-24,0.21
1,2025-03-25,2025-03-25,2022-03-25,0.18
2,2025-03-25,2025-03-25,2022-03-28,0.11
3,2025-03-25,2025-03-25,2022-03-29,0.06
4,2025-03-25,2025-03-25,2022-03-30,0.04


## 2. CoinGecko Data
CoinGecko is a leading independent cryptocurrency data aggregator that provides comprehensive information on over 17,000 crypto assets and 1,200+ exchanges.
- For the data we are scraping, we do **not** need any API key.

In [8]:
from src.data_collection.data_scraper import fetch_top_crypto_data_from_coingecko

df_top_crypto_data = fetch_top_crypto_data_from_coingecko(
    limit=100,
    vs_currency="usd",
    days=365,
    output_filename="data/raw/top_crypto_daily_data.csv"
)

Fetching top 100 cryptocurrency assets by market cap...
Found 100 assets
[1/100] Fetching daily data for bitcoin...
✅ Successfully fetched 366 days of data for bitcoin
Waiting 6 seconds to avoid API rate limits...
[2/100] Fetching daily data for ethereum...
✅ Successfully fetched 366 days of data for ethereum
Waiting 3 seconds to avoid API rate limits...
[3/100] Fetching daily data for tether...
✅ Successfully fetched 366 days of data for tether
Waiting 3 seconds to avoid API rate limits...
[4/100] Fetching daily data for ripple...
✅ Successfully fetched 366 days of data for ripple
Waiting 3 seconds to avoid API rate limits...
[5/100] Fetching daily data for binancecoin...
✅ Successfully fetched 366 days of data for binancecoin
Waiting 3 seconds to avoid API rate limits...
[6/100] Fetching daily data for solana...
✅ Successfully fetched 366 days of data for solana
Waiting 3 seconds to avoid API rate limits...
[7/100] Fetching daily data for usd-coin...
✅ Successfully fetched 366 days o

KeyboardInterrupt: 