# 📊 01 - Data Collection

This notebook is the first step in our crypto regime classification and trading strategy project. Here, we focus on gathering the data needed to perform PCA, clustering, and downstream trading analysis.

---

### 📥 1. Load and Inspect Raw Price Data

We begin by loading daily closing price data for various cryptocurrencies. These prices will serve as the foundation for return and feature calculations.

- Source: `crypto_prices.csv`
- Content: Historical daily closing prices
- Format: Coins in columns, dates in rows

```python
# Load price data
price_df = pd.read_csv("crypto_prices.csv", parse_dates=['timestamp'])
price_df.set_index('timestamp', inplace=True)
price_df.head()

In [35]:
import requests
import pandas as pd
import os
from dotenv import load_dotenv

# Load API key from .env
load_dotenv()
api_key = os.getenv("COINGECKO_DEMO_KEY")

def fetch_daily_prices_demo(coin_id, api_key, vs_currency="usd", days="365"):
    url = f"https://api.coingecko.com/api/v3/coins/{coin_id}/market_chart"
    
    headers = {
        "x-cg-demo-api-key": api_key  # DEMO key header
    }
    
    params = {
        "vs_currency": vs_currency,
        "days": days,
        "interval": "daily"
    }

    response = requests.get(url, headers=headers, params=params)
    if response.status_code != 200:
        raise Exception(f"Error {response.status_code}: {response.text}")
    
    data = response.json()
    prices = data.get("prices", [])
    
    df = pd.DataFrame(prices, columns=["timestamp", coin_id])
    df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms")
    df.set_index("timestamp", inplace=True)
    return df

In [38]:
import time

coins = [
    'bitcoin',        # BTC
    'ethereum',       # ETH
    'tether',         # USDT
    'ripple',         # XRP
    'binancecoin',    # BNB
    'usd-coin',       # USDC
    'solana',         # SOL
    'cardano',        # ADA
    'dogecoin',       # DOGE
    'avalanche-2',    # AVAX
    'polkadot',       # DOT
    'tron',           # TRX
    'matic-network',  # MATIC (Polygon)
    'litecoin',       # LTC
    'shiba-inu'       # SHIB
]

all_data = []

for coin in coins:
    success = False
    attempts = 0
    while not success and attempts < 5:
        try:
            df = fetch_daily_prices_demo(coin, api_key)
            all_data.append(df)
            print(f"✅ Fetched data for {coin}")
            time.sleep(6)  # Wait to stay under rate limits
            success = True
        except Exception as e:
            attempts += 1
            print(f"⚠️ Error fetching {coin} (attempt {attempts}): {e}")
            time.sleep(10 * attempts)  # Wait longer with each retry

# Combine and save if successful
if all_data:
    price_df = pd.concat(all_data, axis=1)
    price_df.to_csv("crypto_prices.csv")
    print("📁 Saved: crypto_prices.csv")
else:
    print("❌ No data collected.")

✅ Fetched data for bitcoin
✅ Fetched data for ethereum
✅ Fetched data for tether
✅ Fetched data for ripple
✅ Fetched data for binancecoin
✅ Fetched data for usd-coin
⚠️ Error fetching solana (attempt 1): Error 429: {"status":{"error_code":429,"error_message":"You've exceeded the Rate Limit. Please visit https://www.coingecko.com/en/api/pricing to subscribe to our API plans for higher rate limits."}}
⚠️ Error fetching solana (attempt 2): Error 429: {"status":{"error_code":429,"error_message":"You've exceeded the Rate Limit. Please visit https://www.coingecko.com/en/api/pricing to subscribe to our API plans for higher rate limits."}}
⚠️ Error fetching solana (attempt 3): Error 429: {"status":{"error_code":429,"error_message":"You've exceeded the Rate Limit. Please visit https://www.coingecko.com/en/api/pricing to subscribe to our API plans for higher rate limits."}}
✅ Fetched data for solana
✅ Fetched data for cardano
✅ Fetched data for dogecoin
✅ Fetched data for avalanche-2
✅ Fetched d

In [39]:
# Load and display the collected data
price_df = pd.read_csv("crypto_prices.csv", index_col="timestamp", parse_dates=True)
print(price_df.head())

                 bitcoin     ethereum    tether    ripple  binancecoin  \
timestamp                                                                
2024-06-05  70600.011167  3814.932030  1.000157  0.525878   686.510668   
2024-06-06  71184.599431  3871.082091  1.000756  0.526266   699.924112   
2024-06-07  70759.588193  3812.701857  0.999556  0.521673   710.043483   
2024-06-08  69325.362388  3679.376652  0.999594  0.498861   683.338328   
2024-06-09  69315.104123  3683.025380  0.999938  0.493173   682.782750   

            usd-coin      solana   cardano  dogecoin  avalanche-2  polkadot  \
timestamp                                                                     
2024-06-05  1.000233  171.728129  0.461416  0.161543    36.091757  7.191048   
2024-06-06  1.000322  173.769571  0.461678  0.163508    36.535981  7.254035   
2024-06-07  0.999933  170.372720  0.458149  0.160168    35.926170  7.141668   
2024-06-08  0.999985  162.453205  0.449497  0.148307    33.509910  6.658967   
2024-06