<a href="https://colab.research.google.com/github/monicalamagt/crypto-momentum-model/blob/main/download_data_GECKO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dowloading Crypto Data from GECKO

*Author: Monica Lama*

The following notebook is used to download Cryto data using the GECKO Pro API, and is the prelimenary step to running `phase1_factor_model.ipynb`.

In [None]:
!pip install pycoingecko
from google.colab import drive, userdata
import pandas as pd
import requests
import os
import time
from datetime import datetime
from pycoingecko import CoinGeckoAPI

Collecting pycoingecko
  Downloading pycoingecko-3.2.0-py3-none-any.whl.metadata (16 kB)
Downloading pycoingecko-3.2.0-py3-none-any.whl (10 kB)
Installing collected packages: pycoingecko
Successfully installed pycoingecko-3.2.0


In [None]:
drive.mount('/content/drive')
drive_data_path = '/content/drive/MyDrive/crypto_momentum_data'
os.makedirs(drive_data_path, exist_ok=True)
os.makedirs('data', exist_ok=True)

Mounted at /content/drive


### API Client and Helper Functions

In [None]:
API_KEY = userdata.get('COIN_GECKO_API')
cg = CoinGeckoAPI(api_key=API_KEY)

## Data Source Summary

We use CoinGecko Pro API to collect daily OHLCV data for the top 100 cryptocurrencies by market cap. The data spans January 2024 to December 2024, and is downloaded once and stored locally as .csv. API keys are loaded via Colab secrets.


In [None]:
# Gets market data for the top "quantity" of coins.
def get_top_coins(quantity):
  try:
      market_data = cg.get_coins_markets(
          vs_currency='usd',
          order='market_cap_desc',
          per_page=quantity,
          page=1,
          sparkline=False
      )

      coin_ids = [coin['id'] for coin in market_data]
      return market_data, coin_ids

  except Exception as e:
      print(f"Getting market data failed. Error: {e}")
      return None, None


In [None]:
def get_ohlcv(coin_ids, start_date, end_date):
    for coin_id in coin_ids:
        try:
            filepath = os.path.join(drive_data_path, f"{coin_id}.parquet")

            if os.path.exists(filepath):
                existing_df = pd.read_parquet(filepath)
                last_date = existing_df.index.max().to_pydatetime()

                if last_date >= datetime.fromtimestamp(end_date):
                    print(f"{coin_id} is already up to date. Skipping.")
                    continue

                from_timestamp = int((last_date + pd.Timedelta(days=1)).timestamp())
                print(f"{coin_id} exists. Appending new data from {last_date.date() + pd.Timedelta(days=1)}")

            else:
                existing_df = None
                from_timestamp = start_date

            print(f"Fetching {coin_id}...")
            data = cg.get_coin_market_chart_range_by_id(
                id=coin_id,
                vs_currency='usd',
                from_timestamp=from_timestamp,
                to_timestamp=end_date
            )

            prices = data.get('prices', [])
            volumes = data.get('total_volumes', [])

            if not prices or not volumes:
                print(f"No new data for {coin_id}. Skipping.")
                continue

            df = pd.DataFrame({
                'timestamp': [p[0] for p in prices],
                'price': [p[1] for p in prices],
                'volume': [v[1] for v in volumes]
            })

            df['date'] = pd.to_datetime(df['timestamp'], unit='ms')
            df.set_index('date', inplace=True)
            df.drop(columns='timestamp', inplace=True)

            if existing_df is not None:
                df = pd.concat([existing_df, df])
                df = df[~df.index.duplicated(keep='last')]

            df.to_parquet(filepath)
            print(f"Saved {coin_id}.parquet")

        except Exception as e:
            print(f"Failed for {coin_id}: {e}")
            continue


In [None]:
#Specify date range
start_date = '2024-01-01'
end_date = '2024-12-31'

start_ts = int(datetime.strptime(start_date, '%Y-%m-%d').timestamp())
end_ts = int(datetime.strptime(end_date, '%Y-%m-%d').timestamp())

### Call to get specified coins for time range.

In [None]:
COIN_LIST = ['bitcoin', 'ethereum', 'solana', 'cardano', 'avalanche-2', 'dogecoin', 'polkadot', 'chainlink']
TICKER_MAP = {'bitcoin': 'BTC', 'ethereum': 'ETH', 'solana': 'SOL', 'cardano': 'ADA',
              'avalanche-2': 'AVAX', 'dogecoin': 'DOGE', 'polkadot': 'DOT', 'chainlink': 'LINK'}

In [None]:
get_ohlcv(COIN_LIST, start_ts, end_ts)

Fetching bitcoin...
Saved bitcoin.parquet
Fetching ethereum...
Failed for ethereum: 500 Server Error: Internal Server Error for url: https://pro-api.coingecko.com/api/v3/coins/ethereum/market_chart/range?vs_currency=usd&from=1704067200&to=1735603200&x_cg_pro_api_key=CG-hdG8XkygQrnX3FSuMmtUhY78
Fetching solana...
Failed for solana: 500 Server Error: Internal Server Error for url: https://pro-api.coingecko.com/api/v3/coins/solana/market_chart/range?vs_currency=usd&from=1704067200&to=1735603200&x_cg_pro_api_key=CG-hdG8XkygQrnX3FSuMmtUhY78
Fetching cardano...
Saved cardano.parquet
Fetching avalanche-2...
Failed for avalanche-2: 500 Server Error: Internal Server Error for url: https://pro-api.coingecko.com/api/v3/coins/avalanche-2/market_chart/range?vs_currency=usd&from=1704067200&to=1735603200&x_cg_pro_api_key=CG-hdG8XkygQrnX3FSuMmtUhY78
Fetching dogecoin...
Saved dogecoin.parquet
Fetching polkadot...
Saved polkadot.parquet
Fetching chainlink...
Saved chainlink.parquet


### Call to get top 100 coins by market cap, and the coin history over time range.

In [None]:
top_100, coin_ids = get_top_coins(100)
get_ohlcv(coin_ids, start_ts, end_ts)