# üì• 01 ‚Äî Extract Cryptocurrency Market Data

This notebook represents the **extraction stage** of the data pipeline.

We retrieve live cryptocurrency market data from a public API and store it
as a **raw JSON file**. This raw data is preserved exactly as received
and serves as the source of truth for all downstream steps.

### Input
- External cryptocurrency API

### Output
- `../data/raw/crypto_raw_<timestamp>.json`

### Key Principles
- No cleaning or transformation at this stage
- Timestamped files for reproducibility
- JSON format to preserve original structure

In [1]:
import json
import requests
from datetime import datetime
from pathlib import Path

## üìÇ Prepare Raw Data Directory

We ensure the raw data directory exists so the pipeline
can run on any machine without manual setup.

In [2]:
RAW_DIR = Path("../data/raw")
RAW_DIR.mkdir(parents=True, exist_ok=True)

## üåê Request Cryptocurrency Market Data

We use the CoinGecko API to retrieve Bitcoin price data
for the past 24 hours, denominated in USD.

This endpoint returns time-series data suitable for analysis.

In [3]:
url = "https://api.coingecko.com/api/v3/coins/bitcoin/market_chart"
params = {"vs_currency": "usd", "days": 1}

response = requests.get(url)
raw_data = response.json()

raw_data.keys()

dict_keys(['error'])

## üíæ Save Raw JSON File

The response is saved without modification using a timestamped filename.

In [4]:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
raw_path = RAW_DIR / f"crypto_raw_{timestamp}.json"

with open(raw_path, "w") as f:
    json.dump(raw_data, f, indent=4)

raw_path

WindowsPath('../data/raw/crypto_raw_20251217_140226.json')

## ‚úÖ Extraction Complete

Raw cryptocurrency data has been successfully saved.

Proceed to:
‚û° **02_transform_data.ipynb**