# Cryptocurrency Data Pipeline — Final Presentation
**Author:** Sonia Mannepuli

This notebook summarizes the 6-week ETL project: extraction, cleaning, storage, quality checks, visualization, and final presentation.


## Week-by-week Summary

- **Week 1 — Extraction:** CoinGecko API extraction, retry logic, raw JSON/CSV snapshots.
- **Week 2 — Cleaning:** Normalize cols, handle missing values, deduplicate, feature engineering (price change, moving averages).
- **Week 3 — Storage & Automation:** Master CSV + SQLite DB, automation with schedule/APScheduler, logging.
- **Week 4 — Data Quality & Monitoring:** Data quality checker and daily/weekly reports (CSV + Markdown).
- **Week 5 — EDA & Dashboard:** Jupyter EDA, time-series visuals, Streamlit dashboard deployment.
- **Week 6 — Presentation:** Architecture diagram, presentation notebook, final repo polish.


## Pipeline Architecture (Week 1 — Extraction view)
![Pipeline Diagram](docs/pipeline_architecture.png)

In [6]:
import os
print('Files in working directory:')
for d in ['data/clean','data/raw','docs','logs']:
    p = os.path.join('.', d)
    if os.path.exists(p):
        print('-', p, 'contains', len(os.listdir(p)))
    else:
        print('-', p, 'MISSING')

Files in working directory:
- .\data/clean contains 7
- .\data/raw contains 3
- .\docs contains 1
- .\logs contains 3


In [None]:

# Example: Top 5 bar chart if a cleaned CSV exists
import pandas as pd, matplotlib.pyplot as plt
cands = ['data/clean/crypto_clean.csv','crypto_clean.csv','data/clean/crypto_master.csv']
csv = next((p for p in cands if os.path.exists(p)), None)
if csv:
    df = pd.read_csv(csv)
    if 'market_cap' in df.columns:
        df['market_cap'] = pd.to_numeric(df['market_cap'], errors='coerce')
        top5 = df.nlargest(5, 'market_cap').dropna(subset=['market_cap'])
    else:
        top5 = df.head(5)
    names = top5['name'] if 'name' in top5.columns else top5.iloc[:,0]
    prices = top5['current_price'] if 'current_price' in top5.columns else top5.iloc[:,1]
    fig, ax = plt.subplots(figsize=(6,3))
    ax.bar(names.astype(str), pd.to_numeric(prices, errors='coerce'), color='gold')
    ax.set_title('Top 5 Cryptos by Price')
    ax.set_ylabel('Price (USD)')
    plt.xticks(rotation=25)
    plt.tight_layout()
    plt.show()
else:
    print('No cleaned CSV found to plot.')


## Key Findings

- Bitcoin and Ethereum lead total market cap.
- 5-day MA smooths volatility and highlights trend direction.
- Altcoins show higher short-term volatility.
