Skip to content

oussou-dev/cryptoflow-analytics

Repository files navigation

πŸš€ CryptoFlow Analytics

Real-time Cryptocurrency Market Intelligence Pipeline

Built with Bruin Data Source BigQuery License: MIT

An end-to-end data pipeline that ingests multi-source crypto data, transforms it through a medallion architecture, and produces actionable market intelligence β€” including a Market Regime Detector, Composite Momentum Signals, and Sentiment Analysis β€” all orchestrated by Bruin.


πŸ“‹ Table of Contents


🎯 Problem Statement

Crypto markets generate massive amounts of data across hundreds of exchanges, thousands of tokens, and multiple sentiment indicators. Individual investors and analysts face three core challenges:

  1. Data fragmentation β€” Prices, volumes, sentiment, and trending data live in separate APIs with different formats
  2. Signal noise β€” Raw price changes alone are misleading without context (volume confirmation, market breadth, sentiment)
  3. Regime blindness β€” Most dashboards show what happened, but fail to classify where we are in the market cycle

CryptoFlow Analytics solves this by building a unified intelligence layer that ingests, cleans, enriches, and analyzes crypto data to produce actionable signals β€” not just charts.


πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        EXTERNAL SOURCES                             β”‚
β”‚  CoinGecko API (Free)  β”‚  Fear & Greed API  β”‚  CSV Seeds (static)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚                     β”‚                     β”‚
            β–Ό                     β–Ό                     β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ₯‰ BRONZE β€” Ingestion (raw.)                                       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-─┐ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ fetch_coin_marketsβ”‚ β”‚ fetch_fear_greed β”‚ β”‚ fetch_global_data   β”‚ β”‚
β”‚  β”‚ (Python asset)    β”‚ β”‚ (Python asset)   β”‚ β”‚ (Python asset)      β”‚ β”‚
β”‚  └──────────────────-β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ fetch_trending   β”‚ β”‚ fetch_coin_categories                    β”‚  β”‚
β”‚  β”‚ (Python asset)   β”‚ β”‚ (Python asset β€” reads seeds/categories)  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ₯ˆ SILVER β€” Staging (stg.)                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ stg_enriched_    β”‚ β”‚ stg_fear_greed_    β”‚ β”‚ stg_global_       β”‚  β”‚
β”‚  β”‚ coins            β”‚ β”‚ daily              β”‚ β”‚ metrics           β”‚  β”‚
β”‚  β”‚ + market cap tierβ”‚ β”‚ + moving averages  β”‚ β”‚ + dominance calc  β”‚  β”‚
β”‚  β”‚ + liquidity ratioβ”‚ β”‚ + trend direction  β”‚ β”‚ + volume ratio    β”‚  β”‚
β”‚  β”‚ + supply scarcityβ”‚ β”‚ + sentiment zones  β”‚ β”‚                   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  πŸ₯‡ GOLD β€” Analytics (analytics.)                                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ market_dominance β”‚ β”‚ momentum_signals β”‚ β”‚ market_regime       β”‚  β”‚
β”‚  β”‚ BTC/ETH/Alt shareβ”‚ β”‚ Composite score  β”‚ β”‚ Bull/Bear/Neutral   β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ BUY/SELL signals β”‚ β”‚ Breadth + Sentiment β”‚  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”‚ volatility_      β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ analysis         β”‚ β”‚ top_performers   β”‚ β”‚ fear_greed_impact   β”‚  β”‚
β”‚  β”‚ Volatility tiers β”‚ β”‚ Gainers & Losers β”‚ β”‚ Sentiment zones     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                                β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  πŸ€– Bruin AI Analyst  β”‚
                    β”‚  Natural language Q&A β”‚
                    β”‚  on all analytics     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Data Sources

Source Type Endpoint Frequency Cost
CoinGecko REST API /coins/markets Daily Free (30 calls/min)
CoinGecko REST API /global Daily Free
CoinGecko REST API /search/trending Daily Free
Alternative.me REST API /fng/ (Fear & Greed) Daily Free, no key
CSV Seed Static file coin_categories.csv Manual β€”

πŸ”„ Pipeline Layers

πŸ₯‰ Bronze β€” Raw Ingestion

Raw data from external APIs, loaded as-is into BigQuery with an ingested_at timestamp. Python assets handle API calls, pagination, error handling, and return pandas.DataFrame objects that Bruin materializes as tables.

πŸ₯ˆ Silver β€” Staging

Cleaned, typed, deduplicated data with computed enrichments:

  • Market cap tiers: mega_cap (>$100B), large_cap, mid_cap, small_cap, micro_cap
  • Liquidity ratio: 24h volume / market cap
  • Supply scarcity: circulating / max supply percentage
  • Intraday spread: (high - low) / low as volatility proxy
  • Sentiment moving averages: 7-day and 14-day smoothed Fear & Greed
  • Trend direction: MA crossover detection (improving / deteriorating / flat)

πŸ₯‡ Gold β€” Analytics

Business-ready analytical tables answering specific questions:

Table Question It Answers
market_dominance How is market share distributed across coins and tiers?
volatility_analysis Which coins are most/least volatile and why?
momentum_signals Which coins show strong buying or selling momentum?
fear_greed_impact How does sentiment distribute and what's the current trend?
top_performers Who are the biggest winners and losers across timeframes?
market_regime Are we in a Bull, Bear, or Neutral market right now?

🧠 Key Analytics

Market Regime Detector

The crown jewel of this pipeline. Combines three dimensions into a single regime classification:

  1. Market Breadth (60% weight) β€” Percentage of top 50 coins in positive territory across 24h, 7d, and 30d timeframes
  2. Sentiment (20% weight) β€” Fear & Greed Index mapped to a directional score
  3. Global Metrics (20% weight) β€” Total market cap change and volume ratios

Output: STRONG_BULL | BULL | NEUTRAL | BEAR | STRONG_BEAR with a numeric score and human-readable narrative.

Composite Momentum Score

A proprietary indicator combining 5 factors for each coin:

  • Short-term price action (24h change)
  • Medium-term trend (7d change)
  • Long-term trend (30d change)
  • Volume confirmation (volume/market cap ratio)
  • ATH proximity (distance from all-time high)

Each factor contributes to a score from roughly -50 to +100, which is then combined with market sentiment to produce actionable signals: STRONG_BUY, BUY, NEUTRAL, SELL, STRONG_SELL.

The contrarian twist: a high momentum score combined with extreme fear produces a STRONG_BUY β€” the classic "buy when there's blood in the streets" signal.


πŸ›  Bruin Features Used

Feature How It's Used
Python Assets 5 ingestion scripts fetching from CoinGecko, Alternative.me APIs, and CSV seed
SQL Assets 9 BigQuery SQL transformations across staging (3) and analytics (6) layers
Seed Assets CSV-based reference data for coin categories
Materialization table strategy for all assets; merge for incremental ingestion
Dependencies Explicit depends declarations creating a proper DAG
Quality Checks Built-in (not_null, unique, positive, accepted_values) on every asset
Custom Checks Business logic validations (e.g., "Bitcoin must exist in data", "dominances sum to ~100%")
Glossary Structured business term definitions for crypto concepts
Pipeline Schedule Daily schedule via pipeline.yml
Bruin Cloud Deployment, monitoring, and AI analyst
AI Data Analyst Conversational analysis on all analytics tables
Lineage Full column-level lineage via bruin lineage

⚑ Quick Start

Prerequisites

  • Bruin CLI installed
  • Python 3.10+ with pandas and requests
  • A Google Cloud project with BigQuery enabled
  • A GCP Service Account with BigQuery Data Editor and BigQuery Job User roles
  • (Optional) VS Code Bruin Extension

Installation

# 1. Clone the repository
git clone https://github.com/oussou-dev/cryptoflow-analytics.git
cd cryptoflow-analytics

# 2. Set up your GCP credentials
cp gcp-key.json.example gcp-key.json  # add your service account key

# 3. Create BigQuery datasets
bq mk --dataset --location=US your-project:raw
bq mk --dataset --location=US your-project:stg
bq mk --dataset --location=US your-project:analytics

# 4. Execute the full pipeline
bruin run .

All data is stored in BigQuery under raw, stg, and analytics datasets.

Verify the results

# Check pipeline lineage
bruin lineage .

# Query via BigQuery console or bq CLI
bq query --use_legacy_sql=false \
  'SELECT regime, regime_score, regime_narrative FROM `your-project.analytics.market_regime`'

bq query --use_legacy_sql=false \
  'SELECT name, signal, momentum_score FROM `your-project.analytics.momentum_signals` ORDER BY momentum_score DESC LIMIT 10'

πŸ€– AI Analyst Insights

Deployed on Bruin Cloud, the AI Data Analyst answers natural language questions about the entire pipeline output β€” no SQL required.

Market Regime Analysis

"What is the current market regime? Show me the regime classification, score, and narrative."

Market Regime Analysis

Top Momentum Signals

"Show me the top 10 coins by momentum score with their BUY/SELL signal and confidence level."

Top Momentum Signals

Market Dominance by Tier

"Show me the market dominance breakdown by price tier. What % does mega-cap control?"

Market Dominance

Top Performers (7-Day)

"What are the top 5 biggest gainers and losers over the past 7 days?"

Top Performers

Sentiment Analysis (Fear & Greed)

"How has market sentiment evolved recently? Show the distribution across sentiment zones."

Sentiment vs Price


βœ… Data Quality

Every asset in the pipeline includes quality checks that run automatically after each execution. Failed checks block downstream assets, ensuring data integrity throughout.

Built-in Checks

  • not_null β€” No NULL values in critical columns
  • unique β€” No duplicate records where uniqueness is expected
  • positive β€” Prices, volumes, and market caps are positive
  • accepted_values β€” Enum-like columns contain only valid values

Custom Business Checks

  • At least 50 coins ingested per run
  • Bitcoin always present in the dataset
  • No negative market capitalizations
  • Fear & Greed values within 0-100 range
  • Market dominance percentages sum to approximately 100%
  • All top 50 coins have momentum signals
  • Exactly one market regime row per run

πŸ“ Project Structure

cryptoflow-analytics/
β”œβ”€β”€ .bruin.yml                              # Project config + BigQuery connection
β”œβ”€β”€ pipeline.yml                            # Daily schedule + start_date
β”œβ”€β”€ glossary.yml                            # Bruin glossary with crypto terms
β”œβ”€β”€ README.md
β”œβ”€β”€ LICENSE
β”œβ”€β”€ .gitignore
β”‚
β”œβ”€β”€ assets/
β”‚   β”œβ”€β”€ ingestion/                          # πŸ₯‰ BRONZE β€” Raw data
β”‚   β”‚   β”œβ”€β”€ fetch_coin_markets.py           # Top 100 coins (CoinGecko)
β”‚   β”‚   β”œβ”€β”€ fetch_fear_greed.py             # 90d sentiment index
β”‚   β”‚   β”œβ”€β”€ fetch_global_data.py            # Global market metrics
β”‚   β”‚   β”œβ”€β”€ fetch_trending.py               # Trending coins
β”‚   β”‚   └── fetch_coin_categories.py        # Seed loader: reads seeds/coin_categories.csv
β”‚   β”‚
β”‚   β”œβ”€β”€ staging/                            # πŸ₯ˆ SILVER β€” Cleaned & enriched
β”‚   β”‚   β”œβ”€β”€ stg_enriched_coins.sql          # Tiers, ratios, spreads
β”‚   β”‚   β”œβ”€β”€ stg_fear_greed_daily.sql        # Moving averages, trends
β”‚   β”‚   └── stg_global_metrics.sql          # Dominance, volume ratios
β”‚   β”‚
β”‚   └── analytics/                          # πŸ₯‡ GOLD β€” Business intelligence
β”‚       β”œβ”€β”€ market_dominance.sql            # Market share analysis
β”‚       β”œβ”€β”€ volatility_analysis.sql         # Volatility scoring
β”‚       β”œβ”€β”€ momentum_signals.sql            # Buy/Sell signals
β”‚       β”œβ”€β”€ fear_greed_impact.sql           # Sentiment zone analysis
β”‚       β”œβ”€β”€ top_performers.sql              # Winners & losers
β”‚       └── market_regime.sql              # Bull/Bear classifier
β”‚
β”œβ”€β”€ seeds/
β”‚   └── coin_categories.csv                # DeFi, L1, L2, Meme categories (35 coins)
β”‚
└── docs/
    └── ai_analyst_screenshots/            # Bruin AI analyst evidence

🧩 Design Decisions

Why Bruin over dbt + Airflow + Great Expectations?

Concern Traditional Stack Bruin
Ingestion Airbyte / custom scripts + separate orchestration Python assets with built-in materialization
Transformation dbt (separate project, profiles.yml, dbt_project.yml) SQL assets in the same project
Orchestration Airflow DAGs (Python boilerplate, scheduler infra) pipeline.yml with schedule + depends
Quality Great Expectations (separate YAML suites, checkpoint configs) Inline columns.checks + custom_checks
Setup time Hours to days (Docker, Airflow webserver, dbt profiles...) 3 commands, < 5 minutes
Files to manage 10+ config files across tools 2 config files (.bruin.yml + pipeline.yml)

The biggest win: quality checks are embedded in the asset definition, not in a separate tool. This means every change to a transformation automatically includes its quality contract. There's no "forgetting to update the test file."

Why BigQuery?

  • Serverless, fully managed β€” zero infrastructure to maintain
  • Scales from kilobytes to petabytes with the same SQL
  • Native integration with Bruin Cloud for seamless deployment
  • Production-grade analytics with familiar SQL syntax
  • Free tier covers well beyond hackathon data volumes

Why CoinGecko Free API?

  • 30 calls/minute, 10,000/month β€” more than enough for daily batch
  • No credit card required
  • Richest free-tier data: prices, volumes, market cap, ATH, supply metrics
  • Well-documented, stable endpoints

πŸ’‘ What I Learned

  1. Bruin's single-file asset model is powerful β€” Having the SQL query, its dependencies, materialization strategy, column metadata, and quality checks all in one file eliminates an entire category of "where is the config for this?" problems.

  2. Quality checks as first-class citizens change behavior β€” When checks are blocking by default, you think about data correctness while writing the transformation, not as an afterthought.

  3. The glossary feature is underrated β€” Defining business terms in glossary.yml forced me to think precisely about what "market dominance" or "momentum score" means before writing SQL. That clarity propagated into better queries.

  4. Python + SQL in the same pipeline is the right abstraction β€” API ingestion naturally belongs in Python. Analytical transformations naturally belong in SQL. Bruin lets both coexist without fighting about which language "wins."


πŸ“œ License

MIT β€” see LICENSE for details.


Built for the Data Engineering Zoomcamp 2026 Project Competition, sponsored by Bruin.

About

Real-time Cryptocurrency Market Intelligence Pipeline - DE Zoomcamp 2026 x Build a data engineering project with Bruin

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages