Real-time Cryptocurrency Market Intelligence Pipeline
An end-to-end data pipeline that ingests multi-source crypto data, transforms it through a medallion architecture, and produces actionable market intelligence β including a Market Regime Detector, Composite Momentum Signals, and Sentiment Analysis β all orchestrated by Bruin.
- Problem Statement
- Architecture
- Data Sources
- Pipeline Layers
- Key Analytics
- Bruin Features Used
- Quick Start
- AI Analyst Insights
- Data Quality
- Project Structure
- Design Decisions
- What I Learned
Crypto markets generate massive amounts of data across hundreds of exchanges, thousands of tokens, and multiple sentiment indicators. Individual investors and analysts face three core challenges:
- Data fragmentation β Prices, volumes, sentiment, and trending data live in separate APIs with different formats
- Signal noise β Raw price changes alone are misleading without context (volume confirmation, market breadth, sentiment)
- Regime blindness β Most dashboards show what happened, but fail to classify where we are in the market cycle
CryptoFlow Analytics solves this by building a unified intelligence layer that ingests, cleans, enriches, and analyzes crypto data to produce actionable signals β not just charts.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXTERNAL SOURCES β
β CoinGecko API (Free) β Fear & Greed API β CSV Seeds (static) β
βββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π₯ BRONZE β Ingestion (raw.) β
β ββββββββββββββββββ-ββ ββββββββββββββββββββ βββββββββββββββββββββββ β
β β fetch_coin_marketsβ β fetch_fear_greed β β fetch_global_data β β
β β (Python asset) β β (Python asset) β β (Python asset) β β
β βββββββββββββββββββ-β ββββββββββββββββββββ βββββββββββββββββββββββ β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββ β
β β fetch_trending β β fetch_coin_categories β β
β β (Python asset) β β (Python asset β reads seeds/categories) β β
β ββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π₯ SILVER β Staging (stg.) β
β ββββββββββββββββββββ ββββββββββββββββββββββ βββββββββββββββββββββ β
β β stg_enriched_ β β stg_fear_greed_ β β stg_global_ β β
β β coins β β daily β β metrics β β
β β + market cap tierβ β + moving averages β β + dominance calc β β
β β + liquidity ratioβ β + trend direction β β + volume ratio β β
β β + supply scarcityβ β + sentiment zones β β β β
β ββββββββββββββββββββ ββββββββββββββββββββββ βββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π₯ GOLD β Analytics (analytics.) β
β ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ β
β β market_dominance β β momentum_signals β β market_regime β β
β β BTC/ETH/Alt shareβ β Composite score β β Bull/Bear/Neutral β β
β ββββββββββββββββββββ β BUY/SELL signals β β Breadth + Sentiment β β
β ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ β
β β volatility_ β ββββββββββββββββββββ βββββββββββββββββββββββ β
β β analysis β β top_performers β β fear_greed_impact β β
β β Volatility tiers β β Gainers & Losers β β Sentiment zones β β
β ββββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β π€ Bruin AI Analyst β
β Natural language Q&A β
β on all analytics β
βββββββββββββββββββββββββ
| Source | Type | Endpoint | Frequency | Cost |
|---|---|---|---|---|
| CoinGecko | REST API | /coins/markets |
Daily | Free (30 calls/min) |
| CoinGecko | REST API | /global |
Daily | Free |
| CoinGecko | REST API | /search/trending |
Daily | Free |
| Alternative.me | REST API | /fng/ (Fear & Greed) |
Daily | Free, no key |
| CSV Seed | Static file | coin_categories.csv |
Manual | β |
Raw data from external APIs, loaded as-is into BigQuery with an ingested_at timestamp. Python assets handle API calls, pagination, error handling, and return pandas.DataFrame objects that Bruin materializes as tables.
Cleaned, typed, deduplicated data with computed enrichments:
- Market cap tiers: mega_cap (>$100B), large_cap, mid_cap, small_cap, micro_cap
- Liquidity ratio: 24h volume / market cap
- Supply scarcity: circulating / max supply percentage
- Intraday spread: (high - low) / low as volatility proxy
- Sentiment moving averages: 7-day and 14-day smoothed Fear & Greed
- Trend direction: MA crossover detection (improving / deteriorating / flat)
Business-ready analytical tables answering specific questions:
| Table | Question It Answers |
|---|---|
market_dominance |
How is market share distributed across coins and tiers? |
volatility_analysis |
Which coins are most/least volatile and why? |
momentum_signals |
Which coins show strong buying or selling momentum? |
fear_greed_impact |
How does sentiment distribute and what's the current trend? |
top_performers |
Who are the biggest winners and losers across timeframes? |
market_regime |
Are we in a Bull, Bear, or Neutral market right now? |
The crown jewel of this pipeline. Combines three dimensions into a single regime classification:
- Market Breadth (60% weight) β Percentage of top 50 coins in positive territory across 24h, 7d, and 30d timeframes
- Sentiment (20% weight) β Fear & Greed Index mapped to a directional score
- Global Metrics (20% weight) β Total market cap change and volume ratios
Output: STRONG_BULL | BULL | NEUTRAL | BEAR | STRONG_BEAR with a numeric score and human-readable narrative.
A proprietary indicator combining 5 factors for each coin:
- Short-term price action (24h change)
- Medium-term trend (7d change)
- Long-term trend (30d change)
- Volume confirmation (volume/market cap ratio)
- ATH proximity (distance from all-time high)
Each factor contributes to a score from roughly -50 to +100, which is then combined with market sentiment to produce actionable signals: STRONG_BUY, BUY, NEUTRAL, SELL, STRONG_SELL.
The contrarian twist: a high momentum score combined with extreme fear produces a STRONG_BUY β the classic "buy when there's blood in the streets" signal.
| Feature | How It's Used |
|---|---|
| Python Assets | 5 ingestion scripts fetching from CoinGecko, Alternative.me APIs, and CSV seed |
| SQL Assets | 9 BigQuery SQL transformations across staging (3) and analytics (6) layers |
| Seed Assets | CSV-based reference data for coin categories |
| Materialization | table strategy for all assets; merge for incremental ingestion |
| Dependencies | Explicit depends declarations creating a proper DAG |
| Quality Checks | Built-in (not_null, unique, positive, accepted_values) on every asset |
| Custom Checks | Business logic validations (e.g., "Bitcoin must exist in data", "dominances sum to ~100%") |
| Glossary | Structured business term definitions for crypto concepts |
| Pipeline Schedule | Daily schedule via pipeline.yml |
| Bruin Cloud | Deployment, monitoring, and AI analyst |
| AI Data Analyst | Conversational analysis on all analytics tables |
| Lineage | Full column-level lineage via bruin lineage |
- Bruin CLI installed
- Python 3.10+ with
pandasandrequests - A Google Cloud project with BigQuery enabled
- A GCP Service Account with
BigQuery Data EditorandBigQuery Job Userroles - (Optional) VS Code Bruin Extension
# 1. Clone the repository
git clone https://github.com/oussou-dev/cryptoflow-analytics.git
cd cryptoflow-analytics
# 2. Set up your GCP credentials
cp gcp-key.json.example gcp-key.json # add your service account key
# 3. Create BigQuery datasets
bq mk --dataset --location=US your-project:raw
bq mk --dataset --location=US your-project:stg
bq mk --dataset --location=US your-project:analytics
# 4. Execute the full pipeline
bruin run .All data is stored in BigQuery under raw, stg, and analytics datasets.
# Check pipeline lineage
bruin lineage .
# Query via BigQuery console or bq CLI
bq query --use_legacy_sql=false \
'SELECT regime, regime_score, regime_narrative FROM `your-project.analytics.market_regime`'
bq query --use_legacy_sql=false \
'SELECT name, signal, momentum_score FROM `your-project.analytics.momentum_signals` ORDER BY momentum_score DESC LIMIT 10'Deployed on Bruin Cloud, the AI Data Analyst answers natural language questions about the entire pipeline output β no SQL required.
"What is the current market regime? Show me the regime classification, score, and narrative."
"Show me the top 10 coins by momentum score with their BUY/SELL signal and confidence level."
"Show me the market dominance breakdown by price tier. What % does mega-cap control?"
"What are the top 5 biggest gainers and losers over the past 7 days?"
"How has market sentiment evolved recently? Show the distribution across sentiment zones."
Every asset in the pipeline includes quality checks that run automatically after each execution. Failed checks block downstream assets, ensuring data integrity throughout.
not_nullβ No NULL values in critical columnsuniqueβ No duplicate records where uniqueness is expectedpositiveβ Prices, volumes, and market caps are positiveaccepted_valuesβ Enum-like columns contain only valid values
- At least 50 coins ingested per run
- Bitcoin always present in the dataset
- No negative market capitalizations
- Fear & Greed values within 0-100 range
- Market dominance percentages sum to approximately 100%
- All top 50 coins have momentum signals
- Exactly one market regime row per run
cryptoflow-analytics/
βββ .bruin.yml # Project config + BigQuery connection
βββ pipeline.yml # Daily schedule + start_date
βββ glossary.yml # Bruin glossary with crypto terms
βββ README.md
βββ LICENSE
βββ .gitignore
β
βββ assets/
β βββ ingestion/ # π₯ BRONZE β Raw data
β β βββ fetch_coin_markets.py # Top 100 coins (CoinGecko)
β β βββ fetch_fear_greed.py # 90d sentiment index
β β βββ fetch_global_data.py # Global market metrics
β β βββ fetch_trending.py # Trending coins
β β βββ fetch_coin_categories.py # Seed loader: reads seeds/coin_categories.csv
β β
β βββ staging/ # π₯ SILVER β Cleaned & enriched
β β βββ stg_enriched_coins.sql # Tiers, ratios, spreads
β β βββ stg_fear_greed_daily.sql # Moving averages, trends
β β βββ stg_global_metrics.sql # Dominance, volume ratios
β β
β βββ analytics/ # π₯ GOLD β Business intelligence
β βββ market_dominance.sql # Market share analysis
β βββ volatility_analysis.sql # Volatility scoring
β βββ momentum_signals.sql # Buy/Sell signals
β βββ fear_greed_impact.sql # Sentiment zone analysis
β βββ top_performers.sql # Winners & losers
β βββ market_regime.sql # Bull/Bear classifier
β
βββ seeds/
β βββ coin_categories.csv # DeFi, L1, L2, Meme categories (35 coins)
β
βββ docs/
βββ ai_analyst_screenshots/ # Bruin AI analyst evidence
| Concern | Traditional Stack | Bruin |
|---|---|---|
| Ingestion | Airbyte / custom scripts + separate orchestration | Python assets with built-in materialization |
| Transformation | dbt (separate project, profiles.yml, dbt_project.yml) | SQL assets in the same project |
| Orchestration | Airflow DAGs (Python boilerplate, scheduler infra) | pipeline.yml with schedule + depends |
| Quality | Great Expectations (separate YAML suites, checkpoint configs) | Inline columns.checks + custom_checks |
| Setup time | Hours to days (Docker, Airflow webserver, dbt profiles...) | 3 commands, < 5 minutes |
| Files to manage | 10+ config files across tools | 2 config files (.bruin.yml + pipeline.yml) |
The biggest win: quality checks are embedded in the asset definition, not in a separate tool. This means every change to a transformation automatically includes its quality contract. There's no "forgetting to update the test file."
- Serverless, fully managed β zero infrastructure to maintain
- Scales from kilobytes to petabytes with the same SQL
- Native integration with Bruin Cloud for seamless deployment
- Production-grade analytics with familiar SQL syntax
- Free tier covers well beyond hackathon data volumes
- 30 calls/minute, 10,000/month β more than enough for daily batch
- No credit card required
- Richest free-tier data: prices, volumes, market cap, ATH, supply metrics
- Well-documented, stable endpoints
-
Bruin's single-file asset model is powerful β Having the SQL query, its dependencies, materialization strategy, column metadata, and quality checks all in one file eliminates an entire category of "where is the config for this?" problems.
-
Quality checks as first-class citizens change behavior β When checks are blocking by default, you think about data correctness while writing the transformation, not as an afterthought.
-
The glossary feature is underrated β Defining business terms in
glossary.ymlforced me to think precisely about what "market dominance" or "momentum score" means before writing SQL. That clarity propagated into better queries. -
Python + SQL in the same pipeline is the right abstraction β API ingestion naturally belongs in Python. Analytical transformations naturally belong in SQL. Bruin lets both coexist without fighting about which language "wins."
MIT β see LICENSE for details.
Built for the Data Engineering Zoomcamp 2026 Project Competition, sponsored by Bruin.




