# K2 Platform - Binance Crypto Demo

**Focus**: Cryptocurrency market data streaming and analytics

---

## What This Demo Shows

This notebook demonstrates a **production-grade cryptocurrency market data platform**:

- **Clear Positioning** - L3 cold path reference data platform (not HFT)
- **Live Streaming** - Binance WebSocket → Kafka → Iceberg
- **Production Patterns** - Circuit breaker, degradation, deduplication
- **Hybrid Queries** - Seamless Kafka + Iceberg merge (last 15 minutes)
- **Observability** - 83 Prometheus metrics, Grafana dashboards
- **Scalable** - Same architecture scales 1000x

---

## Sections

1. **Architecture Context** - Platform positioning and key metrics
2. **Ingestion** - Live Binance streaming with resilience patterns
3. **Storage** - Iceberg lakehouse with ACID and time-travel
4. **Monitoring** - Observability and graceful degradation
5. **Query** - Hybrid queries (Kafka + Iceberg)
6. **Scaling** - Cost model and scaling path

---

## Prerequisites

- Docker services running: `docker compose up -d`
- Infrastructure initialized: `python scripts/init_e2e_demo.py`
- Binance streaming: `docker logs k2-binance-stream`

## Setup & Imports

Prerequisites:
- All Docker services running: `docker compose up -d`
- Infrastructure initialized: `python scripts/init_e2e_demo.py`
- Binance streaming: `docker logs k2-binance-stream` (should show trade messages)

In [None]:
# Standard library
import sys
import subprocess
import re
from pathlib import Path

# Data processing
import pandas as pd
import requests

# Add src to path
sys.path.insert(0, str(Path.cwd().parent.parent / 'src'))

# Pandas display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print("Imports loaded")

---

# Section 1: Architecture Context

## Platform Positioning

K2 is a **Reference Data Platform for the L3 Cold Path** - optimized for analytics, compliance, and backtesting, not real-time execution.

### Market Data Latency Tiers

| Tier | Latency | Use Case | K2 Position |
|------|---------|----------|-------------|
| **L1 Hot Path** | <10μs | HFT execution, order routing | Not K2 |
| **L2 Warm Path** | <10ms | Real-time risk, positions | Not K2 |
| **L3 Cold Path** | <500ms | Analytics, compliance, backtesting | **K2 Platform** |

### What K2 IS

- High-throughput ingestion (10K-50K msg/sec crypto, scalable to 1M+)
- ACID-compliant lakehouse storage (Apache Iceberg)
- Sub-second analytical queries on historical data
- Compliance and audit trail (time-travel queries)
- Cost-effective ($0.85 per million messages at scale)

### What K2 is NOT

- Ultra-low-latency execution infrastructure (<10μs)
- Real-time position/risk management (<10ms)
- Order routing or market making systems

In [None]:
# Show key platform metrics
print("K2 Platform - Key Metrics\n")
print(f"{'Metric':<30} {'Current (Demo)':<25} {'Production Target':<30}")
print("-" * 85)
print(f"{'Ingestion Throughput':<30} {'138 msg/sec':<25} {'1M msg/sec (distributed)':<30}")
print(f"{'Query Latency (p99)':<30} {'<500ms':<25} {'<500ms':<30}")
print(f"{'Storage Backend':<30} {'Iceberg + MinIO':<25} {'Iceberg + S3':<30}")
print(f"{'Data Sources':<30} {'Binance WebSocket':<25} {'Multi-exchange':<30}")
print(f"{'Crypto Pairs':<30} {'BTC, ETH, BNB, ADA, DOGE':<25} {'100+ pairs':<30}")
print(f"{'Test Coverage':<30} {'95%+':<25} {'95%+':<30}")

print("\nPlatform positioned for L3 cold path analytics and compliance")

---

# Section 2: Ingestion (2 min)

## Live Binance Streaming

Real-time cryptocurrency trade data streaming from Binance spot market:

- **Connection**: `wss://stream.binance.com:9443`
- **Symbols**: BTCUSDT, ETHUSDT, BNBUSDT, ADAUSDT, DOGEUSDT
- **Message Rate**: 10K-50K msg/sec peak (top 10 pairs)
- **Schema**: V2 hybrid (core fields + vendor_data for flexibility)
- **Serialization**: Avro with Schema Registry

## Production Patterns

### 1. Resilience
- ✅ Circuit breaker integration (wraps all external calls)
- ✅ Exponential backoff on connection failures
- ✅ Dead Letter Queue (DLQ) with 3 retry attempts
- ✅ Zero data loss on transient failures

### 2. Data Quality
- ✅ Sequence gap detection (per-symbol validation)
- ✅ Deduplication (1-hour sliding window, in-memory)
- ✅ Schema validation (Avro enforces structure)

### 3. Observability
- ✅ 7 Binance-specific Prometheus metrics
- ✅ Real-time connection health monitoring
- ✅ Structured logging with correlation IDs

In [None]:
# Check Binance stream is running
print("Checking Binance WebSocket stream...\n")

# Get last 50 log lines
result = subprocess.run(
    ['docker', 'logs', 'k2-binance-stream', '--tail', '50'],
    capture_output=True,
    text=True
)

if result.returncode == 0:
    # Strip ANSI color codes from logs
    ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
    clean_logs = ansi_escape.sub('', result.stdout)
    
    # Parse streaming metrics
    trade_matches = re.findall(r'trades_streamed=(\d+)', clean_logs)
    streamed_matches = re.findall(r'Streamed (\d+) trades', clean_logs)
    
    # Get the highest count (most recent)
    total_trades = 0
    if trade_matches:
        total_trades = max([int(m) for m in trade_matches])
    elif streamed_matches:
        total_trades = max([int(m) for m in streamed_matches])

    if total_trades > 0:
        print(f"Binance stream is ACTIVE")
        print(f"  {total_trades:,} total trades streamed")
        
        # Show sample log line with symbol
        symbol_matches = re.findall(r'symbol=(\w+)', clean_logs)
        if symbol_matches:
            last_symbol = symbol_matches[-1]
            print(f"  Last symbol: {last_symbol}")
    else:
        print("Stream running but no recent trades in logs")
        print("  This is normal if markets are slow")
else:
    print("Could not check Binance stream")
    print("  Run: docker compose up -d")

In [None]:
# Show Kafka topic statistics
from confluent_kafka.admin import AdminClient

admin = AdminClient({'bootstrap.servers': 'localhost:9092'})
metadata = admin.list_topics(timeout=10)

# Find trades topic
trades_topic = 'market-data.trades.v2'
if trades_topic in metadata.topics:
    topic_meta = metadata.topics[trades_topic]

    print("Kafka Topic: market-data.trades.v2\n")
    print(f"  Partitions: {len(topic_meta.partitions)}")
    print(f"  Replication Factor: 1 (dev mode)")
    print(f"  Serialization: Avro (with Schema Registry)")
    print(f"  Retention: 7 days")
    
    print("\nKafka topic configured and accepting messages")
else:
    print(f"Topic {trades_topic} not found")
    print("  Run: python scripts/init_e2e_demo.py")

---

# Section 3: Storage (2 min)

## Apache Iceberg Lakehouse

K2 uses Apache Iceberg for production-grade storage with:

### ACID Transactions
- ✅ All-or-nothing writes (no partial data)
- ✅ Transaction logging with snapshot IDs
- ✅ Concurrent readers don't block writers (MVCC)

### Time-Travel Queries
- ✅ Query data as-of any historical snapshot
- ✅ Compliance audits without ETL copies
- ✅ Snapshot isolation for consistent reads

### Schema Evolution
- ✅ Add columns without rewriting data
- ✅ V1 → V2 migration completed seamlessly
- ✅ Forward and backward compatibility

### Performance
- ✅ Parquet columnar storage (10:1 compression)
- ✅ Hidden partitioning (by date + symbol hash)
- ✅ Partition pruning (scan GBs instead of TBs)

## Storage Architecture

```
Catalog (PostgreSQL)
    ↓
Metadata Layer (Iceberg tables)
    ↓
Data Files (Parquet on MinIO/S3)
```

In [None]:
# Check Iceberg table
from k2.query.engine import QueryEngine

engine = QueryEngine()

print("Querying Iceberg table: trades_v2\n")

try:
    # Get table stats
    stats = engine.get_stats()

    print("Iceberg Table: trades_v2\n")
    print(f"  Total Rows: {stats.get('trades_count', 0):,}")
    print(f"  Storage Format: Parquet (columnar)")
    print(f"  Partitioning: By exchange_date + symbol hash (16 buckets)")
    print(f"  Catalog: PostgreSQL (ACID metadata)")
    print(f"  Object Store: MinIO (S3-compatible)")
    print(f"  Compression: ~10:1 (Parquet Snappy)")

    print("\nIceberg lakehouse operational")

    # Query recent trades
    print("\nSample query: Last 5 BTCUSDT trades\n")

    trades = engine.query_trades(
        symbol='BTCUSDT',
        exchange='binance',
        limit=5
    )

    if trades:
        df = pd.DataFrame(trades)
        print(df[['symbol', 'timestamp', 'price', 'quantity']].to_string(index=False))
        print(f"\nQuery returned {len(trades)} trades in {stats.get('query_time_ms', 0):.0f}ms")
    else:
        print("No trades found (stream may need more time)")

except Exception as e:
    print(f"Error querying Iceberg: {e}")

In [None]:
# Show Iceberg snapshots (time-travel capability)
print("\nTime-Travel: Iceberg Snapshots\n")

try:
    snapshots = engine.get_snapshots(table_name='trades_v2')

    if snapshots:
        print("Recent Snapshots (Time-Travel Points)\n")
        print(f"{'Snapshot ID':<20} {'Timestamp':<20} {'Operation':<15}")
        print("-" * 55)

        # Show last 5 snapshots
        for snap in snapshots[-5:]:
            snap_id = str(snap['snapshot_id'])[:12] + '...'
            timestamp = snap['committed_at'].strftime('%Y-%m-%d %H:%M:%S')
            operation = snap.get('operation', 'append')
            print(f"{snap_id:<20} {timestamp:<20} {operation:<15}")

        print(f"\n{len(snapshots)} snapshots available for time-travel queries")
        print("  Example: SELECT * FROM trades_v2 FOR SYSTEM_TIME AS OF '2026-01-13 10:00:00'")
    else:
        print("No snapshots yet (table recently created)")

except Exception as e:
    print(f"Error listing snapshots: {e}")

---

# Section 4: Monitoring (2 min)

## Observability

K2 exposes comprehensive metrics through Prometheus:

### Metric Categories (83 metrics total)

**Ingestion**:
- `k2_kafka_messages_produced_total` - Messages published to Kafka
- `k2_sequence_gaps_detected_total` - Data quality tracking
- `k2_duplicate_messages_detected_total` - Deduplication stats

**Storage**:
- `k2_iceberg_rows_written_total` - Rows committed to Iceberg
- `k2_iceberg_transactions_total` - ACID transactions
- `k2_iceberg_write_duration_seconds` - Write latency

**Query**:
- `k2_query_executions_total` - Query count
- `k2_query_duration_seconds` - Query latency histogram
- `k2_hybrid_queries_total` - Hybrid query count (new!)

**System Health**:
- `k2_degradation_level` - 0=normal, 4=circuit break
- `k2_circuit_breaker_state` - Per-component state
- `k2_messages_shed_total` - Load shedding stats

## Grafana Dashboards

- **URL**: http://localhost:3000 (admin/admin)
- **15 panels** across 5 rows (health, ingestion, storage, query, system)
- **Real-time** visualization of platform health

## Graceful Degradation (5-Level Cascade)

| Level | Name | Triggers | Actions |
|-------|------|----------|----------|
| 0 | NORMAL | Lag <100K | All features enabled |
| 1 | SOFT | Lag 100K-500K | Skip LOW priority data |
| 2 | GRACEFUL | Lag 500K-1M | Drop Tier 3 symbols |
| 3 | AGGRESSIVE | Lag 1M-5M | Only Tier 1 symbols |
| 4 | CIRCUIT_BREAK | Lag >5M | Stop processing |

**Recovery**: Automatic with hysteresis (50% threshold, 30s cooldown)

In [None]:
# Query Prometheus metrics
print("Querying Prometheus metrics...\n")

try:
    # Check Prometheus health
    response = requests.get('http://localhost:9090/-/healthy', timeout=5)

    if response.status_code == 200:
        print("Prometheus is healthy\n")

        # Query key metrics
        metrics_to_query = [
            ('k2_kafka_messages_produced_total', 'Total messages produced'),
            ('k2_iceberg_rows_written_total', 'Total rows written to Iceberg'),
            ('k2_degradation_level', 'Current degradation level'),
        ]

        print("Key Metrics (Current Values)\n")
        print(f"{'Metric':<40} {'Value':<20}")
        print("-" * 60)

        for metric, description in metrics_to_query:
            try:
                query_response = requests.get(
                    'http://localhost:9090/api/v1/query',
                    params={'query': metric},
                    timeout=5
                )

                if query_response.status_code == 200:
                    data = query_response.json()
                    results = data.get('data', {}).get('result', [])

                    if results:
                        value = results[0]['value'][1]
                        print(f"{description:<40} {float(value):,.0f}")
                    else:
                        print(f"{description:<40} No data yet")
            except:
                print(f"{description:<40} Query failed")

        print("\nView all metrics: http://localhost:9090")
        print("Grafana dashboards: http://localhost:3000")
    else:
        print("Prometheus not responding")

except requests.exceptions.ConnectionError:
    print("Cannot connect to Prometheus")
    print("  Run: docker compose up -d")
except Exception as e:
    print(f"Prometheus check failed: {e}")

## Resilience Demonstration (Interactive)

This section demonstrates the circuit breaker's graceful degradation in action.

**Scenario**: What happens when the system is overloaded?

The degradation manager automatically responds to system stress:
- **Monitors**: Consumer lag and heap usage in real-time
- **Reacts**: Automatically sheds load based on priority tiers
- **Recovers**: Returns to normal when conditions improve (with hysteresis)

Let's simulate a high-lag scenario to see the circuit breaker in action.

In [None]:
# Resilience Demo: Circuit Breaker in Action
print("\nResilience Demonstration: Circuit Breaker\n")
print("Using scripts/simulate_failure.py to demonstrate failure scenarios\n")

# Show current system status
print("Current System Status:\n")
result = subprocess.run(
    ['python', 'scripts/simulate_failure.py', '--status'],
    capture_output=True,
    text=True,
    cwd='../../..'
)

if result.returncode == 0:
    # Parse output to show status
    for line in result.stdout.split('\n'):
        if line.strip():
            print(line)

print("\nSimulating High Lag Scenario (600K messages):\n")

# Simulate high lag
result = subprocess.run(
    ['python', 'scripts/simulate_failure.py', '--scenario', 'high_lag'],
    capture_output=True,
    text=True,
    cwd='../../..'
)

if result.returncode == 0:
    for line in result.stdout.split('\n'):
        if line.strip():
            print(line)

print("\nKey Takeaways:")
print("  - Automatic degradation when lag >= 500K")
print("  - Priority-based load shedding (drop low-value symbols)")
print("  - High-value data continues processing (BTC, ETH, critical symbols)")
print("  - Automatic recovery with hysteresis (prevents flapping)")
print("  - Production-grade resilience: graceful degradation, not cliff-edge failure\n")

print("This is what separates production systems from demos")
print("  See: src/k2/common/degradation_manager.py (304 lines, 34 tests)")

---

# Section 5: Query (2 min)

## Hybrid Query Engine (NEW!)

**The core lakehouse value proposition**: Unified queries spanning streaming (Kafka) + batch (Iceberg).

### Problem: Recent Data Gap

Traditional systems have a gap:
- Iceberg has data up to **T-2 minutes** (commit lag)
- Kafka has data from **T-15 minutes to now**
- User wants: "Give me last 15 minutes of BTCUSDT trades"

### Solution: Hybrid Queries

```python
# Query: Last 15 minutes
# Automatic routing:
#   - Iceberg: 0-13 min ago (committed)
#   - Kafka:   13-15 min ago (uncommitted)
#   - Merge + deduplicate by message_id

trades = hybrid_engine.query_trades(
    symbol='BTCUSDT',
    exchange='binance',
    start_time=now - timedelta(minutes=15),
    end_time=now
)
```

### Performance

- **Iceberg query**: 200-500ms (DuckDB + Parquet)
- **Kafka tail**: <50ms (in-memory buffer)
- **Total**: <500ms p99 for 15-minute window

### REST API

```bash
GET /v1/trades/recent?symbol=BTCUSDT&window_minutes=15
```

Returns unified results from both sources automatically.

In [None]:
# Demonstrate hybrid query
print("Hybrid Query Demo: Last 15 minutes of BTCUSDT\n")

try:
    # Use the hybrid query endpoint
    response = requests.get(
        'http://localhost:8000/v1/trades/recent',
        params={
            'symbol': 'BTCUSDT',
            'exchange': 'binance',
            'window_minutes': 15
        },
        headers={'X-API-Key': 'k2-dev-api-key-2026'},
        timeout=10
    )

    if response.status_code == 200:
        data = response.json()
        trades = data.get('data', [])
        meta = data.get('meta', {})
        query_info = meta.get('query', {})

        print("Hybrid query successful")
        print(f"  Returned {len(trades)} trades")
        print(f"  Window: {query_info.get('start_time', '')} to {query_info.get('end_time', '')}")

        if trades:
            df = pd.DataFrame(trades)
            print("\nSample Results:\n")
            print(df[['symbol', 'timestamp', 'price', 'quantity']].head(10).to_string(index=False))

            print("\nHow it works:")
            print("  1. Query Iceberg for committed data (0-13 min ago)")
            print("  2. Query Kafka tail for recent data (13-15 min ago)")
            print("  3. Merge results and deduplicate by message_id")
            print("  4. Return unified result (<500ms)")
            print("\nUser gets seamless data regardless of source")
        else:
            print("\nNo trades in last 15 minutes")
            print("  Stream may need more time to accumulate data")
    else:
        print(f"API returned {response.status_code}")
        print(f"  {response.text}")

except requests.exceptions.ConnectionError:
    print("Cannot connect to API")
    print("  Run: uvicorn k2.api.main:app --reload")
except Exception as e:
    print(f"Hybrid query failed: {e}")

In [None]:
# Show REST API capabilities
print("K2 REST API Endpoints\n")
print(f"{'Endpoint':<35} {'Description':<45}")
print("-" * 80)
print(f"{'GET /v1/trades':<35} {'Query historical trades (Iceberg only)':<45}")
print(f"{'GET /v1/trades/recent':<35} {'Hybrid query (Kafka + Iceberg)':<45}")
print(f"{'GET /v1/quotes':<35} {'Query bid/ask quotes':<45}")
print(f"{'GET /v1/summary/{symbol}':<35} {'Daily OHLCV summary':<45}")
print(f"{'GET /v1/symbols':<35} {'List available symbols':<45}")
print(f"{'GET /v1/snapshots':<35} {'List Iceberg snapshots':<45}")
print(f"{'GET /health':<35} {'Health check (dependencies)':<45}")
print(f"{'GET /metrics':<35} {'Prometheus metrics':<45}")
print(f"{'GET /docs':<35} {'OpenAPI/Swagger docs':<45}")

print("\nOpenAPI docs: http://localhost:8000/docs")

---

# Section 6: Scaling & Cost Model (1 min)

## Scaling Path

**Same architecture scales 1000x:**

| Scale | Throughput | Deployment | Cost/Month |
|-------|------------|------------|------------|
| **Current (1x)** | 138 msg/sec | Docker Compose (laptop) | $0 |
| **Small (10x)** | 10K msg/sec | AWS - 3 Kafka brokers | ~$600 |
| **Medium (100x)** | 1M msg/sec | AWS - 20 Kafka brokers, Presto cluster | ~$22K |
| **Large (1000x)** | 10M msg/sec | AWS - 50 Kafka brokers, large Presto | ~$165K |

**Cost per message decreases with scale** (economies of scale):
- 10K msg/sec: $2.20 per million messages
- 1M msg/sec: **$0.85 per million messages**
- 10M msg/sec: **$0.63 per million messages**

## Cost Model: 1M msg/sec Scale (AWS us-east-1)

| Component | Resources | Monthly Cost |
|-----------|-----------|-------------|
| **Kafka (MSK)** | 20× m5.2xlarge | $7,200 |
| **Storage (S3)** | 26 TB/month ingestion | $6,000 |
| **Archive (Glacier)** | 5 PB deep archive | $500 |
| **Catalog (RDS)** | db.r5.2xlarge Multi-AZ | $1,200 |
| **Query (Presto)** | 10× r5.4xlarge nodes | $5,760 |
| **Data Transfer** | 10 TB cross-AZ egress | $900 |
| **Ops (CloudWatch)** | Logs + metrics + backups | $500 |
| **Total** | | **$22,060** |

## Cost Optimization

✅ S3 lifecycle: Standard → IA → Glacier (40% savings)  
✅ Iceberg compaction: Reduce file count (faster queries)  
✅ Partition pruning: Query only relevant data  
✅ Reserved instances: 30-40% discount for compute  

In [None]:
# Show scaling comparison
print("Scaling Comparison\n")
print(f"{'Scale':<15} {'Throughput':<15} {'Monthly Cost':<15} {'Cost per 1M msgs':<18}")
print("-" * 63)
print(f"{'Current (1x)':<15} {'138 msg/s':<15} {'$0':<15} {'$0 (dev)':<18}")
print(f"{'Small (10x)':<15} {'10K msg/s':<15} {'$600':<15} {'$2.20':<18}")
print(f"{'Medium (100x)':<15} {'1M msg/s':<15} {'$22,060':<15} {'$0.85':<18}")
print(f"{'Large (1000x)':<15} {'10M msg/s':<15} {'$165,600':<15} {'$0.63':<18}")

print("\nCost per message decreases as scale increases")
print("Same architecture works at all scales")

---

# Summary & Key Takeaways

## What We Demonstrated

### 1. Clear Positioning
- Reference data platform for L3 cold path (analytics/compliance)
- **Not HFT, not real-time risk** - honest about capabilities
- Target: <500ms queries for backtesting and analysis

### 2. Production Patterns
- Circuit breaker integration (all external calls)
- 5-level graceful degradation cascade
- Sequence tracking and deduplication
- Zero data loss with Dead Letter Queue

### 3. Observable
- **83 Prometheus metrics** (validated by pre-commit hook)
- **21 alert rules** for production monitoring
- Grafana dashboards (15 panels)
- Real-time visibility into platform health

### 4. Queryable
- REST API with <500ms p99 latency
- **Hybrid queries** (Kafka + Iceberg) for recent data
- Time-travel queries for compliance
- Connection pooling (5x throughput improvement)

### 5. Scalable
- Same architecture scales 1000x
- Cost-effective: $0.85 per million messages at scale
- Economies of scale (cost per message decreases)

---

## Performance Summary

| Metric | Current | Target |
|--------|---------|--------|
| Ingestion | 138 msg/sec | 1M msg/sec (distributed) |
| Query (p99) | <500ms | <500ms |
| Test Coverage | 95%+ | 95%+ |
| Uptime | 99%+ | 99.9% (production) |

---

## Questions?

**Documentation:**
- Technical Deep-Dive: `notebooks/binance_e2e_demo.ipynb`
- Architecture: `docs/architecture/`
- OpenAPI: http://localhost:8000/docs

**Local Services:**
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- MinIO: http://localhost:9001 (minioadmin/minioadmin)