# Market Data Service

**Real-time WebSocket tick data collection and processing system for algorithmic trading**

---

## Table of Contents

1. [Overview & Architecture](#section-1)
2. [Environment Setup](#section-2)
3. [Data Models with Pydantic](#section-3)
4. [Single WebSocket Connection](#section-4)
5. [Threaded WebSocket Connection](#section-5)
6. [JSONL Storage Implementation](#section-6)
7. [Three-Connection Manager](#section-7)
8. [Subscription Management](#section-8)
9. [Status Monitoring & Endpoints](#section-9)
10. [Complete Service Integration](#section-10)
11. [Real-World Testing](#section-11)
12. [Usage Examples & Utilities](#section-12)

---
<a id="section-1"></a>
## Section 1: Overview & Architecture

### Problem Statement

As a trader/algorithmic trading system, we need real-time market data converted into usable format and stored in local files for later consumption.

### Success Criteria

- ✅ Receive real-time tick data from broker WebSocket
- ✅ Store data reliably in JSONL files
- ✅ Manage instrument subscription/unsubscription across 3 WebSocket connections
- ✅ Handle network disconnections gracefully with auto-reconnection
- ✅ Process 1000 ticks per second without data loss

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                    Market Data Service                          │
│                  (FastAPI Main Thread)                          │
└────────────┬────────────────────────────────────┬───────────────┘
             │                                    │
    ┌────────▼─────────┐              ┌─────────▼──────────┐
    │  Connection 1    │              │   Connection 2     │
    │   (Primary)      │              │   (Secondary)      │
    │   Thread 1       │              │   Thread 2         │
    │                  │              │                    │
    │  MODE: QUOTE     │              │  MODE: FULL        │
    │  Max: 3000 inst  │              │  Max: 3000 inst    │
    │  Trading Universe│              │  On-demand Full    │
    └────────┬─────────┘              └─────────┬──────────┘
             │                                   │
             │        ┌─────────────────┐        │
             │        │  Connection 3   │        │
             └────────►   (Tertiary)    ◄────────┘
                      │   Thread 3      │
                      │                 │
                      │  MODE: Failover │
                      │  Activated when │
                      │  1 or 2 fails   │
                      └────────┬────────┘
                               │
                      ┌────────▼─────────┐
                      │  JSONL Storage   │
                      │  market_data_    │
                      │  YYYY-MM-DD.jsonl│
                      └──────────────────┘
```

### Key Concepts

**WebSocket Modes:**
- **LTP (Last Traded Price):** Minimal data - only last price
- **QUOTE:** OHLC + Volume + Buy/Sell quantities
- **FULL:** Complete market depth + All quote data

**Connection Strategy:**
- **Connection 1:** Primary connection for all trading universe stocks in QUOTE mode
- **Connection 2:** Secondary connection activated for FULL mode data when needed
- **Connection 3:** Redundancy connection for failover scenarios
- Each connection supports max 3000 instruments
- Single API key can have up to 3 WebSocket connections

**Performance Target:** Process 1000 ticks/second across all connections without data loss

---
<a id="section-2"></a>
## Section 2: Environment Setup

In [None]:
# Cell 1: Imports and Configuration

import logging
import os
import json
import time
import threading
from datetime import datetime, date
from typing import Dict, List, Optional, Any, Set
from pathlib import Path
from collections import defaultdict

# KiteConnect imports
from kiteconnect import KiteTicker

# Data handling
import pandas as pd

# Type validation
from pydantic import BaseModel, Field, validator

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

print("✓ All imports successful")

In [None]:
# Cell 2: Credentials and Constants

# Load credentials from environment variables
API_KEY = os.getenv("KITE_API_KEY")
ACCESS_TOKEN = os.getenv("KITE_ACCESS_TOKEN")

if not API_KEY or not ACCESS_TOKEN:
    logger.warning(
        "Missing credentials. Set KITE_API_KEY and KITE_ACCESS_TOKEN "
        "environment variables before running WebSocket connections"
    )
else:
    logger.info("✓ Credentials loaded from environment")

# Configuration constants
MAX_INSTRUMENTS_PER_CONNECTION = 3000
MAX_WEBSOCKET_CONNECTIONS = 3

# Storage configuration
DATA_DIR = Path("./market_data")
DATA_DIR.mkdir(exist_ok=True)

# WebSocket modes (from KiteTicker)
MODE_LTP = "ltp"
MODE_QUOTE = "quote"
MODE_FULL = "full"

logger.info(f"✓ Configuration loaded")
logger.info(f"✓ Data directory: {DATA_DIR}")
logger.info(f"✓ Max instruments per connection: {MAX_INSTRUMENTS_PER_CONNECTION}")

In [None]:
# Cell 3: Sample Instrument Tokens

# Sample instrument tokens for testing (from Zerodha examples)
SAMPLE_TOKENS = [
    408065, 738561, 341249, 1270529, 779521, 492033, 1510401, 1346049, 3050241,
    579329, 2863105, 261889, 81153, 119553, 5582849, 4774913, 6054401, 175361,
    4268801, 579329, 2953217, 408065, 969473, 1850625, 3465729, 1152769,
    4701441, 4752385, 356865, 424961, 4598529, 140033, 197633, 2585345, 1041153,
    3876097, 878593, 4843777, 857857, 225537, 177665, 2800641, 900609, 303617,
    70401, 418049, 2911489, 40193, 2815745, 884737, 519937, 232961, 2170625,
    345089, 54273, 108033, 558337, 738561, 633601, 415745, 134657, 1207553,
    4464129, 2905857, 2977281, 3834113, 6401, 895745, 3001089, 348929, 3924993,
    758529, 5215745, 758529, 1723649, 2939649, 112129, 951809, 2513665, 108033,
    806401, 3329, 3848705, 486657, 470529, 2714625, 3677697, 738561, 3431425,
    975873, 952577, 3721473
]

# Trading universe - all instruments for Connection 1 (QUOTE mode)
TRADING_UNIVERSE = SAMPLE_TOKENS[:20]  # Use first 20 for demo

# Full mode instruments for Connection 2
FULL_MODE_INSTRUMENTS = SAMPLE_TOKENS[20:25]  # Use next 5 for demo

logger.info(f"✓ Trading universe size: {len(TRADING_UNIVERSE)}")
logger.info(f"✓ Full mode instruments: {len(FULL_MODE_INSTRUMENTS)}")
logger.info(f"✓ Total sample tokens available: {len(SAMPLE_TOKENS)}")