# Kafka Mock Stream Generator

This notebook generates mock stock tick data and publishes it to Kafka for the Trino real-time analytics demo.

In [1]:
# Install Kafka Python client
!pip install kafka-python pandas -q

[0m

In [2]:
import json
import random
import time
from kafka import KafkaProducer
import pandas as pd
from datetime import datetime

# Kafka producer configuration
producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',
    retries=3
)

# Stock symbols to generate data for
symbols = ['MSFT', 'AAPL', 'AMZN', 'GOOG', 'SNOW', 'TSLA', 'META', 'NVDA']

# Base prices for realistic simulation
base_prices = {
    'MSFT': 310.0,
    'AAPL': 175.0,
    'AMZN': 145.0,
    'GOOG': 140.0,
    'SNOW': 185.0,
    'TSLA': 250.0,
    'META': 320.0,
    'NVDA': 450.0
}

print(f"Kafka producer connected. Generating data for {len(symbols)} symbols...")

Kafka producer connected. Generating data for 8 symbols...


## Generate Real-Time Stock Data

Generate realistic stock price movements and publish to Kafka topic 'test'.

In [None]:
# Generate and publish stock data
message_count = 0

try:
    while True:
        # Select random symbol
        symbol = random.choice(symbols)
        base_price = base_prices[symbol]
        
        # Generate realistic price movement (-2% to +2%)
        price_change = random.uniform(-0.02, 0.02)
        current_price = round(base_price * (1 + price_change), 2)
        
        # Create message
        message = {
            'symbol': symbol,
            'price': current_price,
            'event_ts': datetime.utcnow().isoformat() + 'Z'
        }
        
        # Send to Kafka
        producer.send('test', message)
        
        message_count += 1
        
        # Progress update every 10 messages
        if message_count % 10 == 0:
            print(f"Generated {message_count} messages - Latest: {symbol} @ ${current_price}")
        
        # Random delay 0.5-2 seconds
        time.sleep(random.uniform(0.5, 2.0))
        
except KeyboardInterrupt:
    print(f"\nStream generation stopped. Total messages sent: {message_count}")
finally:
    producer.close()
    print("Kafka producer closed.")

  'event_ts': datetime.utcnow().isoformat() + 'Z'


Generated 10 messages - Latest: AAPL @ $172.32
Generated 20 messages - Latest: TSLA @ $247.32
Generated 30 messages - Latest: AMZN @ $146.05
Generated 40 messages - Latest: SNOW @ $187.1
Generated 50 messages - Latest: SNOW @ $181.79
Generated 60 messages - Latest: AMZN @ $146.15
Generated 70 messages - Latest: AMZN @ $144.74


## Stream Status

The stream is now generating real-time stock data that can be:

1. **Queried through Trino** as `kafka.default.test`
2. **Joined with historical data** in federated queries
3. **Visualized in real-time** through the demo dashboard

### Data Format
```json
{
  "symbol": "MSFT",
  "price": 312.45,
  "event_ts": "2025-01-01T10:30:15Z"
}
```