# Send Trades To Kafka

This notebook will read the `./tradesMarch.csv` file to read trading events, and will send the events to Apache Kafka. Data will be then processed by Kafka Connect and will eventually end up on a QuestDB table.

We first create the QuestDB table. It would automatically be created if it didn't exist in any case, but this way we can see the schema.

In [5]:
#ignore deprecation warnings in this demo
import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)

In [6]:
import psycopg as pg


conn_str = 'user=admin password=quest host=questdb port=8812 dbname=qdb'
with pg.connect(conn_str, autocommit=True) as connection:
    with connection.cursor() as cur:
        cur.execute(
        """
        CREATE TABLE IF NOT EXISTS  'trades' (
  symbol SYMBOL capacity 256 CACHE,
  side SYMBOL capacity 256 CACHE,
  price DOUBLE,
  amount DOUBLE,
  timestamp TIMESTAMP
) timestamp (timestamp) PARTITION BY DAY WAL DEDUP UPSERT KEYS(timestamp, symbol, side);
""")
                    


## Sending the data to Kafka

Now we read the `./tradesMarch.csv` file and we convert every row to Avro binary format before we send to Kafka into a topic named `trades`.

By default, the script will override the original date with the current date and
 will wait 50ms between events before sending to Kafka, to simulate a real time stream and provide
a nicer visualization. You can override those configurations by changing the constants in the script. 

This script will keep sending data until you click stop or exit the notebook, or until the end of the file is reached.


In [7]:
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer
import csv
from datetime import datetime
import time


def get_delivery_report_func(verbose):
    def delivery_report(err, msg):
        if verbose:
            if err is not None:
                print(f'Message delivery failed: {err}')
            else:
                print(f'Message delivered to {msg.topic()} [{msg.partition()}]')
    return delivery_report

def main():   
    KAFKA_BROKER='broker:29092'
    KAFKA_TOPIC='trades'
    CSV_FILE='./tradesMarch.csv'
    SCHEMA_REGISTRY='http://schema_registry:8081'
    TIMESTAMP_FROM_FILE=False
    VERBOSE=True
    DELAY_MS=50

    value_schema = avro.loads("""
    {
        "type": "record",
        "name": "Trade",
        "fields": [
            {"name": "symbol", "type": "string"},
            {"name": "side", "type": "string"},
            {"name": "price", "type": "double"},
            {"name": "amount", "type": "double"},
            {"name": "timestamp", "type": "long", "logicalType": "timestamp-micros"}
        ]
    }
    """)

    avro_producer = AvroProducer({
        'bootstrap.servers': KAFKA_BROKER,
        'schema.registry.url': SCHEMA_REGISTRY,
        'linger.ms': '100',  # Adjust based on your needs
        'batch.size': '65536',  # Adjust based on your needs
    }, default_value_schema=value_schema)

    delivery_report_func = get_delivery_report_func(VERBOSE)

    with open(CSV_FILE, mode='r') as file:
        csv_reader = csv.DictReader(file)
        for row in csv_reader:
            if TIMESTAMP_FROM_FILE:
                timestamp_dt = datetime.strptime(row['timestamp'], "%Y-%m-%dT%H:%M:%S.%fZ")
                timestamp_micros = int(timestamp_dt.timestamp() * 1e6)
            else:
                timestamp_micros = int(time.time() * 1e6)

            value = {
                "symbol": row['symbol'],
                "side": row['side'],
                "price": float(row['price']),
                "amount": float(row['amount']),
                "timestamp": timestamp_micros
            }

            if DELAY_MS > 0:
                time.sleep(DELAY_MS / 1000.0)  # Convert milliseconds to seconds
                
            avro_producer.produce(topic=KAFKA_TOPIC, value=value, on_delivery=delivery_report_func)
            avro_producer.poll(0)  # Serve delivery callback queue

    avro_producer.flush()

if __name__ == '__main__':
    main()




Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]
Message delivered to trades [0]


KeyboardInterrupt: 

## Verify we have ingested some data

The data you send to Kafka will be processed by Kafka Connect and passed to QuestDB, where it will be stored into a table named `trades`. Let's check we can actually see some data

In [8]:
import requests

host = 'http://questdb:9000'

sql_query = 'SELECT * FROM trades LIMIT 5;'

try:
    response = requests.get(
        host + '/exec',
        params={'query': sql_query}).json()
    for row in response['dataset']:
        print(row)
except requests.exceptions.RequestException as e:
    print(f'Error: {e}')

['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-02-29T23:00:00.080992Z']
['LTC-USD', 'buy', 80.105555555555, 5.080278386, '2024-02-29T23:00:00.080992Z']
['ETH-USD', 'buy', 3342.659019607845, 0.301281981078, '2024-02-29T23:00:00.080992Z']
['BTC-USD', 'buy', 61196.341418918964, 0.0525110666891, '2024-02-29T23:00:00.080992Z']
['XLM-USD', 'buy', 0.122351361445, 564.286784114338, '2024-02-29T23:00:00.080992Z']
