# Send Trades To QuestDB directly

This notebook will read the `./tradesMarch.csv` file to read trading events, and will send the events directly to QuestDB using multiple process in parallel.

We first create the QuestDB table. It would automatically be created if it didn't exist in any case, but this way we can see the schema.

In [1]:
#ignore deprecation warnings in this demo
import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)

In [2]:
import psycopg as pg
import os

# Fetch environment variables with defaults
host = os.getenv('QDB_CLIENT_HOST', 'questdb')
port = os.getenv('QDB_CLIENT_PORT', '8812')
user = os.getenv('QDB_CLIENT_USER', 'admin')
password = os.getenv('QDB_CLIENT_PASSWORD', 'quest')

# Create the connection string using the environment variables or defaults
conn_str = f'user={user} password={password} host={host} port={port} dbname=qdb'

with pg.connect(conn_str, autocommit=True) as connection:
    with connection.cursor() as cur:
        cur.execute(
        """
        CREATE TABLE IF NOT EXISTS  'trades' (
  symbol SYMBOL capacity 256 CACHE,
  side SYMBOL capacity 256 CACHE,
  price DOUBLE,
  amount DOUBLE,
  timestamp TIMESTAMP
) timestamp (timestamp) PARTITION BY DAY WAL DEDUP UPSERT KEYS(timestamp, symbol, side);
""")
                    


## Sending the data to QuestDB

Now we read the `./tradesMarch.csv` file and we insert into the trades table.

By default, the script will override the original date with the current date and
 will wait 50ms between events before sending to QuestDB, to simulate a real time stream and provide
a nicer visualization. You can override those configurations by changing the constants in the script. 

This script will keep sending data until you click stop or exit the notebook, or until the `TOTAL_EVENTS` number is reached. If the number of events on the CSV is smaller than the total events configured, the script will sumply loop over the file again.

While the script is running, you can check the data in the table directly at QuestDB's web console at http://localhost:9000 or a live Grafana Dashboard powered by QuestDB at http://localhost:3000/d/trades-crypto-currency/trades-crypto-currency?orgId=1&refresh=250ms (user admin and password quest).


In [4]:
from questdb.ingress import Sender, IngressError, TimestampNanos, ServerTimestamp
import os
import sys
import csv
import time
from multiprocessing import Pool
from datetime import datetime

HTTP_ENDPOINT = os.getenv('QUESTDB_HTTP_ENDPOINT', 'questdb:9000')
REST_TOKEN = os.getenv('QUESTDB_REST_TOKEN')

TOTAL_EVENTS = 20_000_000  # Total events across all senders
DELAY_MS = 50  # Delay between events in milliseconds
NUM_SENDERS = 7  # Number of senders to execute in parallel
CSV_FILE = './tradesMarch.csv'  # Path to the CSV file
TIMESTAMP_FROM_FILE = False  # Whether to use the timestamp from the CSV file

def send(sender_id, total_events, delay_ms=DELAY_MS, csv_file=CSV_FILE, http_endpoint=HTTP_ENDPOINT, auth=REST_TOKEN):
    sys.stdout.write(f"Sender {sender_id} will send {total_events} events\n")

    try:
        if auth is not None:
            conf = f'https::addr={http_endpoint};tls_verify=unsafe_off;token={auth};'
        else:
            conf = f'http::addr={http_endpoint};'
            
        with Sender.from_conf(conf) as sender, open(csv_file, mode='r') as file:
            csv_reader = csv.DictReader(file)
            events_sent = 0
            csv_rows = list(csv_reader)  # Load the CSV data once into memory for looping

            while events_sent < total_events:
                row = csv_rows[events_sent % len(csv_rows)]  # Loop over the CSV rows

                if TIMESTAMP_FROM_FILE:
                    timestamp_dt = datetime.strptime(row['timestamp'], "%Y-%m-%dT%H:%M:%S.%fZ")
                    timestamp_nanos = TimestampNanos(int(timestamp_dt.timestamp() * 1e9))  # Convert to nanoseconds
                else:
                    #timestamp_nanos = TimestampNanos.now()  # Get current time in nanoseconds
                    timestamp_nanos = ServerTimestamp
                
                # Ingest the row with the current timestamp
                sender.row(
                    'trades',
                    symbols={'symbol': row['symbol'], 'side': row['side']},
                    columns={
                        'price': float(row['price']),
                        'amount': float(row['amount']),
                    },
                    at=timestamp_nanos  # Send timestamp in nanoseconds
                )

                events_sent += 1

                # Delay after each event
                if delay_ms > 0:
                    time.sleep(delay_ms / 1000.0)  # Convert milliseconds to seconds

            sys.stdout.write(f"Sender {sender_id} finished sending {events_sent} events\n")

    except IngressError as e:
        sys.stderr.write(f'Sender {sender_id} got error: {e}\n')

def parallel_send(total_events, num_senders: int):
    events_per_sender = total_events // num_senders
    remaining_events = total_events % num_senders

    sender_events = [events_per_sender] * num_senders
    for i in range(remaining_events):  # Distribute the remaining events
        sender_events[i] += 1

    with Pool(processes=num_senders) as pool:
        sender_ids = range(num_senders)
        pool.starmap(send, [(sender_id, sender_events[sender_id]) for sender_id in sender_ids])

if __name__ == '__main__':
    sys.stdout.write(f'Ingestion started. Connecting to {HTTP_ENDPOINT}\n')
    parallel_send(TOTAL_EVENTS, NUM_SENDERS)


Ingestion started. Connecting to host.docker.internal:9000
Sender 5 will send 2857143 events
Sender 0 will send 2857143 events
Sender 4 will send 2857143 events
Sender 6 will send 2857142 events
Sender 3 will send 2857143 events
Sender 1 will send 2857143 events
Sender 2 will send 2857143 events
Sender 2 finished sending 2857143 events
Sender 1 finished sending 2857143 events
Sender 6 finished sending 2857142 events
Sender 0 finished sending 2857143 events
Sender 3 finished sending 2857143 events
Sender 4 finished sending 2857143 events
Sender 5 finished sending 2857143 events


## Verify we have ingested some data

The data you send to Kafka will be processed by Kafka Connect and passed to QuestDB, where it will be stored into a table named `trades`. Let's check we can actually see some data

In [11]:
import requests
import os

HTTP_ENDPOINT = os.getenv('QUESTDB_HTTP_ENDPOINT', 'questdb:9000')
REST_TOKEN = os.getenv('QUESTDB_REST_TOKEN')

if REST_TOKEN is not None:
  host = f'https://admin:quest@{HTTP_ENDPOINT}'
else:
  host = f'http://admin:quest@questdb:9000'

sql_query = 'SELECT * FROM trades LIMIT -5;'

try:
    response = requests.get(
        host + '/exec',
        params={'query': sql_query}, verify=False).json()
    for row in response['dataset']:
        print(row)    
except requests.exceptions.RequestException as e:
    print(f'Error: {e}')

['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-10-28T11:54:19.284340Z']
['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-10-28T11:54:19.285580Z']
['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-10-28T11:56:45.865026Z']
['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-10-28T11:56:46.183816Z']
['DOT-USD', 'buy', 8.278547619047, 39.607455338095, '2024-10-28T11:56:46.360160Z']


