# Trades Notebook

## Overview

This program connects to a local Postgres database running on a Docker container, and puts trading data that has been loaded into the Postgres table `public.trades` into a DataFrame for analysis. The program uses the trading data in the DataFrame and calculates the number of day trades for each account over a specified range of time.

## Setup

Setups up the connection to Postgres and loads the `public.trades` data into a DataFrame.

In [1]:
import psycopg2
import pandas as pd

# create a connection to the Postgres database
connection = psycopg2.connect("dbname=postgres host=sp-psql user=postgres password=postgres")
# create a cursor
cursor = connection.cursor()

# execute a query
cursor.execute("SELECT * FROM public.trades")

# retrieve query results
records = cursor.fetchall()
cursor.close()

# define the column names
column_names = ['account_id','symbol','side','qty','timestamp']

# create a DataFrame
trades_df = pd.DataFrame(records, columns=column_names)

## Trades Analysis

Creates a method called `calculate_day_trades()`, which calculates the number of day trades for each account over a given time window. The method accepts a `start_time` and an `end_time` as parameters for the date window, and it will default to using the `trades_df` DataFrame created above if not specified.

The second method `write_output_file()`, writes the day trades to a file. It accepts the `start_time` and `end_time` to include in the output file name, and it also takes it a DataFrame with the day trades data.

In [5]:
import pandas as pd
import numpy as np


def find_day_trades(start_time, end_time, df=trades_df):
    """
    Given a DataFrame with trading data, it calculates the number of day trades per account, day
    over a given time window.
    """
    # set index on timestamp
    df = df.set_index('timestamp')
    # get trades within the start and end time window
    data_in_range = df.loc[start_time:end_time]

    # group by account id and symbol and find where there are buy and sell side records
    sell_and_buy = data_in_range.groupby(['account_id', 'symbol']).filter(
            lambda x: 'sell' in x['side'].values and 'buy' in x['side'].values)

    # get the counts of sell and buy side trades / 2- this roughly accounts for side
    trades_div_two = sell_and_buy.groupby('account_id')['side'].count() / 2

    # create a DataFrame with the account and trade data
    output = pd.Series(trades_div_two).reset_index()
    output.columns = ['account_id', 'trades_div']
    
    # create a new column which takes the floor(trades / 2)
    # this will round down in scenarios where there are 2 sell/buy side orders and only one corresponding on the other side
    # there is probably a better way to do this...
    # also cast column to an integer
    output['day_trade_count'] = output['trades_div'].apply(np.floor).astype(int)
    
    return output

    
def write_output_file(start_time, end_time, df):
    """
    Writes a DataFrame to a file given a start time and an end time.
    """
    # create the output file name with the date range in the file name
    filename = f"day_trades_{start_time}_to_{end_time}.csv"
    # output the DataFrame to a csv
    df.to_csv(filename, columns=['account_id', 'day_trade_count'], header=True, index=False)

## Output

Call the calculate_day_trades() method with a `start_time` and `end_time` and save the result. Then pass the result to the `write_output_file()` method to generate a `.csv` file with the count of day trades in that time window for each account.

In [4]:
results = find_day_trades('2023-09-08 14:59:57', '2023-09-08 15:59:57')
write_output_file('2023-09-08 14:59:57', '2023-09-08 15:59:57', results)