# Cleaning data


There are 4 possible types of orders that a trader can place: 
1. Limit Orders (LO): Placing a new order in the book.
2. Market Orders (MO): Executing a LO.
3. Cancellations: Cancelling a existing order in the book.
4. Updates: Changing the price of an existing order in the book.

In the simulations only 3 types of orders are allowed: LO, MO and cancellations, meaning that updates needs to be modeled as a cancellation + a LO.

In order to do so I implemented 2 fuctions:
1. clean_data: This function differentiate between LO, cancellations/MO and updates.
2. update_df: When an update occours models it as a cancellation + LO.

Lastly I differentiated cancellations and MO using the function matching.

In [1]:
import pandas as pd
import os
import numpy as np
import sys
sys.path.append("../main/")
import match


In [2]:
# Load  LOB data
DIR = "../data/energia/LOB_ottobre21/LOB_ottobre21/"
filepath = os.path.abspath(DIR) + "\\"
files = os.listdir(filepath)
lst_df = []

# Open all the files in the folder and concatenate them
for file in files:
     lst_df.append(match.load_data(DIR + file, del_time = False,
                                   del_spread = True, start_month = True))
df = pd.concat(lst_df)
df.reset_index(inplace = True, drop = True)

df["Quote"] = df["BidVolume_0"] * 0 - 999
df["Type"] = df["BidVolume_0"] * 0
df["Sign"] = df["Type"]
df["Price"] = df["Type"]
df["Volume"] = df["Type"]

In [3]:
# Load trades
DIR_1 = "../data/energia/trade_ottobre2021_nuovo/trade_ottobre2021/"
filepath = os.path.abspath(DIR_1) + "/"
files = os.listdir(filepath)
lst_df = []

# Open all the files in the folder and concatenate them
for file in files:
     lst_df.append(match.load_trade_data(DIR_1 + file, start_month = True))
df_t = pd.concat(lst_df)
df_t.reset_index(inplace = True, drop = True)

In [4]:
# Differentiate LO cancellations and updates
match.clean_data(df)
# Model updates as a cancellation and a LO
df_1 = match.update_df(df)
# Repeat the process once again
match.clean_data(df_1)
df_2 = match.update_df(df_1)
match.clean_data(df_2)
# Delete orders done outside the 10 best quotes of the bid and the ask
df_2 = df_2[df_2["Quote"] != -999]
df_2.reset_index(inplace = True, drop = True)
# Differentiate cancellations and MO
matched_data = match.matching(df_2, df_t, criterion = "best matching", time_interval = 4)
matched_data.to_csv("../data/energia/order/new_best.csv")

Cleaning data...

Modifying update orders...

Cleaning data...

Modifying update orders...

Cleaning data...

Matching orders...
Number of orders without match : 1781, out of : 10293
