In [1]:
import pandas as pd
import sys, os
sys.path.append("../main/")
import match
import ZI
import matplotlib.pyplot as plt

## Matching ordini

Per prima cosa carico i dati del dataframe contenente lo stato del LOB e il DataFrame dei trades
utilizzando le funzioni load_data e load_trade data (più in là faccio un notebook in cui spiego come utilizzarle).

In [2]:
# carica dataframe LOB del 01/10/2021
DIR = "../data/LOB_01_10.csv"
# per funzionare correttamente devo utilizzare l'absolute path
filepath = os.path.abspath(DIR)
df_o = match.load_data(filepath, start_month = True)


# carica dataframe trade del 01/10/2021
DIR_1 = "../data/trade_01_10.csv"
filepath = os.path.abspath(DIR_1)
df_t = match.load_trade_data(filepath, start_month = True)

In [3]:
df_o.head()

Unnamed: 0,Datetime,BidPrice_0,BidVolume_0,AskPrice_0,AskVolume_0,BidPrice_1,BidVolume_1,AskPrice_1,AskVolume_1,BidPrice_2,...,BidVolume_8,AskPrice_8,AskVolume_8,BidPrice_9,BidVolume_9,AskPrice_9,AskVolume_9,MidPrice,Spread,Seconds
0,2021-10-01 06:00:01.630,10850.0,1.0,14000.0,1.0,5200.0,1.0,0.0,0.0,4800.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12425.0,3150.0,21601.63
1,2021-10-01 06:00:19.222,10850.0,1.0,14000.0,1.0,9380.0,1.0,0.0,0.0,5200.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12425.0,3150.0,21619.222
2,2021-10-01 06:02:37.526,13000.0,1.0,14000.0,1.0,10850.0,1.0,0.0,0.0,9380.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13500.0,1000.0,21757.526
3,2021-10-01 06:03:29.627,13000.0,1.0,14000.0,2.0,10850.0,1.0,0.0,0.0,9380.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13500.0,1000.0,21809.627
4,2021-10-01 06:03:56.548,13000.0,1.0,13900.0,1.0,10850.0,1.0,14000.0,1.0,9380.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13450.0,900.0,21836.548


Successivamente ho ricavato dallo stato del book usando la funzione clean_data le seguenti informazioni:

1. Volume degli ordini
2. Prezzo degli ordini
3. Segno dell'ordine
4. Quota a cui è stato fatto l'ordine
5. Il tipo di ordine:
    - 0.0: Ordine eseguito al di fuori delle 10 migliori quote dell'ask e del bid
    - Limit: Limit order
    - Market/Cancel: Ordine che può essere sia un trade che una cancellazione

In [4]:
new_df = match.clean_data(df_o)

La funzione clean_data prende in input il DataFrame del LOB e ritorna il seguente DataFrame

In [5]:
new_df.head()

Unnamed: 0,Price,Volume,Sign,Quote,Type,DateTime,Seconds,Spread,MidPrice,AskVolume_0,BidVolume_0
0,9380.0,1.0,1.0,1,Limit,2021-10-01 06:00:19.222,21619.222,3150.0,12425.0,1.0,1.0
1,13000.0,1.0,1.0,0,Limit,2021-10-01 06:02:37.526,21757.526,1000.0,13500.0,1.0,1.0
2,14000.0,1.0,-1.0,0,Limit,2021-10-01 06:03:29.627,21809.627,1000.0,13500.0,2.0,1.0
3,13900.0,1.0,-1.0,0,Market/Cancel,2021-10-01 06:03:56.548,21836.548,900.0,13450.0,1.0,1.0
4,13700.0,2.0,-1.0,0,Limit,2021-10-01 06:05:42.208,21942.208,700.0,13350.0,2.0,1.0


In [6]:
new_df["Type"].value_counts()

Limit            5921
Market/Cancel    5150
NoBest           1138
Name: Type, dtype: int64

Infine per fare il match degli ordini utilizzo la funzione matching:

### Input

1. order_df: pd.DataFrame
    - DataFrame degli ordini pulito con la funzione clean_data.
2. trade_df: pd.DataFrame
    - DataFrame dei trades.
3. criterion: {"time", "time price", "time volume", "time" } (default = "time"):
    - Criterio considerato per fare matching.
4. time_interval: int (default = 5)
    - Considero come candidati per il matching tutti gli ordini in un intervallo di +- time interval secondi da un trade.
    
### Output
1. match_df: pd.DataFrame
    - DataFrame in cui ho differenziato trades e cancellazioni.
2. no_match: int
    - Numero di trades per cui non è stato possibile trovare un match

In [7]:
matched_data, no_match = match.matching(new_df, df_t, criterion = "time price sign", time_interval = 4)

In [8]:
matched_data["Type"].value_counts()

Limit     5921
Cancel    4685
NoBest    1138
Market     465
Name: Type, dtype: int64

In [9]:
print(f"Numero di trades senza match: {no_match}")

Numero di trades senza match: 245


Una volta fatto il matching degli ordini posso stimare i parametri del modello ZI

In [10]:
X_lo = matched_data[(matched_data["Quote"] == 0) & (matched_data["Type"] == "Limit")]["Volume"]
spr = matched_data[(matched_data["Quote"] == 0) & (matched_data["Type"] == "Limit")]["Spread"]
X_mo = matched_data[matched_data["Type"] == "Market"]["Volume"]
X_c = matched_data[(matched_data["Quote"] == 0) & (matched_data["Type"] == "Cancel")]["Volume"]
V = (matched_data["AskVolume_0"].mean() + matched_data["BidVolume_0"].mean()) / 2
# stima parametri ZI
lam, mu, delta = ZI.estimate_parameters(X_lo, X_mo, X_c, spr, V)
print(f"lambda = {lam:.4f}, mu = {mu:.3f}, delta = {delta:.2f}")

lambda = 0.0029, mu = 0.061, delta = 0.14
