# Optimal Split Of Orders Across Liquidity Pools

In this notebook, we address the challenge of optimally distributing orders across dark pools to enhance trading efficiency and reduce market impact. Dark pools, which are private trading venues where large orders can be executed discreetly, offer key benefits such as minimizing price slippage and lowering market impact. However, effectively managing order allocation across multiple dark pools is a complex problem that involves careful evaluation of liquidity, execution costs, and timing.

## Dark Pool Definitino / Modelling :

The principle of a Dark Pool is the following : 
- It proposes a bid price with no guarantee of executed quantity at the occasion of an OTC transaction.
- Usually this price is lower than the bid price offered on the regular market.

The impact of $N$ dark pools ($N>1$) on a given transaction can be modeled as follows : 
- $V>0$ be the random volume to be executed
- $\theta \in )0, 1($ be the discount factor proposed by the dark pool $i$.
- $r_i$ denote the percentage $V$ sent to the dark pool $i$ for execution
- $D_i \geq 0$ the quantity of securities that can be delivered (or mase available) by the dark pool $i$ at the dicounted price $\theta_i S$

The reminder of the order is to be executed on the regular market, at price $S$. 

## Cost of the executed order

The **cost** of the whole executed order across the $N$ dark pools is then given by : 



$$C = S \sum_{i=1}^{N} \theta_i \min(r_i V, D_i) + S \left( V - \sum_{i=1}^{N} \min(r_i V, D_i) \right)$$


$$= S \left( V - \sum_{i=1}^{N} \rho_i \min(r_i V, D_i) \right)$$

where

$$\rho_i = 1 - \theta_i \in (0,1), \quad i = 1, \dots, N$$

## Maximization Problem


**Minimizing the mean execution cost**, *given the price $S$*, amounts to solving the following **maximization problem**:


$$\max \left\{ \sum_{i=1}^{N} \rho_i \mathbb{E}(S \min(r_i V, D_i)), r \in \mathcal{P}_N \right\}$$

where

$$\mathcal{P}_N := \left\{ r = (r_i)_{1 \leq i \leq N} \in \mathbb{R}_+^N \mid \sum_{i=1}^{N} r_i = 1 \right\}$$

It is then convenient to include the price *$S$ into both random variables $V$ and $D_i$* by considering:


$$\widetilde{V} := V S \quad \text{and} \quad \widetilde{D_i} := D_i S$$

instead of $V$ and $D_i$.

## Resolution - Optimal allocation of orders among N dark pools


We set for every $r = (r_1, \dots, r_N) \in \mathcal{P}_N$,


$$\Phi(r_1, \dots, r_N) :=\sum_{i=1}^{N} \varphi_i(r_i)$$

where for every $i \in I_N = \{1, \dots, N\}$,


$$\varphi_i(u) := \rho_i \mathbb{E}[\min(u V, D_i)], \quad u \in [0,1]$$

We can formally extend $\Phi$ on the whole affine hyperplane spanned by $\mathcal{P}_N$, i.e.,


$$\mathcal{H}_N := \left\{ r = (r_1, \dots, r_N) \in \mathbb{R}^N \mid \sum_{i=1}^{N} r_i = 1 \right\}$$





In [55]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from sklearn.linear_model import LinearRegression

In [56]:
data_folder = "/Users/antoinehedouis/Documents/Télécom Paris/3A/semestre 2/ML trading /TP Machine Learning & Optimal Trading/StockData"
files = os.listdir(data_folder)
DATAS = {}
for f in files:
    DATAS[f.replace(".h5","").lower()] = pd.read_hdf(data_folder+"/"+f)

DATAS['lvmh'].head()

Unnamed: 0_level_0,TradedPrice,TradedQty,BidPrice,AskPrice,BidQty,AskQty,TradedSign
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2011-01-03 09:00:01.616,124.5,75,124.0,124.05,52600,63900,2
2011-01-03 09:00:01.616,124.5,42,124.0,124.05,52600,63900,2
2011-01-03 09:00:01.616,124.5,12,124.0,124.05,52600,63900,2
2011-01-03 09:00:01.616,124.5,1,124.0,124.05,52600,63900,2
2011-01-03 09:00:01.616,124.5,3,124.0,124.05,52600,63900,2


In [57]:
FR_stocks = {'bouygues','lvmh','sanofi','total'}
US_stocks = {'amazon','apple','google'}
JPN_stocks = {'canon','panasonic','sony'}

FR_schedule = pd.date_range(start = "09:00",end = "17:30",freq = "5min").strftime("%H:%M")
US_schedule = pd.date_range(start = "09:30",end = "16:00",freq = "5min").strftime("%H:%M")
JPN_schedule= pd.date_range(start = "09:00",end = "15:00",freq = "5min").strftime("%H:%M")

In [58]:
datas_2W = {stk:DATAS[stk][(DATAS[stk].index.month == 3) & (DATAS[stk].index.day <= 15)] for stk in FR_stocks}
for stk in FR_stocks:
    datas_2W[stk]['MidPrice'] = 0.5 * (datas_2W[stk].BidPrice + datas_2W[stk].AskPrice)
datas_2W['lvmh']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  datas_2W[stk]['MidPrice'] = 0.5 * (datas_2W[stk].BidPrice + datas_2W[stk].AskPrice)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  datas_2W[stk]['MidPrice'] = 0.5 * (datas_2W[stk].BidPrice + datas_2W[stk].AskPrice)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  datas_2W[stk]['MidPrice'] = 0.5 * (da

Unnamed: 0_level_0,TradedPrice,TradedQty,BidPrice,AskPrice,BidQty,AskQty,TradedSign,MidPrice
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2011-03-01 09:00:01.220,115.20,35,115.0,115.2,1900,107600,1,115.1
2011-03-01 09:00:01.220,115.20,458,115.0,115.2,1900,107600,1,115.1
2011-03-01 09:00:01.220,115.20,213,115.0,115.2,1900,107600,1,115.1
2011-03-01 09:00:01.220,115.20,5,115.0,115.2,1900,107600,1,115.1
2011-03-01 09:00:01.220,115.20,115,115.0,115.2,1900,107600,1,115.1
...,...,...,...,...,...,...,...,...
2011-03-15 17:29:55.765,104.25,215,104.3,104.5,113100,110900,-2,104.4
2011-03-15 17:29:55.765,104.25,200,104.3,104.5,113100,110900,-2,104.4
2011-03-15 17:29:55.770,104.25,239,104.3,104.5,113100,110900,-2,104.4
2011-03-15 17:29:55.770,104.25,3665,104.3,104.5,113100,110900,-2,104.4


In [59]:
Asset = 'sanofi'
pools = [stk for stk in FR_stocks if stk != Asset]
T = "5Min"
traded_qty_T = {stk:datas_2W[stk].TradedQty.resample(T).sum() for stk in FR_stocks}
traded_qty_mean = {stk:datas_2W[stk].TradedQty.resample(T).mean() for stk in FR_stocks}
resampled = {stk:pd.DataFrame({'Traded_Qty_Sum':traded_qty_T[stk],'Traded_Qty_Mean':traded_qty_mean[stk]}) for stk in FR_stocks}
for key,val in traded_qty_T.items():
    print(key, " : \n", val)

total  : 
 Time
2011-03-01 09:00:00    161729
2011-03-01 09:05:00     25352
2011-03-01 09:10:00     33412
2011-03-01 09:15:00     30037
2011-03-01 09:20:00     22054
                        ...  
2011-03-15 17:05:00    124949
2011-03-15 17:10:00    118922
2011-03-15 17:15:00    170499
2011-03-15 17:20:00    122560
2011-03-15 17:25:00    176210
Freq: 5T, Name: TradedQty, Length: 4134, dtype: int32
bouygues  : 
 Time
2011-03-01 09:00:00     9966
2011-03-01 09:05:00     8318
2011-03-01 09:10:00     5684
2011-03-01 09:15:00    18543
2011-03-01 09:20:00     3108
                       ...  
2011-03-15 17:05:00    26567
2011-03-15 17:10:00    32744
2011-03-15 17:15:00    25606
2011-03-15 17:20:00    34572
2011-03-15 17:25:00    53352
Freq: 5T, Name: TradedQty, Length: 4134, dtype: int32
sanofi  : 
 Time
2011-03-01 09:00:00     54821
2011-03-01 09:05:00     15004
2011-03-01 09:10:00      9765
2011-03-01 09:15:00     17556
2011-03-01 09:20:00     10058
                        ...  
2011-03-15 

In [85]:
alpha_beta = {}

alpha_beta['bouygues'] = [0.4,0.1]
alpha_beta['total'] =  [0.6,0.2]
alpha_beta['lvmh'] = [0.8,0.3]
for stk in pools:
    print( traded_qty_T[stk],  traded_qty_T[Asset])
    d_stk = alpha_beta[stk][1]*((1 - alpha_beta[stk][0]) * traded_qty_T[Asset] + alpha_beta[stk][0] * traded_qty_T[stk] * (traded_qty_mean[Asset]/traded_qty_mean[stk]))
    resampled[stk]['D'] = d_stk
    #print(d_stk)

resampled['sanofi']['D'] = sum(resampled[stk]['D'] for stk in pools)#.mean()



Time
2011-03-01 09:00:00    161729
2011-03-01 09:05:00     25352
2011-03-01 09:10:00     33412
2011-03-01 09:15:00     30037
2011-03-01 09:20:00     22054
                        ...  
2011-03-15 17:05:00    124949
2011-03-15 17:10:00    118922
2011-03-15 17:15:00    170499
2011-03-15 17:20:00    122560
2011-03-15 17:25:00    176210
Freq: 5T, Name: TradedQty, Length: 4134, dtype: int32 Time
2011-03-01 09:00:00     54821
2011-03-01 09:05:00     15004
2011-03-01 09:10:00      9765
2011-03-01 09:15:00     17556
2011-03-01 09:20:00     10058
                        ...  
2011-03-15 17:05:00     79219
2011-03-15 17:10:00    105930
2011-03-15 17:15:00     62523
2011-03-15 17:20:00     73409
2011-03-15 17:25:00     78694
Freq: 5T, Name: TradedQty, Length: 4134, dtype: int32
Time
2011-03-01 09:00:00     9966
2011-03-01 09:05:00     8318
2011-03-01 09:10:00     5684
2011-03-01 09:15:00    18543
2011-03-01 09:20:00     3108
                       ...  
2011-03-15 17:05:00    26567
2011-03-15 17:

In [80]:
resampled["sanofi"]

Unnamed: 0_level_0,Traded_Qty_Sum,Traded_Qty_Mean,D
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2011-03-01 09:00:00,54821,365.473333,54733.286400
2011-03-01 09:05:00,15004,230.830769,11338.407385
2011-03-01 09:10:00,9765,191.470588,10975.094118
2011-03-01 09:15:00,17556,201.793103,17806.223448
2011-03-01 09:20:00,10058,137.780822,7671.636164
...,...,...,...
2011-03-15 17:05:00,79219,208.471053,39767.938000
2011-03-15 17:10:00,105930,208.113949,46292.866798
2011-03-15 17:15:00,62523,237.730038,39548.769125
2011-03-15 17:20:00,73409,151.047325,32142.870782
