# Generating synthetic metaorders from synthetic trade data

<div style="font-size:120%">

We provide below the code used in [Generating realistic metaorders from synthetic data](https://arxiv.org/pdf/2503.18199).  
As highlighted in the article, generating synthetic metaorders from a synthetic price does **not** yield the observed square-root law (SQL). However, replacing the synthetic price with **real trade data** - while keeping the same algorithm — allows one to recover the results presented in the paper. 

When working with real trade data, we recommend the following preprocessing steps:

- Rescale trade sizes by the corresponding daily traded volume  
- Rescale prices by taking the logarithm and dividing by the daily volatility

</div>


In [150]:
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from scipy.stats import powerlaw

# Formatting the data

We expect the trade data to include at least the following columns:
- `timestamp` (with both date and time)
- `sign` or `side` (indicating the trade direction)
- `quantity` (trade size)
- `midprice` (current mid price at the beginning of the trade)


The trade sign can be determined by labelling a trade as a buy (or sell) if its price is higher (or lower) than the midprice. The SQL also holds when using trade prices directly instead of the midprice.



In [None]:
### Generate a synthetic price for example
timestamp = pd.date_range(start='2024-01-01', end='2024-01-08',freq='1s')
nb_trades = len(timestamp)
sign = 2*np.random.randint(0,2,nb_trades)-1
qty = np.random.randint(10,100,nb_trades)   
price = np.cumsum(sign*vol**0.5)
price = price - np.min(price) + 1
trades = pd.DataFrame({'timestamp':timestamp,'sign':sign,'quantity':qty,'midprice':price})

trades.head()


Unnamed: 0,timestamp,sign,quantity,midprice
0,2024-01-01 00:00:00,1,15,2175.223057
1,2024-01-01 00:00:01,1,64,2179.695193
2,2024-01-01 00:00:02,-1,48,2176.532915
3,2024-01-01 00:00:03,-1,37,2168.917142
4,2024-01-01 00:00:04,-1,27,2164.226726


Your data should follow the format illustrated above.
Next, we normalize it using the daily volatility and the daily traded volume:

In [197]:
trades['day'] = trades['timestamp'].dt.date

# Calculate daily volatility and volume
daily_stats = trades.groupby('day').agg(
    DailySigma=('midprice', lambda x: (x.max() - x.min()) / x.iloc[0]),  # (max - min) / open
    DailyVolume=('quantity', 'sum')  # Sum of quantity for each day
).reset_index()


trades = trades.merge(daily_stats, on='day', how='left')
trades['BeginMid'] = np.log(trades['midprice']) / trades['DailySigma']
trades['RescaledVolume'] = trades['quantity'] / trades['DailyVolume']

# Generating synthetic metaorders

In [None]:
def generate_meta(df,nb_traders,kind,exponent):
    '''
    Generate metaorders from trades
    Inputs :
    df : DataFrame with trades
    nb_traders : number of traders
    kind : type of distribution for the trader's frequency
    exponent : exponent of the distribution (if kind = 'power')

    Outputs :
    metaorders : DataFrame with synthetic metaorders
    '''

    ### Attribute to each trade a trader thanks to a mapping function
    df['trader'] = mapping_function(df,nb_traders,kind,exponent) 

    ### Aggregate metaorders, defined as a sequence of trades with the same sign from the same trader 
    sorted_trades  = df.sort_values(['trader','timestamp']).reset_index(drop=True)
    sorted_trades['metaid'] = np.where((sorted_trades['trader']!=sorted_trades['trader'].shift())|( sorted_trades.sign.shift()!= sorted_trades.sign)|(sorted_trades.day!= sorted_trades.day.shift()),1,0).cumsum() # In our definition, a metaorder should be executed within a single day 


    ### Compute the metaorder features
    sorted_trades['EndMid'] = sorted_trades['BeginMid'].shift(-1) # End Mid price of the metaorder 
    sorted_trades['EndTime'] = sorted_trades['timestamp'].shift(-1) # End Time of the metaorder  
    sorted_trades = sorted_trades.dropna()
    sorted_trades = sorted_trades.groupby('metaid').agg({'sign':'first','BeginMid':'first','EndMid':'last',
                                 'trader':'count','timestamp':'first','EndTime':'last','RescaledVolume':'sum'}).reset_index(drop=True)
    sorted_trades = sorted_trades.rename(columns={'trader': 'NbChild','RescaledVolume':'MetaVolume','timestamp':'BeginTime'})
    sorted_trades['MetaDuration'] = (sorted_trades.EndTime - sorted_trades.BeginTime).dt.total_seconds()
    sorted_trades['MetaImpact'] = (sorted_trades.EndMid-sorted_trades.BeginMid)*sorted_trades.sign
    

    #Return the metaorders with more than one child
    metaorders = sorted_trades[sorted_trades.NbChild > 1].reset_index(drop=True)
    print('Number of metaorders ',len(sorted_trades))
    return metaorders


def mapping_function(df,nb_traders,kind,alpha) :
    '''
    Given a trading frequency distribution, generate a list of traders for each trade
    Inputs :
    df : DataFrame with trades
    nb_traders : number of traders  
    kind : type of distribution for the trader's frequency
    alpha : exponent of the distribution (if kind = 'power')

    Outputs :
    traders : list of traders for each trade
    '''
    ### Choose a trading frequency distribution
    if kind == 'power':
        samples = powerlaw.rvs(int(alpha), size=int(nb_traders))
    if kind =='uniform':
        samples = np.ones(nb_traders)
    frequencies = samples / samples.sum()
    cum_freq = np.cumsum(frequencies)
    traders = []

    ### Assign traders to trades
    for _ in range (len(df)):
        u = random.random()
        trader_index = np.searchsorted(cum_freq,u)
        traders.append(f"Trader {trader_index+1}")
    return traders


In [200]:
#Example usage
nb_traders = 10
kind = 'power'
exponent = 2.5

synthetic_meta = generate_meta(trades,nb_traders,kind,exponent)
synthetic_meta.head()

Number of metaorders  302204


Unnamed: 0,sign,BeginMid,EndMid,NbChild,BeginTime,EndTime,MetaVolume,MetaDuration,MetaImpact
0,-1,4.996743,4.994272,2,2024-01-01 00:00:03,2024-01-01 00:00:16,1.4e-05,13.0,0.002471
1,1,4.99038,4.997663,2,2024-01-01 00:01:09,2024-01-01 00:01:32,2.3e-05,23.0,0.007283
2,-1,4.997663,5.01255,2,2024-01-01 00:01:32,2024-01-01 00:02:07,1.4e-05,35.0,-0.014887
3,1,5.01255,5.000845,4,2024-01-01 00:02:07,2024-01-01 00:02:31,5.6e-05,24.0,-0.011705
4,-1,5.000845,5.005189,2,2024-01-01 00:02:31,2024-01-01 00:02:37,2.3e-05,6.0,-0.004343
