1. Resample the data: keep the last orderbook state of every 1 second. This is denoted as L(t_n). L(t_n) is the limit order book state which can be observed by our hypothetical agent. Set the first state as L(t_0).
2. Set the agent's position (denoted as \xi(t_n)) at L(t_0) as NULL.
3. The next order submitted in messagebook right after L(t_n) is regarded as an action taken by our hypothetical agent, denoted by U(t_n). 
    - If the order type is 1, it means that our agent posts an Limit Order and this will update the agent's position (append tuple (p,v) to our agent's position, where p and v is the price and size of this limit order.).
    - If the order type is 2 or 3, which means it is a Cancel Order, then check the order id to determine whether the canceled order is posted by our hypothetical agent (if yes, the order id should appear in previous actions). 
        - If yes, reduce the canceled volume of the corresponding price from our agent's position. When the volume reduces to 0, delete this tuple.
        - If not, set the agent's action at this time to 'No action'.
    - If the order type is 4, which means it is a Market Order, then check the order id to determine whether it is posted by our hypothetical agent (if yes, the order id should not appear in agent's current position, because our hypothetical agent's market order shouldn't hit its own remaining limit order). 
        - If not in, update our hypothetocal agent's current cash and inventory, record the action.
        - If in, set the action at this time to 'No action', update our agent's current cash, inventory and position (reduce the executed volume of the corresponding price from our agent's position).
4. Besides our hypothetical agent's actions, we also need to deal other orders submitted to the limit order book one by one:
    - If the order type is 1, which means this is a Limit order posted by other participants. Do nothing.
    - If the order type is 2 or 3, which means this is a Cancel Order. Check the order id to determine whether the canceled order is posted by our hypothetical agent (if yes, the order id should appear in previous actions). 
        - If yes, reduce the canceled volume of the corresponding price from our agent's position. When the volume reduces to 0, delete this tuple.
        - If not, do nothing.
    - If the order type is 4, which means this is a Market Order. Check the order id to determine whether it hits the limit order posted by our agent (If yes, the order id should appear in current position).
        - If in, update our agent's current cash, inventory and position (reduce the executed volume of the corresponding price from our agent's position).
        - If not, do nothing.
5. Every time our agent's remaining position is beyond the top 2 levels, it gets canceled immediately.

In [1]:
import pandas as pd

def load_original():
    # Load and preprocess the original data
    message_file = 'LOBSTER_SampleFile_AAPL_2012-06-21_5/AAPL_2012-06-21_34200000_57600000_message_5.csv'
    orderbook_file = 'LOBSTER_SampleFile_AAPL_2012-06-21_5/AAPL_2012-06-21_34200000_57600000_orderbook_5.csv'
    
    message_org = pd.read_csv(
        filepath_or_buffer=message_file,
        names=['Time', 'Type', 'Order ID', 'Size', 'Price', 'Direction']
    )

    orderbook_org = pd.read_csv(
        filepath_or_buffer=orderbook_file,
        names=['A1P', 'A1V', 'B1P', 'B1V', 'A2P', 'A2V', 'B2P', 'B2V', 
               'A3P', 'A3V', 'B3P', 'B3V', 'A4P', 'A4V', 'B4P', 'B4V',
               'A5P', 'A5V', 'B5P', 'B5V']
    ).loc[:, ['A2P', 'A2V', 'A1P', 'A1V', 'B1P', 'B1V', 'B2P', 'B2V']]

    # drop duplicates (Some changes happen outside the top 2 levels)
    orderbook_org_shifted = orderbook_org.shift()
    orderbook_org_unique = orderbook_org[orderbook_org.ne(orderbook_org_shifted).any(axis=1)]
    orderbook_org = orderbook_org_unique
    message_org = message_org.loc[orderbook_org.index]  #Sample the messagebook accordingly 

    # Convert 'Time' column to nanoseconds timestamp
    message_org['Time'] = message_org['Time'] * 1e9
    message_org['Time'] = pd.to_datetime(message_org['Time'], unit='ns')
    # Set 'Time' as index for both dataframes
    message_org.set_index('Time', inplace=True)
    orderbook_org['Time'] = message_org.index
    orderbook_org.set_index('Time', inplace=True)
    data = pd.concat([message_org,orderbook_org],axis=1)
    data['Seq'] = range(0,len(data))
    orderbook_org['Seq'] = range(0,len(orderbook_org))
    message_org['Seq'] = range(0,len(message_org))
    return orderbook_org, message_org, data

In [2]:
def orderbook_resampling(orderbook_org):
    df = orderbook_org.copy()
    # Resample orderbook to 1s intervals by taking the last state in each interval
    orderbook_resampled_1s = df.resample('1s').last().dropna()['Seq']
    # Combine the indices of mid-price changes and 1s intervals
    df.set_index('Seq',inplace=True)
    orderbook_resampled = df.loc[orderbook_resampled_1s,:]
    return orderbook_resampled

In [3]:
def get_actions_positions_cash_invt(orderbook_org, message_org, orderbook_resampled):
    orderbook = orderbook_org.set_index('Seq')
    messagebook = message_org.set_index('Seq')
    # Initialize agent's position, cash, and inventory
    agent_position = {}
    agent_cash = 1e9
    agent_inventory = 1e6

    # Track the agent's actions, positions, cash and inventory
    actions = pd.DataFrame(columns=['Seq', 'Action', 'Order ID', 'Price', 'Volume', 'Type','Direction'])
    positions = []
    cash = []
    inventory = []

    # Function to check and cancel positions beyond the top 2 levels
    def cancel_positions_beyond_top2(lob_state, agent_position):
        level2_ask_price = lob_state['A2P']
        level2_bid_price = lob_state['B2P']
        new_agent_position = agent_position.copy()
        for order_id, (price, _) in agent_position.items():
            if (price < level2_bid_price or price > level2_ask_price):
                del new_agent_position[order_id]
        return new_agent_position

    # Iterate through each resampled time point
    for t,i in zip(orderbook_resampled.head(-1).index, range(len(orderbook_resampled)-1)):
        positions.append(agent_position)
        cash.append(agent_cash)
        inventory.append(agent_inventory)
        
        # Get the next order in the messagebook right after L(t_n)
        next_order = messagebook.loc[t+1]
        seq = next_order.name
        order_id = next_order['Order ID']
        order_type = next_order['Type']
        price = next_order['Price']
        size = next_order['Size']
        direction = next_order['Direction']

        if order_type == 1:
            # Agent posts a limit order
            agent_position[order_id] = (price, size)
            new_action = pd.DataFrame({
                'Seq': seq,
                'Action': 'Limit Order',
                'Order ID': order_id,
                'Price': price,
                'Volume': size,
                'Type': order_type,
                'Direction': direction
            }, index=[0]
            )
            actions = pd.concat([actions,new_action],ignore_index=True)
        elif order_type in [2, 3]:
            # Cancel Order
            if order_id in agent_position:
                current_price, current_volume = agent_position[order_id]
                new_volume = current_volume - size
                if new_volume > 0:
                    agent_position[order_id] = (current_price, new_volume)
                else:
                    del agent_position[order_id]
                new_action = pd.DataFrame({
                    'Seq': seq,
                    'Action': 'Cancel Order',
                    'Order ID': order_id,
                    'Price': price,
                    'Volume': size,
                    'Type': order_type,
                    'Direction': direction
                }, index=[0]
                )
                actions = pd.concat([actions,new_action],ignore_index=True)
            else:
                new_action = pd.DataFrame({
                    'Seq': seq,
                    'Action': 'No action',
                    'Order ID': None,
                    'Price': None,
                    'Volume': None,
                    'Type': None,
                    'Direction': None
                }, index=[0]
                )
                actions = pd.concat([actions,new_action],ignore_index=True)
        elif order_type == 4:
            # Market Order
            if order_id in agent_position:
                # Market order by other participants and it hits our agent's position
                hit_orders = [oid for oid, (p, v) in agent_position.items() if p == price]
                oid = hit_orders[0]
                current_price, current_volume = agent_position[oid]
                new_volume = current_volume - size
                agent_cash -= size * price * direction / 10000  # Update cash (price scaled by 10000)
                agent_inventory += size * direction  # Update inventory
                if new_volume > 0:
                    agent_position[oid] = (current_price, new_volume)
                else:
                    del agent_position[oid]
                new_action = pd.DataFrame({
                    'Seq': seq,
                    'Action': 'No action',
                    'Order ID': None,
                    'Price': None,
                    'Volume': None,
                    'Type': None,
                    'Direction': None
                }, index=[0]
                )
                actions = pd.concat([actions,new_action],ignore_index=True)
            else:
                # Market order by our agent
                agent_cash += size * price * direction / 10000  # Update cash (price scaled by 10000)
                agent_inventory -= size * direction  # Update inventory
                new_action = pd.DataFrame({
                    'Seq': seq,
                    'Action': 'Market Order',
                    'Order ID': order_id,
                    'Price': price,
                    'Volume': size,
                    'Type': order_type,
                    'Direction': direction
                }, index=[0]
                )
                actions = pd.concat([actions,new_action],ignore_index=True)
            
        # Check and cancel positions beyond the top 2 levels
        agent_position = cancel_positions_beyond_top2(orderbook.loc[seq], agent_position)


        # Check other orders in messagebook that may affect the agent's position, cash and inventory
        other_orders = messagebook[(messagebook.index > t+1) & (messagebook.index <= orderbook_resampled.iloc[i+1].name)]
        for seq, order in other_orders.iterrows():
            order_id = order['Order ID']
            order_type = order['Type']
            price = order['Price']
            size = order['Size']
            direction = order['Direction']

            if order_type == 1:
                # Limit order by other participants, do nothing
                pass
            elif order_type in [2, 3]:
                # Cancel Order
                if order_id in agent_position:
                    current_price, current_volume = agent_position[order_id]
                    new_volume = current_volume - size
                    if new_volume > 0:
                        agent_position[order_id] = (current_price, new_volume)
                    else:
                        del agent_position[order_id]
            elif order_type == 4:
                # Market Order
                if order_id in agent_position:
                    # Market order by other participants and it hits our agent's position
                    hit_orders = [oid for oid, (p, v) in agent_position.items() if p == price]
                    oid = hit_orders[0]
                    current_price, current_volume = agent_position[oid]
                    new_volume = current_volume - size
                    agent_cash -= size * price * direction / 10000  # Update cash (price scaled by 10000)
                    agent_inventory += size * direction  # Update inventory
                    if new_volume > 0:
                        agent_position[oid] = (current_price, new_volume)
                    else:
                        del agent_position[oid]
            
            # Check and cancel positions beyond the top 2 levels
            agent_position = cancel_positions_beyond_top2(orderbook.loc[seq], agent_position)

    return actions, positions, cash, inventory

In [4]:
orderbook_org, message_org, _ = load_original()
orderbook_resampled = orderbook_resampling(orderbook_org)
actions, positions, cash, inventory = get_actions_positions_cash_invt(orderbook_org, message_org, orderbook_resampled)

In [5]:
print(orderbook_resampled.reset_index())

          Seq      A2P   A2V      A1P   A1V      B1P  B1V      B2P   B2V
0          53  5858900   100  5858700   100  5857400  100  5857100    18
1          94  5857800    18  5857700    18  5854700   18  5854600     5
2         131  5856900    16  5856800    18  5854700  100  5854400   167
3         160  5858000   100  5857100   100  5854500   18  5854400   167
4         184  5858000   100  5856800   900  5855000   18  5854900    18
...       ...      ...   ...      ...   ...      ...  ...      ...   ...
17215  161465  5776700  1200  5776400   943  5775500  656  5775400   410
17216  161486  5776700  1200  5776400   716  5776000  400  5775500   356
17217  161496  5776800   200  5776700  1300  5776000  500  5775800   100
17218  161514  5776800   200  5776700  1062  5775900  150  5775500   256
17219  161540  5776800   200  5776700   300  5775400  410  5775300  1400

[17220 rows x 9 columns]


In [6]:
print(actions)

          Seq        Action   Order ID    Price Volume  Type Direction
0          54     No action       None     None   None  None      None
1          95  Market Order   16396043  5854700     18     4         1
2         132     No action       None     None   None  None      None
3         161   Limit Order   16632807  5854700     18     1         1
4         185  Market Order   16675969  5856800     50     4        -1
...       ...           ...        ...      ...    ...   ...       ...
17214  161448   Limit Order  286923136  5775900    900     1        -1
17215  161466  Market Order  286914464  5776400    227     4        -1
17216  161487  Market Order  286914464  5776400    200     4        -1
17217  161497  Market Order  287028109  5776000    170     4         1
17218  161515   Limit Order  287112852  5776700    200     1        -1

[17219 rows x 7 columns]


In [7]:
print(pd.Series(positions))

0                                 {}
1                                 {}
2                                 {}
3          {16632807: (5854700, 18)}
4                                 {}
                    ...             
17214    {286923136: (5775900, 900)}
17215                             {}
17216                             {}
17217                             {}
17218    {287112852: (5776700, 200)}
Length: 17219, dtype: object


In [8]:
print(pd.Series(cash))

0        1.000000e+09
1        1.000000e+09
2        1.000011e+09
3        1.000011e+09
4        1.000011e+09
             ...     
17214    9.966351e+08
17215    9.971549e+08
17216    9.970238e+08
17217    9.969082e+08
17218    9.970064e+08
Length: 17219, dtype: float64


In [9]:
print(pd.Series(inventory))

0        1000000.0
1        1000000.0
2         999982.0
3         999982.0
4         999982.0
           ...    
17214    1005731.0
17215    1004831.0
17216    1005058.0
17217    1005258.0
17218    1005088.0
Length: 17219, dtype: float64
