### Detecting insertion attacks by heuristics

According to Frontrunner-Jones insertion attacks can be detected the following way: 
 
A transfer event is triggered on the blockchain, whenever an ERC-20 token is traded.  
An Event combines the following transactional information E = (s,r,a,c,h,i,g):
- s: sender of tokens
- r: receiver of tokens
- a: number of transferred tokens
- c: contract address of token
- h: transaction hash
- i: transaction index
- g: gas price of transaction

Iterating block by block through all transfer events and checking if there are 3 events EA1, EV, EA2 for which the folloing 6 heuristics hold:  


#### Heuristic 1

Heuristic 1: 
- sender of EA1 must be identical to sender of EV and receiver of EA2
    - sA1 = sV = rA2 
- receiver of EA1 must be identical to sender of EA2.
    - rA1 = sA2 

#### Heuristics 2
- number of tokens bought by EA1 must be similar to the umber of tokens sold by EA2 ( difference of max 1%).

#### Heuristics 3
- token contract address of EA1, EV and EA2 must be identical
    - cA1 = cV2 = cA2 

#### Heuristics 4
- transaction hashes of EA1, EV and EA2 must be dissimilar
    - hA1 != hV != hA2

#### Heuristics 5
- transaction index of EA1 must be smaller than the transaction index of EV
- transaction index of EV must be smaller than the transaction index of EA2
    - iA1 < iV < i A2   

#### Heuristics 6
- the gas price of EA1 must be larger than the gas price of EV
- the gas price of EA2 must be less or equal to gas price of EV
    - gA1 > gV >= gA2 

### Implementation of Heuristics

In [6]:
from web3 import Web3
import pandas as pd


In [2]:
web3 = Web3(Web3.HTTPProvider("https://intensive-sly-mountain.quiknode.pro/a3f5256d7f2af6541d483cce3f1d49c94c01879e/"))
print("\033[92m"+str(web3.is_connected()))

[92mTrue


In [3]:
BLOCK_NUMBER = 5574870
TRANSFER = "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef" # ERC20 "Transfer"

events = web3.eth.filter({"fromBlock": BLOCK_NUMBER, "toBlock": BLOCK_NUMBER, "topics": [TRANSFER]}).get_all_entries()

### Helper Methods

In [19]:
def get_checksum_address_from_topics_hash(topics_hash):
    return web3.to_checksum_address(topics_hash.hex().replace("0x", "")[24:64])

In [22]:
def get_amount_of_tokens_from_data_hash(data_hash):
    return int(data_hash.hex().replace("0x", "")[0:64], 16)

Create a dictionary with contract-address of token as key and events as values.

In [144]:
def get_events_by_contract_address(events):

    events_by_address = {}
    
    for event in events:
        
        token_contract_address = event["address"]
        
        if token_contract_address in events_by_address:
            events_by_address[token_contract_address].append(event)
        else:
            events_by_address[token_contract_address] = [event]
            
    return events_by_address

events_by_address = get_events_by_contract_address(events)

Create a DF of form: contract-address | transactionIndex | logIndex | transaction  

Ignore all events, where not at least 3 events include contract address (A1, V, A2)

In [154]:
def create_df_of_events(events_by_address):


    df = pd.DataFrame(columns=['contractAddress',
                               'transactionIndex',
                               'logIndex',
                               'transactionHash',
                               'wallet',
                               'sender',
                               'receiver',
                               'gasPrice',
                               'amount'])
    
    for token_contract_address in events_by_address:
    
        nr_of_transactions_with_same_coin = len(events_by_address[token_contract_address])
        
        # At least 3 transactions (A1, V, A2)
        if nr_of_transactions_with_same_coin <= 2:
            continue
            
        if token_contract_address == "0xc0829421C1d260BD3cB3E0F06cfE2D52db2cE315":
            pass
        
        if token_contract_address != "0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB":
            pass
            
        for transaction in events_by_address[token_contract_address]:
            
            transaction_hash = transaction["transactionHash"].hex()
            tx_by_hash = web3.eth.get_transaction(transaction_hash)
    
    
            
            record = {
                "contractAddress": token_contract_address,
                "transactionIndex": transaction["transactionIndex"],
                "logIndex": transaction["logIndex"],
                "transactionHash": transaction_hash,
                "wallet": tx_by_hash["from"],
                "sender": get_checksum_address_from_topics_hash(transaction["topics"][1]),
                "receiver": get_checksum_address_from_topics_hash(transaction["topics"][2]),
                "gasPrice": tx_by_hash["gasPrice"] / 10 ** 9,
                "amount": get_amount_of_tokens_from_data_hash(transaction["data"])
            }    
            new_df = pd.DataFrame([record])
            df = pd.concat([df, new_df], ignore_index=True)
    
    return df
        
df_of_events = create_df_of_events(events_by_address)
df_of_events

Unnamed: 0,contractAddress,transactionIndex,logIndex,transactionHash,wallet,sender,receiver,gasPrice,amount
0,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,14,2,0x1382efde35b60306e4667a8e8c5591175d17f6071a14...,0x03747F06215B44E498831dA019B27f53E483599F,0x010bDcf31074B87627683d83950B3183a10b813f,0x03747F06215B44E498831dA019B27f53E483599F,53.2,558635231770000000000
1,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,17,9,0xc99265d20aa2c5fcd82068f048458a2ff8344c5ef0d2...,0x03747F06215B44E498831dA019B27f53E483599F,0xB9BAE58484cABbA0DC6F962d43c32c3E1A6f8730,0x03747F06215B44E498831dA019B27f53E483599F,53.2,13000000000000000000000
2,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,18,12,0x99362e748249ea80be0aaa4b146be52a60de16a9cbfb...,0x03747F06215B44E498831dA019B27f53E483599F,0xC7dC674c94BE61C881e26828EEd95AE2f724573B,0x03747F06215B44E498831dA019B27f53E483599F,53.2,91381873940000000000
3,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,20,17,0xde24a35414a1e0cf14a65da7cff7c96c4dc53c6f7316...,0x03747F06215B44E498831dA019B27f53E483599F,0x3dB5A42c3E675A127eF417f162F7559Dc252C00b,0x03747F06215B44E498831dA019B27f53E483599F,53.2,23185416250000000000
4,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,39,22,0xdc4b97c2940272b8d70e99919f8f23f0bc83803d5a9f...,0x358C7746f7eA9A7c3b26a59Ae5189516840BAE5f,0x358C7746f7eA9A7c3b26a59Ae5189516840BAE5f,0x692DA4782d996DAC7D66B5822f3c504f67dA8493,22.2,50922677800000004096
5,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,46,27,0xf8d1ac899bfa0b0f6a124f9b25ffd4844b5c8f281bbd...,0xA5224C410A14255c94B29381Ed96Be873A063FDb,0xA5224C410A14255c94B29381Ed96Be873A063FDb,0x639d9BA0f11Ff73A25c0a26849D4d5A7175169b6,20.0,3209000000000000000000
6,0x86Fa049857E0209aa7D9e616F7eb3b3B78ECfdb0,126,155,0xfe47534383b68c8e6a8b40bc5b95ae4c77ae8b2be755...,0xb795eA294044BBcC4B61c661C3CBe9aCB374EacA,0xb795eA294044BBcC4B61c661C3CBe9aCB374EacA,0x83761c6785427F5A27a07c92a9dcFa99947bC4AD,2.0,3838600000000000000000
7,0xf230b790E05390FC8295F4d3F60332c93BEd42e2,15,4,0xcfdf7e2afaaf5b67516c50ddba27f104420b85aec20b...,0x03747F06215B44E498831dA019B27f53E483599F,0x2Ff2AD5ff29D51771E8B32229319AA4be9096255,0x03747F06215B44E498831dA019B27f53E483599F,53.2,9783370106
8,0xf230b790E05390FC8295F4d3F60332c93BEd42e2,16,6,0x85244a96a6e2a6edcd6d138fbacdc0634b451feb0a0a...,0x03747F06215B44E498831dA019B27f53E483599F,0x49d1B466394D233589bBbF4e88bbbb4E010E1A68,0x03747F06215B44E498831dA019B27f53E483599F,53.2,74863000000
9,0xf230b790E05390FC8295F4d3F60332c93BEd42e2,147,164,0x20293f71b73be88ccae834f58d48baf6d52fe58fb37f...,0x76B2546B4A7f3b96018F88b86232f5044c33807b,0x76B2546B4A7f3b96018F88b86232f5044c33807b,0xFFE7Aa08dA702aCc45A619A3E1800ee8Fed2ba90,1.0,16000000


Get for each transactionIndex first sender/receiver and last sender/receiver

In [151]:
def create_df_grouped_by_transaction_index(df_of_events):


    df_final = pd.DataFrame(columns=['contractAddress',
                               'transactionIndex',
                               'transactionHash',
                               'wallet',
                               'first_sender',
                               'first_receiver',
                               'last_sender',
                               'last_receiver',
                               'gasPrice',
                               'amount'])
    
    unique_token_contract_addresses = df_of_events["contractAddress"].unique()
        
    df_grouped_by_contractAddress_and_transactionIndex = df_of_events.groupby(['contractAddress', 'transactionIndex'])['logIndex'].agg(['min', 'max']).reset_index()
    
    for token_contract_address in unique_token_contract_addresses:
        
        df_grouped_subset = df_grouped_by_contractAddress_and_transactionIndex[df_grouped_by_contractAddress_and_transactionIndex["contractAddress"] == token_contract_address]
        
        for index, row in df_grouped_subset.iterrows():
            
            transaction_index = row["transactionIndex"]
            min_log_index = row["min"]
            max_log_index = row["max"]
                        
            min_log_index_row = df_of_events[
                (df_of_events["contractAddress"] == token_contract_address) &
                (df_of_events["transactionIndex"] == transaction_index) &
                (df_of_events["logIndex"] == min_log_index)
            ]
            
            first_sender = min_log_index_row.iloc[0]["sender"]
            first_receiver = min_log_index_row.iloc[0]["receiver"]
            
            transaction_hash = min_log_index_row.iloc[0]["transactionHash"]
            wallet = min_log_index_row.iloc[0]["wallet"]
            gasPrice = min_log_index_row.iloc[0]["gasPrice"]
            amount = min_log_index_row.iloc[0]["amount"]
            
            
            max_log_index_row = df_of_events[
                (df_of_events["contractAddress"] == token_contract_address) &
                (df_of_events["transactionIndex"] == transaction_index) &
                (df_of_events["logIndex"] == max_log_index)
            ]
            
            last_sender = max_log_index_row.iloc[0]["sender"]
            last_receiver = max_log_index_row.iloc[0]["receiver"]
            
            record = {
                "contractAddress": token_contract_address,
                "transactionIndex": transaction_index,
                "transactionHash": transaction_hash,
                "wallet": wallet,
                "first_sender": first_sender,
                "first_receiver": first_receiver,
                "last_sender": last_sender,
                "last_receiver": last_receiver,
                "gasPrice": gasPrice,
                "amount": amount
            }    
            
            new_df = pd.DataFrame([record])
            df_final = pd.concat([df_final, new_df], ignore_index=True)
            
    return df_final

df_grouped_by_transaction_index = create_df_grouped_by_transaction_index(df_of_events)
df_grouped_by_transaction_index

Unnamed: 0,contractAddress,transactionIndex,transactionHash,wallet,first_sender,first_receiver,last_sender,last_receiver,gasPrice,amount
0,0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB,94,0xedee87cdea91b70805184a1dbd32f689b02ff6f40579...,0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C,0x238B7E54DfEE4d8e98b8D1A78AB40dd94349BcFd,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xC9D9c248A71e5573A4f446B825F915C3e1359239,6.0,2052115257102964823611722
1,0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB,96,0xff17087b1cde666c6bd5022167a633ddd43b6ee2c929...,0xfF1b9745f68F84F036E5e92c920038d895FB701A,0x238B7E54DfEE4d8e98b8D1A78AB40dd94349BcFd,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xdfb702D463B378DEEf627F29B34fBea0eEE16b63,5.98,2190274940592916726546929
2,0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB,98,0xb46caa3c254f3e2050818a1ce3ea6b01f96011b40325...,0x8FB6840a46a8D143DaC1301F560976b953a095C5,0x238B7E54DfEE4d8e98b8D1A78AB40dd94349BcFd,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0x8FB6840a46a8D143DaC1301F560976b953a095C5,5.34,1555063898315702240434069
3,0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB,100,0x9b4d25a15945cc913732642dc2f9d7b655eac4d8be62...,0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C,0xC9D9c248A71e5573A4f446B825F915C3e1359239,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0xcF1CC6eD5B653DeF7417E3fA93992c3FFe49139B,0x238B7E54DfEE4d8e98b8D1A78AB40dd94349BcFd,5.0,2052115257102964823611722


##### Heuristics 2

In [166]:
def is_similar(value1, value2):
    # Check if the absolute difference between the values is within 1% of the larger value
    diff_percentage = abs(value1 - value2) / max(value1, value2) * 100
    return diff_percentage <= 1

def get_rows_with_similar_amounts(df):
    
    def is_similar(value1, value2):
        # Check if the absolute difference between the values is within 1% of the larger value
        diff_percentage = abs(value1 - value2) / max(value1, value2) * 100
        return diff_percentage <= 1
    
    rows_with_similar_amount = []
    
    column_name = "amount"
    
    for i in range(len(df)):
        for j in range(i + 1, len(df)):
            value1 = df.at[i, column_name]
            value2 = df.at[j, column_name]
            if is_similar(value1, value2):
                rows_with_similar_amount.append((i, j))
    
    return rows_with_similar_amount
    
    

In [167]:
def find_whale_transactions(atk1_idx, atk2_idx, df):
    
    atk1_sender = df.iloc[atk1_idx]["first_sender"]
    possible_whales_df = df.iloc[atk1_idx+1:atk2_idx].sort_values(by="transactionIndex", ascending=False)
    
    for i in range(len(possible_whales_df)):
        
        whale_transaction = possible_whales_df.iloc[i]
        whale_receiver = whale_transaction["first_sender"]
        
        # Heuristic 1 (part of it)
        if not atk1_sender == whale_receiver:
            continue
        
        return whale_transaction      
    
    return None
        

In [171]:
def get_attacks_for_contract_address(df, contract_address):
    
    attack_list_by_transaction_index = []
    
    # Heuristics 3 already grouped by contract address   
    # Heuristics 5 (sorting by transaction index)
    df_contract_address = df[df["contractAddress"] == contract_address].sort_values(by="transactionIndex").reset_index(drop=True)
            
    # Heuristics 2
    rows_with_similar_amount = get_rows_with_similar_amounts(df_contract_address)
    
    for combination in rows_with_similar_amount:
        
        row1_idx = combination[0]
        row2_idx = combination[1]
        
        if row2_idx - row1_idx <= 1:
            continue
        
        row1 = df_contract_address.iloc[row1_idx]
        row2 = df_contract_address.iloc[row2_idx]
        
        # Heuristic 1 (part of it)
        if not row1["first_sender"] == row2["last_receiver"]:
            continue
        
        if not row1["first_receiver"] == row2["last_sender"]:
            continue
        
        # Heuristics 6
        if not row1["gasPrice"] > row2["gasPrice"]:
            continue
            
        whale_transaction = find_whale_transactions(row1_idx, row2_idx, df_contract_address)
        
        if whale_transaction is None:
            continue
        
        # Heuristics 4
        if not row1["transactionHash"] != whale_transaction["transactionHash"] and row2["transactionHash"] != whale_transaction["transactionHash"]:
            continue
        
        attack_list_by_transaction_index.append([row1["wallet"], whale_transaction["wallet"], row2["wallet"]])
        
    return attack_list_by_transaction_index

get_attacks_for_contract_address(df_grouped_by_transaction_index, "0x9a0242b7a33DAcbe40eDb927834F96eB39f8fBCB")

[['0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C',
  '0x8FB6840a46a8D143DaC1301F560976b953a095C5',
  '0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C']]

### Putting all together

In [178]:
TRANSFER = "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef" # ERC20 "Transfer"

def get_frontrunning_attacks_of_block(block_nr):
    
    events = web3.eth.filter({"fromBlock": block_nr, "toBlock": block_nr, "topics": [TRANSFER]}).get_all_entries()
    
    events_by_address = get_events_by_contract_address(events)

    df_of_events = create_df_of_events(events_by_address)
    df_grouped_by_transaction_index = create_df_grouped_by_transaction_index(df_of_events)

    unique_token_contract_addresses = df_grouped_by_transaction_index["contractAddress"].unique()
    
    attacks_in_block = []

    for token_contract_address in unique_token_contract_addresses:
        attacks = get_attacks_for_contract_address(df_grouped_by_transaction_index, token_contract_address)
        attacks_in_block.extend(attacks)
        
    unique_tuples = {tuple(inner_list) for inner_list in attacks_in_block}
    unique_elements = [list(unique_tuple) for unique_tuple in unique_tuples]

    return unique_elements

#### Test random blocks from insertion data

In [179]:
get_frontrunning_attacks_of_block(5574870)

[['0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C',
  '0x8FB6840a46a8D143DaC1301F560976b953a095C5',
  '0x4fCc2FF6c75923D33B4F5aF4C524461014B2EE1C']]

In [180]:
get_frontrunning_attacks_of_block(5599805)

[['0xfF1b9745f68F84F036E5e92c920038d895FB701A',
  '0xc74F67BDa3F0E19071bBa30E24E0827347bCe12f',
  '0xFF28319a7cD2136ea7283E7cDb0675B50AC29Dd2']]

In [181]:
get_frontrunning_attacks_of_block(5599933)

[['0xfF1b9745f68F84F036E5e92c920038d895FB701A',
  '0xEfd0199657B444856e3259ED8e3c39EE43cf51Dc',
  '0xFF28319a7cD2136ea7283E7cDb0675B50AC29Dd2']]

In [182]:
get_frontrunning_attacks_of_block(5303107)

[['0xfF1b9745f68F84F036E5e92c920038d895FB701A',
  '0xD7B09c7B74b3F482E2E1055D495D0e364f84Bbd0',
  '0xFF28319a7cD2136ea7283E7cDb0675B50AC29Dd2']]

In [183]:
get_frontrunning_attacks_of_block(9409988)

[['0x4C6F6fa6ef89e4b44f2e5F9722EF055732A20Ad7',
  '0x8586C28425f6ba3C05C63a3711D45724F0fF4c48',
  '0x4C6F6fa6ef89e4b44f2e5F9722EF055732A20Ad7']]