# METHODOLOGY

The data to conduct the following analysis was obtained using @NumiaData https://numia.xyz/ with the following query:


``` sql
SELECT
  *
FROM `immaculate-355716.osmosis_1.swaps` as swaps
INNER JOIN `immaculate-355716.osmosis_1.transactions` as transactions
  ON swaps.tx_id = transactions.tx_id
WHERE swaps.block_height > '4200000' AND
      swaps.block_height <= '4700000';
```

- For more information about the tables, see Numia's Osmosis documentation: https://docs.numia.xyz/using-numia/chains/osmosis

- Of importance, this query retrieves all swap events on Osmosis joined with other data from the transactions table, between blocks 4200000 and 4700000. This block range encompasses starts from 2022-04-29 and goes to 2022-06-06. 

- In particular, this time period encompasses the entirety of May 2022.

Downloadable csv of the data: https://drive.google.com/file/d/1cR_ANJiGoP1xgvlB0HIRt0mZv9YK1J99/view?usp=sharing

Block 4200000 Mintscan: https://www.mintscan.io/osmosis/blocks/4200000

Block 4700000 Mintscan: https://www.mintscan.io/osmosis/blocks/4700000


## What this analysis doesn't qualify as a potential sandwich attack:

Let A, B, C be 3 different Assets

Let Pool_1, Pool_2, Pool_3 be 3 different Pools with:
- Pool_1: A, B
- Pool_2: B, C
- Pool_3: C, A

Tx_1: A -> B -> C -> A (Cyclic Arb)   # Swap on Pool_1 -> Pool_2 -> Pool_3

Tx_2: A -> B (Regular User Swap)      # Swap on Pool_1 

Tx_3: A -> C -> B -> A (Cyclic Arb)   # Swap on Pool_3 -> Pool_1 -> Pool_1

## What this analysis does qualify as a potential sandwich attack:
Tx_1: A -> B (Directional Swap, Same Direction)     # Swap on Pool_1

Tx_2: A -> B (Regular User Swap, Same Direction)    # Swap on Pool_1

Tx_3: B -> A (Directional Swap, Opposite Direction) # Swap on Pool_1
  
Including > 2 pool cases such as
  
Tx_1: A -> B -> C (Directional Swap, Same Direction)      # Swap on Pool_1 -> Pool_2

Tx_2: A -> B -> C (Regular User Swap, Same Direction)     # Swap on Pool_1 -> Pool_2

Tx_3: C -> B -> A (Directional Swap, Opposite Direction)  # Swap on Pool_2 -> Pool_1

## Potential Sandwich Attack Pattern Matching Process
The most common definition of a sandwich attack is when the above pattern matches for 3 consecutive txs. But because our analysis showed the occurance of this was so infrequent, we expanded our definition of a potential sandwich attack to explore deeper. In total, we analyze the following different scopes:
1. 3 consecutive txs that match the above sandwich pattern
2. 3 txs that match the above sandwich pattern, within the same block
3. 3 txs that match the above sandwich pattern, within a 2-block interval (essentially combining 2 blocks and assuming they were a single block)

# CONCLUSIONS: 
Our analysis shows that there have been no sandwich attacks on Osmosis during May 2022.
- There was only 1 instance of 3 tx consecutive swaps that match a sandwich pattern in May 2022, and when you look into it it's not a sandwich attack, just trading activity.
- If you expand to allow matching throughout an entire block, only 22 instances of txs matching a sandwich pattern, and when you look into it they are not sandwich attacks, just trading activity.
- If you expand to allow two-block intervals, only 178 instances of txs matching a sandwich pattern, and when you look into it they are not sandwich attacks, just trading activity.

## What do we mean by "just trading activity"?
Two or more of the following:
1. "Sandwicher" using more pools than the "sandwichee" on their swaps
2. Having an account that consistently trades back and forth same pools regardless of other traders (market making, vol strat)
3. Having different input/output amounts for the legs of the sandwich (would expect clean sandwiches that just increase their balances, not change underlying portfolio distribution)
4. Left slippage on the table (shows the first and last swaps have different purposes)
5. The first and last tx in the sandwich being net unprofitable

# Don't Trust Us, Trust the Data and Code:
1. Original Data: https://drive.google.com/file/d/1cR_ANJiGoP1xgvlB0HIRt0mZv9YK1J99/view?usp=sharing
2. Post-Analysis Data of Not Sandwiches: https://drive.google.com/file/d/1pmGOYSKTj1j8ZLVBWgsmDBBvlZtw7ZJs/view?usp=sharing


In [25]:
# Import libraries
import pandas as pd
import json
import logging

# Configure logging
# Works if ran as .py, not in .ipynb
logging.basicConfig(filename='analysis.log', encoding='utf-8', level=logging.INFO)

In [20]:
# Read in data
df = pd.read_csv('4200000_4700000.csv')

# Each row in the dataframe is a swap event
# This means that a single tx can have multiple rows

# Limit the df to only the columns we care about
# Sort by block height, tx index, and event index
# This results in a df in the same of how on-chain swap events
# were processed on chain
df = df[['block_height', 'tx_index', 'event_index', 'sender', 'pool_id', 'denom_in', 'denom_out', 'amount_in', 'amount_out', 'tx_id', 'block_timestamp']]
df.sort_values(by=['block_height', 'tx_index', 'event_index'], ascending=True, inplace=True)
df.reset_index(drop=True, inplace=True)

In [23]:
# The length of the df should be 3888452, 
# can verify this with the Numia query
print("Number of Swap Events within Block Range: ", len(df))

Number of Swap Events within Block Range:  3888452


In [26]:
# Get the number of unique txs
print("Number of Unique Transactions: ", df['tx_id'].nunique())

Number of Unique Transactions:  2176036


In [24]:
# Peep what the df looks like
df

Unnamed: 0,block_height,tx_index,event_index,sender,pool_id,denom_in,denom_out,amount_in,amount_out,tx_id,block_timestamp
0,4200001,1,12,osmo1fpjveg0f42adp0hfnwlknh70ej0qtn4cryrw7p,608,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,uosmo,4424727887,5780088,024E7F52630B678B25D5B2EB6265EE5C98D3A5C4C43098...,2022-04-30 01:52:18.616511 UTC
1,4200001,2,12,osmo1ec0vqccz3qa2vv9crda7zgzw0r4jka3naue0zq,608,uosmo,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,13522798,10289490155,B9E59AF0F2CB916582EFB3AE9BEA69F8F28E6DF40FDC33...,2022-04-30 01:52:18.616511 UTC
2,4200001,3,12,osmo1042zdhk96tggc3q0eakj6ngxfae6dm98u5axpv,608,uosmo,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,5779857,4397507050,DCAB80823F7767FC21B3A680C19EFC0B7C249F9B7C167E...,2022-04-30 01:52:18.616511 UTC
3,4200001,4,12,osmo16rfazfh8urmu5usqlj5muuf0vsk8hw67y8hccw,608,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,uosmo,10319170555,13481261,C086365F8053DC1CD280EA674E369484FD2E5569A03248...,2022-04-30 01:52:18.616511 UTC
4,4200001,5,12,osmo1zc8p49qvqqc0d5sa60yvracjtn5xhpaalcn2xx,560,ibc/BE1BB42D4BE3C30D50B68D7C41DB4DFCE9678E8EF8...,uosmo,593000000,140775201,5A687C2D24058972B1AA00127C5C53A44284DCB844E084...,2022-04-30 01:52:18.616511 UTC
...,...,...,...,...,...,...,...,...,...,...,...
3888447,4699999,2,30,osmo1ugrp92e334f83t34xx74cpxck3vlt52jgy6asp,724,ibc/27394FB092D2ECCD56123C74F36E4C1F926001CEAD...,ibc/65381C5F3FD21442283D56925E62EA524DED8B6927...,432,2203340927400,2E0D37DE04E2B9EFC6910AEC2AC2554D09DC6F01A5820F...,2022-06-07 03:20:41.392391 UTC
3888448,4700000,6,12,osmo10dwvdq7lt2vu28e4yskdwrsea3j2rs0ynl83ag,608,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,uosmo,5134586443,6200590,8304D59221F63255A01F9F7555747C6E0997C73F2E46E3...,2022-06-07 03:20:48.151664 UTC
3888449,4700000,7,12,osmo1kl40wdd0dk5cxzu0up8jevqqm72yz3jfa5vlmz,608,uosmo,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,6364435,5238685745,54F040D7C229F0A6829B3F2E74DB160D844B13E602D2C5...,2022-06-07 03:20:48.151664 UTC
3888450,4700000,8,12,osmo1z70zys0musmv9zln04sltf8u30409vujsuhrf6,608,uosmo,ibc/8A34AF0C1943FD0DFCDE9ADBF0B2C9959C45E87E60...,1496772,1231975961,B015029AC3C4BDB3319D3BBC2341017ED7173F1AB9F4E2...,2022-06-07 03:20:48.151664 UTC


In [48]:
# Obtain a dictionary with all cycic arb tx ids (tx hashes)
# This will be used in the analysis to filter out cyclic arb txs
cyclic_arb_tracker = {}
# Iterate through the df and relate each tx id to the denoms involved in the swaps
for _, row in df.iterrows():
    if row['tx_id'] not in cyclic_arb_tracker:
        cyclic_arb_tracker[row['tx_id']] = {'denoms': [row['denom_in'], row['denom_out']]}
    else:
        cyclic_arb_tracker[row['tx_id']]['denoms'].extend([row['denom_in'], row['denom_out']])
# Iterate through the dictionary and check if the first
# denom in and the last denom out are the same
# If so, denote as cyclic arb
for key, value in cyclic_arb_tracker.items():
    if value['denoms'][0] == value['denoms'][-1]:
        cyclic_arb_tracker[key]['cyclic_arb'] = True
    else:
        cyclic_arb_tracker[key]['cyclic_arb'] = False

In [None]:
# This code mad slow, but it works, and this is 
# just to prove a point, so I ain't gonna optimize it

# Desperate sandwich attack detector
already_used_txs = {}
sandwich_list = []
for i in range(4200001, 4700000):
    print("Analyzing Block Height: ", i)

    # Iterate through each block height and created a
    # 2-block interval df to run the analysis on

    # Faster iteration, but less readable, using for actual analysis
    df_sorted_search = df['block_height'].searchsorted([i, i+2])
    df_two_blocks = df.iloc[df_sorted_search[0]:df_sorted_search[1]]
    
    swap_on_pool_dict = {}
    for _, swap in df_two_blocks.iterrows():
        directional_swap = f"{swap['pool_id']}_{swap['denom_in']}_{swap['denom_out']}"
        opposite_directional_swap = f"{swap['pool_id']}_{swap['denom_out']}_{swap['denom_in']}"
        if directional_swap not in swap_on_pool_dict:
            # Add the swap to the dictionary of swaps on a pool
            swap_on_pool_dict[directional_swap] = {'senders': [swap['sender']],
                                                   'tx_ids': [swap['tx_id']],
                                                   'tx_indexes': [swap['tx_index']],
                                                   'block_heights': [swap['block_height']],
                                                   'denom_in': swap['denom_in'],
                                                   'denom_out': swap['denom_out'],
                                                   'pool_id': swap['pool_id']}                   
        else:
            swap_on_pool_dict[directional_swap]['senders'].append(swap['sender'])
            swap_on_pool_dict[directional_swap]['tx_ids'].append(swap['tx_id'])
            swap_on_pool_dict[directional_swap]['tx_indexes'].append(swap['tx_index'])
            swap_on_pool_dict[directional_swap]['block_heights'].append(swap['block_height'])

        if opposite_directional_swap in swap_on_pool_dict:
            for index, sender in enumerate(swap_on_pool_dict[opposite_directional_swap]['senders'][:-1]):
                if swap['sender'] == sender:
                    pool_id = swap_on_pool_dict[opposite_directional_swap]["pool_id"]
                    # Front of the sandwich tx
                    tx_id = swap_on_pool_dict[opposite_directional_swap]["tx_ids"][index]
                    for ix, sandwiched_user in enumerate(swap_on_pool_dict[opposite_directional_swap]['senders'][index + 1:]):
                        if sandwiched_user != sender:
                            # Sandwiched tx
                            sandwiched_tx = swap_on_pool_dict[opposite_directional_swap]['tx_ids'][index + 1:][ix]
                            if tx_id not in already_used_txs:
                                if cyclic_arb_tracker[tx_id]['cyclic_arb'] == False and cyclic_arb_tracker[swap['tx_id']]['cyclic_arb'] == False:
                                    already_used_txs[tx_id] = True
                                    already_used_txs[swap['tx_id']] = True
                                    sandwich_list.append({
                                        'tx_ids': [tx_id, sandwiched_tx, swap['tx_id']], # Front of the sandwich tx, sandwiched tx, back of the sandwich tx
                                        'tx_indexes': [swap_on_pool_dict[opposite_directional_swap]['tx_indexes'][index], swap_on_pool_dict[opposite_directional_swap]['tx_indexes'][index + 1:][ix], swap['tx_index']],
                                        'block_heights': [swap_on_pool_dict[opposite_directional_swap]['block_heights'][index], swap_on_pool_dict[opposite_directional_swap]['block_heights'][index + 1:][ix], swap['block_height']],
                                        'senders': [sender, sandwiched_user, swap['sender']],
                                        'sandwicher': swap['sender'],
                                        'pool_id': swap['pool_id'],
                                        'denom_in': swap['denom_in'],
                                        'denom_out': swap['denom_out'],
                                        'start_block_height': swap_on_pool_dict[opposite_directional_swap]['block_heights'][index],
                                        'end_block_height': swap['block_height']})

In [72]:
# Length of sandwich list when including arbs
len(sandwich_list)

6652

In [76]:
# Length of sandwich list when excluding arbs and only 
# allowing a tx to be sandwiched once
len(sandwich_list)

168

In [106]:
# Length of sandwich list when excluding arbs and
# allowing a tx to be sandwiched multiple times
len(sandwich_list)

178

In [120]:
# This output results from commenting out the already used txs
# checks in the script and allowing for all possible combinations
# of sandwich txs to occur, allowing for overlapping sandwiches


# This is to make sure we get all possible sandwich combinations

# No holds bar, same tx can be front of a sandwich multiple times
# and same tx can be in the middle of a sandwich multiple times
# and end tx can be in a sandwich multiple times
# The most lenient it could be
print(len(sandwich_list))
print()

# Question: Are any of these possible sandwich combinations consecutive?
# That is:
# TX_1: A -> B
# TX_2: A -> B
# TX_3: B -> A

for sandwich in sandwich_list:
    # check if the tx_indexes are consecutive
    if int(sandwich['tx_indexes'][0]) + 1 == int(sandwich['tx_indexes'][1]) and int(sandwich['tx_indexes'][1]) + 1 == int(sandwich['tx_indexes'][2]):
        if sandwich['start_block_height'] == sandwich['end_block_height']:
            print(sandwich)
            print()

# Result: NO NONE OF THESE ARE SANDWICHES, LOOK AT MINTSCAN
# These are just traders...trading, no sign of deliberate 
# sandwich attack:
    # Much slippage left to go on the "sandwiched" tx.
    # Sandwicher using more pools than the sandwiched tx
    # Sandwicher having histort of using the pools frequently for general trading strategy.
# This is noise, not signal

# FYI: First two rows repeated since no tx inclusion checks and are single block

517

{'tx_ids': ['FB02530607E750F178FBD0455E1E03D9753684F39F98D035326D294F3AEDC51F', '099E87FA83B8FBE5BFF6589DC833F5F798BDD1B1F8528524BA57728BEDF08E99', '6AFFFE89FF44DCA842FFAF6A35A30568FC71A793FF4171D53564816EFCD291DF'], 'tx_indexes': [5, 6, 7], 'block_heights': [4366700, 4366700, 4366700], 'senders': ['osmo1m0kvcrhjjk8f6p45krxf6uxechyrt32rcs6tcz', 'osmo1k84s5vmvge68vce2yfh3449sfc6zm9qxv7uvlt', 'osmo1m0kvcrhjjk8f6p45krxf6uxechyrt32rcs6tcz'], 'sandwicher': 'osmo1m0kvcrhjjk8f6p45krxf6uxechyrt32rcs6tcz', 'pool_id': 561, 'denom_in': 'uosmo', 'denom_out': 'ibc/0EF15DF2F02480ADE0BB6E85D9EBB5DAEA2836D3860E9F97F9AADE4F57A31AA0', 'start_block_height': 4366700, 'end_block_height': 4366700}

{'tx_ids': ['FB02530607E750F178FBD0455E1E03D9753684F39F98D035326D294F3AEDC51F', '099E87FA83B8FBE5BFF6589DC833F5F798BDD1B1F8528524BA57728BEDF08E99', '6AFFFE89FF44DCA842FFAF6A35A30568FC71A793FF4171D53564816EFCD291DF'], 'tx_indexes': [5, 6, 7], 'block_heights': [4366700, 4366700, 4366700], 'senders': ['osmo1m0k

In [96]:
# iterate through the sandwich list and create a dataframe from the dictionaries
sandwich_df = pd.DataFrame(sandwich_list)

In [97]:
sandwich_df.groupby('sandwicher').count()

Unnamed: 0_level_0,tx_ids,pool_id,denom_in,denom_out,start_block_height,end_block_height
sandwicher,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
osmo10g69v2meq377dpzhrx70e28nf8ptc9lfmkd2zk,1,1,1,1,1,1
osmo12dvx0nmkya8qe5qslpclx348xejgwmuu0h2xf7,1,1,1,1,1,1
osmo12m3h99h8a5j2c0rt46sdldmqccypc6n2ruspn8,1,1,1,1,1,1
osmo13ues4s9k53a35qqu2k2hqk5fqjyzuqx76trg78,27,27,27,27,27,27
osmo15a5g72967828dqdh029eedk3t7hlqchyd2geer,1,1,1,1,1,1
osmo15vdjje8009ly9rudapxjcht5h9vshc7hw3a7zy,26,26,26,26,26,26
osmo15wjxnjkcd6szgm53v8ve7ysqgfz63cnzz84qcn,1,1,1,1,1,1
osmo1c36e80c032d3s5w5nl0zwcuuvzplyq4cfg0u7j,1,1,1,1,1,1
osmo1c8zy6r2shjsaj7ujtushz3gclxgw400rh8d6qa,1,1,1,1,1,1
osmo1cjay405tr8d7crtmwtenfrmrs4qny2fj2vtjeh,4,4,4,4,4,4


In [101]:
same_block_df = sandwich_df[sandwich_df['start_block_height'] == sandwich_df['end_block_height']]
two_block_df = sandwich_df[sandwich_df['start_block_height'] != sandwich_df['end_block_height']]

print("Number of potential sandwiches in same block: ", len(same_block_df))
print("Number of potential sandwiches in two blocks: ", len(two_block_df))

Number of potential sandwiches in same block:  22
Number of potential sandwiches in two blocks:  156


In [110]:
# Example pattern match within same block
same_block_df['tx_ids'][0]

['821E3A2518F41069EED1C409F9ED5DB60710B26E0758FA05042F5891309D822C',
 '9C51814B66F387552FF692D501EEAA47AEDEC7A7FF68E79DF27A8C81AFA4CAA5',
 'DBDF184B7AB11BA78285A9805E2AB814800333ED7A3375C11DDE95BC64C4E09D']

In [137]:
# Example pattern match within two blocks
two_block_df['tx_ids'][0]

['70FA0937F321A0052947731001F698D45D90BD4FE668B01896A5D466BF82B8D0',
 'BE6CF238133D1D7406E511BF7D5DF0B1B8484B4B5F0DBB09C90E8BBA58EE87E8',
 'D8D976313AFF74016628BB7AE9F466C23B43A544325F250F6D63571EFD805EFB']

In [118]:
sandwich_df.to_csv('not_sandwiches.csv')