# Cow + Univ3 DataPipeline

The goal of this notebook is to show and explain how to create a decentralized data pipeline powered by Cow Subgraphs with DataStreams. DataStreams functions as a GraphQL query manager for Subgraphs and allows one to compose queries together into a fully replicatable data pipeline. Data comes from the following sources:
CoW Subgraph - https://thegraph.com/hosted-service/subgraph/cowprotocol/cow
Univ3 Subgraph - https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum
Dune Solver Names Query link - https://dune.com/queries/1941061


### Installation Notes
If you haven't already, you can install DataStreams with the command `!pip install git+https://github.com/Evan-Kim2028/DataStreams.git` in a new cell. Exclude the `!` if you are installing in a virtual environment or terminal. DataStreams requires Python 3.10 or greater too. Finally we use polars to perform merges and column mutations. You can install polars with `!pip install polars`.

### Process
First we query the CoW trade schemas twice for WETH/USDC. Then we query the CoW schema for settlement info. Finally we download the Dune Solver names query and merge it with the CoW data. We then perform some column mutations with polars to get the final data.

For Univ3 data, we query the swaps schema for the USDC/WETH .05% and .3% fee pools. Then we merge the CoW trades data and Univ3 Swaps data for transactions that occur at the same timestamp. We add decimal places for the amounts and calculate marginal swap execution prices for comparison.

### Setup Jupyter Environment

In [3]:
from datastreams.datastream import Streamer

import os
import pandas as pd
import polars as pl

# These commands enlarge the column size of the dataframe so things like 0x... are not truncated
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

### Cowswap Trades

In [4]:
# instantiate Streamer class. Note that we need two separate streamer classes, otherwise the queries will be overwritten. 
cow_ds1 = Streamer('https://api.thegraph.com/subgraphs/name/cowprotocol/cow')
cow_ds2 = Streamer('https://api.thegraph.com/subgraphs/name/cowprotocol/cow')

In [5]:
# DEFINE TIMESTAMP HERE. Timstamp is used for replication quality assurance purposes.
timestamp = 1700000000

# define ethereum token addresses here to be used in cowswap trades query filter
weth_addr = "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2"
usdc_addr = "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"

# we set a fixed query size number. The Cow settlements and Uniswap swaps query are multiples larger than this initial query size.
query_size = 2100

In [6]:
token_addr_list = [weth_addr, usdc_addr]

In [7]:
# We need to make two queries to the cow schema to get all the trades that match weth/usdc and usdc/weth.
trades_weth_usdc_fp = cow_ds1.queryDict.get('trades')
trades_usdc_weth_fp = cow_ds2.queryDict.get('trades')

# trades query path that gets token a -> token b trades
trades_weth_usdc_qp = trades_weth_usdc_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {
    'timestamp_lt': timestamp, 
    'buyAmountUsd_gt': 100, 
    'sellAmountUsd_gt': 100, 
    "sellToken_in": token_addr_list, 
    "buyToken_in": token_addr_list
    }
)

# trades query path that gets token b -> token a trades
trades_usdc_weth_qp = trades_usdc_weth_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {
    'timestamp_lt': timestamp, 
    'buyAmountUsd_gt': 100, 
    'sellAmountUsd_gt': 100, 
    "sellToken_in": token_addr_list, 
    "buyToken_in": token_addr_list
    }
)

# run query
trades_weth_usdc_df = cow_ds1.runQuery(trades_weth_usdc_qp)
trades_usdc_weth_df = cow_ds2.runQuery(trades_usdc_weth_qp)

FIELD - trades
FIELD - trades


In [8]:
# combine the trades queries together
trades_df = pd.concat([trades_weth_usdc_df, trades_usdc_weth_df])

In [9]:
print(f'query returned {len(trades_df)} rows')

query returned 4200 rows


In [10]:
# get unique values in trades_df column to verify the query results.
trades_df['trades_buyToken_id'].unique()

array(['0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2',
       '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'], dtype=object)

In [11]:
# replace the above values with symbols
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].replace(weth_addr, 'WETH')
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].replace(usdc_addr, 'USDC')

trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].replace(weth_addr, 'WETH')
trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].replace(usdc_addr, 'USDC')

### Cowswap Trades-Settlement Merge

In [12]:
# get a query field path from the query dictionary which is automatically populated in the Streamer object
settlements_fp = cow_ds1.queryDict.get('settlements')

# add parameters to the settlements_qp.
settlements_qp = settlements_fp(
    first=query_size * 5,
    orderBy='firstTradeTimestamp',
    orderDirection='desc',
    where = {'firstTradeTimestamp_lt': timestamp} 
    )

# run query
settlements_df = cow_ds1.runQuery(settlements_qp)

FIELD - settlements


In [13]:
settlements_df.size

52500

In [14]:
trades_df.dtypes

trades_id                object
trades_timestamp          int64
trades_gasPrice           int64
trades_feeAmount          int64
trades_txHash            object
trades_settlement_id     object
trades_buyAmount         object
trades_sellAmount        object
trades_sellToken_id      object
trades_buyToken_id       object
trades_order_id          object
trades_buyAmountEth     float64
trades_sellAmountEth    float64
trades_buyAmountUsd     float64
trades_sellAmountUsd    float64
endpoint                 object
dtype: object

In [15]:
# enforce trades_df column types. This is necessary because the data types are not enforced by pandas dataframes. Types are enforced as a Polars dataframe because of the columnar storage method.
trades_df['trades_buyAmount'] = trades_df['trades_buyAmount'].astype('float64')
trades_df['trades_sellAmount'] = trades_df['trades_sellAmount'].astype('float64')
trades_df['trades_buyAmountUsd'] = trades_df['trades_buyAmountUsd'].astype('float64')
trades_df['trades_sellAmountUsd'] = trades_df['trades_sellAmountUsd'].astype('float64')
trades_df['trades_timestamp'] = trades_df['trades_timestamp'].astype('int64')
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].astype('str')
trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].astype('str')

In [16]:
# convert dfs into a dictionaries
settlement_dict = settlements_df.to_dict('records')
trades_dict = trades_df.to_dict('records')

In [17]:
# convert dictionaries into polars dataframes
settlement_pl = pl.from_dicts(settlement_dict)
trades_pl = pl.from_dicts(trades_dict)

In [18]:
# merge trades and settlement dataframes on the settlement transaction hash
cow_trades_pl = trades_pl.join(other=settlement_pl, left_on='trades_settlement_id', right_on='settlements_txHash', how='inner')

In [19]:
cow_trades_pl.shape

(1076, 20)

In [20]:
cow_trades_pl.head(5)

trades_id,trades_timestamp,trades_gasPrice,trades_feeAmount,trades_txHash,trades_settlement_id,trades_buyAmount,trades_sellAmount,trades_sellToken_id,trades_buyToken_id,trades_order_id,trades_buyAmountEth,trades_sellAmountEth,trades_buyAmountUsd,trades_sellAmountUsd,endpoint,settlements_id,settlements_firstTradeTimestamp,settlements_solver_id,endpoint_right
str,i64,i64,i64,str,str,f64,f64,str,str,str,f64,f64,f64,f64,str,str,i64,str,str
"""0xe7091acaae8b...",1676854427,18905989039,0,"""0x15335ac33a70...","""0x15335ac33a70...",1.3112e+19,22000000000.0,"""USDC""","""WETH""","""0xe7091acaae8b...",13.112022,13.0769,22059.0888,22000.0,"""cow""","""0x15335ac33a70...",1676854427,"""0x149d0f928233...","""cow"""
"""0xe7091acaae8b...",1676854427,18905989039,0,"""0x15335ac33a70...","""0x15335ac33a70...",1.3112e+19,22000000000.0,"""USDC""","""WETH""","""0xe7091acaae8b...",13.112022,13.0769,22059.0888,22000.0,"""cow""","""0x15335ac33a70...",1676854427,"""0x149d0f928233...","""cow"""
"""0x0fab05e4cd61...",1676854355,20372316359,0,"""0xea0f54777b51...","""0xea0f54777b51...",1.3103e+19,22000000000.0,"""USDC""","""WETH""","""0x0fab05e4cd61...",13.103208,13.071313,22053.681009,22000.0,"""cow""","""0xea0f54777b51...",1676854355,"""0x149d0f928233...","""cow"""
"""0x0fab05e4cd61...",1676854355,20372316359,0,"""0xea0f54777b51...","""0xea0f54777b51...",1.3103e+19,22000000000.0,"""USDC""","""WETH""","""0x0fab05e4cd61...",13.103208,13.071313,22053.681009,22000.0,"""cow""","""0xea0f54777b51...",1676854355,"""0x149d0f928233...","""cow"""
"""0x5a3aa6f65f83...",1676852495,20834035283,0,"""0x5bccbb4b4c16...","""0x5bccbb4b4c16...",1.3313e+19,22390000000.0,"""USDC""","""WETH""","""0x5a3aa6f65f83...",13.31307,13.303491,22406.472431,22390.349577,"""cow""","""0x5bccbb4b4c16...",1676852495,"""0x149d0f928233...","""cow"""


In [21]:
# get unique values in cow_trades_pl trades_sellToken_id column
cow_trades_pl['trades_sellToken_id'].unique()

trades_sellToken_id
str
"""WETH"""
"""USDC"""


### Cowswap Trades-Solver Merge

In [22]:
solvers = pd.read_csv('data/cowv2_solvers.csv') # load in pandas instead of polars. Having trouble replacing \ symbol in polars

In [23]:
# rename address to settlements_solver_id in pandas
solvers = solvers.rename(columns={"address": "settlements_solver_id"})

In [24]:
# NOTE - dune formats addresses as /x... need to convert '/' to '0'
solvers['settlements_solver_id'] = solvers['settlements_solver_id'].str.replace('\\', '0', regex=False)

In [25]:
# turn solvers into a dictionary
solvers_dict = solvers.to_dict('records')

# convert dict to polars
solvers_pl = pl.from_dicts(solvers_dict)

In [26]:
# inner join solvers_pl on total_settlement_tokens_pl
cow_complete_pl = cow_trades_pl.join(solvers_pl, on="settlements_solver_id", how="inner")

In [27]:
# drop endpoint_right column from total_settlement_tokens_solvers
cow_complete_pl = cow_complete_pl.drop('endpoint_right')

In [28]:
cow_complete_pl.shape

(1036, 22)

#### Basic Agg

In [29]:
# filter by "prod" environments
filter_df = cow_complete_pl.filter(pl.col("environment") == "prod")

In [30]:
# filter by "prod" environments
filter_df = cow_complete_pl.filter(pl.col("environment") == "prod")

In [31]:
filter_df.shape

(1024, 22)

In [32]:
# group filter_df by solver name. Check solver count
grouped_df = filter_df.groupby("name").agg(
    pl.count("trades_id").alias("total_trades")).sort("total_trades", reverse=True)


In [33]:
grouped_df

name,total_trades
str,u32
"""Otex""",416
"""PLM""",304
"""Laertes""",106
"""Gnosis_1inch""",78
"""Quasilabs""",70
"""Gnosis_0x""",24
"""Seasolver""",8
"""Gnosis_Balance...",8
"""DexCowAgg""",6
"""Naive""",2


### Uniswap V3 Swaps

In [34]:
# instantiate Streamer object. 
# Note - unlike the cow queries, univ3 does not require multiple streamer instantations because the swaps field path is reset each iteration. 
# If the Cow queries were updated to use the same method, we could use the same streamer object for all queries.
univ3_ds = Streamer('https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum')

In [35]:
# get a query field path from the query dictionary which is automatically populated in the Streamer object
swaps_fp = univ3_ds.queryDict.get('swaps')

In [36]:
weth_usdc_list = [
    "0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640", # usdc/weth .05%
    "0x8ad599c3a0ff1de082011efddc58f1908eb6e6d8" #usdc/weth .3%
]

In [37]:
swaps_df_list = []

In [38]:
# for loop over the LP list to get the swap data
for lp in weth_usdc_list:
    # add parameters to the query_path
    swaps_qp = swaps_fp(
        first=query_size * 10,
        orderBy='timestamp',
        orderDirection='desc',
        where = {'timestamp_lt': timestamp, 'amountInUSD_gt': 250, 'amountOutUSD_gt': 250, 'pool': lp} 
        )

    # run query
    swaps_df = univ3_ds.runQuery(swaps_qp)
    swaps_df_list.append(swaps_df)

FIELD - swaps
FIELD - swaps


In [39]:
# concat swaps_df_list into a single dataframe.
swaps_df = pd.concat(swaps_df_list)

In [40]:
# replace the pool addresses with LP pool names with fees
swaps_df['swaps_pool_id'] = swaps_df['swaps_pool_id'].replace(weth_usdc_list[0], 'USDC_WETH .05%')
swaps_df['swaps_pool_id'] = swaps_df['swaps_pool_id'].replace(weth_usdc_list[1], 'USDC_WETH .3%')

# replace token addresses with symbols
swaps_df['swaps_tokenIn_id'] = swaps_df['swaps_tokenIn_id'].replace(usdc_addr, 'USDC')
swaps_df['swaps_tokenIn_id'] = swaps_df['swaps_tokenIn_id'].replace(weth_addr, 'WETH')
swaps_df['swaps_tokenOut_id'] = swaps_df['swaps_tokenOut_id'].replace(usdc_addr, 'USDC')
swaps_df['swaps_tokenOut_id'] = swaps_df['swaps_tokenOut_id'].replace(weth_addr, 'WETH')

In [41]:
print(f'query returned {len(swaps_df)} rows\n swaps_df columns are {swaps_df.columns}')

query returned 42000 rows
 swaps_df columns are Index(['swaps_id', 'swaps_hash', 'swaps_logIndex', 'swaps_protocol_id',
       'swaps_to', 'swaps_from', 'swaps_blockNumber', 'swaps_timestamp',
       'swaps_tokenIn_id', 'swaps_amountIn', 'swaps_amountInUSD',
       'swaps_tokenOut_id', 'swaps_amountOut', 'swaps_amountOutUSD',
       'swaps_pool_id', 'endpoint'],
      dtype='object')


In [42]:
# convert swaps_df to pl
swaps_dict = swaps_df.to_dict('records')
swaps_pl = pl.from_dicts(swaps_dict)

### Merge Cow and Univ3 (NOTE - not sure if neccessary. It's not used anywhere else in the notebook.)

In [43]:
# merge trades and swaps on timestamp value. We use outer join because we want to keep all trades and swaps data and backfill swap values
cow_uni_outer_pl = cow_complete_pl.join(other=swaps_pl, left_on='trades_timestamp', right_on='swaps_timestamp', how='outer')

In [44]:
cow_uni_outer_pl.columns

['trades_id',
 'trades_timestamp',
 'trades_gasPrice',
 'trades_feeAmount',
 'trades_txHash',
 'trades_settlement_id',
 'trades_buyAmount',
 'trades_sellAmount',
 'trades_sellToken_id',
 'trades_buyToken_id',
 'trades_order_id',
 'trades_buyAmountEth',
 'trades_sellAmountEth',
 'trades_buyAmountUsd',
 'trades_sellAmountUsd',
 'endpoint',
 'settlements_id',
 'settlements_firstTradeTimestamp',
 'settlements_solver_id',
 'environment',
 'name',
 'active',
 'swaps_id',
 'swaps_hash',
 'swaps_logIndex',
 'swaps_protocol_id',
 'swaps_to',
 'swaps_from',
 'swaps_blockNumber',
 'swaps_tokenIn_id',
 'swaps_amountIn',
 'swaps_amountInUSD',
 'swaps_tokenOut_id',
 'swaps_amountOut',
 'swaps_amountOutUSD',
 'swaps_pool_id',
 'endpoint_right']

In [45]:
# not sure if the merge between uni and cow is completely neccessary at this point as we can get data needed for execution prices without the merge step.
cow_uni_trunc_pl = cow_uni_outer_pl[[
    'trades_timestamp', 
    'trades_txHash',
    'trades_feeAmount',
    'trades_sellToken_id', 
    'trades_buyToken_id', 
    'trades_buyAmount',
    'trades_sellAmount',
    # 'trades_sellAmountUsd', 
    # 'trades_buyAmountUsd', 
    'name',
    'environment',
    'swaps_pool_id', 
    'swaps_tokenIn_id', 
    'swaps_tokenOut_id',
    'swaps_amountIn',
    'swaps_amountOut',  
    # 'swaps_amountInUSD',
    # 'swaps_amountOutUSD',
    'swaps_blockNumber'
    ]]

In [46]:
#check pl dataframe size
cow_uni_outer_pl.shape

(42996, 37)

In [47]:
# get the rows that have 0 null values. This leaves us with the cow/univ3 trades that were executed on the same timestamp.
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_pl.drop_nulls()

In [48]:
# check shape.
cow_uni_trunc_no_nulls_pl.shape

(418, 15)

In [49]:
# drop duplicate row values polars dataframe. Note - Unsure why duplicates get inserted into the dataframe during the transformation
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.unique()

In [50]:
cow_uni_trunc_no_nulls_pl.shape

(209, 15)

In [51]:
# how many values in trades_feeAmount are 0 polars series
cow_uni_trunc_no_nulls_pl['trades_feeAmount'].eq(0).sum()

53

In [52]:
cow_uni_trunc_no_nulls_pl.head(15)

trades_timestamp,trades_txHash,trades_feeAmount,trades_sellToken_id,trades_buyToken_id,trades_buyAmount,trades_sellAmount,name,environment,swaps_pool_id,swaps_tokenIn_id,swaps_tokenOut_id,swaps_amountIn,swaps_amountOut,swaps_blockNumber
i64,str,i64,str,str,f64,f64,str,str,str,str,str,f64,f64,i64
1676852495,"""0x5bccbb4b4c16...",0,"""USDC""","""WETH""",1.3313e+19,22390000000.0,"""PLM""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",313409936.0,1.862e+17,16666163
1676851127,"""0x380da7a51274...",2642240106390777,"""WETH""","""USDC""",165382938.0,1e+17,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",134710000000.0,8.011e+19,16666049
1676851127,"""0x380da7a51274...",2642240106390777,"""WETH""","""USDC""",165382938.0,1e+17,"""Otex""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",3.1065e+17,521887967.0,16666049
1676851127,"""0x380da7a51274...",2642240106390777,"""WETH""","""USDC""",165382938.0,1e+17,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",500000000.0,2.9732e+17,16666049
1676850947,"""0x91c42df26c76...",0,"""USDC""","""WETH""",2e+19,33610000000.0,"""PLM""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",1.1e+19,18477000000.0,16666034
1676850947,"""0x91c42df26c76...",0,"""USDC""","""WETH""",2e+19,33610000000.0,"""PLM""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",3.2e+17,537518732.0,16666034
1676841335,"""0x57101d029a02...",3591326000870283,"""WETH""","""USDC""",921182727.0,5.5e+17,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",148390000000.0,8.8015e+19,16665243
1676841335,"""0x57101d029a02...",3591326000870283,"""WETH""","""USDC""",921182727.0,5.5e+17,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",6730700000.0,3.9918e+18,16665243
1676838731,"""0x5cfc3e432fa5...",5778398386909872,"""WETH""","""USDC""",150780000000.0,9.01e+19,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",93320000000.0,5.5733e+19,16665027
1676838731,"""0x5cfc3e432fa5...",3439675782149555,"""WETH""","""USDC""",19710000000.0,1.1781e+19,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",93320000000.0,5.5733e+19,16665027


#### Add Decimal Values to data

In [53]:
# add decimals to cow trades sell tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('trades_sellToken_id'),
        (
            pl.when(pl.col('trades_sellToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_sellToken_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('trades_buyToken_id'),
        (
            pl.when(pl.col('trades_buyToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_buyToken_decimals'),
    ]
)

In [54]:
# add decimals to cow trades sell tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('swaps_tokenIn_id'),
        (
            pl.when(pl.col('swaps_tokenIn_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenIn_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('swaps_tokenOut_id'),
        (
            pl.when(pl.col('swaps_tokenOut_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenOut_decimals'),
    ]
)

In [55]:
# note that polars can perform these calculations in-column. This means it can convert the values in place without creating a new column. The new column created here is more verbose, but is a good sanity check to see before/after results.
trades_swaps_converted_pl = cow_uni_trunc_no_nulls_pl.with_columns([
    (pl.col("trades_buyAmount") / (10**pl.col("trades_buyToken_decimals"))).alias('trades_buyAmount_converted'),
    (pl.col("trades_sellAmount") / (10**pl.col("trades_sellToken_decimals"))).alias('trades_sellAmount_converted'),
    (pl.col("swaps_amountIn") / (10**pl.col("swaps_tokenIn_decimals"))).alias('swaps_amountIn_converted'),
    (pl.col("swaps_amountOut") / (10**pl.col("swaps_tokenOut_decimals"))).alias('swaps_amountOut_converted'),
])

In [56]:
# check that the decimal columns and conversion columns were added
trades_swaps_converted_pl.columns

['trades_timestamp',
 'trades_txHash',
 'trades_feeAmount',
 'trades_sellToken_id',
 'trades_buyToken_id',
 'trades_buyAmount',
 'trades_sellAmount',
 'name',
 'environment',
 'swaps_pool_id',
 'swaps_tokenIn_id',
 'swaps_tokenOut_id',
 'swaps_amountIn',
 'swaps_amountOut',
 'swaps_blockNumber',
 'trades_sellToken_decimals',
 'trades_buyToken_decimals',
 'swaps_tokenIn_decimals',
 'swaps_tokenOut_decimals',
 'trades_buyAmount_converted',
 'trades_sellAmount_converted',
 'swaps_amountIn_converted',
 'swaps_amountOut_converted']

In [57]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_pl[
    'trades_timestamp',
    'swaps_blockNumber',
    'trades_sellToken_id',
    'trades_buyToken_id',
    'trades_sellAmount_converted',
    'trades_buyAmount_converted',
    'name',
    'environment',
    'swaps_pool_id',
    'swaps_tokenIn_id',
    'swaps_tokenOut_id',
    'swaps_amountIn_converted',
    'swaps_amountOut_converted',
]

### Add Execution Price Columns "x/y" and "y/x"

We calculate execution prices for cow trades and univ3 swaps. Then we take the timestamp with the most number of executions and compare the execution prices.

In [58]:
trades_swaps_converted_trunc_pl.head(5)

trades_timestamp,swaps_blockNumber,trades_sellToken_id,trades_buyToken_id,trades_sellAmount_converted,trades_buyAmount_converted,name,environment,swaps_pool_id,swaps_tokenIn_id,swaps_tokenOut_id,swaps_amountIn_converted,swaps_amountOut_converted
i64,i64,str,str,f64,f64,str,str,str,str,str,f64,f64
1676852495,16666163,"""USDC""","""WETH""",22390.349577,13.31307,"""PLM""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",313.409936,0.186202
1676851127,16666049,"""WETH""","""USDC""",0.1,165.382938,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",134705.092164,80.110311
1676851127,16666049,"""WETH""","""USDC""",0.1,165.382938,"""Otex""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",0.310648,521.887967
1676851127,16666049,"""WETH""","""USDC""",0.1,165.382938,"""Otex""","""prod""","""USDC_WETH .05%...","""USDC""","""WETH""",500.0,0.297322
1676850947,16666034,"""USDC""","""WETH""",33609.650423,20.0,"""PLM""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",11.0,18477.490432


In [69]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_trunc_pl.with_columns([
    (pl.col("trades_buyAmount_converted") / pl.col("trades_sellAmount_converted")).alias('trades_buy_sell_ratio'),
    (pl.col("trades_sellAmount_converted") / pl.col("trades_buyAmount_converted")).alias('trades_sell_buy_ratio'),
    (pl.col("swaps_amountIn_converted") / pl.col("swaps_amountOut_converted")).alias('swaps_amountIn_amountOut_ratio'),
    (pl.col("swaps_amountOut_converted") / pl.col("swaps_amountIn_converted")).alias('swaps_amountOut_amountIn_ratio'),
])

In [70]:
# find the juiciest timestamp with the most amount of activity
trades_swaps_converted_trunc_pl['trades_timestamp'].value_counts().sort('counts', reverse=True)

trades_timestamp,counts
i64,u32
1676645159,7
1676823287,5
1676602211,4
1676766059,4
1676717975,4
1676639207,4
1676651087,4
1676767055,3
1676645735,3
1676665715,3


In [71]:
# get timestamp value with highest count
biggest_timestamp_count = trades_swaps_converted_trunc_pl['trades_timestamp'].value_counts().sort('counts', reverse=True).head(1)

In [72]:
biggest_timestamp_count

trades_timestamp,counts
i64,u32
1676645159,7


In [73]:
biggest_timestamp_count_int = biggest_timestamp_count['trades_timestamp'].head(1).to_numpy()[0]

In [74]:
# filter poalrs dataframe by 1676823287 timestamp value
filter_trades_swaps_pl = trades_swaps_converted_trunc_pl.filter(pl.col('trades_timestamp') == biggest_timestamp_count_int)

In [75]:
filter_trades_swaps_pl

trades_timestamp,swaps_blockNumber,trades_sellToken_id,trades_buyToken_id,trades_sellAmount_converted,trades_buyAmount_converted,name,environment,swaps_pool_id,swaps_tokenIn_id,swaps_tokenOut_id,swaps_amountIn_converted,swaps_amountOut_converted,trades_buy_sell_ratio,swaps_amountIn_amountOut_ratio,trades_sell_buy_ratio,swaps_amountOut_amountIn_ratio
i64,i64,str,str,f64,f64,str,str,str,str,str,f64,f64,f64,f64,f64,f64
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",347.856688,582759.25735,1671.001743,0.000597,0.000598,1675.285476
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",7.274375,12176.955005,1671.001743,0.000597,0.000598,1673.952102
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",5.407001,9051.216855,1671.001743,0.000597,0.000598,1673.981054
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",3.10509,5197.92201,1671.001743,0.000597,0.000598,1674.000489
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",3.315188,5549.675704,1671.001743,0.000597,0.000598,1674.015147
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",71.418334,119567.561199,1671.001743,0.000597,0.000598,1674.185796
1676645159,16649105,"""WETH""","""USDC""",27.891154,46606.167074,"""Seasolver""","""prod""","""USDC_WETH .05%...","""WETH""","""USDC""",27.883131,46687.865393,1671.001743,0.000597,0.000598,1674.412576


In [86]:
print(f'At timestamp {biggest_timestamp_count_int}, there were {filter_trades_swaps_pl.shape[0]} trades and swaps.')

print(f'\
Cow mean execution prices were {filter_trades_swaps_pl["trades_buy_sell_ratio"].mean()} \
and {filter_trades_swaps_pl["trades_sell_buy_ratio"].mean()} \
completed by solver {filter_trades_swaps_pl["name"].head(1).to_numpy()[0]}')

print(f'\
The Univ3 pool {filter_trades_swaps_pl["swaps_pool_id"].head(1).to_numpy()[0]} mean execution price during this time was {filter_trades_swaps_pl["swaps_amountOut_amountIn_ratio"].mean()} \
and {filter_trades_swaps_pl["swaps_amountIn_amountOut_ratio"].mean()}')


At timestamp 1676645159, there were 7 trades and swaps.
Cow mean execution prices were 1671.0017426760564 and 0.0005984434213686288 completed by solver Seasolver
The Univ3 pool USDC_WETH .05% mean execution price during this time was 1674.2618058223286 and 0.0005972781955816646


### Save Dataset to Csv

In [65]:
# save cow_complete_pl to csv
trades_swaps_converted_trunc_pl.write_csv('data/trades_swaps_converted_trunc_pl.csv')