# Cow + Univ3 DataPipeline

The goal of this notebook is to show and explain how to create a decentralized data pipeline powered by Cow Subgraphs with DataStreams. DataStreams functions as a GraphQL query manager for Subgraphs and allows one to compose queries together into a fully replicatable data pipeline. Data comes from the following sources:
CoW Subgraph - https://thegraph.com/hosted-service/subgraph/cowprotocol/cow
Univ3 Subgraph - https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum
Dune Solver Names Query link - https://dune.com/queries/1941061


### Installation Notes
If you haven't already, you can install DataStreams with the command `!pip install git+https://github.com/Evan-Kim2028/DataStreams.git` in a new cell. Exclude the `!` if you are installing in a virtual environment or terminal. DataStreams requires Python 3.10 or greater too. Finally we use polars to perform merges and column mutations. You can install polars with `!pip install polars`.

### Process
First we query the CoW trade schemas twice for WETH/USDC. Then we query the CoW schema for settlement info. Finally we download the Dune Solver names query and merge it with the CoW data. We then perform some column mutations with polars to get the final data.

For Univ3 data, we query the swaps schema for the USDC/WETH .05% and .3% fee pools. Then we merge the CoW trades data and Univ3 Swaps data for transactions that occur at the same timestamp. We add decimal places for the amounts and calculate marginal swap execution prices for comparison.

### Setup Jupyter Environment

In [1]:
from datastreams.datastream import Streamer

import pandas as pd
import polars as pl

# These commands enlarge the column size of the dataframe so things like 0x... are not truncated
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

### Cowswap Trades

In [2]:
# instantiate Streamer class. Note that we need two separate streamer classes, otherwise the queries will be overwritten. 
cow_ds1 = Streamer('https://api.thegraph.com/subgraphs/name/cowprotocol/cow')
cow_ds2 = Streamer('https://api.thegraph.com/subgraphs/name/cowprotocol/cow')

In [3]:
# DEFINE TIMESTAMP HERE. Timstamp is used for replication quality assurance purposes.
timestamp = 1677891498 # current block timestamp is around 1677891498 on March 3rd, 2023 8:06PM

# define ethereum token addresses here to be used in cowswap trades query filter
weth_addr = "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2"
usdc_addr = "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"

# we set a fixed query size number. The Cow settlements and Uniswap swaps query are multiples larger than this initial query size.
query_size = 100000000

#Filter size - We filter trades out that are smaller than $1000 USD size
filter_usd = 1000

In [4]:
token_addr_list = [weth_addr, usdc_addr]

In [5]:
# We need to make two queries to the cow schema to get all the trades that match weth/usdc and usdc/weth.
trades_weth_usdc_fp = cow_ds1.queryDict.get('trades')
trades_usdc_weth_fp = cow_ds2.queryDict.get('trades')

# trades query path that gets token a -> token b trades
trades_weth_usdc_qp = trades_weth_usdc_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {
    'timestamp_lt': timestamp, 
    'buyAmountUsd_gt': filter_usd, 
    'sellAmountUsd_gt': filter_usd, 
    "sellToken_in": token_addr_list, 
    "buyToken_in": token_addr_list
    }
)

# trades query path that gets token b -> token a trades
trades_usdc_weth_qp = trades_usdc_weth_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {
    'timestamp_lt': timestamp, 
    'buyAmountUsd_gt': filter_usd, 
    'sellAmountUsd_gt': filter_usd, 
    "sellToken_in": token_addr_list, 
    "buyToken_in": token_addr_list
    }
)

# run query
trades_weth_usdc_df = cow_ds1.runQuery(trades_weth_usdc_qp)
trades_usdc_weth_df = cow_ds2.runQuery(trades_usdc_weth_qp)

FIELD - trades
FIELD - trades


In [6]:
# combine the trades queries together
trades_df = pd.concat([trades_weth_usdc_df, trades_usdc_weth_df])

In [7]:
print(f'query returned {len(trades_df)} rows')

query returned 43696 rows


In [8]:
# get unique values in trades_df column to verify the query results.
trades_df['trades_buyToken_id'].unique()

array(['0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2',
       '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'], dtype=object)

In [9]:
# replace the above values with symbols
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].replace(weth_addr, 'WETH')
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].replace(usdc_addr, 'USDC')

trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].replace(weth_addr, 'WETH')
trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].replace(usdc_addr, 'USDC')

### Cowswap Trades-Settlement Merge

In [10]:
# get a query field path from the query dictionary which is automatically populated in the Streamer object
settlements_fp = cow_ds1.queryDict.get('settlements')

# add parameters to the settlements_qp.
settlements_qp = settlements_fp(
    first=query_size * 5,
    orderBy='firstTradeTimestamp',
    orderDirection='desc',
    where = {'firstTradeTimestamp_lt': timestamp} 
    )

# run query
settlements_df = cow_ds1.runQuery(settlements_qp)

FIELD - settlements


In [11]:
settlements_df.size

1876815

In [12]:
trades_df.dtypes

trades_id                object
trades_timestamp          int64
trades_gasPrice           int64
trades_feeAmount          int64
trades_txHash            object
trades_settlement_id     object
trades_buyAmount         object
trades_sellAmount        object
trades_sellToken_id      object
trades_buyToken_id       object
trades_order_id          object
trades_buyAmountEth     float64
trades_sellAmountEth    float64
trades_buyAmountUsd     float64
trades_sellAmountUsd    float64
endpoint                 object
dtype: object

In [13]:
# enforce trades_df column types. This is necessary because the data types are not enforced by pandas dataframes. Types are enforced as a Polars dataframe because of the columnar storage method.
trades_df['trades_buyAmount'] = trades_df['trades_buyAmount'].astype('float64')
trades_df['trades_sellAmount'] = trades_df['trades_sellAmount'].astype('float64')
trades_df['trades_buyAmountUsd'] = trades_df['trades_buyAmountUsd'].astype('float64')
trades_df['trades_sellAmountUsd'] = trades_df['trades_sellAmountUsd'].astype('float64')
trades_df['trades_timestamp'] = trades_df['trades_timestamp'].astype('int64')
trades_df['trades_buyToken_id'] = trades_df['trades_buyToken_id'].astype('str')
trades_df['trades_sellToken_id'] = trades_df['trades_sellToken_id'].astype('str')

In [14]:
# convert dfs into a dictionaries
settlement_dict = settlements_df.to_dict('records')
trades_dict = trades_df.to_dict('records')

In [15]:
# convert dictionaries into polars dataframes
settlement_pl = pl.from_dicts(settlement_dict)
trades_pl = pl.from_dicts(trades_dict)

In [16]:
# merge trades and settlement dataframes on the settlement transaction hash
cow_trades_pl = trades_pl.join(other=settlement_pl, left_on='trades_settlement_id', right_on='settlements_txHash', how='inner')

In [17]:
cow_trades_pl.shape

(43694, 20)

In [18]:
cow_trades_pl.head(5)

trades_id,trades_timestamp,trades_gasPrice,trades_feeAmount,trades_txHash,trades_settlement_id,trades_buyAmount,trades_sellAmount,trades_sellToken_id,trades_buyToken_id,trades_order_id,trades_buyAmountEth,trades_sellAmountEth,trades_buyAmountUsd,trades_sellAmountUsd,endpoint,settlements_id,settlements_firstTradeTimestamp,settlements_solver_id,endpoint_right
str,i64,i64,i64,str,str,f64,f64,str,str,str,f64,f64,f64,f64,str,str,i64,str,str
"""0x2d177cbcc3e2...",1677890687,34031938581,11504283,"""0xdbeb3db4bf01...","""0xdbeb3db4bf01...",6.3506e+19,100000000000.0,"""USDC""","""WETH""","""0x2d177cbcc3e2...",63.506413,63.678801,99729.286519,100000.0,"""cow""","""0xdbeb3db4bf01...",1677890687,"""0x149d0f928233...","""cow"""
"""0x2d177cbcc3e2...",1677890687,34031938581,11504283,"""0xdbeb3db4bf01...","""0xdbeb3db4bf01...",6.3506e+19,100000000000.0,"""USDC""","""WETH""","""0x2d177cbcc3e2...",63.506413,63.678801,99729.286519,100000.0,"""cow""","""0xdbeb3db4bf01...",1677890687,"""0x149d0f928233...","""cow"""
"""0x47ece80491bf...",1677887663,30096110884,39337855,"""0x8810dcd24713...","""0x8810dcd24713...",9.5578e+19,150000000000.0,"""USDC""","""WETH""","""0x47ece80491bf...",95.578018,95.778401,149686.177736,150000.0,"""cow""","""0x8810dcd24713...",1677887663,"""0x149d0f928233...","""cow"""
"""0x47ece80491bf...",1677887663,30096110884,39337855,"""0x8810dcd24713...","""0x8810dcd24713...",9.5578e+19,150000000000.0,"""USDC""","""WETH""","""0x47ece80491bf...",95.578018,95.778401,149686.177736,150000.0,"""cow""","""0x8810dcd24713...",1677887663,"""0x149d0f928233...","""cow"""
"""0x8c129381cf3e...",1677885323,19689355198,27018672,"""0xf8e2a0e1ae13...","""0xf8e2a0e1ae13...",9.5518e+19,150000000000.0,"""USDC""","""WETH""","""0x8c129381cf3e...",95.518272,95.806942,149548.044201,150000.0,"""cow""","""0xf8e2a0e1ae13...",1677885323,"""0xb20b86c4e6de...","""cow"""


In [19]:
# get unique values in cow_trades_pl trades_sellToken_id column
cow_trades_pl['trades_sellToken_id'].unique()

trades_sellToken_id
str
"""USDC"""
"""WETH"""


### Cowswap Trades-Solver Merge

In [20]:
solvers = pd.read_csv('data/cowv2_solvers.csv') # load in pandas instead of polars. Having trouble replacing \ symbol in polars

In [21]:
# rename address to settlements_solver_id in pandas
solvers = solvers.rename(columns={"address": "settlements_solver_id"})

In [22]:
# NOTE - dune formats addresses as /x... need to convert '/' to '0'
solvers['settlements_solver_id'] = solvers['settlements_solver_id'].str.replace('\\', '0', regex=False)

In [23]:
# turn solvers into a dictionary
solvers_dict = solvers.to_dict('records')

# convert dict to polars
solvers_pl = pl.from_dicts(solvers_dict)

In [24]:
# inner join solvers_pl on total_settlement_tokens_pl
cow_complete_pl = cow_trades_pl.join(solvers_pl, on="settlements_solver_id", how="inner")

In [25]:
# drop endpoint_right column from total_settlement_tokens_solvers
cow_complete_pl = cow_complete_pl.drop('endpoint_right')

In [26]:
cow_complete_pl.shape

(43512, 22)

In [27]:
# save polars to parquet
cow_complete_pl.write_parquet('data/cow_complete_pl.parquet')

#### Basic Agg

In [28]:
# filter by "prod" environments
filter_df = cow_complete_pl.filter(pl.col("environment") == "prod")

In [29]:
# filter by "prod" environments
filter_df = cow_complete_pl.filter(pl.col("environment") == "prod")

In [30]:
filter_df.shape

(42310, 22)

In [31]:
# group filter_df by solver name. Check solver count
grouped_df = filter_df.groupby("name").agg(
    pl.count("trades_id").alias("total_trades")).sort("total_trades", reverse=True)


In [32]:
grouped_df

name,total_trades
str,u32
"""Otex""",6510
"""PLM""",6020
"""Gnosis_0x""",5866
"""Gnosis_1inch""",4568
"""QuasiModo""",4556
"""Legacy""",4358
"""Laertes""",1914
"""DexCowAgg""",1774
"""MIP""",1706
"""Gnosis_ParaSwa...",1504


### Uniswap V3 Swaps

In [33]:
# instantiate Streamer object. 
# Note - unlike the cow queries, univ3 does not require multiple streamer instantations because the swaps field path is reset each iteration. 
# If the Cow queries were updated to use the same method, we could use the same streamer object for all queries.
univ3_ds = Streamer('https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum')

In [34]:
# get a query field path from the query dictionary which is automatically populated in the Streamer object
swaps_fp = univ3_ds.queryDict.get('swaps')

In [35]:
weth_usdc_list = [
    "0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640", # usdc/weth .05%
    "0x8ad599c3a0ff1de082011efddc58f1908eb6e6d8" #usdc/weth .3%
]

In [36]:
swaps_df_list = []

In [37]:
# for loop over the LP list to get the swap data
for lp in weth_usdc_list:
    # add parameters to the query_path
    swaps_qp = swaps_fp(
        first=query_size,
        orderBy='timestamp',
        orderDirection='desc',
        where = {'timestamp_lt': timestamp, 'amountInUSD_gt': filter_usd, 'amountOutUSD_gt': filter_usd, 'pool': lp} 
        )

    # run query
    swaps_df = univ3_ds.runQuery(swaps_qp)
    swaps_df_list.append(swaps_df)

FIELD - swaps


In [None]:
# concat swaps_df_list into a single dataframe.
swaps_df = pd.concat(swaps_df_list)

In [None]:
swaps_df.head(5)

In [None]:
# replace the pool addresses with LP pool names with fees
swaps_df['swaps_pool_id'] = swaps_df['swaps_pool_id'].replace(weth_usdc_list[0], 'USDC_WETH .05%')
swaps_df['swaps_pool_id'] = swaps_df['swaps_pool_id'].replace(weth_usdc_list[1], 'USDC_WETH .3%')

# replace token addresses with symbols
swaps_df['swaps_tokenIn_id'] = swaps_df['swaps_tokenIn_id'].replace(usdc_addr, 'USDC')
swaps_df['swaps_tokenIn_id'] = swaps_df['swaps_tokenIn_id'].replace(weth_addr, 'WETH')
swaps_df['swaps_tokenOut_id'] = swaps_df['swaps_tokenOut_id'].replace(usdc_addr, 'USDC')
swaps_df['swaps_tokenOut_id'] = swaps_df['swaps_tokenOut_id'].replace(weth_addr, 'WETH')

In [None]:
print(f'query returned {len(swaps_df)} rows\n swaps_df columns are {swaps_df.columns}')

In [None]:
# convert swaps_df to pl
swaps_dict = swaps_df.to_dict('records')
swaps_pl = pl.from_dicts(swaps_dict)

### This is a checkpoint to save the data incase there are crashes

In [None]:
# save polars to parquet
swaps_pl.write_parquet('data/swaps_pl.parquet')

### Get the Uniswap Gas Data

In [None]:
univ3_no_messari_ds = Streamer('https://api.thegraph.com/subgraphs/name/uniswap/uniswap-v3')

In [None]:
transactions_fp = univ3_no_messari_ds.queryDict.get('transactions')

In [None]:
transactions_df_list = []

In [None]:
weth_txs_qp = transactions_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {"swaps_": {'timestamp_lt': timestamp, 'pool': weth_usdc_list[0]}}
    )

usdc_txs_qp = transactions_fp(
    first=query_size,
    orderBy='timestamp',
    orderDirection='desc',
    where = {"swaps_": {'timestamp_lt': timestamp, 'pool': weth_usdc_list[1]}}
    )

In [None]:
txs1_df = univ3_no_messari_ds.runQuery(weth_txs_qp)

In [None]:
txs2_df = univ3_no_messari_ds.runQuery(usdc_txs_qp)

In [None]:
# concat txs1 and txs2
transactions_df = pd.concat([txs1_df, txs2_df])

In [None]:
transactions_df.shape

In [None]:
swaps_pl.shape

In [None]:
# convert swaps_df to pl
transactions_dict = transactions_df.to_dict('records')
transactions_pl = pl.from_dicts(transactions_dict)

In [None]:
# save polars to parquet
transactions_pl.write_parquet('data/transactions_pl.parquet')

### Merge Swaps and Transactions for Gas

In [None]:
# merge swaps_pl and transactions_pl polars
uni_complete_pl = swaps_pl.join(transactions_pl, left_on="swaps_hash", right_on="transactions_id", how="inner")

In [None]:
# drop duplicate rows
uni_complete_pl = uni_complete_pl.unique()

In [None]:
# drop endpoint column from uni_complete_pl
uni_complete_pl = uni_complete_pl.drop('endpoint')

In [None]:
# to get transaction gas used, we do gasUsed * gasPrice / 10^9
uni_complete_pl = uni_complete_pl.with_columns(
    (pl.col("transactions_gasUsed") * pl.col("transactions_gasPrice")).alias('transaction_gas_fee')
)

In [None]:
uni_complete_pl = uni_complete_pl.with_columns(
    (pl.col("transaction_gas_fee") / 10**18).alias('transaction_gas_fee') # wei is 10^9, but eth is 10^18
)

In [None]:
# sort by largest transaction_gas_fee
uni_complete_pl.sort("transaction_gas_fee", reverse=True).head(5)

### Merge Cow and Univ3

In [None]:
# merge trades and swaps on timestamp value. We use outer join because we want to keep all trades and swaps data and backfill swap values
cow_uni_outer_pl = cow_complete_pl.join(other=uni_complete_pl, left_on='trades_timestamp', right_on='swaps_timestamp', how='outer')

In [None]:
cow_uni_outer_pl.columns

In [None]:
# This truncated dataframe isn't being used right now
cow_uni_trunc_pl = cow_uni_outer_pl[[
    'trades_timestamp', 
    'trades_txHash',
    'trades_feeAmount',
    'trades_sellToken_id', 
    'trades_buyToken_id', 
    'trades_buyAmount',
    'trades_sellAmount',
    # 'trades_sellAmountUsd', 
    # 'trades_buyAmountUsd', 
    'name',
    'environment',
    'swaps_pool_id', 
    'swaps_tokenIn_id', 
    'swaps_tokenOut_id',
    'swaps_amountIn',
    'swaps_amountOut',  
    # 'swaps_amountInUSD',
    # 'swaps_amountOutUSD',
    'swaps_blockNumber',
    # 'transactions_timestamp',
    # 'transactions_gasUsed',
    # 'transactions_gasPrice',
    'transaction_gas_fee'
    ]]

In [None]:
#check pl dataframe size
cow_uni_trunc_pl.shape

In [None]:
# sort by largest transaction_gas_fee
cow_uni_trunc_pl.sort("transaction_gas_fee", reverse=True).head(5)

### Chainlink

In [None]:
# load streamer class
chain_ds = Streamer('https://api.thegraph.com/subgraphs/name/openpredict/chainlink-prices-subgraph')

In [None]:
chain_price_feed = "ETH/USD"

In [None]:
# get a query field path from the query dictionary which is automatically populated in the Streamer object
chain_fp = chain_ds.queryDict.get('prices')

# add parameters to the settlements_qp.
chain_qp = chain_fp(
    first=query_size * 5,
    orderBy='timestamp',
    orderDirection='desc',
    where = {'timestamp_lt': timestamp, 'assetPair': chain_price_feed}
    )

# run query
chain_df = chain_ds.runQuery(chain_qp)

In [None]:
# drop prices_id, endpoint. 
chain_df = chain_df.drop(['prices_id', 'endpoint'], axis=1)
# divide prices_price by 1e8 to get the price in USD
chain_df['prices_price'] = chain_df['prices_price'] / 10 ** 8

In [None]:
chain_df.shape

In [None]:
# save polars to parquet
chain_df.write_parquet('data/chain_df.parquet')

### Merge chain_df with cow_uni_outer_pl

In [None]:
# convert chain_df to dictionary and then polars datarame
chain_dict = chain_df.to_dict('records')
chain_pl = pl.from_dicts(chain_dict)

In [None]:
# merge chain_df with cow_uni_outer_pl on timestamp
cow_uni_chain_outer_pl = cow_uni_trunc_pl.join(other=chain_pl, left_on='trades_timestamp', right_on='prices_timestamp', how='outer')

In [None]:
cow_uni_chain_outer_pl.shape

### Calculate and Plot Prices

#### Decimal Calculations

In [None]:
# add decimals to cow trades sell tokens
cow_uni_chain_outer_pl = cow_uni_chain_outer_pl.with_columns(
    [
        pl.col('trades_sellToken_id'),
        (
            pl.when(pl.col('trades_sellToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_sellToken_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_chain_outer_pl = cow_uni_chain_outer_pl.with_columns(
    [
        pl.col('trades_buyToken_id'),
        (
            pl.when(pl.col('trades_buyToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_buyToken_decimals'),
    ]
)

# add decimals to cow trades sell tokens
cow_uni_chain_outer_pl = cow_uni_chain_outer_pl.with_columns(
    [
        pl.col('swaps_tokenIn_id'),
        (
            pl.when(pl.col('swaps_tokenIn_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenIn_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_chain_outer_pl = cow_uni_chain_outer_pl.with_columns(
    [
        pl.col('swaps_tokenOut_id'),
        (
            pl.when(pl.col('swaps_tokenOut_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenOut_decimals'),
    ]
)

In [None]:
cow_uni_chain_outer_pl.columns # has transaction cols

#### Execution Price Calculations

In [None]:
# note that polars can perform these calculations in-column. This means it can convert the values in place without creating a new column. The new column created here is more verbose, but is a good sanity check to see before/after results.
trades_swaps_converted_pl = cow_uni_chain_outer_pl.with_columns([
    (pl.col("trades_buyAmount") / (10**pl.col("trades_buyToken_decimals"))).alias('trades_buyAmount_converted'),
    (pl.col("trades_sellAmount") / (10**pl.col("trades_sellToken_decimals"))).alias('trades_sellAmount_converted'),
    (pl.col("swaps_amountIn") / (10**pl.col("swaps_tokenIn_decimals"))).alias('swaps_amountIn_converted'),
    (pl.col("swaps_amountOut") / (10**pl.col("swaps_tokenOut_decimals"))).alias('swaps_amountOut_converted'),
])

In [None]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_pl.with_columns([
    (pl.col("trades_buyAmount_converted") / pl.col("trades_sellAmount_converted")).alias('trades_buy_sell_ratio'),
    (pl.col("trades_sellAmount_converted") / pl.col("trades_buyAmount_converted")).alias('trades_sell_buy_ratio'),
    (pl.col("swaps_amountIn_converted") / pl.col("swaps_amountOut_converted")).alias('swaps_amountIn_amountOut_ratio'),
    (pl.col("swaps_amountOut_converted") / pl.col("swaps_amountIn_converted")).alias('swaps_amountOut_amountIn_ratio'),
])

In [None]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_trunc_pl[
    'trades_timestamp',
    'swaps_blockNumber',
    'trades_txHash',
    'trades_feeAmount',
    'trades_sellToken_id',
    'trades_buyToken_id',
    'trades_sellAmount_converted',
    'trades_buyAmount_converted',
    'name',
    'environment',
    'swaps_pool_id',
    'swaps_tokenIn_id',
    'swaps_tokenOut_id',
    'swaps_amountIn_converted',
    'swaps_amountOut_converted',
    'transaction_gas_fee',
    'trades_buy_sell_ratio',
    'trades_sell_buy_ratio',
    'swaps_amountIn_amountOut_ratio',
    'swaps_amountOut_amountIn_ratio',
    'prices_assetPair_id',
    'prices_price'
]

In [None]:
trades_swaps_converted_trunc_pl.shape

In [None]:
trades_swaps_converted_trunc_pl.head(10)

In [None]:
# replace null values with a small number, 0.0001 in non-str cols. Reason why we can't use 0 is we will get a divide by zero error otherwise.
trades_swaps_converted_trunc_pl = trades_swaps_converted_trunc_pl.fill_null(0.0001)

In [None]:
trades_swaps_converted_trunc_pl.head(5)

In [None]:
# return the larger value between trades_buy_sell_ratio and trades_sell_buy_ratio in a lambda function
execution_prices_pl = trades_swaps_converted_trunc_pl.with_columns([
    (pl.col("trades_buy_sell_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('trades_buy_sell_ratio'),
    (pl.col("trades_sell_buy_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('trades_sell_buy_ratio'),
    (pl.col("swaps_amountIn_amountOut_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('swaps_amountIn_amountOut_ratio'),
    (pl.col("swaps_amountOut_amountIn_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('swaps_amountOut_amountIn_ratio'),
])

In [None]:
execution_prices_pl.columns

In [None]:
# convert execution_prices_pl to pandas dataframe for easier plotting.
execution_prices_pd = execution_prices_pl.to_pandas()

# set trades_timestamp as index and order by trades_timestamp
execution_prices_pd = execution_prices_pd.set_index('trades_timestamp').sort_index()

In [None]:
# get rows with highest trades_buy_sell_ratio values. We see here some 10,000 values. This is due to the fact that we use a dust amount .0001 to replace null values which is done to avoid division by zero errors.
execution_prices_pd['trades_buy_sell_ratio'].head(5)

In [None]:
execution_prices_pd['name'].unique()

In [None]:
# filter out trades_buy_sell_ratio values that are 10000. These are outlier mistakes as a result from calculating prices with dust amounts.

# cow
cow_execution_prices_pd = execution_prices_pd[execution_prices_pd['trades_buy_sell_ratio'] < 10000]
cow_execution_prices_pd = execution_prices_pd[execution_prices_pd['trades_sell_buy_ratio'] < 10000]

#univ3 
univ3_execution_prices_pd = execution_prices_pd[execution_prices_pd['swaps_amountIn_amountOut_ratio'] < 10000]
univ3_execution_prices_pd = execution_prices_pd[execution_prices_pd['swaps_amountOut_amountIn_ratio'] < 10000]

# filter out chainlink prices if it less than 1
chain_execution_prices_pd = execution_prices_pd[execution_prices_pd['prices_price'] > 1]

In [None]:
# print the shapes in a single f string
print(f"cow_execution_prices_pd shape: {cow_execution_prices_pd.shape}")
print(f"univ3_execution_prices_pd shape: {univ3_execution_prices_pd.shape}")
print(f"chain_execution_prices_pd shape: {chain_execution_prices_pd.shape}")

#### FFILL prices to the timestamp range and plot

In [None]:
start_index = int(cow_execution_prices_pd.index.min())
end_index = int(cow_execution_prices_pd.index.max())

In [None]:
start_index

In [None]:
# create an empty dataframe with index from start_index to end_index
price_diff_df = pd.DataFrame(index=range(start_index, end_index))

In [None]:
price_diff_df.head(5)

In [None]:
# merge univ3 duplicate index values by taking the mean of the values
univ3_execution_prices_pd = univ3_execution_prices_pd.groupby(univ3_execution_prices_pd.index).mean(numeric_only=True)

In [None]:
# do same for cow and chainlink
cow_execution_prices_pd = cow_execution_prices_pd.groupby(cow_execution_prices_pd.index).mean(numeric_only=True)
chain_execution_prices_pd = chain_execution_prices_pd.groupby(chain_execution_prices_pd.index).mean(numeric_only=True)

In [None]:
univ3_execution_prices_pd

In [None]:
# add univ3 priices to price_diff_df. If there is no price for a given timestamp, give it the most recent price value.
price_diff_df['univ3'] = univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio'].reindex(price_diff_df.index, method='ffill')
# add univ3 transaction gas fee
price_diff_df['univ3_gas_fee'] = univ3_execution_prices_pd['transaction_gas_fee'].reindex(price_diff_df.index, method='ffill')
# add cow
price_diff_df['cow'] = cow_execution_prices_pd['trades_buy_sell_ratio'].reindex(price_diff_df.index, method='ffill')
# add chain
price_diff_df['chain'] = chain_execution_prices_pd['prices_price'].reindex(price_diff_df.index, method='ffill')

In [None]:
# add a column of (cow - univ3) prices
price_diff_df['cow_univ3_diff'] = price_diff_df['cow'] - price_diff_df['univ3']
# add a column of (cow - chain) prices
price_diff_df['cow_chain_diff'] = price_diff_df['cow'] - price_diff_df['chain']

In [None]:
# get the percentages
price_diff_df['cow_univ3_diff_percent'] = price_diff_df['cow_univ3_diff'] / price_diff_df['univ3']
price_diff_df['cow_chain_diff_percent'] = price_diff_df['cow_chain_diff'] / price_diff_df['chain']

In [None]:
# add gas tx adjustment to univ3
price_diff_df['univ3_gas_adj_price'] = price_diff_df['univ3'] - (price_diff_df['univ3'] * price_diff_df['univ3_gas_fee'])

In [None]:
# add gas tx adjustment to chain (calculated using univ3 transaction fee)
price_diff_df['chain_gas_adj_price'] = price_diff_df['chain'] - (price_diff_df['chain'] * price_diff_df['univ3_gas_fee'])

In [None]:
# add a column of (cow - univ3) prices
price_diff_df['cow_univ3_gas_diff'] = price_diff_df['cow'] - price_diff_df['univ3_gas_adj_price']
# add a column of (cow - chain) prices
price_diff_df['cow_chain_gas_diff'] = price_diff_df['cow'] - price_diff_df['chain_gas_adj_price']

In [None]:
# get the percentages
price_diff_df['cow_univ3_gas_diff_percent'] = price_diff_df['cow_univ3_gas_diff'] / price_diff_df['univ3_gas_adj_price']
price_diff_df['cow_chain_gas_diff_percent'] = price_diff_df['cow_chain_gas_diff'] / price_diff_df['chain_gas_adj_price']

In [None]:
# filter price_diff_df based on cow_execution_price timestamp
price_diff_df_filtered = price_diff_df[price_diff_df.index.isin(cow_execution_prices_pd.index)]

In [None]:
price_diff_df_filtered.shape

In [None]:
price_diff_df_filtered.head(5)

In [None]:
# plot the univ3, cow, and chain prices as histogram distributions with  matplotlib all on the same axis
fig, ax = plt.subplots()
# price_diff_df_filtered_usdc_pos['cow'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='univ3')
price_diff_df_filtered['cow_univ3_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.6, label='univ3')
price_diff_df_filtered['cow_chain_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='chain')
ax.legend()

plt.title('CoW Price Diff')

In [None]:
# remove outliers from gas diff percent and plot histograms
price_diff_df_filtered_gas = price_diff_df_filtered[price_diff_df_filtered['cow_univ3_gas_diff_percent'] < 0.1]
price_diff_df_filtered_gas = price_diff_df_filtered_gas[price_diff_df_filtered_gas['cow_chain_gas_diff_percent'] < 0.1]

# plot the univ3, cow, and chain prices as histogram distributions with  matplotlib all on the same axis
fig, ax = plt.subplots()
# price_diff_df_filtered_usdc_pos['cow'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='univ3')
price_diff_df_filtered_gas['cow_univ3_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.6, label='univ3')
price_diff_df_filtered_gas['cow_chain_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='chain')
ax.legend()

plt.title('CoW Price Diff (Gas Adjusted)')

In [None]:
# plot the univ3, cow, and chain prices as histogram distributions with  matplotlib all on the same axis
fig, ax = plt.subplots()
# price_diff_df_filtered_usdc_pos['cow'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='univ3')
price_diff_df_filtered['cow_univ3_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.6, label='univ3')
price_diff_df_filtered['cow_chain_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='chain')
ax.legend()

plt.title('CoW Price Diff (gas adj)')

In [None]:
# split dataframe into cow trades based on buy or sell
cow_buy_trades = cow_execution_prices_pd[cow_execution_prices_pd['trades_buy_sell_ratio'] > 1]

In [None]:
execution_prices_pd['trades_buyToken_id'].unique()

In [None]:
# get an array of timestamps where trades_buyToken_id = USDC
usdc_trades_filter = execution_prices_pd[execution_prices_pd['trades_buyToken_id'] == 'USDC']
usdc_trades_filter = usdc_trades_filter.index

# get an array of timestamps where trades_buyToken_id = WETH
weth_trades_filter = execution_prices_pd[execution_prices_pd['trades_buyToken_id'] == 'WETH']
weth_trades_filter = weth_trades_filter.index

In [None]:
# convert values in filter to integer
usdc_trades_filter = usdc_trades_filter.astype(int)
weth_trades_filter = weth_trades_filter.astype(int)

In [None]:
# filter price_diff_df_filtered based on the usdc trades filter
price_diff_df_filtered_usdc = price_diff_df_filtered[price_diff_df_filtered.index.isin(usdc_trades_filter)]

In [None]:
price_diff_df_filtered_usdc.head(5)

In [None]:
# do same for weth
price_diff_df_filtered_weth = price_diff_df_filtered[price_diff_df_filtered.index.isin(weth_trades_filter)]

In [None]:
# print f string for filtered df sizes
print(f"price_diff_df_filtered_usdc shape: {price_diff_df_filtered_usdc.shape}")
print(f"price_diff_df_filtered_weth shape: {price_diff_df_filtered_weth.shape}")

In [None]:
# merge duplicate index values on execution_prices_pd to get the names
solver_name_filter = execution_prices_pd['name'].groupby(execution_prices_pd.index).first()

In [None]:
# add a second filter that adds 'names' column to the filtered dfs
price_diff_df_filtered_usdc.loc[:, 'name'] = solver_name_filter

In [None]:
# do the same for weth
price_diff_df_filtered_weth.loc[:, 'name'] = solver_name_filter

In [None]:
# plot histogram grouped by names showing the sum of the price diff
# price_diff_df_filtered_usdc.groupby('name')['cow_univ3_diff_percent'].sum().plot(kind='bar', title='cow_univ3_diff_percent')

# Create a Matplotlib figure and axis object
fig, ax = plt.subplots()

# Create two sets of data using the groupby method
pos_counts = price_diff_df_filtered_usdc.groupby('name')['cow_univ3_diff_percent'].sum()
neg_counts = -price_diff_df_filtered_usdc.groupby('name')['cow_univ3_diff_percent'].sum()

# Plot the first bar chart on the axis
pos_counts.plot(kind='bar', ax=ax, color='b', alpha=0.5, label='Positive Price Diff Sum')

# Plot the second bar chart on the same axis
neg_counts.plot(kind='bar', ax=ax, color='r', alpha=0.5, label='Negative Price Diff Sum')

# Add a legend to the plot
ax.legend()

# Set the title for the plot
ax.set_title('cow_univ3_diff_percent_sum (USDC)')

# Show the plot
plt.show()

In [None]:
# Create a Matplotlib figure and axis object
fig, ax = plt.subplots()

# Create two sets of data using the groupby method
pos_counts = price_diff_df_filtered_usdc.groupby('name')['cow_univ3_diff_percent'].apply(lambda x: (x > 0).sum())
neg_counts = -price_diff_df_filtered_usdc.groupby('name')['cow_univ3_diff_percent'].apply(lambda x: (x < 0).sum())

# Plot the first bar chart on the axis
pos_counts.plot(kind='bar', ax=ax, color='b', alpha=0.5, label='Positive Price Diff')

# Plot the second bar chart on the same axis
neg_counts.plot(kind='bar', ax=ax, color='r', alpha=0.5, label='Negative Price Diff')

# Add a legend to the plot
ax.legend()

# Set the title for the plot
ax.set_title('cow_univ3 Price Diff Count (USDC)')

# Show the plot
plt.show()

In [None]:
# get new dataframe with only positive cow_univ3_diff_pct values
price_diff_df_filtered_usdc_pos = price_diff_df_filtered_usdc[price_diff_df_filtered_usdc['cow_univ3_diff_percent'] > 0]

In [None]:
# plot the univ3, cow, and chain prices as histogram distributions with  matplotlib all on the same axis
fig, ax = plt.subplots()
# price_diff_df_filtered_usdc_pos['cow'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='univ3')
price_diff_df_filtered_usdc['cow_univ3_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.6, label='cow')
price_diff_df_filtered_usdc['cow_chain_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='chain')
ax.legend()

plt.title('USDC Price Diff')

In [None]:
# plot the univ3, cow, and chain prices as histogram distributions with  matplotlib all on the same axis
fig, ax = plt.subplots()
# price_diff_df_filtered_weth_pos['cow'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='univ3')
price_diff_df_filtered_weth['cow_univ3_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.6, label='cow')
price_diff_df_filtered_weth['cow_chain_gas_diff_percent'].plot(kind='hist', bins=50, ax=ax, alpha=0.45, label='chain')
ax.legend()

plt.title('weth Price Diff')

In [None]:
# take the price_diff_df_filtered_usdc_pos dataframe and use it as a filter on execution_prices timestamp to get trade amount
sample = execution_prices_pd[execution_prices_pd.index.isin(price_diff_df_filtered_usdc_neg.index)]

In [None]:
# plot the trades_buyAmount_converted, swaps_amountOut_converted to get a sense of the distribution. Cut off values greater than 1e6
sample[sample['trades_buyAmount_converted'] < 2e5]['trades_buyAmount_converted'].plot(kind='hist', bins=50, alpha=0.5, label='trades_buyAmount_converted')
sample[sample['swaps_amountOut_converted'] < 2e5]['swaps_amountOut_converted'].plot(kind='hist', bins=50, alpha=0.5, label='swaps_amountOut_converted')
plt.legend()

plt.title("buyToken=USDC, swaps_amountOut=USDC")

#### Aggregate Price Diff Stats

In [None]:
# sum the percent rows
usdc_univ3_sum = price_diff_df_filtered_usdc['cow_univ3_diff_percent'].sum()
usdc_chain_sum = price_diff_df_filtered_usdc['cow_chain_diff_percent'].sum()

# print the sums
print(f"usdc_univ3_sum: {usdc_univ3_sum}")
print(f"usdc_chain_sum: {usdc_chain_sum}")

In [None]:
# plot histograms of the percent differences
price_diff_df_filtered_usdc['cow_univ3_diff_percent'].hist(bins=100, label='cow_univ3_diff_percent', alpha=.5)
price_diff_df_filtered_usdc['cow_chain_diff_percent'].hist(bins=100, label='cow_chain_diff_pct', alpha=.5)

plt.title('cow price diffs (USDC)')
plt.legend()
plt.show();

In [None]:
# sum the percent rows
weth_univ3_sum = price_diff_df_filtered_weth['cow_univ3_diff_percent'].sum()
weth_chain_sum = price_diff_df_filtered_weth['cow_chain_diff_percent'].sum()

# print the sums
print(f"weth_univ3_sum: {weth_univ3_sum}")
print(f"weth_chain_sum: {weth_chain_sum}")

In [None]:
# plot histograms of the percent differences
price_diff_df_filtered_weth['cow_univ3_diff_percent'].hist(bins=100, label='cow_univ3_diff_percent', alpha=.5)
price_diff_df_filtered_weth['cow_chain_diff_percent'].hist(bins=100, label='cow_chain_diff_pct', alpha=.5)

plt.title('cow price diffs (WETH)')
plt.legend()
plt.show();

In [None]:
# plot the percent columns filtered data in same plot axis, add horizontal line at 0
price_diff_df_filtered_usdc[['cow_univ3_diff_percent', 'cow_chain_diff_percent']].plot(figsize=(13,8), title='Cow vs Chainlink vs Uniswap V3 (buyToken=USDC)')
plt.axhline(y=0, color='r', linestyle='-')
price_diff_df_filtered_weth[['cow_univ3_diff_percent', 'cow_chain_diff_percent']].plot(figsize=(13,8), title='Cow vs Chainlink vs Uniswap V3 (buyToken=WETH)')
plt.axhline(y=0, color='r', linestyle='-')

In [None]:
STOP

#### Plot Prices side by side (without ffill)

In [None]:
# cut the univ3_execution_prices_pd to begin when CoW Data begins
univ3_execution_prices_pd = univ3_execution_prices_pd[univ3_execution_prices_pd.index >= cow_execution_prices_pd.index[0]]

In [None]:
# get the minimum index value for the Univ3 data as a sanity check. This is the starting block.
univ3_execution_prices_pd[univ3_execution_prices_pd.index > cow_execution_prices_pd.index.min()]['swaps_amountIn_amountOut_ratio'].index.min()

In [None]:
# merge cow and univ3
cow_univ3_execution_prices_pd = cow_execution_prices_pd.merge(univ3_execution_prices_pd, left_index=True, right_index=True, how='inner')
cow_univ3_chain_execution_prices_pd = cow_univ3_execution_prices_pd.merge(chain_execution_prices_pd, left_index=True, right_index=True, how='inner')

In [None]:
cow_color = '#ff7f0e'
univ3_color = '#1f77b4'
chain_color = '#2ca02c'

In [None]:
plt.figure(figsize=(13,7))

# emphasize CoW price points
plt.scatter(cow_execution_prices_pd.index, cow_execution_prices_pd['trades_buy_sell_ratio'], color='blue', label='COW Data Points', marker="v", s=10)

# plot step prices of cow, univ3, and chain
plt.step(univ3_execution_prices_pd.index, univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio'], label='Univ3 ETH Price', color=univ3_color, linestyle='dashed', alpha=.5)
plt.step(chain_execution_prices_pd.index, chain_execution_prices_pd['prices_price'], label='Chainlink ETH Price', color=chain_color, linestyle='dashed', linewidth=2, alpha=.5)


plt.title('Cow vs Univ3 vs Chain ETH Price Comparisons')
plt.xlabel('Timestamp')
plt.ylabel('ETH Price')
plt.legend(loc='best')
plt.show();

# print the shapes as f strings
print(f"cow_execution_prices_pd shape: {cow_execution_prices_pd.shape}")
print(f"univ3_execution_prices_pd shape: {univ3_execution_prices_pd.shape}")
print(f"chain_execution_prices_pd shape: {chain_execution_prices_pd.shape}")

In [None]:
plt.figure(figsize=(13,7))

# emphasize CoW price points
plt.scatter(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['trades_buy_sell_ratio_x'], color='blue', label='COW Data Points', marker="v", s=10)

# plot step prices of cow, univ3, and chain
plt.step(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio_y'], label='Univ3 ETH Price', color=univ3_color, linestyle='dashed', alpha=.5)
plt.step(cow_univ3_chain_execution_prices_pd.index, cow_univ3_chain_execution_prices_pd['prices_price_x'], label='Chainlink ETH Price', color=chain_color, linestyle='dashed', linewidth=2, alpha=.5)


plt.title('Cow vs Univ3 vs Chain ETH Price Comparisons')
plt.xlabel('Timestamp')
plt.ylabel('ETH Price')
plt.legend(loc='best')
plt.show();


# print the shapes as f strings
print(f"cow_univ3_execution_prices_pd shape: {cow_univ3_execution_prices_pd.shape}")
print(f"cow_univ3_chain_execution_prices_pd shape: {cow_univ3_chain_execution_prices_pd.shape}")

In [None]:
plt.figure(figsize=(13,7))

# emphasize CoW price points
plt.scatter(cow_execution_prices_pd.index, cow_execution_prices_pd['trades_buy_sell_ratio'], color='blue', label='COW Data Points', marker="v", s=10)
plt.step(cow_execution_prices_pd.index, cow_execution_prices_pd['trades_buy_sell_ratio'], color=cow_color, label='COW ETH Price', linewidth=2)
plt.step(univ3_execution_prices_pd.index, univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio'], label='Univ3 ETH Price', color=univ3_color, linestyle='dashed', alpha=.5)
# plt.step(chain_execution_prices_pd.index, chain_execution_prices_pd['prices_price'], label='Chainlink ETH Price', color='green', linestyle='dashed', linewidth=2, alpha=.5)


plt.title('Cow vs Univ3 ETH Price Comparisons')
plt.xlabel('Timestamp')
plt.ylabel('ETH Price')
plt.legend(loc='best')
plt.show();

In [None]:
plt.figure(figsize=(13,7))

# emphasize CoW price points
plt.scatter(cow_execution_prices_pd.index, cow_execution_prices_pd['trades_buy_sell_ratio'], color='blue', label='COW Data Points', marker="v", s=10)
plt.step(cow_execution_prices_pd.index, cow_execution_prices_pd['trades_buy_sell_ratio'], color=cow_color, label='COW ETH Price', linewidth=2)
plt.step(chain_execution_prices_pd.index, chain_execution_prices_pd['prices_price'], label='Chainlink ETH Price', color=chain_color, linestyle='dashed', linewidth=2, alpha=.9)


plt.title('Cow vs Chain ETH Price Comparisons')
plt.xlabel('Timestamp')
plt.ylabel('ETH Price')
plt.legend(loc='best')
plt.show();

In [None]:
plt.figure(figsize=(13,5))
# plot the above as a step plot and specify the colors 003664, 00203B
plt.step(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['trades_buy_sell_ratio_x'], label='CoW ETH Price', color='#003664', linewidth=2, alpha=0.4)
plt.step(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio_y'], label='Univ3 ETH Price', color='#a34d4d', linestyle='dashed')

plt.scatter(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['trades_buy_sell_ratio_x'], color='blue', label='COW Data Points', marker="v", s=10)
# plt.scatter(cow_univ3_execution_prices_pd.index, cow_univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio_y'], color='blue', marker="^", s=5)


plt.title('Cow ETH vs Uniswap V3 ETH')
plt.legend()
plt.show();

In [None]:
# plot histogram distributions of the above and normalize the values
plt.figure(figsize=(13,5))
plt.hist(cow_execution_prices_pd['trades_buy_sell_ratio'], bins=100, label='CoW ETH Price', color=cow_color, alpha=0.7)
plt.hist(univ3_execution_prices_pd['swaps_amountIn_amountOut_ratio'], bins=100, label='Univ3 ETH Price', color=univ3_color, alpha=0.6)
plt.hist(chain_execution_prices_pd['prices_price'], bins=100, label='Chainlink ETH Price', color=chain_color, alpha=0.6)

plt.title('Cow vs Univ3 vs Chain ETH Price')
plt.xlabel('ETH Price')
plt.ylabel('Count')
plt.legend()
plt.show();

In [None]:
# count how many points are plotted
cow_univ3_execution_prices_pd['trades_buy_sell_ratio_x'].count()

### OLD

In [None]:
# get the rows that have 0 null values. This leaves us with the cow/univ3 trades that were executed on the same timestamp.
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_pl.drop_nulls()

In [None]:
# check shape.
cow_uni_trunc_no_nulls_pl.shape

In [None]:
# drop duplicate row values polars dataframe. Note - Unsure why duplicates get inserted into the dataframe during the transformation
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.unique()

In [None]:
cow_uni_trunc_no_nulls_pl.shape

In [None]:
# how many values in trades_feeAmount are 0 polars series
cow_uni_trunc_no_nulls_pl['trades_feeAmount'].eq(0).sum()

In [None]:
cow_uni_trunc_no_nulls_pl.head(15)

#### Add Decimal Values to data

In [None]:
# add decimals to cow trades sell tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('trades_sellToken_id'),
        (
            pl.when(pl.col('trades_sellToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_sellToken_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('trades_buyToken_id'),
        (
            pl.when(pl.col('trades_buyToken_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('trades_buyToken_decimals'),
    ]
)

In [None]:
# add decimals to cow trades sell tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('swaps_tokenIn_id'),
        (
            pl.when(pl.col('swaps_tokenIn_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenIn_decimals'),
    ]
)

# add decimals to cow trades buy tokens
cow_uni_trunc_no_nulls_pl = cow_uni_trunc_no_nulls_pl.with_columns(
    [
        pl.col('swaps_tokenOut_id'),
        (
            pl.when(pl.col('swaps_tokenOut_id') == 'WETH')
            .then(18)
            .otherwise(6)
            .cast(pl.UInt8)
        ).alias('swaps_tokenOut_decimals'),
    ]
)

In [None]:
# note that polars can perform these calculations in-column. This means it can convert the values in place without creating a new column. The new column created here is more verbose, but is a good sanity check to see before/after results.
trades_swaps_converted_pl = cow_uni_trunc_no_nulls_pl.with_columns([
    (pl.col("trades_buyAmount") / (10**pl.col("trades_buyToken_decimals"))).alias('trades_buyAmount_converted'),
    (pl.col("trades_sellAmount") / (10**pl.col("trades_sellToken_decimals"))).alias('trades_sellAmount_converted'),
    (pl.col("swaps_amountIn") / (10**pl.col("swaps_tokenIn_decimals"))).alias('swaps_amountIn_converted'),
    (pl.col("swaps_amountOut") / (10**pl.col("swaps_tokenOut_decimals"))).alias('swaps_amountOut_converted'),
])

In [None]:
# check that the decimal columns and conversion columns were added
trades_swaps_converted_pl.columns

In [None]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_pl[
    'trades_timestamp',
    'swaps_blockNumber',
    'trades_sellToken_id',
    'trades_buyToken_id',
    'trades_sellAmount_converted',
    'trades_buyAmount_converted',
    'name',
    'environment',
    'swaps_pool_id',
    'swaps_tokenIn_id',
    'swaps_tokenOut_id',
    'swaps_amountIn_converted',
    'swaps_amountOut_converted',
]

### Add Execution Price Columns "x/y" and "y/x"

We calculate execution prices for cow trades and univ3 swaps. Then we take the timestamp with the most number of executions and compare the execution prices.

In [None]:
trades_swaps_converted_trunc_pl = trades_swaps_converted_trunc_pl.with_columns([
    (pl.col("trades_buyAmount_converted") / pl.col("trades_sellAmount_converted")).alias('trades_buy_sell_ratio'),
    (pl.col("trades_sellAmount_converted") / pl.col("trades_buyAmount_converted")).alias('trades_sell_buy_ratio'),
    (pl.col("swaps_amountIn_converted") / pl.col("swaps_amountOut_converted")).alias('swaps_amountIn_amountOut_ratio'),
    (pl.col("swaps_amountOut_converted") / pl.col("swaps_amountIn_converted")).alias('swaps_amountOut_amountIn_ratio'),
])

In [None]:
# plot Cow ETH price vs Univ3 ETH price
trades_swaps_converted_trunc_pl.head(5)


In [None]:
# filter for the columns trades_timestamp, swaps_blockNumber, trades_buy_sell_ratio, trades_sell_buy_ratio, swaps_amountIn_amountOut_ratio, swaps_amountOut_amountIn_ratio
execution_prices_pl = trades_swaps_converted_trunc_pl[
    'trades_timestamp',
    'swaps_blockNumber',
    'trades_buy_sell_ratio',
    'trades_sell_buy_ratio',
    'swaps_amountIn_amountOut_ratio',
    'swaps_amountOut_amountIn_ratio',
]

In [None]:
# return the larger value between trades_buy_sell_ratio and trades_sell_buy_ratio in a lambda function
execution_prices_pl = execution_prices_pl.with_columns([
    (pl.col("trades_buy_sell_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('trades_buy_sell_ratio'),
    (pl.col("trades_sell_buy_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('trades_sell_buy_ratio'),
    (pl.col("swaps_amountIn_amountOut_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('swaps_amountIn_amountOut_ratio'),
    (pl.col("swaps_amountOut_amountIn_ratio").apply(lambda x: x if x > 1 else 1/x)).alias('swaps_amountOut_amountIn_ratio'),
])

In [None]:
# convert execution_prices_pl to pandas dataframe
execution_prices_pd = execution_prices_pl.to_pandas()

In [None]:
execution_prices_pd.head(5)

In [None]:
# set trades_timestamp as index and order by trades_timestamp
execution_prices_pd = execution_prices_pd.set_index('trades_timestamp').sort_index()

In [None]:
import matplotlib.pyplot as plt

In [None]:
# plot trades_buy_sell_ratio and swaps_amountIn_amountOut_ratio and change legend labels
execution_prices_pd[['trades_buy_sell_ratio', 'swaps_amountIn_amountOut_ratio']].plot()

# change legend labels
plt.legend(['Cow ETH price', 'Univ3 ETH price'])

#### Largest Block Activity Analysis

In [None]:
# find the juiciest timestamp with the most amount of activity
trades_swaps_converted_trunc_pl['trades_timestamp'].value_counts().sort('counts', reverse=True)

In [None]:
# get timestamp value with highest count
biggest_timestamp_count = trades_swaps_converted_trunc_pl['trades_timestamp'].value_counts().sort('counts', reverse=True).head(1)

In [None]:
biggest_timestamp_count

In [None]:
biggest_timestamp_count_int = biggest_timestamp_count['trades_timestamp'].head(1).to_numpy()[0]

In [None]:
# filter poalrs dataframe by 1676823287 timestamp value
filter_trades_swaps_pl = trades_swaps_converted_trunc_pl.filter(pl.col('trades_timestamp') == biggest_timestamp_count_int)

In [None]:
filter_trades_swaps_pl

In [None]:
print(f'At timestamp {biggest_timestamp_count_int}, there were {filter_trades_swaps_pl.shape[0]} trades and swaps.')

print(f'\
Cow mean execution prices were {filter_trades_swaps_pl["trades_buy_sell_ratio"].mean()} \
and {filter_trades_swaps_pl["trades_sell_buy_ratio"].mean()} \
completed by solver {filter_trades_swaps_pl["name"].head(1).to_numpy()[0]}')

print(f'\
The Univ3 pool {filter_trades_swaps_pl["swaps_pool_id"].head(1).to_numpy()[0]} mean execution price during this time was {filter_trades_swaps_pl["swaps_amountOut_amountIn_ratio"].mean()} \
and {filter_trades_swaps_pl["swaps_amountIn_amountOut_ratio"].mean()}')


### Save Dataset to Csv

In [None]:
# save cow_complete_pl to csv
trades_swaps_converted_trunc_pl.write_csv('data/trades_swaps_converted_trunc_pl.csv')