# py0xcluster

py0xCluster is a package dedicated to perform exploratory data analysis and machine learning tasks on DEX activity (Decentralized Exhanges) and web3 data.

This is for now an educational project for myself with the aim of performing fun data-science projects around blockchain data gathered through the Graph Network (https://thegraph.com/)

## Target objectives

- Establish meaningful grouping of address by clustering DEX traders and LP
    - Feature Extraction:
        - TBD but based on mint/swap/burn data from messari subgraphs entities
        - with or without balances at swap time (web3py fetch balance at block)
        - EOA vs Contracts
    - Dimensionality reduction:
        - UMAP / tSNE or PCA / ICA
    - Clustering:
        - DBSCAN
        - silhouette evaluation
    - Visualization:
        - scatter plot with color-coded returns? (TBD)

## Secondary objective: identify which group has the most profitable activity

- Triggered Average of price by swap in/out by group of addresses

- Predict future returns based on the activity of previously clustered groups of addresses

## ML overall approach:

- Decide whether adopting time-series vs tabular approach (preference for the first one)
- Compute time-series based on extracted features and certain kernels / windowing
- Begin by classification approach of expected future (down-bad / neurtral / up-strong)
- Extend to regression

## Random list of potential features:

### Accounts

### Relative to a pool:
- z-scored (clarify how) difference of price 24h? after swap -> could be target independant variable

- nb of events (z-scored to other addresses on same pool)
- average swap size (z-scored/pool)
- average deposit size (z-scored/pool)
- average withdraw size (z-score/pool)

### Account only

- Total nb of positions: swapCount, depositCount, withdrawCount
- ratio? of nb of: swaps / (deposits + withdraws)

### Account - Position
- nb of (liquid) pools interacted with
- % of events (likely swaps) happening in the same block (possibly identical to MEV bots?)
- % of Limit order on uni-v3 (one deposit amout = 0)

### Account - Web3

- is contract?
- Normalized balance (compared to other users) at time of events

# Roadmap:

## Easy / To implement first

### Aggregation / Feature computation

- Aggregate unique addresses
- Implement Account-only query
- First Web3 requests (is_contract / ETH balance)

### First plots

- First features distribution
- PCA/ICA -> t-SNE

## Next, not immediate priority

- Pool clustering / identify easy-best features

### Data Management

- Store/Retrieve to/from SQLite?
- Consider parquet / feather / hdf5

## Secondary, nice to do

### Package

- Update and test requirements / setup
- Document classes and methods with nicely formatted docstrings to future build of the doc

### Performance

- Evaluate performance, profiling, and try improving inefficient / slow bits

### Imports

In [30]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from py0xcluster.utils.query_utils import *
from py0xcluster.main_classes.pools import *
from py0xcluster.main_classes.pool_events import *

from py0xcluster.features.swaps_feature_extractor import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


ModuleNotFoundError: No module named 'py0xcluster.features.swaps_feature_extractor'

### Gathering data about most-active pools
    - need to adapt to take into account refactoring of queries-related methods for multiple entities at once

In [4]:
uni3pools_selector = PoolSelector(
    subgraph_url = 'https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum',
    min_daily_volume_USD = 500000,
    min_TVL = 1000000, # Not implemented. consider removing
    start_date = (2022,12,31), 
    end_date = (2023,1,16),
    days_batch_size = 20)

uni3_pools = uni3pools_selector.create_pool_selection(stables='exclude', verbose=True)
uni3_pools.pools_df.head(5)

Queriying from 2022-12-31 00:00:00 to 2023-01-16 00:00:00
1484 lquidity pools snapshots retrieved
262 stable pools snapshots (over 1484) have been removed
210 illiquid pools snapshots (over 1222) have been removed 
60 pools were selected


Unnamed: 0,pool.name,dailyVolumeUSD,pool.totalValueLockedUSD,token0.lastPriceUSD,token1.lastPriceUSD,pool.protocol.name,pool.protocol.network,pool.id,token0.symbol,token0.decimals,token1.symbol,token1.decimals
0,Uniswap V3 USD Coin/Wrapped Ether 0.05%,204246200.0,179588200.0,1.0,1564.361029,Uniswap V3,MAINNET,0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640,USDC,6,WETH,18
1,Uniswap V3 USD Coin/Wrapped Ether 0.01%,40339360.0,18166340.0,1.0,1564.361029,Uniswap V3,MAINNET,0xe0554a476a092703abdb3ef35c80e0d76d32939f,USDC,6,WETH,18
2,Uniswap V3 Wrapped Ether/Tether USD 0.05%,30585760.0,19264760.0,1564.361029,1.0,Uniswap V3,MAINNET,0x11b815efb8f581194ae79006d24e0d814b7697f6,WETH,18,USDT,6
3,Uniswap V3 Wrapped BTC/Wrapped Ether 0.05%,15145850.0,128346100.0,21089.575069,1564.361029,Uniswap V3,MAINNET,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,WBTC,8,WETH,18
4,Uniswap V3 Lido DAO Token/Wrapped Ether 0.3%,12437790.0,11871040.0,2.192731,1564.361029,Uniswap V3,MAINNET,0xa3f558aebaecaf0e11ca4b2199cc5ed341edfd74,LDO,18,WETH,18


## Extracting all events from these pools

### Steps

- Data query:
    - Perform query for each pool, batch by days to accomodate response limit
    - Alternatively the query can be done on multiple pools. Ideally, the size of the batch should be proportional to the volume, but quite arbitrary to implement.
    - Consider whether do swaps / mints / burns separately or jointly
    - Loop and aggregate over days / pools (or batch of pools)


In [5]:
uni3_events_getter = PoolEventGetter(
    subgraph_url = 'https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum',
    pool_ids = uni3_pools.pools_df['pool.id'][13:15],
    start_date = (2023,1,13), 
    end_date = (2023,1,15),
    days_batch_size = 1
    )

# Get swaps, deposits, and withdraw from a pool.
uni3_events = uni3_events_getter.get_events(verbose=True)


pool: 0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
Queriying from 2023-01-13 00:00:00 to 2023-01-14 00:00:00
Queriying from 2023-01-14 00:00:00 to 2023-01-15 00:00:00
pool: 0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801
Queriying from 2023-01-13 00:00:00 to 2023-01-14 00:00:00
Queriying from 2023-01-14 00:00:00 to 2023-01-15 00:00:00


In [6]:
uni3_events.swaps


Unnamed: 0,amountInUSD,amountOutUSD,amountIn,amountOut,from,to,timestamp,blockNumber,id,pool.id
0,4552.918006,4537.259165,1.914007e+13,4.537259e+09,0xa85f21a7e7170d7e90e9bc6253be703df07dc9cc,0xa85f21a7e7170d7e90e9bc6253be703df07dc9cc,2023-01-13 00:17:35,16394241,0x001d8052376e07c9d78662b82dd26c9ace7eccbc567a...,0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
1,640.857758,638.891597,6.408578e+08,2.626708e+12,0x2fd7d7a58e8cd0d85b76f6b1351b28445c94a44d,0x74de5d4fcbf63e00296fd95d33236b9794016631,2023-01-13 04:27:47,16395485,0x006bb0e4371712bd56d385af9274135450dc19f308dd...,0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
2,22598.392328,22481.352329,9.602006e+13,2.248135e+10,0xfea677f3b408700ae4ba6355c4885851867a29e3,0xfea677f3b408700ae4ba6355c4885851867a29e3,2023-01-13 16:42:47,16399136,0x019f2a4566943ffc2630c49f72d8a756ff327dd14a43...,0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
3,855.180709,852.544545,8.551807e+08,3.587183e+12,0x6da27eaf92f8e7c4211ae5e664d15a896bd63701,0x6da27eaf92f8e7c4211ae5e664d15a896bd63701,2023-01-13 00:18:11,16394244,0x0351c44fb37671c67a770bdbc88797847bbb5747126a...,0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
4,15344.603746,15276.057505,6.078423e+13,1.527606e+10,0xb4070209d9a0ad61c34e6be6d1fcd52a3d75b025,0xb4070209d9a0ad61c34e6be6d1fcd52a3d75b025,2023-01-13 18:57:47,16399808,0x041d7f5c557d0337be08ca1699456d867a54cd30cb70...,0x69d91b94f0aaf8e8a2586909fa77a5c2c89818d5
...,...,...,...,...,...,...,...,...,...,...
3593,48855.246930,48684.270722,7.367340e+21,3.163029e+19,0x0039b625b1d8632c7a0057c964ec58a9f39789d3,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-14 13:17:47,16405290,0xfe9e36da143210fbea5bf7e30325465aa4b87988b44b...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801
3594,28727.128461,28633.650951,4.251929e+21,1.836791e+19,0xaf9df30242b311f6c4c8f34a33ada2e783e74ade,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-14 09:10:47,16404062,0xff036b56d5288f89dc4137e20c071fb2e60bd24a5b65...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801
3595,20159.274541,20095.065972,3.096254e+21,1.323380e+19,0x2a91d154cdcdf08a553017afdcdea398c8b706a6,0x98c3d3183c4b8a650614ad179a1a98be0a8d6b8e,2023-01-14 09:22:59,16404123,0xff5f44ba10a4968960d6207df49e0ef4036f92f04b4f...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801
3596,16954.523195,16901.094588,1.089951e+19,2.512056e+21,0xc4c4d0c3c45a088182c092e173250fcaf0a8685f,0x74de5d4fcbf63e00296fd95d33236b9794016631,2023-01-14 02:32:23,16402074,0xffb65fec1cece280af5585052dee0b7bf366a8975270...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801


In [86]:
uni3_events.swaps

Unnamed: 0,amountInUSD,amountOutUSD,amountIn,amountOut,from,to,timestamp,blockNumber,id,pool.id,amountInUSD_zscore,amountOutUSD_zscore,amountIn_zscore,amountOut_zscore,slippage,slippage_zscore
0,5657.032861,5639.334644,2.885114e+21,3.987748e+18,0x36a454aef52938c8637cd4689b2980c1cfd43389,0x56178a0d5f301baf6cf3e1cd53d9863437345bf9,2023-01-13 11:08:11,16397475,0x02329fc97b9ddb108b7f8b5e97512d225054e3508f72...,0xf4ad61db72f114be877e87d62dc5e7bd52df4d9b,-0.564925,-0.569975,-0.166299,-0.450433,0.312853,0.047844
1,13995.903407,13961.956025,9.915001e+18,6.860720e+21,0x87d7476a1309afdf23143bef1967d3fd7e6c64d4,0x87d7476a1309afdf23143bef1967d3fd7e6c64d4,2023-01-13 17:03:11,16399238,0x05072edff9ebabc9d4cc2d885fc5ad080b5948b5cb51...,0xf4ad61db72f114be877e87d62dc5e7bd52df4d9b,-0.063729,-0.062665,-0.651990,0.513844,0.242552,-0.153996
2,9394.431746,9374.735792,4.527541e+21,6.641266e+18,0x34ec9e3a1ac200ea58744ced891015152130e400,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-13 17:13:47,16399291,0x05a8628d93bbe833de34b378de05a6cc12f959884144...,0xf4ad61db72f114be877e87d62dc5e7bd52df4d9b,-0.340294,-0.342282,0.111147,-0.450060,0.209656,-0.248445
3,18863.877559,18777.285126,1.336358e+19,8.984906e+21,0xb18ccf69940177f3ec62920ddb2a08ef7cb16e8f,0xd249942f6d417cbfdcb792b1229353b66c790726,2023-01-13 17:05:23,16399249,0x0942ddb8b8390f57f5ba2e8f3ead2089f6431f0b2c65...,0xf4ad61db72f114be877e87d62dc5e7bd52df4d9b,0.228854,0.230857,-0.651408,0.812572,0.459038,0.467553
4,3039.538115,3031.598715,2.153275e+18,1.519653e+21,0x34ec9e3a1ac200ea58744ced891015152130e400,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-13 17:01:11,16399228,0x09861504d2575edd2987e90fc503079994a4124f3563...,0xf4ad61db72f114be877e87d62dc5e7bd52df4d9b,-0.722246,-0.728932,-0.653302,-0.237282,0.261204,-0.100445
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2105,48855.246930,48684.270722,7.367340e+21,3.163029e+19,0x0039b625b1d8632c7a0057c964ec58a9f39789d3,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-14 13:17:47,16405290,0xfe9e36da143210fbea5bf7e30325465aa4b87988b44b...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801,0.480390,0.481566,0.988976,-0.392195,0.349965,0.310878
2106,28727.128461,28633.650951,4.251929e+21,1.836791e+19,0xaf9df30242b311f6c4c8f34a33ada2e783e74ade,0x57c1e0c2adf6eecdb135bcf9ec5f23b319be2c94,2023-01-14 09:10:47,16404062,0xff036b56d5288f89dc4137e20c071fb2e60bd24a5b65...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801,0.024686,0.024926,0.393245,-0.394759,0.325398,0.074274
2107,20159.274541,20095.065972,3.096254e+21,1.323380e+19,0x2a91d154cdcdf08a553017afdcdea398c8b706a6,0x98c3d3183c4b8a650614ad179a1a98be0a8d6b8e,2023-01-14 09:22:59,16404123,0xff5f44ba10a4968960d6207df49e0ef4036f92f04b4f...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801,-0.169292,-0.169534,0.172256,-0.395752,0.318506,0.007900
2108,16954.523195,16901.094588,1.089951e+19,2.512056e+21,0xc4c4d0c3c45a088182c092e173250fcaf0a8685f,0x74de5d4fcbf63e00296fd95d33236b9794016631,2023-01-14 02:32:23,16402074,0xffb65fec1cece280af5585052dee0b7bf366a8975270...,0x1d42064fc4beb5f8aaf85f4617ae8b3b5b8bd801,-0.241848,-0.242275,-0.417728,0.087396,0.315129,-0.024628


In [11]:
uni3_events.compute_slippage()

uni3_events.swaps = compute_zscore(
    uni3_events.swaps, 
    columns= ['amountInUSD', 'amountOutUSD', 'amountIn', 'amountOut', 'slippage'],
    group_by= 'pool.id')

# uni3_events.deposits = compute_zscore(
#     uni3_events.deposits, 
#     columns=['amountUSD', 'InputTokenAmount0', 'InputTokenAmount1'], 
#     group_by= 'pool.id')

# uni3_events.withdraws = compute_zscore(
#     uni3_events.withdraws, 
#     columns=['amountUSD', 'InputTokenAmount0', 'InputTokenAmount1'], 
#     group_by= 'pool.id')


In [29]:
aggregate_median(uni3_events.swaps)

Unnamed: 0_level_0,amountInUSD_zscore_median,amountOutUSD_zscore_median,amountIn_zscore_median,amountOut_zscore_median,slippage_median,slippage_zscore_median
from,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0x00000006e42915a2b6907f8b3faf311b68862f60,-0.613813,-0.615264,-0.419741,-0.381712,0.300518,-0.165345
0x0025500c6a6bcaebde159db6a307f4d38503a079,-0.406518,-0.409241,-0.250851,-0.306725,0.306109,-0.416440
0x002be0becbd31be8da908ffcde0519a075f2b208,-0.203920,-0.204442,0.026924,-0.306656,0.338272,-0.208200
0x0037825fd75af7eeace28889665e3fac8fdb6300,-0.132933,-0.132716,-0.301435,0.138918,0.349789,-0.133636
0x0039b625b1d8632c7a0057c964ec58a9f39789d3,0.203923,0.204536,-0.409849,-0.072089,0.338787,0.203220
...,...,...,...,...,...,...
0xfea677f3b408700ae4ba6355c4885851867a29e3,0.918546,0.927798,1.758448,-0.306272,0.517913,0.954881
0xff2ccbe0d026b9b399b56248405eb3697a2e067b,-0.349277,-0.351365,-0.301512,-0.180251,0.315088,-0.358306
0xff82bf5238637b7e5e345888bab9cd99f5ebe331,-0.328084,-0.328808,-0.174084,-0.396257,0.313421,-0.041075
0xffc458db291b4abce020fe3de4f91f2770e537b1,-0.323545,-0.325350,-0.137918,-0.306696,0.318819,-0.334149


In [15]:
uni3_events.withdraws.id.values

array(['0x017bb4efe9d927193ccc14eb5e9e67952a000f0f1c743c32fa87fc996c9afd2e-112',
       '0x03e71f168168a6d87761a721c8876f0dd34011925ae317ff445b13fba54f20ba-357',
       '0x4b4f114786ea8b42547f5d755361f946a1bafa6ee47c3461b6d5797d4be6c2c0-16',
       '0x4cb03f930de324e3a738ceca96cb109a1eb0d254f13b23a2c933a3275da2aa50-271',
       '0x544cd74a749e3af0a0111f6f46b72dcd894d7f1cf64aebe0dedf2ea20163f694-148',
       '0x57b2d5470d72cb9d5e21c2223edf9fcdfd5ed365861deaa4d4d9cce64fee6a44-213',
       '0x62489c2ed16882546123e330e3405c0edab11602cac4aa91af7ea5c240afc860-157',
       '0x741121b23d77a362acccc82715604b46cd562f2f0d005721eba9fe2f66cc3ad0-107',
       '0x8c01c72814650ded02507c42975ea917fcf0e479836bbd6c3dd09210ddab5959-326',
       '0x97a1507bd8ceace063cbec3c03b576f6e5a19540ab0574a16582250827ee0269-350',
       '0xd2a448533c9e1c6c619ed55b387dc62a87cb6934d242f2ddc374b5dbbd7dfa99-85',
       '0xe222e77a08a049102f76441c39a7b107700c1c423b30b514058ff48ea04382ee-74',
       '0xe373f7617e5bf02130bc7

In [15]:
uni3_events['deposits'].to_pickle('/home/fujiju/Documents/GitHub/py0xcluster/data/20230113_050500_deposits.pkl')

In [17]:
uni3_events['swaps'].nunique()

amountInUSD     1248508
amountOutUSD    1351632
amountIn        1101277
amountOut       1346714
from             254768
to               127814
timestamp        521411
blockNumber      521411
pool.id              68
dtype: int64

In [None]:
data_lengths = [0, 0, 0]
empty_data = [data_length == 0 for data_length in data_lengths]
all(empty_data)

True

In [None]:
dico = {'ac': 0 , 'asfd': 2}
len(dico)

2