# py0xcluster

This is for now an educational project for myself with the aim of performing fun data-science projects around blockchain data gathered through the Graph Network (https://thegraph.com/)

## Target objectives

- Establish meaningful grouping of address by clustering DEX traders and LP
    - Feature Extraction:
        - TBD but based on mint/swap/burn data from messari subgraphs entities
        - with or without balances at swap time (web3py fetch balance at block)
        - EOA vs Contracts
    - Dimensionality reduction:
        - UMAP / tSNE or PCA / ICA
    - Clustering:
        - DBSCAN
        - silhouette evaluation
    - Visualization:
        - scatter plot with color-coded returns? (TBD)

**Secondary objective: identify which group has the most profitable activity**

- Triggered Average of price by swap in/out by group of addresses

- Predict future returns based on the activity of previously clustered groups of addresses


### Imports

In [14]:
%load_ext autoreload
%autoreload 2

import pandas as pd
from py0xcluster.utils.query_utils import *
from py0xcluster.main_classes.pools import *
from py0xcluster.main_classes.pool_events import *

### Gathering data about most-active pools
    - need to adapt to take into account refactoring of queries-related methods for multiple entities at once

In [8]:
uni3pools_selector = PoolSelector(
    subgraph_url = 'https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum',
    min_daily_volume_USD = 100000,
    min_TVL = 100000, # Not implemented. consider removing
    start_date = (2022,12,21), 
    end_date = (2023,1,10),
    days_batch_size = 20)

uni3_pools = uni3pools_selector.create_pool_selection()
uni3_pools.pools_df

Queriying from 2022-12-21 00:00:00 to 2023-01-10 00:00:00
1650 lquidity pools snapshots retrieved
197 stable pools snapshots (over 1650) have been removed
217 illiquid pools snapshots (over 1650) have been removed 
221 pools were selected


Unnamed: 0,pool.name,dailyVolumeUSD,pool.totalValueLockedUSD,token0.lastPriceUSD,token1.lastPriceUSD,pool.protocol.name,pool.protocol.network,pool.id,token0.symbol,token1.symbol
0,Uniswap V3 USD Coin/Wrapped Ether 0.05%,1.407573e+08,1.790617e+08,1.000000,1334.218011,Uniswap V3,MAINNET,0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640,USDC,WETH
1,Uniswap V3 USD Coin/Tether USD 0.01%,7.632963e+07,1.200383e+08,1.000000,1.000000,Uniswap V3,MAINNET,0x3416cf6c708da44db2624d63ea0aaef7113527c6,USDC,USDT
2,Uniswap V3 Wrapped Ether/Tether USD 0.05%,2.748191e+07,1.992092e+07,1334.218011,1.000000,Uniswap V3,MAINNET,0x11b815efb8f581194ae79006d24e0d814b7697f6,WETH,USDT
3,Uniswap V3 USD Coin/Wrapped Ether 0.01%,1.951618e+07,5.123749e+06,1.000000,1334.218011,Uniswap V3,MAINNET,0xe0554a476a092703abdb3ef35c80e0d76d32939f,USDC,WETH
4,Uniswap V3 Wrapped BTC/Wrapped Ether 0.05%,1.490067e+07,1.114253e+08,17414.239686,1334.218011,Uniswap V3,MAINNET,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,WBTC,WETH
...,...,...,...,...,...,...,...,...,...,...
216,Uniswap V3 Green/Wrapped Ether 0.05%,1.030147e+05,6.053919e+05,0.002071,1334.218011,Uniswap V3,MAINNET,0x4efc9e2e3e77732ce2f9612b8f050082c01688bd,GREEN,WETH
217,Uniswap V3 HuntToken/Tether USD 0.3%,1.028683e+05,9.968453e+05,0.248318,1.000000,Uniswap V3,MAINNET,0x54578b6f942aeb23b67a8cef24220651306b8e26,HUNT,USDT
218,Uniswap V3 Index/Wrapped Ether 1%,1.026767e+05,2.797922e+04,0.000000,1334.218011,Uniswap V3,MAINNET,0x8c13148228765ba9e84eaf940b0416a5e349a5e7,INDEX,WETH
219,Uniswap V3 unification.com/xfund/Wrapped Ether 1%,1.023294e+05,2.742566e+04,0.000000,1334.218011,Uniswap V3,MAINNET,0xb1223da8a5929bcfa9d26f0c6da8f0a29c3925ff,xFUND,WETH


## Extracting all events from these pools

### Steps

- Data query:
    - Perform query for each pool, batch by days to accomodate response limit
    - Alternatively the query can be done on multiple pools. Ideally, the size of the batch should be proportional to the volume, but quite arbitrary to implement.
    - Consider whether do swaps / mints / burns separately or jointly
    - Loop and aggregate over days / pools (or batch of pools)


In [15]:
uni3_events_getter = PoolEventGetter(
    subgraph_url = 'https://api.thegraph.com/subgraphs/name/messari/uniswap-v3-ethereum',
    pool_id = '0x4585fe77225b41b697c938b018e2ac67ac5a20c0',
    start_date = (2023,1,9), 
    end_date = (2023,1,11)
    )

uni3_events = uni3_events_getter._get_raw_events()
uni3_events = uni3_events_getter._normalize_pool_events(uni3_events)

Queriying from 2023-01-09 00:00:00 to 2023-01-11 00:00:00
 entity: swaps, skip: 0, data length: 1000
dict_keys(['swaps']) 1000
 entity: withdraws, skip: 0, data length: 43
dict_keys(['swaps', 'withdraws']) 43
 entity: deposits, skip: 0, data length: 32
dict_keys(['swaps', 'withdraws', 'deposits']) 32
[2000, 86, 64]
 entity: swaps, skip: 1000, data length: 1000
dict_keys(['swaps', 'withdraws', 'deposits']) 2000
 entity: withdraws, skip: 1000, data length: 0
 entity: deposits, skip: 1000, data length: 0
[1000, 0, 0]
 entity: swaps, skip: 2000, data length: 853
dict_keys(['swaps', 'withdraws', 'deposits']) 3000
 entity: withdraws, skip: 2000, data length: 0
 entity: deposits, skip: 2000, data length: 0
[853, 0, 0]


In [16]:
uni3_events['swaps']

Unnamed: 0,amountInUSD,amountOutUSD,amountIn,amountOut,timestamp,blockNumber
0,4714.88464711765722496183949975948,4706.087510460183691845881213781365,27250975,3582083029912952375,1673236871,16366716
1,17290.40648701147566083837890313988,17269.17792857902167183558667309645,100047610,13108907282241196617,1673258483,16368511
2,87.48015824877316770533545221223209,87.50761306153281880724035265416962,507670,66012802926383171,1673358575,16376812
3,200235.7764893190211507489768511823,199949.4594231083965915585281165255,1159465723,152426768445614236172,1673238923,16366887
4,514.2053428554663404750024735810239,512.6962435007653665105200592762253,3007146,399027085803254195,1673221271,16365417
...,...,...,...,...,...,...
3848,1287.857221555046157241748090914744,1287.262221147647203365040855414444,7487841,974709563673794173,1673323043,16373861
3849,58.65948379663715203125332725234445,58.54522260272434570021549319752193,44081344541618760,338527,1673374739,16378154
3850,13.17306596745485508320704856426179,13.20300889663099466126193955083321,10000000000000000,76308,1673257463,16368426
3851,26.38692326464516295088497560549854,26.38719754431123016886651141421227,20000000000000000,153539,1673308967,16372692


In [17]:
uni3_events['deposits']

Unnamed: 0,amountUSD,from,to,inputTokenAmounts,timestamp,blockNumber
0,10050952.38441849668893619314235153,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[0, 7781732621249712256554]",1673227091,16365902
1,534955.4163239834505891129310495455,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[1674475339, 187294220220217246758]",1673242067,16367149
2,731805.911480122585392747289603317,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[3513090810, 96609907696638339360]",1673320283,16373632
3,414949.684426767248589649193928455,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[2400000000, 0]",1673347091,16375859
4,22196.00697064421482002044088122789,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[128513575, 0]",1673268419,16369338
...,...,...,...,...,...,...
59,351033.0574752804455555661440558537,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[1904473007, 19752025143719318958]",1673221427,16365430
60,9251.116691463060447479950247700785,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[0, 6899999999999999982]",1673385251,16379027
61,190390.7308455182047492074877053182,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[353461171, 96499999794997528174]",1673279819,16370282
62,6708.237397128633238281498484562213,0xc36442b4a4522e871399cd717abdd847ab11fe88,0x4585fe77225b41b697c938b018e2ac67ac5a20c0,"[19636505, 2499999945377070893]",1673263679,16368944


In [28]:
data_lengths = [0, 0, 0]
empty_data = [data_length == 0 for data_length in data_lengths]
all(empty_data)

True

In [3]:
dico = {'ac': 0 , 'asfd': 2}
len(dico)

2