### Write well, Im going to be using this for a long time

#### Data we need:
user input:
- investment amount, trading pair -> amt0, amt1
- start time and end time
- time period that you assume fixed swap price, swap volumes or liquidity positions
- upper and lower price
- pool_fee_rate

data from api:
- cprice of each time period (tick, 1.0001 ** i)
- L_pool at each time period at specific pool_fee_rate (liquidity?, or simply total X tokens + Y tokens in USD)
- Swap volume at each time period at specific pool_fee_rate (volumeUSD?)
- Gas cost to mint at each time period

--------------------------------------------------------------------------------------------------------------

#### Fees
The liquidity amount is calculated from the following numbers that describe a position: 
- amount of token 0 (amt0), amount of token 1 (amt1), 
- price (as x token 1's per token 0) at the upper limit of the position (upper), 
- price at the lower limit of the position (lower) 
- and the current swap price (cprice). 

Then liquidity (L_you?) for a position is calculated as follows:

Case 1: cprice <= lower
- liquidity = amt0 * (sqrt(upper) * sqrt(lower)) / (sqrt(upper) - sqrt(lower))

Case 2: lower < cprice <= upper
- liquidity is the min of the following two calculations:
- amt0 * (sqrt(upper) * sqrt(cprice)) / (sqrt(upper) - sqrt(cprice))
- amt1 / (sqrt(cprice) - sqrt(lower))

Case 3: upper < cprice
- liquidity = amt1 / (sqrt(upper) - sqrt(lower))

Resources
- liquidity can use this code: https://github.com/JNP777/UNI_V3-Liquitidy-amounts-calcs/blob/main/UNI_v3_funcs.py

Fee is calculated by:
- Fee income = (L_you/L_pool) * swap volume under fixed time period * pool_fee_rate/100
- L_you also should be for that specific ticks only, not the whole amount you provided for. Its not linear, its calculated from the 3 cases above
- Does Case1 and Case3's fee be 0 regardless?


reference: https://uniswapv3.flipsidecrypto.com/
- check my numbers with the reference from the website

----------------------------------------------------------------------------------------

#### Impermanent Loss (is this v2 or v3)
- IL (in %) = (2 sqrt(p) / (p+1) ) - 1
- where p = r_t1/r_t2
- and r_t is a price in b at time 1
- Net $ loss = total asset value in dollars at stake time * IL (in%)

reference: https://chainbulletin.com/impermanent-loss-explained-with-examples-math/#:~:text=Impermanent%20loss%20is%20the%20difference,is%20equal%20to%20200%20DAI

--------------------------------------------------------------------------------------------------------------

#### Other cost

Gas_costs_mint = 500000 gwei * gas_price at that time (??? double check actual cost)

### PNL/APR
-PNL = Acumulated Fees_accrued (dolar value at generation) - IL - Gas_costs_mint

-APR = PNL/Initial_capital*(age of the position / year time)

--------------------------------------------------------------------------------------------------------------

## Dependencies

In [1]:
import requests
import json
import pandas as pd
import math
import numpy as np

## Main Functions

In [2]:
# function to use requests.post to make an API call to the subgraph url
def run_query(q):

    # endpoint where you are making the request
    request = requests.post('https://api.thegraph.com/subgraphs/name/uniswap/uniswap-v3'
                            '',json={'query': q})
    if request.status_code == 200:
        return request
    else:
        raise Exception('Query failed. return code is {}.      {}'.format(request.status_code, query))
        
        
# turns requests into dataframe        
def results_to_df(query_result):
    json_data_ = json.loads(query_result.text)
    df_data_ = json_data_['data']['pools']
    df_ = pd.DataFrame(df_data_)

    return df_

In [3]:
def get_token_id(symbol):
    
    # default should be first:10, in case there are more than 1 coins with the same symbol
    query_ = """ 
    {{
      tokens(first:1, where:{{symbol: "{}"}}) {{
        id
        symbol
        name
      }}
    }}""".format(symbol)
    
    # run query
    query_result_ = run_query(query_)
    json_data_ = json.loads(query_result_.text)
    
    print(' ')
    print('get_token_id: {}'.format(symbol))
    print(json_data_)
    
    # make sure only return 1 object
    if len(json_data_['data']['tokens']) == 1:
        token_id_ = json_data_['data']['tokens'][0]['id']
        return token_id_
        
    else:
        print(json_data_['data'])
        raise Exception('Returned number of token_ids != 1')

        
def get_pool_id(token0_id, token1_id, feeTier):
    query_ = """
    {{
      pools(first: 10, 
        where:{{token0: "{}",
        token1: "{}",
        feeTier:"{}" }}) 
      {{
        id
        token0{{symbol}}
        token1{{symbol}}
        feeTier
      }}
    }}""".format(token0_id, token1_id, feeTier)
    
    
    # run query
    query_result_ = run_query(query_)
    json_data_ = json.loads(query_result_.text)
    
    print('\n get_pool_id for feeTier: {}'.format(feeTier))
    print(json_data_)
    
    # make sure there is only 1 pool that matches exactly
    if len(json_data_['data']['pools']) == 1:
        pool_id_ = json_data_['data']['pools'][0]['id']
        return pool_id_
    else:
        print(json_data_['data'])
        raise Exception('Returned number of token_ids != 1')

        
    return json_data_

In [9]:
def get_poolDayDatas(pool_id, num_datapoints=1000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
    remainder_ = num_datapoints%max_request_
            
    query_base_ = '''
    {{
      poolDayDatas(first:{},
      skip: {},
        where:{{ pool: "{}" }},
      orderBy:date,
      orderDirection: desc) 
      {{
        date
        tick
        liquidity
        volumeUSD
        pool{{
            token0{{
                symbol
            }}
            token1{{
                symbol
            }}
        }}
      }}
    }}'''
    
    poolDayDatas_array_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
#         print(json_data_)
        poolDayDatas_array_ += json_data_['data']['poolDayDatas']
    
    print(' ')
    print('\n Queried PoolDayDatas, total of {} datapoints'.format(str(len(poolDayDatas_array_))))
#     print('example:')
#     print(poolDayDatas_array_[0])
    
    # array to dataframe
    df_ = pd.json_normalize(poolDayDatas_array_)
    df_.drop_duplicates(subset=['date']) 
    
    return df_

In [12]:
def get_poolHourDatas(pool_id, num_datapoints=3000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
    remainder_ = num_datapoints%max_request_
            
    query_base_ = '''
    {{
      poolHourDatas(first:{},
      skip: {},
        where:{{ pool: "{}" }},
      orderBy:periodStartUnix,
      orderDirection: desc) 
      {{
        periodStartUnix
        pool{{
            token0{{
                symbol
            }}
            token1{{
                symbol
            }}
        }}
        liquidity
        sqrtPrice
        token0Price
        token1Price
        tick
        feeGrowthGlobal0X128
        feeGrowthGlobal1X128
        tvlUSD
        volumeToken0
        volumeToken1
        volumeUSD
        feesUSD
        txCount
        open
        high
        low
        close
      }}
    }}'''
    
    poolDayDatas_array_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
#         print(json_data_)
        try:
            poolDayDatas_array_ += json_data_['data']['poolHourDatas']
        except Exception:
#             print('.. Pass')
            pass
    
    print(' ')
    print('\n Queried poolHourDatas, total of {} datapoints'.format(str(len(poolDayDatas_array_))))
#     print('example:')
#     print(poolDayDatas_array_[0])
    
    # array to dataframe
    df_ = pd.json_normalize(poolDayDatas_array_)
#     df_.drop_duplicates(subset=['periodStartUnix']) # TODO: BUGGGG ??
    
    return df_

In [6]:
def get_swaps(pool_id, time_start='1627369200', time_end='1623772800', num_datapoints=20000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
    remainder_ = num_datapoints%max_request_
           
    ## TODO: BUGG
    query_base_ = '''
    {{
      swaps(first:{}, skip: {},
            where:{{ pool: "{}",
            timestamp_lt: "{}",
            timestamp_gt: "{}"}},
          orderBy:timestamp,
          orderDirection: desc){{
        transaction {{
          blockNumber
          timestamp
          gasUsed
          gasPrice
        }}
        id
        timestamp
        tick
        amount0
        amount1
        amountUSD
        sqrtPriceX96
      }}
    }}'''
    
    swap_arrays_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id, time_start, time_end)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
#         print(query_)
        
        try:
            swap_arrays_ += json_data_['data']['swaps']
        except Exception:
#             print('.. Pass')
            pass
        
    print(' ')
    print('\n Queried Swaps, total of {} datapoints'.format(str(len(swap_arrays_))))
#     print('example:')
#     print(swap_arrays_[0])
    
    # array to dataframe
    df_ = pd.json_normalize(swap_arrays_)
#     df_.drop_duplicates(subset=['id']) 
    
    return df_

### Test running

In [10]:
# Get token_id > Get pool_id > Get PoolDayDatas > Get swap data > Merge Swap data (VolumeUSD, txCount - for checking)

# Indicate Tokens and FeeTier
token0_id = get_token_id('USDC')
token1_id = get_token_id('WETH')
feeTier = '3000'
pool_id = get_pool_id(token0_id, token1_id, feeTier)

 
get_token_id: USDC
{'data': {'tokens': [{'id': '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48', 'name': 'USD Coin', 'symbol': 'USDC'}]}}
 
get_token_id: WETH
{'data': {'tokens': [{'id': '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2', 'name': 'Wrapped Ether', 'symbol': 'WETH'}]}}

 get_pool_id for feeTier: 3000
{'data': {'pools': [{'feeTier': '3000', 'id': '0x8ad599c3a0ff1de082011efddc58f1908eb6e6d8', 'token0': {'symbol': 'USDC'}, 'token1': {'symbol': 'WETH'}}]}}


In [13]:
# Get poolHourDatas
df_poolHourDatas = get_poolHourDatas(pool_id, num_datapoints=10000)

# Get Swap Datas within the poolHourDatas timeframe
time_start = df_poolHourDatas['periodStartUnix'][0]
time_end = df_poolHourDatas['periodStartUnix'][df_poolHourDatas.index[-1]]

# max swaps we can get seems to be 6000, might have to send seperate request for more? 
# - saving latest timestamp + id to use for next iterations
# combined txCount ~ 147079
df_swaps = get_swaps(pool_id, time_start, time_end, num_datapoints=10000) 

 

 Queried poolHourDatas, total of 1997 datapoints
 

 Queried Swaps, total of 6000 datapoints


In [14]:
df_swaps_ori = df_swaps
df_poolHourDatas_ori = df_poolHourDatas

In [18]:
df_swaps = df_swaps_ori
df_poolHourDatas = df_poolHourDatas_ori

### Merge Data

In [19]:
# Merge Data
# def merge_poolHourDatas_swaps():

# Get period_list = {[periodStartUnix, periodEndUnix]}
df_poolHourDatas = df_poolHourDatas.assign(periodEndUnix=df_poolHourDatas['periodStartUnix'].shift(1))
df_poolHourDatas = df_poolHourDatas.dropna(subset=['periodEndUnix'])  # drop the latest periodStartUnix since most likely incomplete hour
df_poolHourDatas['periodEndUnix'] = df_poolHourDatas['periodEndUnix'].astype(int)
period_list = df_poolHourDatas[['periodStartUnix','periodEndUnix']].values


# Match timestamp with period, and assign to df_swaps['periodStartUnix']
df_swaps['periodStartUnix'] = np.nan
for index, row in df_swaps.iterrows():
    i = 0
    timestamp = int(row['timestamp'])
    
    for period in period_list:
        periodStartUnix_ = int(period[0])
        periodEndUnix_ = int(period[1])
        
        if timestamp >= periodStartUnix_ and timestamp < periodEndUnix_:
            df_swaps.loc[index, 'periodStartUnix'] = periodStartUnix_
            
            # checking just in case
            i+=1
            if i>1: 
                raise Exception('timestamp matched more than once! somethings wrong!')  

# Dropping rows that does not match any periodStartUnix                
print('dropping {} nan'.format(df_swaps['periodStartUnix'].isna().sum()))
df_swaps = df_swaps.dropna(subset=['periodStartUnix']) 
df_swaps['periodStartUnix'] = df_swaps['periodStartUnix'].astype(int)

# Create swaps_txCount to compare with txCount in poolHourDatas to check integrity
df_swaps['swaps_txCount'] = 1


# Groupby-Sum based on periodStartUnix, specify columns to sum at GROUPBY_COLS
GROUPBY_COLS = ['periodStartUnix', 'amount0', 'amount1', 'amountUSD', 'swaps_txCount']
df_swaps = df_swaps[GROUPBY_COLS]
for f in GROUPBY_COLS:
    df_swaps[f] = df_swaps[f].astype(float)
df_swaps['periodStartUnix'] = df_swaps['periodStartUnix'].astype(int)
df_swaps = df_swaps.groupby(by=['periodStartUnix']).sum()


# Merge df_swaps (groupby) with df_poolHourDatas
df_merged = df_poolHourDatas.merge(df_swaps, how='left', on='periodStartUnix')

# Check txCount vs swaps_txCount
df_merged[['periodStartUnix', 'periodEndUnix', 'txCount', 'swaps_txCount', 'amount0', 'amount1', 'amountUSD']]




dropping 0 nan


Unnamed: 0,periodStartUnix,periodEndUnix,txCount,swaps_txCount,amount0,amount1,amountUSD
0,1627390800,1627394400,69,48.0,-5.803405e+05,261.057928,7.326168e+06
1,1627387200,1627390800,65,49.0,7.407794e+06,-3221.993390,1.027183e+07
2,1627383600,1627387200,49,38.0,5.530816e+06,-2443.345895,5.525045e+06
3,1627380000,1627383600,32,17.0,1.801318e+06,-802.095562,1.798336e+06
4,1627376400,1627380000,68,38.0,4.123740e+06,-1849.260941,4.117119e+06
...,...,...,...,...,...,...,...
1991,1620205200,1620212400,1,,,,
1992,1620201600,1620205200,1,,,,
1993,1620180000,1620201600,3,,,,
1994,1620176400,1620180000,1,,,,


In [20]:
df_poolHourDatas_ori['txCount'] = df_poolHourDatas_ori['txCount'].astype(int)
print(df_poolHourDatas_ori['txCount'].sum())
print(df_merged['swaps_txCount'].sum())
print('we need to query more data')

147090
6000.0
we need to query more data


In [23]:
df_merged[['periodStartUnix', 'periodEndUnix', 'txCount', 'swaps_txCount', 'amount0', 'amount1', 'amountUSD']].head(10)

Unnamed: 0,periodStartUnix,periodEndUnix,txCount,swaps_txCount,amount0,amount1,amountUSD
0,1627390800,1627394400,69,48.0,-580340.5,261.057928,7326168.0
1,1627387200,1627390800,65,49.0,7407794.0,-3221.99339,10271830.0
2,1627383600,1627387200,49,38.0,5530816.0,-2443.345895,5525045.0
3,1627380000,1627383600,32,17.0,1801318.0,-802.095562,1798336.0
4,1627376400,1627380000,68,38.0,4123740.0,-1849.260941,4117119.0
5,1627372800,1627376400,27,19.0,2855018.0,-1290.326251,2850297.0
6,1627369200,1627372800,27,12.0,781833.0,-353.118292,2075686.0
7,1627365600,1627369200,48,20.0,-631750.2,291.654669,2914841.0
8,1627362000,1627365600,39,17.0,2863650.0,-1302.818915,2859511.0
9,1627358400,1627362000,55,45.0,1807922.0,-818.685412,7190849.0


##### Query tokens with symbol
{
  tokens(first:10, where:{symbol: "WETH"}) {
    id
    symbol
    name
  }
}

##### Query pools with token0 id,  token1 ids and feeTiers
{
  pools(first:10, 
    where:{token0:"0x6b175474e89094c44da98b954eedeac495271d0f",
    token1: "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
    feeTier:"3000" }) 
  {
    id
    token0{symbol}
    token1{symbol}
    feeTier
  }
}

##### Query poolDayDatas with pool id, order by date - Needs to be iterative (max 1000 query)
{
  poolDayDatas(first:1000,
  next: 1000,
    where:{pool:"0xa80964c5bbd1a0e95777094420555fead1a26c1e"},
  orderBy:date,
  orderDirection: desc) 
  {
    date
    tick
    liquidity
    volumeUSD
  }
}





##### Query examples on filtering

{
  pools
  (first: 10, 
    where: {liquidity_gt: "1000000", 
      feeTier: "10000"}
    orderBy: liquidity, 
    orderDirection: desc)
  {
    token0{symbol}
    token1{symbol}
    liquidity
  }


(token0) DAI id = 0x6b175474e89094c44da98b954eedeac495271d0f
(token1) WETH id = 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
(feeTier) "3000"

(DAI-WETH 500) Pool id = 0x60594a405d53811d3bc4766596efd80fd545a270
(DAI-WETH 3000) Pool id = 0xc2e9f25be6257c210d7adf0d4cd6e3e881ba25f8
(DAI-WETH 1000) Pool id = 0xa80964c5bbd1a0e95777094420555fead1a26c1e


