### Write well, Im going to be using this for a long time

#### Data we need:
user input:
- investment amount, trading pair -> amt0, amt1
- start time and end time
- time period that you assume fixed swap price, swap volumes or liquidity positions
- upper and lower price
- pool_fee_rate

data from api:
- cprice of each time period (tick, 1.0001 ** i)
- L_pool at each time period at specific pool_fee_rate (liquidity?, or simply total X tokens + Y tokens in USD)
- Swap volume at each time period at specific pool_fee_rate (volumeUSD?)
- Gas cost to mint at each time period

--------------------------------------------------------------------------------------------------------------

#### Fees
The liquidity amount is calculated from the following numbers that describe a position: 
- amount of token 0 (amt0), amount of token 1 (amt1), 
- price (as x token 1's per token 0) at the upper limit of the position (upper), 
- price at the lower limit of the position (lower) 
- and the current swap price (cprice). 

Then liquidity (L_you?) for a position is calculated as follows:

Case 1: cprice <= lower
- liquidity = amt0 * (sqrt(upper) * sqrt(lower)) / (sqrt(upper) - sqrt(lower))

Case 2: lower < cprice <= upper
- liquidity is the min of the following two calculations:
- amt0 * (sqrt(upper) * sqrt(cprice)) / (sqrt(upper) - sqrt(cprice))
- amt1 / (sqrt(cprice) - sqrt(lower))

Case 3: upper < cprice
- liquidity = amt1 / (sqrt(upper) - sqrt(lower))

Resources
- liquidity can use this code: https://github.com/JNP777/UNI_V3-Liquitidy-amounts-calcs/blob/main/UNI_v3_funcs.py

Fee is calculated by:
- Fee income = (L_you/L_pool) * swap volume under fixed time period * pool_fee_rate/100
- L_you also should be for that specific ticks only, not the whole amount you provided for. Its not linear, its calculated from the 3 cases above
- Does Case1 and Case3's fee be 0 regardless?


reference: https://uniswapv3.flipsidecrypto.com/
- check my numbers with the reference from the website

----------------------------------------------------------------------------------------

#### Impermanent Loss (is this v2 or v3)
- IL (in %) = (2 sqrt(p) / (p+1) ) - 1
- where p = r_t1/r_t2
- and r_t is a price in b at time 1
- Net $ loss = total asset value in dollars at stake time * IL (in%)

reference: https://chainbulletin.com/impermanent-loss-explained-with-examples-math/#:~:text=Impermanent%20loss%20is%20the%20difference,is%20equal%20to%20200%20DAI

--------------------------------------------------------------------------------------------------------------

#### Other cost

Gas_costs_mint = 500000 gwei * gas_price at that time (??? double check actual cost)

### PNL/APR
-PNL = Acumulated Fees_accrued (dolar value at generation) - IL - Gas_costs_mint

-APR = PNL/Initial_capital*(age of the position / year time)

--------------------------------------------------------------------------------------------------------------

## Dependencies

In [1]:
import requests
import json
import pandas as pd
import math
import numpy as np

## Main Functions

In [2]:
# function to use requests.post to make an API call to the subgraph url
def run_query(q):

    # endpoint where you are making the request
    request = requests.post('https://api.thegraph.com/subgraphs/name/uniswap/uniswap-v3'
                            '',json={'query': q})
    if request.status_code == 200:
        return request
    else:
        raise Exception('Query failed. return code is {}.      {}'.format(request.status_code, query))
        
        
# turns requests into dataframe        
# def results_to_df(query_result):
#     json_data_ = json.loads(query_result.text)
#     df_data_ = json_data_['data']['pools']
#     df_ = pd.DataFrame(df_data_)

#     return df_

In [3]:
def get_token_id(symbol):
    
    # default should be first:10, in case there are more than 1 coins with the same symbol
    query_ = """ 
    {{
      tokens(first:1, where:{{symbol: "{}"}}) {{
        id
        symbol
        name
      }}
    }}""".format(symbol)
    
    # run query
    query_result_ = run_query(query_)
    json_data_ = json.loads(query_result_.text)
    
    print(' ')
    print('get_token_id: {}'.format(symbol))
    print(json_data_)
    
    # make sure only return 1 object
    if len(json_data_['data']['tokens']) == 1:
        token_id_ = json_data_['data']['tokens'][0]['id']
        return token_id_
        
    else:
        print(json_data_['data'])
        raise Exception('Returned number of token_ids != 1')

        
def get_pool_id(token0_id, token1_id, feeTier):
    query_ = """
    {{
      pools(first: 10, 
        where:{{token0: "{}",
        token1: "{}",
        feeTier:"{}" }}) 
      {{
        id
        token0{{symbol}}
        token1{{symbol}}
        feeTier
      }}
    }}""".format(token0_id, token1_id, feeTier)
    
    
    # run query
    query_result_ = run_query(query_)
    json_data_ = json.loads(query_result_.text)
    
    print('\n get_pool_id for feeTier: {}'.format(feeTier))
    print(json_data_)
    
    # make sure there is only 1 pool that matches exactly
    if len(json_data_['data']['pools']) == 1:
        pool_id_ = json_data_['data']['pools'][0]['id']
        return pool_id_
    else:
        print(json_data_['data'])
        raise Exception('Returned number of token_ids != 1')

        
    return json_data_

In [4]:
def get_poolDayDatas(pool_id, num_datapoints=1000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
#     remainder_ = num_datapoints%max_request_
            
    query_base_ = '''
    {{
      poolDayDatas(first:{},
      skip: {},
        where:{{ pool: "{}" }},
      orderBy:date,
      orderDirection: desc) 
      {{
        date
        tick
        liquidity
        volumeUSD
        pool{{
            token0{{
                symbol
            }}
            token1{{
                symbol
            }}
        }}
      }}
    }}'''
    
    poolDayDatas_array_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
#         print(json_data_)
        poolDayDatas_array_ += json_data_['data']['poolDayDatas']
    
    print(' ')
    print('\n Queried PoolDayDatas, total of {} datapoints'.format(str(len(poolDayDatas_array_))))
#     print('example:')
#     print(poolDayDatas_array_[0])
    
    # array to dataframe
    df_ = pd.json_normalize(poolDayDatas_array_)
    df_.drop_duplicates(subset=['date']) 
    
    return df_

In [5]:
def get_poolHourDatas(pool_id, num_datapoints=3000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
#     remainder_ = num_datapoints%max_request_
            
    query_base_ = '''
    {{
      poolHourDatas(first:{},
      skip: {},
        where:{{ pool: "{}" }},
      orderBy:periodStartUnix,
      orderDirection: desc) 
      {{
        periodStartUnix
        pool{{
            token0{{
                symbol
            }}
            token1{{
                symbol
            }}
        }}
        liquidity
        sqrtPrice
        token0Price
        token1Price
        tick
        feeGrowthGlobal0X128
        feeGrowthGlobal1X128
        tvlUSD
        volumeToken0
        volumeToken1
        volumeUSD
        feesUSD
        txCount
        open
        high
        low
        close
      }}
    }}'''
    
    poolDayDatas_array_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
#         print(json_data_)
        try:
            poolDayDatas_array_ += json_data_['data']['poolHourDatas']
        except Exception:
#             print('.. Pass')
            pass
    
    print(' ')
    print('\n Queried poolHourDatas, total of {} datapoints'.format(str(len(poolDayDatas_array_))))
#     print('example:')
#     print(poolDayDatas_array_[0])
    
    # array to dataframe
    df_ = pd.json_normalize(poolDayDatas_array_)
#     df_.drop_duplicates(subset=['periodStartUnix']) # TODO: BUGGGG ??
    
    return df_

In [6]:
def get_swaps(pool_id, time_start='1627369200', time_end='1623772800', num_datapoints=6000):
    # input: pool_id
    # num_datapoints (must be multiple of max_request_)
    
    max_request_ = 1000
    quotient_ = math.floor(num_datapoints/max_request_)
#     remainder_ = num_datapoints%max_request_
           
    query_base_ = '''
    {{
      swaps(first:{}, skip: {},
            where:{{ pool: "{}",
            timestamp_lt: "{}",
            timestamp_gt: "{}"}},
          orderBy:timestamp,
          orderDirection: desc){{
        transaction {{
          blockNumber
          timestamp
          gasUsed
          gasPrice
        }}
        id
        timestamp
        tick
        amount0
        amount1
        amountUSD
        sqrtPriceX96
      }}
    }}'''
    
    swap_arrays_ = []
    
    # query loop
    for i in range(quotient_):
        q_first_ = max_request_
        q_next_ = i*max_request_
        query_ = query_base_.format(q_first_, q_next_, pool_id, time_start, time_end)
        query_result_ = run_query(query_)
        json_data_ = json.loads(query_result_.text)
        
        try:
            swap_arrays_ += json_data_['data']['swaps']
            
        except Exception:
            pass
        
    print('Queried Swaps, total of {} datapoints'.format(str(len(swap_arrays_))))
    print(' ')
    df_ = pd.json_normalize(swap_arrays_)
    
    # next time start, if at the edge, then we keep looping and skipping
    if len(swap_arrays_) != 0:
        # last element of timestamp, add 1 so next iterations still includes it
        next_time_start_ = str( int(df_['timestamp'][df_.index[-1]]) + 1 )
    else:
        next_time_start_ = time_start
            
            
    return df_, next_time_start_


# get_swap can only request 6000 datapoints at the time. this is to loop get_swap to get more data
def get_swaps_loop(pool_id, time_start='1627369200', time_end='1623772800'): # ,  num_datapoints= 150000
    
    print('time_start: {}, time_end: {}'.format(time_start, time_end))
    
    max_num_query = 6000
#     num_iterations = math.floor(num_datapoints/max_num_query) + 3 # add 3 just in case
    
    next_time_start_ = time_start
    count = 0 # counting number of times that data returns is less than maximum, meaning reaching the end
#     for i in range(num_iterations):
    first_time = True
    while(count<5):
        print('next_time_start_: {}'.format(next_time_start_))
        df_, next_time_start_ = get_swaps(pool_id, next_time_start_, time_end, num_datapoints=max_num_query)
        
        if first_time == True:
            df_all_ = df_
        else:
            df_all_ = df_all_.append(df_)
            first_time = False
            
        if df_.shape[0] < 6000:
            count += 1
            
    # drop duplicates, reset index
    df_all_ = df_all_.drop_duplicates(subset=['id'])
    df_all_ = df_all_.reset_index(drop=True)
    
    return df_all_

In [7]:
def merge_poolHourData_swaps_all(df_poolHourDatas, df_swaps_all):
    
    # Match timestamp with hour period, and assign to df_swaps_all['periodStartUnix']
    def compute_periodStartUnix(row_):
        return row_['timestamp'] - (row_['timestamp'] % 3600)
    def compute_periodEndUnix(row_):
        return row_['periodStartUnix'] + 3600

    df_swaps_all['timestamp'] = df_swaps_all['timestamp'].astype(int)    
    df_swaps_all['periodStartUnix'] = df_swaps_all.apply(lambda row: compute_periodStartUnix(row), axis=1)                       
    df_swaps_all['periodEndUnix'] = df_swaps_all.apply(lambda row: compute_periodEndUnix(row), axis=1)                       

    df_swaps_all['periodStartUnix'] = df_swaps_all['periodStartUnix'].astype(int)        
    df_swaps_all['periodEndUnix'] = df_swaps_all['periodEndUnix'].astype(int)
    
    # Create swaps_txCount to compare with txCount in poolHourDatas to check integrity
    df_swaps_all['swaps_txCount'] = 1

    # Groupby->Sum based on periodStartUnix, specify columns to sum at GROUPBY_COLS
    GROUPBY_COLS = ['periodStartUnix','amount0', 'amount1', 'amountUSD', 'swaps_txCount']
    df_swaps_to_merge = df_swaps_all[GROUPBY_COLS]
    df_swaps_to_merge = df_swaps_to_merge.astype({'periodStartUnix': 'int',
                                                 'amount0':'float','amount1':'float', 
                                                  'amountUSD':'float', 'swaps_txCount':'int'})
    df_swaps_to_merge = df_swaps_to_merge.groupby(by=['periodStartUnix']).sum()

    # Merge df_swaps_all (groupby) with df_poolHourDatas
    df_poolHourDatas['periodStartUnix'] = df_poolHourDatas['periodStartUnix'].astype(int)
    df_merged = df_poolHourDatas.merge(df_swaps_to_merge, how='left', on='periodStartUnix')
    df_merged['txCount'] = df_merged['txCount'].astype(int)
    
    return df_merged

In [8]:
def get_data(token0, token1, feeTier):
    
    # Indicate Tokens and FeeTier
    token0_id = get_token_id(TOKEN0)
    token1_id = get_token_id(TOKEN1)
    pool_id = get_pool_id(token0_id, token1_id, feeTier)

    # Get poolHourDatas
    df_poolHourDatas = get_poolHourDatas(pool_id, num_datapoints=10000)

    # Get Swap Datas within the poolHourDatas timeframe
    time_start = df_poolHourDatas['periodStartUnix'][0]
    time_end = df_poolHourDatas['periodStartUnix'][df_poolHourDatas.index[-1]]
    
    # Get Swaps data
    df_swaps_all = get_swaps_loop(pool_id, time_start, time_end) # ,  num_datapoints= 150000
    
    # Saving settings
    SETTINGS = '{}-{}-{}-timestamp-{}-{}.csv'.format(TOKEN0, TOKEN1, feeTier, time_start, time_end)
    df_swaps_all.to_csv('../data/df_swaps_all_'+SETTINGS)
    df_poolHourDatas.to_csv('../data/df_poolHourDatas_'+SETTINGS)
    
    # Merge data
    df_merged = merge_poolHourData_swaps_all(df_poolHourDatas, df_swaps_all)
    
    df_merged.to_csv('../data/df_merged_tmp_'+SETTINGS)
    
    return df_merged

### Test running

In [9]:
TOKEN0 = 'DAI'
TOKEN1 = 'WETH'
feeTier = '10000' 

df_merged = get_data(TOKEN0, TOKEN1, feeTier)

 
get_token_id: USDC
{'data': {'tokens': [{'id': '0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48', 'name': 'USD Coin', 'symbol': 'USDC'}]}}
 
get_token_id: WETH
{'data': {'tokens': [{'id': '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2', 'name': 'Wrapped Ether', 'symbol': 'WETH'}]}}

 get_pool_id for feeTier: 500
{'data': {'pools': [{'feeTier': '500', 'id': '0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640', 'token0': {'symbol': 'USDC'}, 'token1': {'symbol': 'WETH'}}]}}
 

 Queried poolHourDatas, total of 2035 datapoints
time_start: 1627570800, time_end: 1620248400
next_time_start_: 1627570800
Queried Swaps, total of 6000 datapoints
next_time_start_: 1627489674
Queried Swaps, total of 6000 datapoints
next_time_start_: 1627409524
Queried Swaps, total of 6000 datapoints
next_time_start_: 1627341227
Queried Swaps, total of 6000 datapoints
next_time_start_: 1627280471
Queried Swaps, total of 6000 datapoints
next_time_start_: 1627208696
Queried Swaps, total of 6000 datapoints
next_time_start_: 162712348

In [12]:
watch_list = ['periodStartUnix',  
              'txCount', 'swaps_txCount', # check data integrity
              'amount0', 'amount1', 'amountUSD', # swaps data
              'tick', 'liquidity', 'sqrtPrice', 'tvlUSD', # pool data at that hour
              'pool.token0.symbol', 'pool.token1.symbol', # token data
              'token0Price', 'token1Price'
             ]
df_merged[watch_list]

Unnamed: 0,periodStartUnix,txCount,swaps_txCount,amount0,amount1,amountUSD,tick,liquidity,sqrtPrice,tvlUSD,pool.token0.symbol,pool.token1.symbol,token0Price,token1Price
0,1627570800,182,,,,,198830,14217435406485926211,1645151892553601129443369411738074,126891839.1044136223362627965871671,USDC,WETH,2319.24786070317883446109972685119,0.000431174268582404719937353948296732
1,1627567200,315,,,,,198820,14025917622023942223,1644316345397116045287347546862068,126769503.069460643317290985266683,USDC,WETH,2321.605476754829289492076433957798,0.0004307364063414483195519970590016063
2,1627563600,242,,,,,198838,14272132584563510513,1645853950731195101714475688302569,126830071.2325492870556803075613774,USDC,WETH,2317.269678311111008345167426344003,0.0004315423488943363391698603040640706
3,1627560000,249,,,,,198933,14399659125348661471,1653636347772442269891409394750425,126342457.6095666842822957478827964,USDC,WETH,2295.509783132289853921725231752128,0.0004356330813086193513775200040370431
4,1627556400,278,,,,,198936,14399659125348661471,1653889462485417757326257651829770,126358883.6523205036923223774094965,USDC,WETH,2294.807217631692948746876180483792,0.0004357664523262345155702357535437782
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2030,1620262800,33,,,,,194704,9911500681148859,1338533781234103800070785694341665,75303.40204770032678577648543431339,USDC,WETH,3503.48687747330231081310091037641,0.0002854299259488578802155431309739119
2031,1620259200,10,,,,,194771,8819118420039392,1343004819178433089573866726711843,55592.55352477641733443242345374899,USDC,WETH,3480.198578839455148182227712773739,0.0002873399253939902679742194998086739
2032,1620255600,8,,,,,194725,3990726651924459,1339927351757144388712475515525518,15990.5025439940886751925509600602,USDC,WETH,3496.203173435439004591148467358266,0.0002860245673358222042637210492706507
2033,1620252000,4,1.0,-119.744094,0.035,119.288101,194996,34507310469936,1358206768703179146794161129278934,650.8030174301385212244959409688095,USDC,WETH,3402.729189121879123675172116361073,0.0002938817473917351905267260503650961


In [14]:
# Eye check:
print(df_poolHourDatas.shape)
print(df_swaps_all.shape)
print(df_merged.shape)

print(df_merged['txCount'].sum())
print(df_merged['swaps_txCount'].sum())

(2035, 24)
396082
1.0


In [None]:
df_merged.head()

In [21]:
df_poolHourDatas = pd.read_csv('../data/df_poolHourDatas_USDC-WETH-500-timestamp-1627570800-1620248400.csv')
df_swaps_all = pd.read_csv('../data/df_swaps_all_USDC-WETH-500-timestamp-1627570800-1620248400.csv')

In [25]:
df_swaps_all

Unnamed: 0.1,Unnamed: 0,amount0,amount1,amountUSD,id,sqrtPriceX96,tick,timestamp,transaction.blockNumber,transaction.gasPrice,transaction.gasUsed,transaction.timestamp
0,0,-119.744094,0.035,119.288101,0x0804ff007263a885191f23c808a9346e62d502a1fc23...,1358206768703179146794161129278934,194996,1620252901,12376891,72600000000,307677,1620252901


In [24]:
df_poolHourDatas.shape

(2035, 21)

##### Query tokens with symbol
{
  tokens(first:10, where:{symbol: "WETH"}) {
    id
    symbol
    name
  }
}

##### Query pools with token0 id,  token1 ids and feeTiers
{
  pools(first:10, 
    where:{token0:"0x6b175474e89094c44da98b954eedeac495271d0f",
    token1: "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2",
    feeTier:"3000" }) 
  {
    id
    token0{symbol}
    token1{symbol}
    feeTier
  }
}

##### Query poolDayDatas with pool id, order by date - Needs to be iterative (max 1000 query)
{
  poolDayDatas(first:1000,
  next: 1000,
    where:{pool:"0xa80964c5bbd1a0e95777094420555fead1a26c1e"},
  orderBy:date,
  orderDirection: desc) 
  {
    date
    tick
    liquidity
    volumeUSD
  }
}





##### Query examples on filtering

{
  pools
  (first: 10, 
    where: {liquidity_gt: "1000000", 
      feeTier: "10000"}
    orderBy: liquidity, 
    orderDirection: desc)
  {
    token0{symbol}
    token1{symbol}
    liquidity
  }


(token0) DAI id = 0x6b175474e89094c44da98b954eedeac495271d0f
(token1) WETH id = 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2
(feeTier) "3000"

(DAI-WETH 500) Pool id = 0x60594a405d53811d3bc4766596efd80fd545a270
(DAI-WETH 3000) Pool id = 0xc2e9f25be6257c210d7adf0d4cd6e3e881ba25f8
(DAI-WETH 1000) Pool id = 0xa80964c5bbd1a0e95777094420555fead1a26c1e




In [None]:
pool_id