# Computing daily ONN index on Binance

## Getting all available symbols on Binance

In this section, we just get all available assets on Binance, generating at the end a file named `all-binance-tokens.csv`. Each line in this file contains the asset symbol two times, separated by comma (e.g. `ETH,ETH`). This is because we will use it afterwards in the __[`bnbFetchCoinData.py`](https://github.com/srmq/bnbfetch-coin-data)__ script, which expects the CSV file of lines in the format `asset_description,asset_symbol`, and we don't care about descriptions here.

In [18]:
import httpx
import pandas as pd

In [2]:
binanceBaseURL = 'https://api.binance.com'

In [3]:
async def getExchangeInfo():
    spotExchangeInfoURL = binanceBaseURL + "/api/v3/exchangeInfo"

    async with httpx.AsyncClient() as client:
        r = await client.get(spotExchangeInfoURL)
    return r.json()

In [4]:
exchangeInfo = await getExchangeInfo()

In [5]:
#blacklist = None
#with open('blacklist.txt', 'r') as f:
#    blacklist = f.read().split(',')
# blacklist should be disabled for now, we need coin data for every coin

In [6]:
baseAssets = set()
#for coinSymbol in exchangeInfo['symbols']:
#    if not (coinSymbol['baseAsset'].lower() in blacklist):
#        baseAssets.add(coinSymbol['baseAsset'])

In [7]:
for coinSymbol in exchangeInfo['symbols']:
    baseAssets.add(coinSymbol['baseAsset'])

In [8]:
def notLeveraged(asset):
    return not (asset.endswith('UP') or asset.endswith('DOWN') or asset.endswith('BEAR') or asset.endswith('BULL'))

In [9]:
baseAssets = set(filter(notLeveraged, baseAssets))

In [10]:
baseAssets

{'1INCH',
 'AAVE',
 'ACA',
 'ACH',
 'ACM',
 'ADA',
 'ADX',
 'AE',
 'AERGO',
 'AGI',
 'AGIX',
 'AGLD',
 'AION',
 'AKRO',
 'ALCX',
 'ALGO',
 'ALICE',
 'ALPACA',
 'ALPHA',
 'ALPINE',
 'AMB',
 'AMP',
 'ANC',
 'ANKR',
 'ANT',
 'ANY',
 'APE',
 'API3',
 'APPC',
 'AR',
 'ARDR',
 'ARK',
 'ARN',
 'ARPA',
 'ASR',
 'AST',
 'ASTR',
 'ATA',
 'ATM',
 'ATOM',
 'AUCTION',
 'AUD',
 'AUDIO',
 'AUTO',
 'AVA',
 'AVAX',
 'AXS',
 'BADGER',
 'BAKE',
 'BAL',
 'BAND',
 'BAR',
 'BAT',
 'BCC',
 'BCD',
 'BCH',
 'BCHA',
 'BCHABC',
 'BCHSV',
 'BCN',
 'BCPT',
 'BDOT',
 'BEAM',
 'BEL',
 'BETA',
 'BETH',
 'BGBP',
 'BICO',
 'BIFI',
 'BKRW',
 'BLZ',
 'BNB',
 'BNT',
 'BNX',
 'BOND',
 'BOT',
 'BQX',
 'BRD',
 'BSW',
 'BTC',
 'BTCB',
 'BTCST',
 'BTG',
 'BTS',
 'BTT',
 'BTTC',
 'BURGER',
 'BUSD',
 'BZRX',
 'C98',
 'CAKE',
 'CDT',
 'CELO',
 'CELR',
 'CFX',
 'CHAT',
 'CHESS',
 'CHR',
 'CHZ',
 'CITY',
 'CKB',
 'CLOAK',
 'CLV',
 'CMT',
 'CND',
 'COCOS',
 'COMP',
 'COS',
 'COTI',
 'COVER',
 'CREAM',
 'CRV',
 'CTK',
 'CTSI',
 'CTXC

In [11]:
with open('all-binance-tokens.csv', 'w') as f:
    for asset in baseAssets:
        f.write(f"{asset},{asset}\n")

Now that we have our `all-binance-tokens.csv` file, we can download, for instance, the complete historical data for all assets on Binance, using something like:

This will create a `coindata` directory under the current one with everything we need.

## Importing Data to QuestDB

We will use __[QuestDB](https://questdb.io/)__ in order to facilitate data manipulation and keep memory requirements low (we won't need to load everything on RAM). First we will create a table for holding the data. Here is what we did:

`CREATE TABLE binance_daily(symb SYMBOL capacity 2048, open_time TIMESTAMP, high DOUBLE, low DOUBLE, close DOUBLE, volume DOUBLE, close_time TIMESTAMP, quote_asset_vol DOUBLE, num_trades LONG, takerbuy_baseasset_vol DOUBLE, takerbuy_quoteasset_vol DOUBLE), index(symb) timestamp(open_time) PARTITION BY YEAR;`

Now we will import the data (that went to the `coindata` subdirectory)...

In [1]:
import glob

allCSVFiles = glob.glob('./coindata/*.csv')

In [5]:
import httpx
import csv

host = 'http://localhost:9000'

async def questInsert(symbol: str, filename: str, lotSize=300):
    async def executeQuery(valStrings):
        sql_query = "INSERT INTO binance_daily VALUES"
        sql_query += valStrings[0]
        for i in range(1, len(valStrings)):
            sql_query += f",{valStrings[i]}"
        query_params = {'query': sql_query, 'fmt': 'json'}
        async with httpx.AsyncClient() as client:
            await client.get(host + '/exec', params=query_params)
        
    with open(filename, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        valStrings = []
        for row in reader:
            if len(valStrings) == lotSize:
                await executeQuery(valStrings)
                valStrings = []
            val = f"(\'{symbol}\', {int(row['Open time'])*1000}, {float(row['High'])}, {float(row['Low'])}, {float(row['Close'])}, {float(row['Volume'])}, {int(row['Close time'] + '999')}, {float(row['Quote asset volume'])}, {int(row['Number of trades'])}, {float(row['Taker buy base asset volume'])}, {float(row['Taker buy quote asset volume'])})"
            valStrings.append(val)
        if len(valStrings) > 0:
            await executeQuery(valStrings)

In [6]:
from pathlib import Path
i = 0
for csvFile in allCSVFiles:
    symb = Path(csvFile).name.split('.')[0]
    await questInsert(symb, csvFile)
    i += 1
    if i % 100 == 0:
        print(f"Inserted {i} symbols out of {len(allCSVFiles)}")    

Inserted 100 symbols out of 1821
Inserted 200 symbols out of 1821
Inserted 300 symbols out of 1821
Inserted 400 symbols out of 1821
Inserted 500 symbols out of 1821
Inserted 600 symbols out of 1821
Inserted 700 symbols out of 1821
Inserted 800 symbols out of 1821
Inserted 900 symbols out of 1821
Inserted 1000 symbols out of 1821
Inserted 1100 symbols out of 1821
Inserted 1200 symbols out of 1821
Inserted 1300 symbols out of 1821
Inserted 1400 symbols out of 1821
Inserted 1500 symbols out of 1821
Inserted 1600 symbols out of 1821
Inserted 1700 symbols out of 1821
Inserted 1800 symbols out of 1821


Now QuestDB has ingested all data. Let's play!

## Playing with the data

In [1]:
import pandas as pd
import httpx

In [2]:
host = 'http://localhost:9000'

In [3]:
async def daysTradedInMonthBySymbol(year: int, month: int) -> pd.DataFrame:
    assert(month >= 1 and month <= 12)
    assert(year >= 2000 and year <= 9999)
    if month == 12:
        nextMonth = 1
        nextMonthYear = year + 1
    else:
        nextMonth = month + 1
        nextMonthYear = year
    sql_query = "select symb, count(*) as daystraded from 'binance_daily' where open_time >= '%d-%02d-01T00:00:00.000000Z' and open_time < '%d-%02d-01T00:00:00.000000Z' and num_trades >= 1 group by symb"%(year, month, nextMonthYear, nextMonth)
    query_params = {'query': sql_query, 'fmt': 'json'}
    async with httpx.AsyncClient() as client:
            r = await client.get(host + '/exec', params=query_params)
    jsonR = r.json()
    return pd.DataFrame(columns=['symb', 'daysTraded'], data=jsonR['dataset'])

In [4]:
import json
with open('coindata/coin-metadata.json', 'r') as f:
    coinMetadata = json.loads(f.read())
symbols = coinMetadata['exchange-info']['symbols']
symbolDict = {}
symbolsByBaseDict = {}
for symbol in symbols:
    symbolDict[symbol['symbol']] = symbol
    baseSymbols = symbolsByBaseDict.get(symbol['baseAsset'], [])
    baseSymbols.append(symbol)
    symbolsByBaseDict[symbol['baseAsset']] = baseSymbols
del symbols

In [5]:
from typing import Set
# It starts as all assets with TRADING status
async def initialAssetCandidateSet() -> Set[str]:
    candidateAssets = set()
    for symbol in symbolDict:
        baseAsset = symbolDict[symbol]['baseAsset']
        if symbolDict[symbol]['status'] == 'TRADING':
            if not baseAsset.endswith('UP') and not baseAsset.endswith('DOWN') and not baseAsset.endswith('BEAR') and not baseAsset.endswith('BULL'):
                candidateAssets.add(baseAsset)
    
    return candidateAssets

In [6]:
def getTradeSymbols(baseAssetSet):
    symbols = []
    for base in baseAssetSet:
        baseInfo = symbolsByBaseDict[base]
        for el in baseInfo:
            symbols.append(el['symbol'])
    return symbols

In [7]:
# We will use USDT to compare apples to apples
def shortestPathToUSDT(fromBaseAsset, coinGraph):
        
    if fromBaseAsset == 'USDT':
        return [fromBaseAsset]
    parent = {}
    explored = set()
    frontier = [fromBaseAsset]
    def solution(baseAsset):
        if parent.get(baseAsset) is None:
            return [baseAsset]
        else:
            result = solution(parent[baseAsset])
            result.append(baseAsset)
            return result

    while len(frontier) > 0:
        node = frontier.pop(0)
        explored.add(node)
        for coin in coinGraph[node]:
            if coin not in explored and coin not in frontier:
                parent[coin] = node
                if coin == 'USDT':
                    return solution(coin)
                else:
                    frontier.append(coin)
    assert False, "Should never get here"    

In [8]:
import psutil
import numpy as np
import concurrent.futures
def splitAddUSDTInfo(df, splittedDf, coinGraph):
    volInUSDT = []
    closePricesInUSDT = []

    for (_, row) in splittedDf.iterrows():
        # print("(Base asset: %s, Quote asset: %s)"%(row['baseAsset'],row['quoteAsset']))
        if row['quoteAsset'] == 'USDT':
            val = row['volume'] * row['close']
            closePriceUSDT = row['close']
        else:
            path = shortestPathToUSDT(row['quoteAsset'], coinGraph)
            multiplier = 1.0
            assert len(path) > 1
            i = 0
            while i + 1 < len(path):
                dfFiltered = df[(df['baseAsset'] == path[i])
                                     & (df['quoteAsset'] == path[i + 1])
                                     & (df['open_time'] == row['open_time'])]
                if dfFiltered.shape[0] > 0:
                    # print(dfFiltered)
                    pathRow = dfFiltered.iloc[0]
                    multiplier *= pathRow['close']
                else:
                    dfFiltered = df[(df['baseAsset'] == path[i + 1]) 
                            & (df['quoteAsset'] == path[i])
                            & (df['open_time'] == row['open_time'])]
                    # print(dfFiltered)
                    pathRow = dfFiltered.iloc[0]
                    multiplier *= 1.0 / pathRow['close']
                i += 1
            assert path[i] == 'USDT'
            val = row['volume'] * row['close'] * multiplier
            closePriceUSDT = row['close'] * multiplier
        volInUSDT.append(val)
        closePricesInUSDT.append(closePriceUSDT)
    splittedDf.insert(splittedDf.shape[1], 'volInUSDT', volInUSDT)
    splittedDf.insert(splittedDf.shape[1], 'closePriceUSDT', closePricesInUSDT)
    del volInUSDT
    del closePricesInUSDT
    return splittedDf
    
def addUSDTInfo(df: pd.DataFrame, coinGraph) -> pd.DataFrame:
    numprocs = psutil.cpu_count(logical=False)
    splittedDfs = np.array_split(df, numprocs)
    dfResults = []
    with concurrent.futures.ProcessPoolExecutor(max_workers=numprocs) as executor:
        results = [ executor.submit(splitAddUSDTInfo, df=df, splittedDf=splittedDf, coinGraph=coinGraph) for splittedDf in splittedDfs ]
        for result in concurrent.futures.as_completed(results):
            try:
                dfResults.append(result.result())
            except Exception as ex:
                print("BOOOO! Exception!")
                print(ex)
                pass
    
    df = pd.concat(dfResults)
    return df

In [9]:
async def getMedianUSDTTradeVol(baseAssetSet: Set[str], isoInstantFrom: str, isoInstantTo: str) -> pd.DataFrame:
    def listToStr(l):
        result = l[0]
        for i in range(1, len(l)):
            result += f", {l[i]}"
        return result
    
    def genCoinGraph():
        from collections import defaultdict
        coinGraph = defaultdict(set)
        for key in symbolDict:
            symbol = symbolDict[key]
            coinGraph[symbol['baseAsset']].add(symbol['quoteAsset'])
            coinGraph[symbol['quoteAsset']].add(symbol['baseAsset'])
        return coinGraph
    
    coinGraph = genCoinGraph()
     
    columns = ['symb', 'open_time', 'close', 'volume', 'close_time', 'quote_asset_vol']
    sql_query = "SELECT %s FROM 'binance_daily' WHERE open_time >= '%s' and open_time < '%s'"%(listToStr(columns), isoInstantFrom, isoInstantTo)    
    #print(sql_query)
    query_params = {'query': sql_query, 'fmt': 'json'}
    async with httpx.AsyncClient() as client:
            r = await client.get(host + '/exec', params=query_params)
    jsonR = r.json()
    df = pd.DataFrame(columns=columns, data=jsonR['dataset'])
    del r
    del jsonR
    df['baseAsset'] = df.apply(lambda row: symbolDict[row['symb']]['baseAsset'], axis = 1)
    df['quoteAsset'] = df.apply(lambda row: symbolDict[row['symb']]['quoteAsset'], axis = 1)
      
    df = addUSDTInfo(df, coinGraph)
    df.sort_values(['close_time'], ascending=[True], ignore_index=True, inplace=True)
    dfSymbLastPrices = df[['symb', 'baseAsset', 'quoteAsset', 'volInUSDT', 'closePriceUSDT']].groupby(['symb', 'baseAsset', 'quoteAsset'], as_index=False).last()
    dfBaseAssetLastPrices = pd.DataFrame(columns= ['baseAsset', 'lastAvgPriceUSDT'])
    for _, group in dfSymbLastPrices.groupby('baseAsset'):
        num = 0.0
        den = 0.0
        for (_, row) in group.iterrows():
            num += row['closePriceUSDT']*row['volInUSDT']
            den += row['volInUSDT']
        avgPrice = num / den
        dfBaseAssetLastPrices.loc[len(dfBaseAssetLastPrices)] = [row['baseAsset'], avgPrice]
    dfBaseAssetLastPrices = dfBaseAssetLastPrices.set_index('baseAsset')

    df = df[['open_time', 'baseAsset', 'volInUSDT']].groupby(['baseAsset', 'open_time'], as_index=False).sum()    
    df = df[['baseAsset', 'volInUSDT']].groupby(['baseAsset'], as_index=True).median()
    df = pd.concat([df, dfBaseAssetLastPrices], axis=1, join='inner')
    df.reset_index(inplace=True)
    df.sort_values(['volInUSDT'], ascending=[False], ignore_index=True, inplace=True)
    return df

In [10]:
def getBlacklistSet(blackListFile) -> Set[str]:
    blacklist = None
    with open(blackListFile, 'r') as f:
        blacklist = f.read().split(',')
    return set(blacklist)

In [11]:
async def daysTradedBetweenBySymbol(isoInstantFrom: str, isoInstantTo: str) -> pd.DataFrame:
    sql_query = "select symb, count(*) as daystraded from 'binance_daily' where open_time >= '%s' and open_time < '%s' and num_trades >= 1 group by symb"%(isoInstantFrom, isoInstantTo)
    query_params = {'query': sql_query, 'fmt': 'json'}
    async with httpx.AsyncClient() as client:
            r = await client.get(host + '/exec', params=query_params)
    jsonR = r.json()
    return pd.DataFrame(columns=['symb', 'daysTraded'], data=jsonR['dataset'])

In [12]:
from datetime import date
from typing import List

# we will consider that an eligible asset should have at least one trade
# pair that was negotiated every day of the month
async def symbolsTradedEveryDay(isoInstantFrom: str, isoInstantTo: str) -> List[str]:
    fromDate = date.fromisoformat(isoInstantFrom[:10])
    toDate = date.fromisoformat(isoInstantTo[:10])
    daysBetween = abs(toDate - fromDate).days    
    df = await daysTradedBetweenBySymbol(isoInstantFrom, isoInstantTo)
    return df[df['daysTraded']==daysBetween]['symb'].tolist()

In [13]:
async def assetsTradedEveryDay(isoInstantFrom: str, isoInstantTo: str) -> Set[str]:
    result = set()
    symbols = await symbolsTradedEveryDay(isoInstantFrom, isoInstantTo)
    for symbol in symbols:
        symbolInfo = symbolDict[symbol]
        result.add(symbolInfo['baseAsset'])
        result.add(symbolInfo['quoteAsset'])
    return result

In [14]:
async def eligibleAssetsAndMedianVolumes(isoInstantFrom: str, isoInstantTo: str, getBlacklistFunc, whitelistFunc=None) -> pd.DataFrame:
    initialAssets = await initialAssetCandidateSet()
    df = await getMedianUSDTTradeVol(initialAssets, isoInstantFrom, isoInstantTo)
    blackListSet = getBlacklistFunc()
    print('blackListSet')
    print(blackListSet)
    df = df[~df['baseAsset'].str.lower().isin(blackListSet)]
    if whitelistFunc is not None:
        print('whitelist is present')
        whitelistSet = whitelistFunc()
        df = df[df['baseAsset'].str.lower().isin(whitelistSet)]
    else:
        print('no whitelist being used')
    everydayAssets = await assetsTradedEveryDay(isoInstantFrom, isoInstantTo)
    df = df[df['baseAsset'].isin(everydayAssets)]
    df.reset_index(drop=True, inplace=True)
    maxVolume = df['volInUSDT'][0]
    df = df[df['volInUSDT'] >= (0.005 * maxVolume)]
    df.reset_index(inplace=True, drop=True)
    return df

Now we have everything to compute our eligibleAssets, and we will proceed to select among then the ones that will be part of the index constituents... For this we will need market cap data, which is not available from Binance API, so we will get them from Coin Gecko.

In [15]:
geckoBaseURL = "https://api.coingecko.com:443/api/v3"

In [16]:
from typing import Dict
import asyncio
async def baseAsset2GeckoId(exchangeId = 'binance') -> Dict[str, str]:
    result = {}
    done = False
    page = 1
    while not done:
        query_params = {'page': page, 'order': 'trust_score_desc'}
        print(f"Page {page}. Pausing...")
        await asyncio.sleep(4)
        async with httpx.AsyncClient() as client:
            r = await client.get(geckoBaseURL + f"/exchanges/{exchangeId}/tickers", params=query_params, timeout=30)
        tickerInfo = r.json()
        if len(tickerInfo['tickers']) == 0:
            done = True
        else:
            page += 1
            for ticker in tickerInfo['tickers']:
                baseAsset = ticker['base']
                quoteAsset = ticker['target']
                if "coin_id" in ticker:
                    if not baseAsset in result:
                        result[baseAsset] = ticker['coin_id']
                if "target_coin_id" in ticker:
                    if not quoteAsset in result:
                        result[quoteAsset] = ticker['target_coin_id']
    return result    

In [17]:
baseAssets2GeckoDict = await baseAsset2GeckoId()

Page 1. Pausing...
Page 2. Pausing...
Page 3. Pausing...
Page 4. Pausing...
Page 5. Pausing...
Page 6. Pausing...
Page 7. Pausing...
Page 8. Pausing...
Page 9. Pausing...
Page 10. Pausing...
Page 11. Pausing...
Page 12. Pausing...
Page 13. Pausing...
Page 14. Pausing...
Page 15. Pausing...
Page 16. Pausing...


In [18]:
baseAssets2GeckoDict

{'BUSD': 'binance-usd',
 'USDT': 'tether',
 'BTC': 'bitcoin',
 'USDC': 'usd-coin',
 'ETH': 'ethereum',
 'BETH': 'binance-eth',
 'WBTC': 'wrapped-bitcoin',
 'SOL': 'solana',
 'BNB': 'binancecoin',
 'ETC': 'ethereum-classic',
 'XRP': 'ripple',
 'MATIC': 'matic-network',
 'OP': 'optimism',
 'ADA': 'cardano',
 'AVAX': 'avalanche-2',
 'DOGE': 'dogecoin',
 'DOT': 'polkadot',
 'APE': 'apecoin',
 'FLOW': 'flow',
 'TUSD': 'true-usd',
 'FTM': 'fantom',
 'SAND': 'the-sandbox',
 'ATOM': 'cosmos',
 'NEAR': 'near',
 'USDP': 'paxos-standard',
 'WAVES': 'waves',
 'LINK': 'chainlink',
 'FIL': 'filecoin',
 'LTC': 'litecoin',
 'GMT': 'stepn',
 'MANA': 'decentraland',
 'DAI': 'dai',
 'FTT': 'ftx-token',
 'SHIB': 'shiba-inu',
 'TRX': 'tron',
 'EOS': 'eos',
 'BCH': 'bitcoin-cash',
 'AXS': 'axie-infinity',
 'UNI': 'uniswap',
 'BIDR': 'binanceidr',
 'DYDX': 'dydx',
 'AAVE': 'aave',
 'XMR': 'monero',
 'GALA': 'gala',
 'VET': 'vechain',
 'CHZ': 'chiliz',
 'ZEC': 'zcash',
 'RUNE': 'thorchain',
 'PAXG': 'pax-gold

In [19]:
import dateutil
from dateutil.parser import parse
import datetime as dt
import pytz

async def computeMedianDailyMarketCapAndLastCirSupply(geckoCoinId: str, isoInstantFrom: str, isoInstantTo: str, maxDelaySeconds=7200) -> (float, float):
    dtFrom = parse(isoInstantFrom)
    dtTo = parse(isoInstantTo)
    epochSecondsFrom = dtFrom.timestamp()
    epochSecondsTo = dtTo.timestamp()
    query_params = {'vs_currency': 'usd', 'from': epochSecondsFrom, 'to': epochSecondsTo}
    async with httpx.AsyncClient() as client:
        r = await client.get(geckoBaseURL + f"/coins/{geckoCoinId}/market_chart/range", params=query_params, timeout=30)
    jsonR = r.json()
    if jsonR is None or not 'market_caps' in jsonR:
        print("WARNING: Could not find market_caps for %s, ignoring (returning 0,0)"%geckoCoinId)
        return (0,0)
    df = pd.DataFrame(columns=['epoch', 'market_cap'], data=jsonR['market_caps'])
    if (df.shape[0] < 1 or df.shape[1] != 2):
        print("WARNING: Unexpected shape for %s, ignoring (returning 0,0)"%geckoCoinId)
        return (0,0)
    if df['epoch'].isnull().values.any():
        print("WARNING: coingecko data for %s, contains empty epochs, ignoring (returning 0,0)"%geckoCoinId)
        return (0,0)

    df['isodate'] = df.apply(lambda row: dt.datetime.fromtimestamp(int(row['epoch'])/1000, tz=pytz.utc).isoformat()[:10] if (len(str(row['epoch'])) >= 13) else dt.datetime.fromtimestamp(int(row['epoch']), tz=pytz.utc).isoformat()[:10], axis = 1)
    df.sort_values(['epoch'], ascending=[True], ignore_index=True, inplace=True)
    lastMarketCap = df.iloc[-1]
    lastMarketCapDT = dt.datetime.fromtimestamp(int(lastMarketCap['epoch']/1000), tz=pytz.utc) if (len(str(lastMarketCap['epoch'])) >= 13) else dt.datetime.fromtimestamp(int(lastMarketCap['epoch']), tz=pytz.utc)
    if (abs(dtTo - lastMarketCapDT).total_seconds() > maxDelaySeconds):
        print("computeMedianDailyMarketCapAndLastCirSupply: WARNING last market cap data for %s is too old, ignoring (returning 0,0)"%geckoCoinId)
        return (0,0)    
    dfPrices = pd.DataFrame(columns=['epoch', 'price'], data=jsonR['prices'])
    dfPrices.sort_values(['epoch'], ascending=[True], ignore_index=True, inplace=True)
    lastPrice = dfPrices.iloc[-1]
    lastPriceDT = dt.datetime.fromtimestamp(int(lastPrice['epoch']/1000), tz=pytz.utc) if (len(str(lastPrice['epoch'])) >= 13) else dt.datetime.fromtimestamp(int(lastPrice['epoch']), tz=pytz.utc)
    if (abs(dtTo - lastPriceDT).total_seconds() >= maxDelaySeconds):
        print("computeMedianDailyMarketCapAndLastCirSupply: WARNING last price data for %s is too old, ignoring (returning 0,0)"%geckoCoinId)
        return (0,0)                                                                                                                                         
    df = df[['isodate', 'market_cap']].groupby(['isodate'], as_index=False).last()
    return (df['market_cap'].median(), float(lastMarketCap['market_cap'])/float(lastPrice['price']))

In [20]:
def addWeights(indexConstituents: pd.DataFrame):
    indexConstituents['lastMarketcap'] = indexConstituents['lastCircSupply']*indexConstituents['lastAvgPriceUSDT']
    totalLastMarketcap = indexConstituents['lastMarketcap'].sum()
    indexConstituents['weight'] = indexConstituents['lastMarketcap']/totalLastMarketcap
    print(indexConstituents)
    assert(abs(indexConstituents['weight'].sum() - 1.0) < 0.0000001) 

In [21]:
# as we are severely rate limited by coingecko (poor free user) we will build a cache
# of median daily market cap and last circulating supply data that will benefit
# us over repeated runs
import sqlite3
from sqlite3 import Error
def createCacheDBConnection(dbFile):
    conn = None
    try:
        conn = sqlite3.connect(dbFile)
        print(sqlite3.version)
    except Error as e:
        print(e)
    finally:
        return conn
    
def createTable(conn, sqlStmt):
    try:
        c = conn.cursor()
        c.execute(sqlStmt)
    except Error as e:
        print(e)
        
def createGeckoCacheDataBase(conn=None):
    sqlStmt = """CREATE TABLE IF NOT EXISTS MedianDailyMarketCapAndLastCirSupply (
                    id text PRIMARY KEY,
                    medDailyMarketCap real NOT NULL,
                    lastCirSupply real NOT NULL
                 );"""
    if conn is None:
        conn = createCacheDBConnection("geckoCache.db")
    if conn is not None:
        createTable(conn, sqlStmt)
        return conn
    else:
        print("Error, could not create cache database")
        
def insertCacheData(geckoId: str, isoInstantFrom: str, isoInstantTo: str, marketCap: float, cirSupply: float, conn):
    sql = 'INSERT INTO MedianDailyMarketCapAndLastCirSupply(id, medDailyMarketCap, lastCirSupply) VALUES(?, ?, ?)'
    key = geckoId + "_" + isoInstantFrom + "_" + isoInstantTo
    value = (key, marketCap, cirSupply)
    cur = conn.cursor()
    cur.execute(sql, value)
    conn.commit()
    
def getCachedData(geckoId: str, isoInstantFrom: str, isoInstantTo: str, conn):
    key = geckoId + "_" + isoInstantFrom + "_" + isoInstantTo
    cur = conn.cursor()
    cur.execute("SELECT medDailyMarketCap, lastCirSupply FROM MedianDailyMarketCapAndLastCirSupply WHERE id=?", (key,))
    rows = cur.fetchall()
    if (len(rows) > 0):
        for row in rows:
            return (row[0], row[1])
    else:
        return (None, None)

In [22]:
import asyncio
import pandas as pd
async def selectIndexConstituents(isoInstantFrom: str, isoInstantTo: str, eligibleAssets: pd.DataFrame, cacheConn=None) -> pd.DataFrame:
    def getGeckoId(baseAsset: str):
        if (baseAsset == 'LUNA'):
            if isoInstantTo < '2022-05-31':
                return 'terra-luna'
            else:
                return 'terra-luna-2'
        elif (baseAsset == 'ANY'):
            return 'anyswap'
        elif (baseAsset == 'EPS'):
            return 'ellipsis'
        elif (baseAsset == 'NPXS'):
            return 'pundi-x'
        elif (baseAsset == 'BTT'):
            if isoInstantTo < '2022-01-17':
                return 'bittorrent-old'
            else:
                return 'bittorrent'
        elif (baseAsset == 'NANO'):
            return 'nano'
        elif (baseAsset == 'NU'):
            return 'nucypher'
        elif (baseAsset == 'KEEP'):
            return 'keep-network'
        elif (baseAsset == 'GXS'):
            return 'gxchain'
        elif (baseAsset in baseAssets2GeckoDict):
            return baseAssets2GeckoDict[baseAsset]
        else:
            return None
        
    indexConstituents = eligibleAssets.copy()
    eligibleAssets = None
    indexConstituents['geckoId'] = indexConstituents.apply(lambda row: getGeckoId(row['baseAsset']), axis = 1)
    geckoUnsupported = indexConstituents[indexConstituents['geckoId'].isna()]
    if geckoUnsupported.shape[0] > 0:
        print("WARNING: coingecko seems to not support the following eligible assets. Ignoring them")
        print(geckoUnsupported['baseAsset'])
        indexConstituents = indexConstituents[~indexConstituents['geckoId'].isna()]
    medians = []
    lastCircSupplies = []
    for geckoId in indexConstituents['geckoId']:
        print(geckoId)
        (medianDailyMarketCap, lastCircSupply) = (None, None)
        if cacheConn is not None:
            (medianDailyMarketCap, lastCircSupply) = getCachedData(geckoId, isoInstantFrom, isoInstantTo, cacheConn)
        if medianDailyMarketCap is None or lastCircSupply is None:
            await asyncio.sleep(4)
            (medianDailyMarketCap, lastCircSupply) = await computeMedianDailyMarketCapAndLastCirSupply(geckoId, isoInstantFrom, isoInstantTo)
            if (cacheConn is not None):
                insertCacheData(geckoId, isoInstantFrom, isoInstantTo, medianDailyMarketCap, lastCircSupply, cacheConn)
                
        medians.append(medianDailyMarketCap)
        lastCircSupplies.append(lastCircSupply)
    indexConstituents['medianMarketcap'] = medians
    indexConstituents['lastCircSupply'] = lastCircSupplies
    del medians
    del lastCircSupplies
    totalMarketCap = indexConstituents['medianMarketcap'].sum()
    indexConstituents.sort_values(['medianMarketcap'], ascending=[False], ignore_index=True, inplace=True)
    indexConstituents['fracTotalMarketCap'] = indexConstituents.apply(lambda row: row['medianMarketcap']/totalMarketCap, axis = 1)
    accumulatedPerc = []
    cumul = 0.0
    for _, row in indexConstituents.iterrows():
        cumul += row['fracTotalMarketCap']
        accumulatedPerc.append(cumul)
    indexConstituents.insert(indexConstituents.shape[1], 'accumulatedPerc', accumulatedPerc)
    del accumulatedPerc
    lastIndex = 0
    cutFrame = False
    for lastIndex, row in indexConstituents.iterrows():
        if row['fracTotalMarketCap'] < 0.005 and row['accumulatedPerc'] > 0.75:
            cutFrame = True
            break
    if cutFrame:
        indexConstituents = indexConstituents[:lastIndex]
    addWeights(indexConstituents)
    return indexConstituents        

In [23]:
import json
import pandas as pd
class IndexData:
    indexId: str
    _indexValue: float
    isoInstant: str
    isRebalanceDay: bool
    _divisor: float
    lastRebalanceISOInstant: str
    indexConstituents: pd.DataFrame
    
    class IndexDataEncoder(json.JSONEncoder):
        def default(self, obj):
            if isinstance(obj, IndexData):
                return {"indexId": obj.indexId, "isoInstant": obj.isoInstant, "indexValue": obj.indexValue, "isRebalanceDay": obj.isRebalanceDay, "divisor": obj.divisor, "lastRebalanceISOInstant": obj.lastRebalanceISOInstant, "indexConstituents": json.loads(obj.indexConstituents.to_json(orient='split', index=False))}
            return json.JSONEncoder.default(self, obj)
    
    def set_indexValue(self, value):
        self._indexValue = round(value, 2)
    
    def get_indexValue(self):
        return self._indexValue
    
    def del_indexValue(self):
        del self._indexValue
        
    indexValue = property(get_indexValue, set_indexValue, del_indexValue)
    
    def set_divisor(self, value):
        self._divisor = round(value, 4)
    
    def get_divisor(self):
        return self._divisor
    
    def del_divisor(self):
        del self._divisor
       
    divisor = property(get_divisor, set_divisor, del_divisor)
    
    def toJSON(self):
        return json.dumps(self, cls=IndexData.IndexDataEncoder)
    
    def __repr__(self):
        return self.toJSON()
    
    def __jsonDecode(dct):
        attrs = ["indexId", "isoInstant", "indexValue", "isRebalanceDay", "divisor", "lastRebalanceISOInstant", "indexConstituents"]
        for attr in attrs:
            if attr not in dct:
                return dct
        result = IndexData()
        result.indexId = dct['indexId']
        result.isoInstant = dct['isoInstant']
        result.indexValue = dct['indexValue']
        result.isRebalanceDay = dct['isRebalanceDay']
        result.divisor = dct['divisor']
        result.lastRebalanceISOInstant = dct['lastRebalanceISOInstant']
        result.indexConstituents = pd.read_json(json.dumps(dct['indexConstituents']), orient='split')
        return result
    
    def fromJSON(jsStr):
        return json.loads(jsStr, object_hook=IndexData.__jsonDecode)

In [24]:
from functools import partial
async def startIndex(isoInstantFrom: str, isoInstantTo: str, startValue: int, name: str, cacheConn=None, whitelistFunc=None, getBlacklistFunc=partial(getBlacklistSet, 'blacklist.txt')) -> IndexData:
    assert(startValue >= 1)
    eligibleAssets = await eligibleAssetsAndMedianVolumes(isoInstantFrom, isoInstantTo, getBlacklistFunc, whitelistFunc)
    indexConstituents = await selectIndexConstituents(isoInstantFrom, isoInstantTo, eligibleAssets, cacheConn)
    indexData = IndexData()
    indexData.indexId = name
    indexData.isoInstant = isoInstantTo
    indexData.isRebalanceDay = True
    indexData.lastRebalanceISOInstant = isoInstantTo
    indexData.indexValue = startValue
    dInitial = indexConstituents['lastMarketcap'].sum() / float(startValue)
    indexData.indexConstituents = indexConstituents
    indexData.divisor = dInitial
    return indexData

In [25]:
cacheConn = createGeckoCacheDataBase()

2.6.0


In [None]:
firstIndexData = await startIndex('2020-11-01T00:00:00.000000Z', '2020-12-01T00:00:00.000000Z', 1000, 'ONN-B-30dW-30dR', cacheConn, None, partial(getBlacklistSet, 'blacklist.txt'))

In [26]:
jsonstr = firstIndexData.toJSON()
print(jsonstr)

{"indexId": "ONN-B-30dW-30dR", "isoInstant": "2020-12-01T00:00:00.000000Z", "indexValue": 1000, "isRebalanceDay": true, "divisor": 501919066.7321, "lastRebalanceISOInstant": "2020-12-01T00:00:00.000000Z", "indexConstituents": {"columns": ["baseAsset", "volInUSDT", "lastAvgPriceUSDT", "geckoId", "medianMarketcap", "lastCircSupply", "fracTotalMarketCap", "accumulatedPerc", "lastMarketcap", "weight"], "data": [["BTC", 1799611405.9828906, 19695.4272907278, "bitcoin", 306147758877.22736, 18484272.8757997, 0.731884735, 0.731884735, 364055652447.28534, 0.7253274015], ["ETH", 865675388.9247065, 616.6642866534, "ethereum", 53055352961.21584, 113526863.21730871, 0.1268354963, 0.8587202313, 70007962121.89664, 0.1394805792], ["XRP", 125602872.32016027, 0.6642120677, "ripple", 12588025364.86216, 45334295892.0, 0.0300932584, 0.8888134897, 30111586410.965424, 0.059992912], ["LINK", 175712200.03103828, 14.2654541803, "chainlink", 5041331272.175043, 394509556.43444437, 0.0120519367, 0.9008654263, 56278

In [26]:
async def computeUpdatedPrices(indexData: IndexData, beginIsoInstant: str, endIsoInstant: str):
    baseAssetSet = set(indexData.indexConstituents['baseAsset'])
    dfNewPrices = await getMedianUSDTTradeVol(baseAssetSet, beginIsoInstant, endIsoInstant)
    dfNewPrices = dfNewPrices[dfNewPrices['baseAsset'].isin(baseAssetSet)]
    dfNewPrices.set_index('baseAsset', inplace=True)
    dfNewPrices.drop(columns=['volInUSDT'], inplace=True)
    dfNewPrices.rename(columns={'lastAvgPriceUSDT':'endAvgPriceUSDT'}, inplace=True)
    return dfNewPrices

In [27]:
import copy
async def lastIndexDataNoRebalance(beginIsoInstant: str, endIsoInstant: str, lastPoint: IndexData) -> IndexData:
    baseAssetSet = set(lastPoint.indexConstituents['baseAsset'])
    dfNewPrices = await computeUpdatedPrices(lastPoint, beginIsoInstant, endIsoInstant)
    dfLastCircSupplies = lastPoint.indexConstituents.copy()
    dfLastCircSupplies.set_index('baseAsset', inplace=True)
    if('endAvgPriceUSDT' in dfLastCircSupplies.columns):
        dfLastCircSupplies.drop(columns=['endAvgPriceUSDT'], inplace=True)
    #dfNew = pd.join([dfLastCircSupplies, dfNewPrices], axis=1, join='inner')
    dfNew = pd.merge(dfLastCircSupplies, dfNewPrices, how='right', on='baseAsset')
    dfNew.fillna(value=0, inplace=True)
    #assert(dfNew.shape[0] == dfLastCircSupplies.shape[0])
    assert(dfNew.shape[0] == dfNewPrices.shape[0])
    del dfNewPrices
    del dfLastCircSupplies
    dfNew.reset_index(inplace=True)
    dfNew.sort_values(['volInUSDT'], ascending=[False], ignore_index=True, inplace=True)
    dfNew['instantDummyMarketCap'] = dfNew['lastCircSupply']*dfNew['endAvgPriceUSDT']
    indexAtEnd = dfNew['instantDummyMarketCap'].sum() / lastPoint.divisor
    endIndex = copy.copy(lastPoint)
    endIndex.indexValue = indexAtEnd
    endIndex.isoInstant = endIsoInstant
    endIndex.isRebalanceDay = False
    dfNew.drop(columns=['instantDummyMarketCap'], inplace=True)
    endIndex.indexConstituents = dfNew
    return endIndex

In [28]:
async def lastIndexDataRebalance(beginIsoInstant: str, endIsoInstant:str, lastPoint: IndexData, cacheConn=None, whitelistFunc=None, getBlacklistFunc=partial(getBlacklistSet, 'blacklist.txt')) -> IndexData:
    updatedPricesOldConstituents = await computeUpdatedPrices(lastPoint, beginIsoInstant, endIsoInstant)
    dfLastCircSupplies = lastPoint.indexConstituents.copy()
    dfLastCircSupplies.set_index('baseAsset', inplace=True)
    if('endAvgPriceUSDT' in dfLastCircSupplies.columns):
        dfLastCircSupplies.drop(columns=['endAvgPriceUSDT'], inplace=True)
    dfNew = pd.concat([dfLastCircSupplies, updatedPricesOldConstituents], axis=1, join='inner')
    assert(dfNew.shape[0] == dfLastCircSupplies.shape[0])
    assert(dfNew.shape[0] == updatedPricesOldConstituents.shape[0])
    del updatedPricesOldConstituents
    del dfLastCircSupplies
    dfNew.reset_index(inplace=True)
    dfNew['instantDummyMarketCap'] = dfNew['lastCircSupply']*dfNew['endAvgPriceUSDT']
    oldConstituentsNewMarketcap = dfNew['instantDummyMarketCap'].sum()
    
    eligibleAssets = await eligibleAssetsAndMedianVolumes(lastPoint.lastRebalanceISOInstant, endIsoInstant, getBlacklistFunc, whitelistFunc)
    indexConstituents = await selectIndexConstituents(lastPoint.lastRebalanceISOInstant, endIsoInstant, eligibleAssets, cacheConn)
    newConstituentsMarketcap = indexConstituents['lastMarketcap'].sum()
    
    newDivisor = (newConstituentsMarketcap/oldConstituentsNewMarketcap)*lastPoint.divisor
    newValue = newConstituentsMarketcap/newDivisor

    endIndex = copy.copy(lastPoint)
    endIndex.indexValue = newValue
    endIndex.isoInstant = endIsoInstant
    endIndex.isRebalanceDay = True
    endIndex.indexConstituents = indexConstituents
    endIndex.divisor = newDivisor
    endIndex.lastRebalanceISOInstant = endIsoInstant
    return endIndex

In [48]:
import datetime
computeDay = parse(firstIndexData.isoInstant) + datetime.timedelta(days = 1)
endDate = parse("2022-06-30T00:00:00.000000Z") + datetime.timedelta(days = 1)
lastIndex = firstIndexData
fileName = f"{firstIndexData.indexId}_{firstIndexData.isoInstant[:10]}_{str(endDate)[:10]}.json"
with open(fileName, 'w') as f:
    print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
    f.write('[\n')
    f.write(lastIndex.toJSON())
    while computeDay <= endDate:
        if(computeDay.day == 1):
            lastIndex = await lastIndexDataRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex, cacheConn, None, partial(getBlacklistSet, 'blacklist.txt'))
        else:
            lastIndex = await lastIndexDataNoRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex)
        print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
        f.write(',\n')
        f.write(lastIndex.toJSON())
        computeDay = computeDay + datetime.timedelta(days = 1)
    f.write('\n]\n')

2020-12-01T00:00:00.000000Z: 1000.000000
2020-12-02T00:00:00.000000Z: 949.040000
2020-12-03T00:00:00.000000Z: 971.890000
2020-12-04T00:00:00.000000Z: 984.790000
2020-12-05T00:00:00.000000Z: 932.790000
2020-12-06T00:00:00.000000Z: 963.710000
2020-12-07T00:00:00.000000Z: 975.780000
2020-12-08T00:00:00.000000Z: 964.340000
2020-12-09T00:00:00.000000Z: 916.170000
2020-12-10T00:00:00.000000Z: 931.640000
2020-12-11T00:00:00.000000Z: 915.200000
2020-12-12T00:00:00.000000Z: 900.560000
2020-12-13T00:00:00.000000Z: 932.960000
2020-12-14T00:00:00.000000Z: 955.370000
2020-12-15T00:00:00.000000Z: 957.050000
2020-12-16T00:00:00.000000Z: 960.870000
2020-12-17T00:00:00.000000Z: 1056.540000
2020-12-18T00:00:00.000000Z: 1112.800000
2020-12-19T00:00:00.000000Z: 1128.720000
2020-12-20T00:00:00.000000Z: 1157.470000
2020-12-21T00:00:00.000000Z: 1137.750000
2020-12-22T00:00:00.000000Z: 1094.240000
2020-12-23T00:00:00.000000Z: 1138.330000
2020-12-24T00:00:00.000000Z: 1079.940000
2020-12-25T00:00:00.000000Z: 11

### Now computing the ONN-Alt index

In [42]:
import datetime
firstAltIndexData = await startIndex('2020-11-01T00:00:00.000000Z', '2020-12-01T00:00:00.000000Z', 1000, 'ONN-Alt-B-30dW-30dR', cacheConn, partial(getBlacklistSet, partial(getBlacklistSet, 'blacklist-alt.txt')))
computeDay = parse(firstAltIndexData.isoInstant) + datetime.timedelta(days = 1)
endDate = parse("2022-06-30T00:00:00.000000Z") + datetime.timedelta(days = 1)
lastIndex = firstAltIndexData
fileName = f"{firstAltIndexData.indexId}_{firstAltIndexData.isoInstant[:10]}_{str(endDate)[:10]}.json"
with open(fileName, 'w') as f:
    print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
    f.write('[\n')
    f.write(lastIndex.toJSON())
    while computeDay <= endDate:
        if(computeDay.day == 1):
            lastIndex = await lastIndexDataRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex, cacheConn, None, partial(getBlacklistSet, 'blacklist-alt.txt'))
        else:
            lastIndex = await lastIndexDataNoRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex)
        print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
        f.write(',\n')
        f.write(lastIndex.toJSON())
        computeDay = computeDay + datetime.timedelta(days = 1)
    f.write('\n]\n')

ethereum
chainlink
yearn-finance
litecoin
ripple
binancecoin
bitcoin-cash
uniswap
cardano
eos
polkadot
yfii-finance
aave
tron
swipe
sushi
vechain
curve-dao-token
monero
zcash
filecoin
tezos
ethereum-classic
neo
cosmos
stellar
reserve-rights-token
waves
dash
omisego
band-protocol
near
alpha-finance
havven
republic-protocol
civic
tellor
ocean-protocol
kava
theta-token
ontology
qtum
thorchain
algorand
solana
zilliqa
compound-governance-token
matic-network
dia-data
ankr
basic-attention-token
iostoken
bittorrent-old
iota
serum
2020-12-01T00:00:00.000000Z: 1000.000000
2020-12-02T00:00:00.000000Z: 938.540000
2020-12-03T00:00:00.000000Z: 964.350000
2020-12-04T00:00:00.000000Z: 981.630000
2020-12-05T00:00:00.000000Z: 896.070000
2020-12-06T00:00:00.000000Z: 942.470000
2020-12-07T00:00:00.000000Z: 955.830000
2020-12-08T00:00:00.000000Z: 940.690000
2020-12-09T00:00:00.000000Z: 878.010000
2020-12-10T00:00:00.000000Z: 906.050000
2020-12-11T00:00:00.000000Z: 882.580000
2020-12-12T00:00:00.000000Z: 85

## And now the ONN-Vapour Index

In [45]:
import datetime
firstVapIndexData = await startIndex('2020-11-01T00:00:00.000000Z', '2020-12-01T00:00:00.000000Z', 1000, 'ONN-Vap-B-30dW-30dR', cacheConn, partial(getBlacklistSet, 'blacklist-vapour.txt'))
computeDay = parse(firstVapIndexData.isoInstant) + datetime.timedelta(days = 1)
endDate = parse("2022-06-30T00:00:00.000000Z") + datetime.timedelta(days = 1)
lastIndex = firstVapIndexData
fileName = f"{firstVapIndexData.indexId}_{firstVapIndexData.isoInstant[:10]}_{str(endDate)[:10]}.json"
with open(fileName, 'w') as f:
    print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
    f.write('[\n')
    f.write(lastIndex.toJSON())
    while computeDay <= endDate:
        if(computeDay.day == 1):
            lastIndex = await lastIndexDataRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex, cacheConn, None, partial(getBlacklistSet, 'blacklist-vapour.txt'))
        else:
            lastIndex = await lastIndexDataNoRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex)
        print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
        f.write(',\n')
        f.write(lastIndex.toJSON())
        computeDay = computeDay + datetime.timedelta(days = 1)
    f.write('\n]\n')

55    BZRX
94    EASY
Name: baseAsset, dtype: object
chainlink
yearn-finance
litecoin
ripple
binancecoin
bitcoin-cash
uniswap
cardano
eos
polkadot
yfii-finance
aave
tron
swipe
sushi
vechain
curve-dao-token
monero
zcash
filecoin
tezos
ethereum-classic
neo
cosmos
stellar
reserve-rights-token
waves
dash
omisego
band-protocol
near
alpha-finance
havven
republic-protocol
civic
tellor
ocean-protocol
kava
theta-token
ontology
qtum
thorchain
algorand
solana
zilliqa
compound-governance-token
matic-network
dia-data
ankr
basic-attention-token
iostoken
bittorrent-old
iota
serum
venus
district0x
certik
kyber-network-crystal
fantom
nem
elrond-erd-2
kusama
coti
loopring
0x
balancer
flamingo-finance
injective-protocol
bella-protocol
icon
fetch-ai
dogecoin
lto-network
tomochain
arpa-chain
decentraland
avalanche-2
enjincoin
iexec-rlc
maker
harmony
the-sandbox
origin-protocol
aragon
wing-finance
blockstack
wrapped-nxm
storj
ravencoin
audius
terra-luna
digibyte
bluzelle
bancor
utrust
orchid-protocol
hedera

### Testing categories

In [29]:
categories = ['ethereum-ecosystem', 
              'binance-smart-chain', 
              'polygon-ecosystem', 
              'avalanche-ecosystem', 
              'moonriver-ecosystem', 
              'centralized-exchange-token-cex',
              'decentralized-finance-defi',
              'non-fungible-tokens-nft',
              'meme-token',
              'gaming',
              'Other'
             ]

In [32]:
categories = ['smart-contract-platform',
              'centralized-exchange-token-cex',
              'decentralized-finance-defi',
              'non-fungible-tokens-nft',
              'meme-token',
              'gaming',
              'Other']

In [33]:
async def getCategoryInfo(categories):
    result = {}
    for cat in categories:
        page = 1
        done = False
        perPage = 250
        result[cat] = {}
        result[cat]['totalMarketCap'] = 0
        result[cat]['tickers'] = {}
        print(f"Getting info for {cat}")
        while not done:
            query_params = {'vs_currency': 'usd', 'category': cat, 'order': 'market_cap_desc', 'per_page': 250, 'page': page, 'sparkline': 'false'}
            print(f"Page {page}. Pausing...")
            await asyncio.sleep(4)
            async with httpx.AsyncClient() as client:
                r = await client.get(geckoBaseURL + f"/coins/markets", params=query_params, timeout=30)
            catInfo = r.json()
            if len(catInfo) == 0:
                done = True
            else:
                for pt in catInfo:
                    marketCap = None
                    try:
                        if 'market_cap' in pt and pt['market_cap'] is not None:
                            marketCap = float(pt['market_cap'])
                    except ValueError:
                        marketCap = None
                    if not marketCap or marketCap < 1000000:
                        done = True
                    else:
                        result[cat]['tickers'][pt['id']] = pt
                        result[cat]['totalMarketCap'] += pt['market_cap']
                page += 1
    return result    

In [34]:
catInfo = await getCategoryInfo(categories)

Getting info for smart-contract-platform
Page 1. Pausing...
Page 2. Pausing...
Page 3. Pausing...
Getting info for centralized-exchange-token-cex
Page 1. Pausing...
Getting info for decentralized-finance-defi
Page 1. Pausing...
Page 2. Pausing...
Page 3. Pausing...
Page 4. Pausing...
Page 5. Pausing...
Page 6. Pausing...
Getting info for non-fungible-tokens-nft
Page 1. Pausing...
Page 2. Pausing...
Page 3. Pausing...
Page 4. Pausing...
Getting info for meme-token
Page 1. Pausing...
Getting info for gaming
Page 1. Pausing...
Page 2. Pausing...
Page 3. Pausing...
Getting info for Other
Page 1. Pausing...


In [35]:
def getDominantCategoryFor(geckoId):
    maxParticipation = 0.0
    result = 'Other'
    for catName in catInfo:
        blacklist = None
        with open(f"blacklist-{catName}.txt", 'r') as f:
            blacklist = f.read().split(',')
        #print(f"blacklist is: {blacklist}")
        cat = catInfo[catName]
        if geckoId in cat['tickers']:
            assetSymbol = cat['tickers'][geckoId]['symbol'].lower()
            #print(f"assetSymbol is {assetSymbol}")
            if not assetSymbol in blacklist:
                thisParticipation = cat['tickers'][geckoId]['market_cap']/cat['totalMarketCap']
                if thisParticipation > maxParticipation:
                    maxParticipation = thisParticipation
                    result = catName
    return result

In [36]:
getDominantCategoryFor('matic-network')

'smart-contract-platform'

In [37]:
categWhiteLists = {}
for baseAsset in baseAssets2GeckoDict:
    geckoId = baseAssets2GeckoDict[baseAsset]
    dominantCateg = getDominantCategoryFor(geckoId)
    if dominantCateg in categWhiteLists:
        categWhiteLists[dominantCateg].append(baseAsset.lower())
    else:
        categWhiteLists[dominantCateg] = [baseAsset.lower()]
categWhiteLists['Other'] += ['luna','any','eps','npxs','btt','nano','nu','keep','gxs']
categWhiteLists

{'Other': ['busd',
  'usdt',
  'btc',
  'usdc',
  'beth',
  'wbtc',
  'xrp',
  'op',
  'tusd',
  'atom',
  'usdp',
  'fil',
  'ltc',
  'dai',
  'bch',
  'bidr',
  'xmr',
  'zec',
  'paxg',
  'lunc',
  'jasmy',
  'bat',
  'people',
  'dash',
  'twt',
  'storj',
  'lever',
  'celo',
  'burger',
  'klay',
  'rvn',
  'drep',
  'pundix',
  'celr',
  'ar',
  'btcst',
  'sxp',
  'ustc',
  'gtc',
  'mask',
  'glmr',
  'ocean',
  'ctk',
  'skl',
  'iotx',
  'hnt',
  'multi',
  'lit',
  'ata',
  'mtl',
  'win',
  'arpa',
  'iost',
  'dar',
  'sys',
  'qnt',
  'xec',
  'psg',
  'sfp',
  'rif',
  'mina',
  'dent',
  'ksm',
  'ankr',
  'rlc',
  'lpt',
  'ethdown',
  't',
  'vgx',
  'zen',
  'alpine',
  'utk',
  'firo',
  'mith',
  'pond',
  'bico',
  'nkn',
  'juv',
  'voxel',
  'xem',
  'bsw',
  'mob',
  'hive',
  'bnx',
  'ant',
  'clv',
  'porto',
  'acm',
  'sun',
  'reef',
  'beta',
  'aca',
  'cvc',
  'oxt',
  'high',
  'ooki',
  'idrt',
  'ethup',
  'forth',
  'nexo',
  'wtc',
  'ach',
  'sa

In [38]:
def categWhiteList(categ):
    return set(categWhiteLists[categ])

In [39]:
import datetime
for categ in categories:
    whiteListFunc = partial(categWhiteList, categ)
    print(f"Doing index for category {categ}")
    firstCategIndexData = await startIndex('2020-11-01T00:00:00.000000Z', '2020-12-01T00:00:00.000000Z', 1000, f"ONN-({categ})-B-30dW-30dR", cacheConn, whiteListFunc, partial(getBlacklistSet, f"blacklist-{categ}.txt"))
    computeDay = parse(firstCategIndexData.isoInstant) + datetime.timedelta(days = 1)
    endDate = parse("2022-06-30T00:00:00.000000Z") + datetime.timedelta(days = 1)
    lastIndex = firstCategIndexData
    fileName = f"{firstCategIndexData.indexId}_{firstCategIndexData.isoInstant[:10]}_{str(endDate)[:10]}.json"
    with open(fileName, 'w') as f:
        print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
        f.write('[\n')
        f.write(lastIndex.toJSON())
        while computeDay <= endDate:
            if(computeDay.day == 1):
                lastIndex = await lastIndexDataRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex, cacheConn, whiteListFunc, partial(getBlacklistSet, f"blacklist-{categ}.txt"))
            else:
                lastIndex = await lastIndexDataNoRebalance(datetime.datetime.strftime(computeDay - datetime.timedelta(days = 1), "%Y-%m-%dT%H:%M:%S.%fZ"), datetime.datetime.strftime(computeDay, "%Y-%m-%dT%H:%M:%S.%fZ"), lastIndex)
            print("%s: %f"%(lastIndex.isoInstant, lastIndex.indexValue))
            f.write(',\n')
            f.write(lastIndex.toJSON())
            computeDay = computeDay + datetime.timedelta(days = 1)
        f.write('\n]\n')    centralized-exchange-token-cex-Copy1

Doing index for category smart-contract-platform
blackListSet
{'pax', 'gbp', 'busd', 'frax', 'tusd', 'dai', 'usdd', 'mim', 'usdp', 'mimatic', 'ust', 'aud', 'eur', 'usdc', 'usdn', 'usdt'}
whitelist is present
ethereum
cardano
eos
polkadot
tron
vechain
tezos
ethereum-classic
neo
stellar
waves
omisego
near
ontology
qtum
algorand
solana
zilliqa
matic-network
iota
   baseAsset     volInUSDT  lastAvgPriceUSDT           geckoId  \
0        ETH  8.656754e+08        616.664287          ethereum   
1        DOT  3.833155e+07          5.376746          polkadot   
2        ADA  5.294544e+07          0.172060           cardano   
3        EOS  4.550082e+07          3.261507               eos   
4        TRX  3.500758e+07          0.032354              tron   
5        XLM  1.469151e+07          0.202894           stellar   
6        XTZ  1.653046e+07          2.491238             tezos   
7        NEO  1.498634e+07         18.576523               neo   
8        VET  2.411100e+07          0.015970