# A Brief Collection Analysis and Monte Carlo Projection for Top NFT Collections


## Importing the Libraries

In [None]:
import pandas as pd
import os
# os.getenv
from dotenv import load_dotenv
import hvplot.pandas
import requests
from utils import *

In [None]:
load_dotenv()

rarify_api_key = os.getenv("RARIFY_API_KEY")
display(type(rarify_api_key))

# Descriptive Data Analysis

# Part 1

## Fetching from the Rarify API
* We get the data for our nft collections from the rarify API
* We are targeting the collections data endpoint which is the following: "https://api.rarify.tech/data/contracts/{network_id}:{contract_id}/insights/90d"
* We supply the network_id as the blockchain which is Ethereum in our case
* In the first instance we will target the crypto punks collection by supplying it's contract_id to check that our authentication and fetch method works


In [None]:
network_id = "ethereum"
# Crypto Punks
contract_id = "b47e3cd837ddf8e4c57f05d70ab865de6e193bbb"

collections_baseurl = f"https://api.rarify.tech/data/contracts/{network_id}:{contract_id}/insights/90d"

# Use the following code to target a specific token in the collection
token_id = 9620
token_baseurl = f"https://api.rarify.tech/data/tokens/{network_id}:{contract_id}:{token_id}"



In [None]:
def fetch_rarify_data(url, key):
    """
    The following function is our base fetch for the collection data using our authorization key stored in the environment
    variables as well as the url that we supply to the function
    The url must be supplied with a valid network_id, contract_id, and token_id
    The function returns the sale_history_data for our targeted collection at the 'history' endpoint
    """
    sale_history_data = requests.get(
        url,
        headers={"Authorization": f"Bearer {key}"}
    ).json()
    return sale_history_data['included'][1]['attributes']['history']

## Example Data Object
* We instantiate the punks_return object as a fetch at our api endpoint
* We turn the return into a DataFrame
* We set the 'time' column to a datetime type object
* We set the index of our data to the 'time' column

In [None]:
punks_return = fetch_rarify_data(collections_baseurl, rarify_api_key)
punks_df = pd.DataFrame(punks_return)
punks_df['time'] = pd.to_datetime(punks_df['time'], infer_datetime_format=True)
punks_df = punks_df.set_index('time')

punks_df.head()

## Type Conversion
* Our numeric data is returned as strings so we must process it
* We use a dict to convert the types of each numeric column to a float type using the df.astype() method

In [None]:
convert_dict = {'avg_price': float,
                'max_price': float,
                'min_price': float,
                'trades': float,
                'unique_buyers': float,
                'volume': float,
               }  
  
punks_df = punks_df.astype(convert_dict)  

## Optional Factoring
* We multiply the numeric data that is given to us in gwei by a factor of 10^-18 to convert it to eth prices

In [None]:
punks_df[['avg_price', 'max_price', 'min_price', 'volume']] = punks_df[['avg_price', 'max_price', 'min_price', 'volume']] * 10**-18

In [None]:
punks_df.head()

## Preliminary Analysis


### Terms
* Collateral Discount Factor: A percentage that the collateral's value must be discounted in order to ensure a safe return for the lender should the borrower default on his loan. This value differs by the type of asset and is somewhat arbitrary but is based largely on expert appraisal (ie. a car used as collateral may grant the borrower a loan of 50% of the appraised value of the car. The collateral discount factor would be 50%). 

* Collateral Coverage Ratio (CCR): The discounted value of the collateralized asset over the value of a loan that a borrower is looking to receive. A higher CCR (over 1.0) indicates sufficient collateral which will cover the value of the loan at the discounted value of the collateral. (ie. John would like a loan of 10,000 and puts his car, worth 25,000 up as collateral. If a 50% collateral discount factor is applied to John's car, the resulting CCR is 1.225. This would be a safe loan for the lender because he could easily cover his costs, and profit, should the borrower default).


In [None]:
# Standard deviation for the minimum price of the Punks collection
punks_df['min_price'].std()

In [None]:
# Display the minimum, average and maximum price for the collection along time
punks_df[['min_price', 'avg_price', 'max_price']].hvplot()

In [None]:
# This plot is hard to read so we will just plot the average price along with the mean of the average price
punks_df['mean_avg'] = punks_df['avg_price'].mean()
punks_df[['avg_price', 'mean_avg']].hvplot()

In [None]:
# We take a look at the min_price
punks_df["mean_min"] = punks_df['min_price'].mean()
punks_df[['mean_min', 'min_price']].hvplot()


In [None]:
punks_df['min_price'].rolling(window=10).std().hvplot(title="min_price standard deviation rolling window=10 days")

Based on the plot it looks like there is little if any trend in the data from the start to the end of the previous 90 days. This may actually be a good signal as it shows that items from the collection may provide stable collateral. For the stability and value of this collection, we would apply a relatively low collateral discount factor for this asset based on its performance and its projected performance overtime. However, the standard deviation of the asset is quite high, this is largely due to the illiquity of NFTs and the relatively few sales that occur on a given day. NFTs in general should be granted a relatively high collateral discount factor compared to other asset classses.

# Part 2

# Analyzing a Series of Collections
* First we aggregate a series of reputable collections from opensea and their contract addresses
* I selected the following collections, but any number of collections would work for analysis:
* *Bored Ape Yacht Club*, *Crypto Punks*, *Clone X*, *Doodles*, *NeoTokyo*, and *Mfers*
* These are all some of the highest performers on OpenSea

In [None]:
# list of collection addresses: 
# bape: 0xBC4CA0EdA7647A8aB7C2061c2E118A18a936f13D
# punks: b47e3cd837ddf8e4c57f05d70ab865de6e193bbb
# clone x: 0x49cF6f5d44E70224e2E23fDcdd2C053F30aDA28B
# doodles: 0x8a90CAb2b38dba80c64b7734e58Ee1dB38B8992e
# neotokyo: 0xb668beB1Fa440F6cF2Da0399f8C28caB993Bdd65
# mfers: 0x79FCDEF22feeD20eDDacbB2587640e45491b757f

def get_collections_data(contract_ids: dict, rarify_api_key: str):
    """
    *The following function is quite messy and I will clean it up at a later time but it will work for now.*
    This function aggregates the data from a selection of NFT collections into a double-layered DataFrame which can be used to run a Monte Carlo simulation
    
    :param contract_ids: (type: dict) Houses the contract addresses and the collection names
    :param rarify_api_key: (type: str) Your authentication key from the rarify API
    
    The function iterates through the dictionary of addresses that you supply to it and makes an API call for each address.
    It then takes the relevant data and turns it into a DataFrame object.
    We then preprocess the data like we did before, formatting and setting the index as the 'time' column,
    and converting the string numbers to integers using the df.astype() method. We also convert the prices to eth from gwei using a factor. 
    We then append the most recently constructed dataframe to the list that we instantiated at the top of the function
    
    :returns: A concatenation of all the DataFrames that are present in the DataFrame list that we constructed.


    *There is obviously much more elegant way to conduct this process so let me know if you have a cleaner way of doing this*

    """
    df_list = []
    network_id = "ethereum"
    convert_dict = {
                    'avg_price': float,
                    'max_price': float,
                    'min_price': float,
                    'trades': float,
                    'unique_buyers': float,
                    'volume': float,
                   }  
    for address in contract_ids.values():
        contract_id = address
        collections_baseurl = f"https://api.rarify.tech/data/contracts/{network_id}:{contract_id}/insights/90d"
        curr_df = pd.DataFrame(fetch_rarify_data(collections_baseurl, rarify_api_key))
        curr_df['time'] = pd.to_datetime(curr_df['time'], infer_datetime_format=True)
        curr_df = curr_df.set_index('time')
        curr_df = curr_df.astype(convert_dict)
        curr_df[['avg_price', 'max_price', 'min_price', 'volume']] = curr_df[['avg_price', 'max_price', 'min_price', 'volume']] * 10**-18
        df_list.append(curr_df)
    sum_df = pd.concat(df_list, axis=1, keys=contract_ids.keys())
    return sum_df



In [None]:
# I might use these as functions inside the main function at some point but I will have to restructure the framework
# So for now I will set these functions aside here
def set_time_index(df):
    df['time'] = pd.to_datetime(df['time'], infer_datetime_format=True)
    df = df.set_index('time')
    return df

def convert_str_int(df):
    convert_dict = {'avg_price': float,
                'max_price': float,
                'min_price': float,
                'trades': float,
                'unique_buyers': float,
                'volume': float,
               }  
    df = df.astype(convert_dict) 
    return df

In [None]:
# The collections that we will take a look at with their contract addresses
contract_ids = {
                "bape": "0xBC4CA0EdA7647A8aB7C2061c2E118A18a936f13D", 
                "punks": "b47e3cd837ddf8e4c57f05d70ab865de6e193bbb", 
                "clonex": "0x49cF6f5d44E70224e2E23fDcdd2C053F30aDA28B",
                "doodles": "0x8a90CAb2b38dba80c64b7734e58Ee1dB38B8992e",
                "neotokyo": "0xb668beB1Fa440F6cF2Da0399f8C28caB993Bdd65",
                "mfers": "0x79FCDEF22feeD20eDDacbB2587640e45491b757f",
}

# Store the resulting concatenated DataFrame in a sum_df object

sum_df = get_collections_data(contract_ids, rarify_api_key)

In [None]:
sum_df.head()

## More preprocessing
* In order to do anything very meaningful with the data it is helpful to rename the columns
* We will rename the columns with the prefix "key_" added to each category

In [None]:
cols = ["avg_price", "max_price", "min_price", "trades", "unique_buyers", "volume"]
new_cols = []
for key in contract_ids.keys():
    for c in cols:
        new_cols.append(f"{key}_{c}")
        
new_cols

In [None]:
"""
I create a new object of the sum_df with the new columns applied to it. 
I want to leave sum_df the way it is because I will use it for the Monte Carlo simulation later.
"""

In [None]:
collection_df = sum_df.copy()
collection_df.columns = new_cols

In [None]:
collection_df.head()

The following is the rolling 30 days standard deviation for each of the collections average price normalized by the average price.

In [None]:
rolling_30_std = collection_df[["bape_avg_price", "clonex_avg_price", "punks_avg_price", "neotokyo_avg_price", "doodles_avg_price", "mfers_avg_price"]].rolling(window=30).std() / collection_df[["bape_avg_price", "clonex_avg_price", "punks_avg_price", "neotokyo_avg_price", "doodles_avg_price", "mfers_avg_price"]] 
rolling_30_std.describe()

We see that the doodles and mfers have the highest normalized standard deviations and clonex has the lowest. If we were evaluating a loan based solely on std we would apply the greatest collateral discount factor to mfers and doodles 

# A Monte Carlo Projection for our selected collections

### This projection takes the previous 90 days of data and predicts the next 30 days of returns if we held a basket of these NFTs

In [None]:
from MCForecastTools import MCSimulation

In [None]:
# simulation set to iterate 100 times over the next 30 trading days
# we leave the default weights which will be 1/6 per collection

# in the MCForecastTools.py file the 'close' column was changed to 'avg_price' to fit our data
sim = MCSimulation(sum_df, num_simulation=100, num_trading_days=30)

In [None]:
sim.portfolio_data.head()

In [None]:
display(f"bape: {sim.portfolio_data['bape']['daily_return'].mean()}")
display(f"punks: {sim.portfolio_data['punks']['daily_return'].mean()}")
display(f"clonex: {sim.portfolio_data['clonex']['daily_return'].mean()}")
display(f"doodles: {sim.portfolio_data['doodles']['daily_return'].mean()}")
display(f"neotokyo: {sim.portfolio_data['neotokyo']['daily_return'].mean()}")
display(f"mfers: {sim.portfolio_data['mfers']['daily_return'].mean()}")

We see that these collections have all performed strongly over the last ninety days, each with a positive average daily return

In [None]:
cum_return = sim.calc_cumulative_return()

In [None]:
cum_return.hvplot()

In [None]:
cum_return

In [None]:
cum_return.to_csv('mc_cum_return.csv')

In [None]:
pd.read_csv('mc_cum_return.csv')

In [None]:
cum_return.describe()

In [None]:
cum_return.mean().mean()

Based on the forecasted returns for this basket of NFTs it would be a good selection candidate for collateralization

# Beta Analysis For NFT versus Basket

* What we want to do here is find the relative risk for each asset in the basket versus the basket as a whole.
* For instance, we will compare Crypto Punks, etc. to the 6 NFT collection that we selected using Beta.


In [None]:
collection_df = collection_df.drop("bape_pct_chg", axis=1)

In [None]:
def find_pct_change(df, contract_ids):
    coll_names = []
    counter = 0
    for k in contract_ids.keys():
        coll_names.append(k)
    for col in df.columns:
        if "avg_price" in col:
            df[f"{coll_names[counter]}_pct_chg"] = df[col].pct_change()
            counter += 1
    return df
            

In [None]:
find_pct_change(collection_df, contract_ids)

In [None]:
def basket_pct_chg(df, contract_ids):
    coll_names = []
    pct_chg_lst = []
    for k in contract_ids.keys():
        coll_names.append(k)
    for col in df.columns:
        if "pct_chg" in col:
            pct_chg_lst.append(col)
    basket_df = df[pct_chg_lst]
    return basket_df.dropna()
            
        

In [None]:
basket_df = basket_pct_chg(collection_df, contract_ids)
basket_df["basket_pct_chg"] = basket_df[basket_df.columns].mean(axis=1)

basket_df

In [None]:

bape_beta = basket_df["bape_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()
punks_beta = basket_df["punks_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()
neo_beta = basket_df["neotokyo_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()
clonex_beta = basket_df["clonex_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()
doodles_beta = basket_df["doodles_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()
mfers_beta = basket_df["mfers_pct_chg"].cov(basket_df["basket_pct_chg"]) / basket_df["basket_pct_chg"].var()


In [None]:
betas_list = [bape_beta, punks_beta, neo_beta, clonex_beta, doodles_beta, mfers_beta]
betas = pd.DataFrame([bape_beta, punks_beta, neo_beta, clonex_beta, doodles_beta, mfers_beta], index=contract_ids.keys())
betas.hvplot.bar()

## Conclusions

From the beta analysis data, we would be more inclined to use the collections with the lowest beta values as collateral because they indicate greater security against the market. In this case, Crypto Punks and Doodles serve as the best candidates for collateralization and borrowers would be rewarded with a potentially lower collateral discount factor.

# Epilogue
## Foreshadowing a Collateral Discount Curve Based on Beta Values

Let's say that we want to appraise how much of a collateral discount factor to apply to a token/collection based on its beta value. In terms of Beta values, a higher one should correspond to a higher discount factor applied to the asset. If an asset is more risky, we can only safely provide a smaller loan. For collateral assets the holy grail is high stability and is even better if it is highly stable appreciation (ie. real estate). If a borrower defaults on a loan it is reassuring to know that the asset used as collateral has either the same value or a higher value than when we received it.


In [None]:
def discount_factor(betas):
    discount_factors = []
    for beta in betas:
        disc_factor = 1 - (1/(beta + 1.5)) + .1697
        discount_factors.append(disc_factor)
    return discount_factors

In [None]:
discount_factors_list = discount_factor(betas_list)

In [None]:
discount_factors = pd.DataFrame(discount_factors_list, index=contract_ids.keys())
discount_factors.hvplot.bar()

In [None]:
discount_factors

## Let's Say
I'm a user with a Doodles NFT that is worth 15eth and I am looking for a loan. Based on the discount curve what kind of a loan could I expect to receive for my NFT?

In [None]:
def find_loan_value(collection: str, value: float, discount_factors):
    loan_value = value - value * discount_factors[0][collection]
    return loan_value

In [None]:
loan_value = find_loan_value("doodles", 15, discount_factors)

In [None]:
print(f"The loan calculator has determined that you are eligible to receive {loan_value: .2f}eth for your NFT")