# Ethereum NFT Analysis: quicker Entropy Analysis

In this notebook, I was concerned with the fact that calculating entropy for the NFT collections was taking too much time. 

Hence the **objective** here is to find a way of calculating the same entropies that takes less time.

The actual notebook of the author is [here](https://www.kaggle.com/code/simiotic/ethereum-nft-analysis).

The actual data author is [here](https://www.kaggle.com/datasets/simiotic/ethereum-nfts).

Link to my kaggle notebook is [here](https://www.kaggle.com/code/sbrar0804/ethereum-nft-analysis-quicker-entropy-calc/notebook).

In [1]:
!pip install nfts

Collecting nfts
  Downloading nfts-0.0.2-py3-none-any.whl (16 kB)
Collecting web3
  Downloading web3-5.30.0-py3-none-any.whl (501 kB)
[K     |████████████████████████████████| 501 kB 2.1 MB/s 
[?25hCollecting moonstreamdb
  Downloading moonstreamdb-0.3.2-py3-none-any.whl (7.4 kB)
Collecting humbug
  Downloading humbug-0.2.7-py3-none-any.whl (11 kB)
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[K     |████████████████████████████████| 3.0 MB 59.2 MB/s 
Collecting eth-account<0.6.0,>=0.5.7
  Downloading eth_account-0.5.9-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101 kB 7.4 MB/s 
Collecting hexbytes<1.0.0,>=0.1.0
  Downloading hexbytes-0.3.0-py3-none-any.whl (6.4 kB)
Collecting eth-rlp<0.3
  Downloading eth_rlp-0.2.1-py3-none-any.whl (5.0 kB)
Collecting eth-hash[pycryptodome]<1.0.0,>=0.2.0
  Downloading eth_hash-0.5.0-py3-none-any.whl (8.9 kB)
Collecting 

In [2]:
import os
import sqlite3

import matplotlib.pyplot as plt
import nfts.dataset
import numpy as np
import pandas as pd
from scipy.optimize import curve_fit
from scipy.special import zeta

In [3]:
os.listdir("/kaggle/input/ethereum-nfts")
DATASET_PATH = "/kaggle/input/ethereum-nfts/nfts.sqlite"
ds = nfts.dataset.FromSQLite(DATASET_PATH)

In [4]:
current_owners_df = ds.load_dataframe("current_owners")
current_owners_df.head()

Unnamed: 0,nft_address,token_id,owner
0,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0,0xb776cAb26B9e6Be821842DC0cc0e8217489a4581
1,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,1,0x8A73024B39A4477a5Dc43fD6360e446851AD1D28
2,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,10,0x5e5C817E9264B46cBBB980198684Ad9d14f3e0B4
3,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,11,0x8376f63c13b99D3eedfA51ddd77Ff375279B3Ba0
4,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,12,0xb5e34552F32BA9226C987769BF6555a538510BA8


### The shapes of NFT collections

NFTs are released in collections, with a single contract accounting for multiple tokens.

Are there differences between ownership distributions of NFTs like the [Ethereum Name Service (ENS)](https://ens.domains/), which have utility beyond their artistic value, and those that do not currently have such use cases?

One way we can answer this question is to see how much information each NFT collection gives us about individual owners of tokens in that collection. We will do this by treating each collection as a probability distribution over owners of tokens from that collection. If the collection $C$ consists of $n$ tokens and an address $A$ owns $m$ of those tokens, we will assign that address a probability of $p_A = m/n$ in the collection's associated probability distribution. Then we will calculate the entropy:

$$H(C) = - \sum_{A} p_A \log(p_A).$$
Here, the sum is over all addresses $A$ that own at least one token from $C$.

$H(C)$ simultaneously contains information about:
1. How many tokens were issued as part of the collection $C$.
2. How evenly the tokens in $C$ are distributed over the addresses $A$ which own those tokens.


In [5]:
contract_owners_df = current_owners_df.groupby(["nft_address", "owner"], as_index=False).size().rename(columns={"size": "num_tokens"})
contract_owners_df.head()

Unnamed: 0,nft_address,owner,num_tokens
0,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0x429a635eD4DaF9529C07d5406D466B349EC34361,3
1,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0x5e5C817E9264B46cBBB980198684Ad9d14f3e0B4,5
2,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0x8376f63c13b99D3eedfA51ddd77Ff375279B3Ba0,1
3,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0x83D7Da9E572C5ad14caAe36771022C43AF084dbF,5
4,0x00000000000b7F8E8E8Ad148f9d53303Bfe20796,0x8A73024B39A4477a5Dc43fD6360e446851AD1D28,5


In [6]:
contract_owners_groups = contract_owners_df.groupby(["nft_address"])
entropies = {}

## ZOMGLINGS way of calculating entropy

In [7]:
%%timeit
for contract_address, owners_group in contract_owners_groups:
    total_supply = owners_group["num_tokens"].sum()
    owners_group["p"] = owners_group["num_tokens"]/total_supply
    owners_group["log(p)"] = np.log2(owners_group["p"])
    owners_group["-plog(p)"] = (-1) * owners_group["p"] * owners_group["log(p)"]
    entropy = owners_group["-plog(p)"].sum()
    entropies[contract_address] = entropy

23.8 s ± 63.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## My way of calculating entropies

In [8]:
# function to calulate entropy of a group
def group_ops(group):
    total_supply = group["num_tokens"].sum()
    p = group["num_tokens"]/total_supply
    log_p = np.log2(p)
    plog_p = -p*log_p
    return plog_p.sum()

In [9]:
%%timeit
entropies2 = contract_owners_groups.apply(group_ops)

5.94 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


For some reason running this does not assigns the entropies to the variable. I don't know why

Now to find out if these two variables have the same values or not

In [10]:
entropies2 = contract_owners_groups.apply(group_ops)

In [11]:
(entropies2 == pd.Series(entropies)).all()

True

Hence, they do have the same entropies. 