# Explore the item to item recommendation for NFT

## Item Collaborative Filtering (CF)

We calculate each NFT pair scores by their common buyers. As the first step, we only take the purchase transaction into account but ignore everything else, such as price, date and so on. We can consider them later.

Thus, the algorithm can be summarized as follows. For item $i$ and $j$, let's denote the number of their common buyers as $N_{ij}$, and the number of buyers for item $i$ as $N_i$, and for item $j$ as $N_j$. Then, the item CF score for items $i$ and $j$ can be calculated by

$$CF(i, j) = \frac{N_{ij}}{\sqrt{N_i} \cdot \sqrt{N_j}}$$

which is essentially the cosine value between two vectors, the user purchase vector for item $i$ and for item $j$.

In [1]:
import pandas as pd
import numpy as np
from scipy.sparse import csc_matrix
import multiprocess


In [2]:
# get the binary matrix
purchase = pd.read_csv('/Users/wenyunyang/dev/nft-recommendation/data_analysis/data/purchases.csv')

buyers = purchase['buyer'].unique()
nft_id = purchase['nft_id'].unique()
print('number of buyers', len(buyers))
print('number of nft', len(nft_id))

full_buyers_list = buyers
full_nft_list = nft_id

full_buyers_index = dict(zip(full_buyers_list, range(len(full_buyers_list))))
full_nft_index = dict(zip(full_nft_list, range(len(full_nft_list))))

n_buyer = len(full_buyers_list)
n_nft = len(full_nft_list)

vals, rows, cols = [], [], []
for _, row in purchase.iterrows():
    bi = full_buyers_index[row['buyer']]
    ni = full_nft_index[row['nft_id']]
    v = 1
    vals.append(v)
    rows.append(bi)
    cols.append(ni)

full_u2i_matrix = csc_matrix((vals, (rows, cols)), shape=(n_buyer, n_nft))
full_u2i_norm = np.sqrt(full_u2i_matrix.sum(axis=0))

number of buyers 686727
number of nft 5727563


In [3]:
# filter the nft if there is only one buyer

single_buyer_nft_index = np.where(full_u2i_norm == 1)[1]
print('number of single buyer nft = %d' % len(single_buyer_nft_index))

keep_nft_index = np.where(full_u2i_norm > 1)[1]
print('number of non-single buyer nft = %d' % len(keep_nft_index))
u2i_matrix = full_u2i_matrix[:, keep_nft_index]
print('u2i_matrix shape = ', u2i_matrix.shape)

buyers_list = full_buyers_list
nft_list = full_nft_list[keep_nft_index]

buyers_index = full_buyers_index
nft_index = dict(zip(nft_list, range(len(nft_list))))

u2i_norm = full_u2i_norm[0, keep_nft_index]

number of single buyer nft = 4069196
number of non-single buyer nft = 1658367
u2i_matrix shape =  (686727, 1658367)


In [4]:
# compute top 10 similar nft for each nft

def get_top_k(nft_id, k=20):
    i_vec = u2i_matrix[:, nft_index[nft_id]]

    ip = i_vec.multiply(u2i_matrix).sum(axis=0)

    score = ip / u2i_norm

    sorted_index = np.argsort(-score)

    similar_tokens = [
        nft_list[idx]
        for idx in sorted_index[0, 1:k+1].tolist()[0]]
    
    top_scores = score[0, sorted_index[0, 1:k+1]]

    n_buyer = i_vec.sum()

    return similar_tokens, top_scores, n_buyer


with multiprocess.Pool(processes=32) as pool:
    results = pool.map(get_top_k, nft_list)


In [9]:

with open('/Users/wenyunyang/dev/nft-recommendation/data_analysis/notebook/results/nft_i2i.csv', 'w') as fp:
    for i, nft_id in enumerate(nft_list):
        fp.write(','.join([
            nft_id,
            str(results[i][2]),   # number of buyers
            *results[i][0],       # top-k similar NFT
            *[str(e) for e in results[i][1].tolist()[0]],   # top-k scores
        ]) + '\n')
    

## Embedding Approach

We learn the embeddings based on user item interactions