# Extended Stochastic Block Model for Recommendations - Tutorial (draft 1.0)
## This is a preliminary tutorial on how to use the module "esbmb" for bipartite network to make inference on cluster assignments $z$ of users and items as well as on block interactions $\Theta$.

In [4]:
from esbmb import esbmb

import pandas as pd
import numpy as np 
import random
from numpy import matlib
from scipy.special import betaln
from scipy.special import gammaln
import time
import matplotlib.pyplot as plt

## A closer look to the initialization
If no parameters are passed to the instantiation of an object "esbmb", the module automatically sets Dirichlet Process priors on cluster assignments of users and items. a and b are set to 1.

## Providing no covariates:
The method "fit" executes a Gibbs sampler, with user-specified number of iterations.
We use toy data

In [9]:
type1 = np.array([5,5,5,5,2,1,0,0,0,0,0,0,0,0,0,0])
type2 = np.array([0,0,0,0,0,0,5,5,2,3,1,4,4,5,5,1])
Y = np.array([type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2])


mod = esbmb()
mod.fit(Y,100)

------------------
Initial log-likelihood: -196.32858651238305
------------------
Gibbs Sampling simulation starts.
Iteration 0 complete. Log-likelihood: 250.3045931898321.
Iteration 10 complete. Log-likelihood: 270.24473129650164.
Iteration 20 complete. Log-likelihood: 270.8692641731425.
Iteration 30 complete. Log-likelihood: 271.47930638929.
Iteration 40 complete. Log-likelihood: 274.2400306807452.
Iteration 50 complete. Log-likelihood: 274.39060280271485.
Iteration 60 complete. Log-likelihood: 273.356919770585.
Iteration 70 complete. Log-likelihood: 274.55616561091006.
Iteration 80 complete. Log-likelihood: 271.47930638929.
Iteration 90 complete. Log-likelihood: 272.6497392900243.
Runtime: 2.066733
Block-interactions computed.


## Block interactions:
The block interaction matrix $\Theta$ is computed afterwards with the estimated cluster assignments:


In [6]:
mod.theta_est

array([[0.01176471, 4.92982456, 0.03448276, 1.93333333, 0.03448276,
        1.        ],
       [4.62025316, 0.01886792, 2.44444444, 0.07142857, 1.        ,
        0.07142857]])

## Providing covariates
Categorical, binary, count or continuous covariates can be provided both for user entity and for item entity. It sufficies to specify their nature in a specific parameter of the "fit" method, as follows:

In [11]:
x1 = [100,100,100,100,100,100,100,0,10,50,0,0,0,0]
x2 = [80,90,80,75,80,95,100,20,10,20,30,20,20,15]
x = np.empty(shape= (2,Y.shape[0]))

type1 = np.array([5,5,5,5,2,1,0,0,0,0,0,0,0,0,0,0])
type2 = np.array([0,0,0,0,0,0,5,5,2,3,1,4,4,5,5,1])

Y = np.array([type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type1, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2, type2])

mod1 = esbmb(prior_u = "PY", prior_i = "PY", beta = 0.1, components = 2, sigma = 0.4, gamma = 0)
mod1.fit(Y, 100, xu = None, xi = x, xi_type = ["cont","cont"], xu_type = None, verbose = False)
mod1.theta_est

NameError: name 'Y' is not defined