# [Hands-on #1] Reproducibility


<div>
  <img src="https://logconference.org/post/announcement/featured.png" alt="Poliba" width="100">
</div>

This is the Google Colab notebook for the hands-on session #1 of the tutorial: "_[Graph Neural Networks for Recommendation: Reproducibility, Graph Topology, and Node Representation](https://sisinflab.github.io/tutorial-gnns-recsys-log2023/)_" presented at the [2nd Learning on Graphs Conference](https://logconference.org/) (LoG 2023) -- November, 30 (Online).

Credits:
- Daniele Malitesta (daniele.malitesta@poliba.it)
- Claudio Pomo (claudio.pomo@poliba.it)
- Tommaso Di Noia (tommaso.dinoia@poliba.it)

<div>
  <img src="https://www.poliba.it/sites/default/files/logo_5.png" alt="Poliba" width="200">
  <img src="https://swot.sisinflab.poliba.it/img/logo-sisinflab.png" alt="SisInfLab" width="200">
</div>

## Clone the repository

First, let's clone the repository from GitHub...

In [None]:
!git clone https://github.com/sisinflab/Graph-RSs-Reproducibility.git

## Set up the environment

Second, let's set up the environment with the needed (extra) pip packages and the environment variables to ensure reproducibility!

In [None]:
%cd Graph-RSs-Reproducibility/
%env PYTHONPATH=.
%env CUBLAS_WORKSPACE_CONFIG=:16:8
!pip install -r requirements_torch_geometric_colab.txt

## Check if GPU is available

Then, let's check if the GPU is available:

In [None]:
!nvidia-smi
!nvcc --version

## Configure the experiments
Let's set the hyper-parameters for the model to be trained and tested. We begin with NGCF on Gowalla.

In [4]:
import yaml

config_filename = 'hands-on-1-log_2023.yml'
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.tsv',
      'test_path': '../data/{0}/test.tsv'
    },
    'dataset': 'gowalla',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.NGCF': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'validation_rate': 10,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'lr': 0.0001,
        'epochs': 400,
        'factors': 64,
        'batch_size': 1024,
        'l_w': 1e-5,
        'n_layers': 3,
        'weight_size': 64,
        'node_dropout': 0.1,
        'message_dropout': 0.1,
        'normalize': True,
        'seed': 42,
        'early_stopping': {
          'patience': 5,
          'mode': 'auto',
          'monitor': 'Recall@20',
          'verbose': True
        }
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

## Run experiments
Now we are all set to run an experiment with NGCF on Gowalla.

In [None]:
from elliot.run import run_experiment

run_experiment(f"config_files/{config_filename}")

Then, we overwrite the configuration file to train and test LightGCN on Gowalla.

In [None]:
import yaml

config_filename = 'hands-on-1-log_2023.yml'
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.tsv',
      'test_path': '../data/{0}/test.tsv'
    },
    'dataset': 'gowalla',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.LightGCN': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'validation_rate': 20,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'lr': 0.001,
        'epochs': 1000,
        'factors': 64,
        'batch_size': 2048,
        'l_w': 1e-4,
        'n_layers': 3,
        'normalize': True,
        'seed': 42,
        'early_stopping': {
          'patience': 5,
          'mode': 'auto',
          'monitor': 'Recall@20',
          'verbose': True
        }
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

run_experiment(f"config_files/{config_filename}")

## Take a look at the code!
Each (GNNs-based) recommender system comes with three modules:



```
├── model-name
│   ├── base-class
│   ├── model-class
│   ├── __init__.py
│   ├── <additional_class_1>
│   ├── <additional_class_2>
│   ├── ...
```

For instance, let's consider NGCF:

```
├── ngcf
│   ├── NGCF.py
│   ├── NGCFModel.py
│   ├── NGCFLayer.py
│   ├── __init__.py
│   ├── custom_sampler.py
```

and LightGCN:

```
├── lightgcn
│   ├── LightGCN.py
│   ├── LightGCNModel.py
│   ├── __init__.py
│   ├── custom_sampler.py
```

### Neural Graph Collaborative Filtering (NGCF)

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, Tat-Seng Chua: _Neural Graph Collaborative Filtering_. SIGIR 2019: 165-174

<div>
  <img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*6cBxhbNs9acjejFvsiqXMw.png" alt="NGCF" width="600">
</div>

\[[**paper**](https://arxiv.org/abs/1905.08108)\]\[[**code**](https://github.com/huangtinglin/NGCF-PyTorch)\]

#### Base class

Let's take a look at the base-class for NGCF (the file is ```NGCF.py```):

```python
class NGCF(RecMixin, BaseRecommenderModel):
    r"""
    Neural Graph Collaborative Filtering

    For further details, please refer to the `paper <https://dl.acm.org/doi/10.1145/3331184.3331267>`_
    """
    @init_charger
    def __init__(self, data, config, params, *args, **kwargs):

        # parameters list for NGCF
        self._params_list = [
            ("_learning_rate", "lr", "lr", 0.0005, float, None),
            ("_factors", "factors", "factors", 64, int, None),
            ("_l_w", "l_w", "l_w", 0.01, float, None),
            ("_n_layers", "n_layers", "n_layers", 3, int, None),
            ("_weight_size", "weight_size", "weight_size", 64, int, None),
            ("_node_dropout", "node_dropout", "node_dropout", 0.0, float, None),
            ("_message_dropout", "message_dropout", "message_dropout", 0.5, float, None),
            ("_normalize", "normalize", "normalize", True, bool, None)
        ]
        self.autoset_params()

        # set all seeds for reproducibility
        random.seed(self._seed)
        np.random.seed(self._seed)
        torch.manual_seed(self._seed)

        # create the sampler for BPR
        self._sampler = Sampler(self._data.i_train_dict, self._batch_size, self._seed)

        # instantiate the adjacency matrix
        row, col = data.sp_i_train.nonzero()
        col = [c + self._num_users for c in col]
        self.edge_index = np.array([row, col])
        self.adj = SparseTensor(row=torch.cat([torch.tensor(self.edge_index[0], dtype=torch.int64),
                                               torch.tensor(self.edge_index[1], dtype=torch.int64)], dim=0),
                                col=torch.cat([torch.tensor(self.edge_index[1], dtype=torch.int64),
                                               torch.tensor(self.edge_index[0], dtype=torch.int64)], dim=0),
                                sparse_sizes=(self._num_users + self._num_items,
                                              self._num_users + self._num_items))

        # optionally normalize the adjacency matrix
        if self._normalize:
            self.adj = self.apply_norm(self.adj, add_self_loops=True)

        # instantiate the model
        self._model = NGCFModel(
            num_users=self._num_users,
            num_items=self._num_items,
            learning_rate=self._learning_rate,
            embed_k=self._factors,
            l_w=self._l_w,
            weight_size=self._weight_size,
            n_layers=self._n_layers,
            message_dropout=self._message_dropout,
            random_seed=self._seed
        )

    # method to perform node dropout
    def sparse_dropout(self, x, rate, noise_shape):
        random_tensor = 1 - rate
        random_tensor += torch.rand(noise_shape).to(x.device())
        dropout_mask = torch.floor(random_tensor).type(torch.bool)
        i = self.adj.to_torch_sparse_coo_tensor().coalesce().indices()
        v = self.adj.to_torch_sparse_coo_tensor().coalesce().values()

        i = i[:, dropout_mask]
        v = v[dropout_mask]

        out = SparseTensor(row=i[0],
                           col=i[1],
                           value=v * (1. / (1 - rate)),
                           sparse_sizes=(self._num_users + self._num_items,
                                         self._num_users + self._num_items))
        return out

    # method to perform adjacency normalization
    @staticmethod
    def apply_norm(edge_index, add_self_loops=True):
        adj_t = edge_index
        if add_self_loops:
            adj_t = fill_diag(adj_t, 1.)
        deg = sum(adj_t, dim=1)
        deg_inv = deg.pow_(-1)
        deg_inv.masked_fill_(deg_inv == float('inf'), 0.)
        norm_adj_t = mul(adj_t, deg_inv.view(-1, 1))
        return norm_adj_t
    
    def train(self):
        for it in self.iterate(self._epochs):
            loss = 0
            steps = 0
            self._model.train()

            # optionally run node dropout
            if self._node_dropout > 0:
                sampled_adj = self.sparse_dropout(self.adj,
                                                  self._node_dropout,
                                                  self.adj.nnz())

            # train model
            n_batch = int(self._data.transactions / self._batch_size) if self._data.transactions % self._batch_size == 0 \
            else int(self._data.transactions / self._batch_size) + 1
            with tqdm(total=n_batch, disable=not self._verbose) as t:
                for _ in range(n_batch):
                    user, pos, neg = self._sampler.step()
                    steps += 1
                    if self._node_dropout > 0:
                        loss += self._model.train_step((user, pos, neg), sampled_adj)
                    else:
                        loss += self._model.train_step((user, pos, neg), self.adj)

                    if math.isnan(loss) or math.isinf(loss) or (not loss):
                        break

                    t.set_postfix({'loss': f'{loss / steps:.5f}'})
                    t.update()

            # run evaluation
            self.evaluate(it, loss / (it + 1))

```


#### Model class
Let's take a look at the model-class for NGCF (the file is ```NGCFModel.py```):


```python
class NGCFModel(torch.nn.Module, ABC):
    def __init__(self,
                 num_users,
                 num_items,
                 learning_rate,
                 embed_k,
                 l_w,
                 weight_size,
                 n_layers,
                 message_dropout,
                 random_seed,
                 name="NGFC",
                 **kwargs
                 ):
        super().__init__()

        # set all seeds and deterministic behaviour
        random.seed(random_seed)
        np.random.seed(random_seed)
        torch.manual_seed(random_seed)
        torch.cuda.manual_seed(random_seed)
        torch.cuda.manual_seed_all(random_seed)
        torch.backends.cudnn.deterministic = True
        torch.use_deterministic_algorithms(True)

        # set device
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        # set all model's parameters
        self.num_users = num_users
        self.num_items = num_items
        self.embed_k = embed_k
        self.learning_rate = learning_rate
        self.l_w = l_w
        self.weight_size = weight_size
        self.n_layers = n_layers
        self.message_dropout = message_dropout
        self.weight_size_list = [self.embed_k] + ([self.weight_size] * self.n_layers)

        # create node embeddings
        self.Gu = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(self.num_users, self.embed_k)))
        self.Gu.to(self.device)
        self.Gi = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(self.num_items, self.embed_k)))
        self.Gi.to(self.device)

        # create GNN
        propagation_network_list = []
        self.dropout_layers = []
        for layer in range(self.n_layers):
            propagation_network_list.append((NGCFLayer(self.weight_size_list[layer],
                                                       self.weight_size_list[layer + 1]), 'x, edge_index -> x'))
            self.dropout_layers.append(torch.nn.Dropout(p=self.message_dropout))
        self.propagation_network = torch_geometric.nn.Sequential('x, edge_index', propagation_network_list)
        self.propagation_network.to(self.device)

        # instantiate optimizer
        self.optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)

    # train step batch-by-batch
    def train_step(self, batch, adj):

        # run message-passing on the whole GNN
        gu, gi = self.propagate_embeddings(adj)

        # run MF
        user, pos, neg = batch
        xu_pos, gamma_u, gamma_i_pos = self.forward(inputs=(gu[user], gi[pos]))
        xu_neg, _, gamma_i_neg = self.forward(inputs=(gu[user], gi[neg]))

        # compute loss
        maxi = torch.nn.LogSigmoid()(xu_pos - xu_neg)
        mf_loss = -1 * torch.mean(maxi)
        reg_loss = self.l_w * (1 / 2) * (torch.norm(gu[user]) ** 2
                                         + torch.norm(gi[pos]) ** 2
                                         + torch.norm(gi[neg]) ** 2) / len(user)
        mf_loss += reg_loss

        # backward propagation
        self.optimizer.zero_grad()
        mf_loss.backward()
        self.optimizer.step()

        # return batch loss
        return mf_loss.detach().cpu().numpy()
    
    # method to run message-passing
    def propagate_embeddings(self, adj):

        # create a unique embedding representation for all users $\mathbf{E}_u^{(0)}$ and items $\mathbf{E}_i^{(0)}$
        ego_embeddings = torch.cat((self.Gu.to(self.device), self.Gi.to(self.device)), 0)
        all_embeddings = [ego_embeddings]
        embedding_idx = 0

        # run message-passing
        for layer in range(self.n_layers):
            all_embeddings += [torch.nn.functional.normalize(self.dropout_layers[embedding_idx](list(
                self.propagation_network.children()
            )[layer](all_embeddings[embedding_idx].to(self.device), adj.to(self.device))), p=2, dim=1)]
            embedding_idx += 1

        # aggregate all embedding representations from each layer $[\mathbf{E}^{(0)}, \mathbf{E}^{(1)}, \dots, \mathbf{E}^{(L - 1)}]$
        all_embeddings = torch.cat(all_embeddings, 1)

        # get the final embedding representation
        gu, gi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)

        return gu, gi

    # method to run MF
    def forward(self, inputs, **kwargs):
          gu, gi = inputs
          gamma_u = torch.squeeze(gu).to(self.device)
          gamma_i = torch.squeeze(gi).to(self.device)

          xui = torch.sum(gamma_u * gamma_i, 1)

          return xui, gamma_u, gamma_i
```



#### Layer class

Let's recall the message-passing formulation for NGCF:

$$\mathbf{E}^{(l)} = \text{LeakyReLU}\Big(\underbrace{\underbrace{\hat{A}\mathbf{E}^{(l-1)}}_{(1)}\mathbf{W}_1^{(l)}}_{(2)} + \overbrace{\underbrace{\underbrace{\hat{A}\mathbf{E}^{(l-1)}}_{(1)}\odot\mathbf{E}^{(l-1)}}_{(3)}\mathbf{W}_2^{(l)}}^{(4)}\Big),$$

where $\hat{A}$ is the normalized adjacency matrix. The normalization is handled in the base-class (see above).

Let's take a look at the layer-class for NGCF (the file is ```NGCFLayer.py```):

```python
import torch
from torch_geometric.nn import MessagePassing
from torch_sparse import matmul

# this class extends the PyG's MessagePassing class
class NGCFLayer(MessagePassing):
    def __init__(self, in_dim, out_dim):
        super(NGCFLayer, self).__init__(aggr='add')
        self.W1 = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(in_dim, out_dim)))
        self.b1 = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(1, out_dim)))
        self.W2 = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(in_dim, out_dim)))
        self.b2 = torch.nn.Parameter(torch.nn.init.xavier_uniform_(torch.empty(1, out_dim)))
        self.leaky_relu = torch.nn.LeakyReLU(negative_slope=0.2)

    def init_weights(self):
        torch.nn.init.xavier_uniform_(self.lin1.weight)
        torch.nn.init.xavier_uniform_(self.lin1.bias.unsqueeze(0))
        torch.nn.init.xavier_uniform_(self.lin2.weight)
        torch.nn.init.xavier_uniform_(self.lin2.bias.unsqueeze(0))

    def forward(self, x, edge_index):
        return self.propagate(edge_index, x=x)

    # method to perform message-passing
    def message_and_aggregate(self, adj_t, x):

        # we calculate (1)
        side_embeddings = matmul(adj_t, x, reduce=self.aggr)

        # we calculate (2)
        first_addendum = torch.matmul(side_embeddings, self.W1)

        # we calculate (3)
        o_dot = torch.mul(side_embeddings, x)

        # we calculate (4)
        second_addendum = torch.matmul(o_dot, self.W2)

        # note that the authors add the bias term in their original code, that is not reported in the
        return self.leaky_relu(first_addendum + self.b1 + second_addendum + self.b2)
```



### Light Graph Convolutional Network (LightGCN)

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yong-Dong Zhang, Meng Wang:
_LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation_. SIGIR 2020: 639-648

<div>
  <img src="https://recbole.io/docs/_images/lightgcn.png" alt="NGCF" width="500">
</div>

\[[**paper**](https://arxiv.org/abs/2002.02126)\]\[[**code**](https://github.com/gusye1234/LightGCN-PyTorch)\]

#### Base class

Let's take a look at the base-class for LightGCN (the file is ```LightGCN.py```):

```python
class LightGCN(RecMixin, BaseRecommenderModel):
    r"""
    LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

    For further details, please refer to the `paper <https://dl.acm.org/doi/10.1145/3397271.3401063>`_
    """
    @init_charger
    def __init__(self, data, config, params, *args, **kwargs):

        # parameters list for LightGCN
        self._params_list = [
            ("_learning_rate", "lr", "lr", 0.0005, float, None),
            ("_factors", "factors", "factors", 64, int, None),
            ("_l_w", "l_w", "l_w", 0.01, float, None),
            ("_n_layers", "n_layers", "n_layers", 1, int, None),
            ("_normalize", "normalize", "normalize", True, bool, None)
        ]
        self.autoset_params()

        # create the sampler for BPR
        self._sampler = Sampler(self._data.i_train_dict, seed=self._seed)

        # instantiate the adjacency matrix
        row, col = data.sp_i_train.nonzero()
        col = [c + self._num_users for c in col]
        edge_index = np.array([row, col])
        edge_index = torch.tensor(edge_index, dtype=torch.int64)
        self.adj = SparseTensor(row=torch.cat([edge_index[0], edge_index[1]], dim=0),
                                col=torch.cat([edge_index[1], edge_index[0]], dim=0),
                                sparse_sizes=(self._num_users + self._num_items,
                                              self._num_users + self._num_items))

        # instantiate the model
        self._model = LightGCNModel(
            num_users=self._num_users,
            num_items=self._num_items,
            learning_rate=self._learning_rate,
            embed_k=self._factors,
            l_w=self._l_w,
            n_layers=self._n_layers,
            adj=self.adj,
            normalize=self._normalize,
            random_seed=self._seed
        )
    
    def train(self):
        for it in self.iterate(self._epochs):
            loss = 0
            steps = 0
            self._model.train()

            # optionally run node dropout
            if self._node_dropout > 0:
                sampled_adj = self.sparse_dropout(self.adj,
                                                  self._node_dropout,
                                                  self.adj.nnz())

            # train model
            n_batch = int(self._data.transactions / self._batch_size) if self._data.transactions % self._batch_size == 0 \
            else int(self._data.transactions / self._batch_size) + 1
            with tqdm(total=n_batch, disable=not self._verbose) as t:
                for batch in self._sampler.step(self._data.transactions, self._batch_size):
                    steps += 1
                    loss += self._model.train_step(batch)

                    if math.isnan(loss) or math.isinf(loss) or (not loss):
                        break

                    t.set_postfix({'loss': f'{loss / steps:.5f}'})
                    t.update()

            # run evaluation
            self.evaluate(it, loss / (it + 1))

```


#### Model class
Let's take a look at the model-class for LightGCN (the file is ```LightGCNModel.py```):


```python
class LightGCNModel(torch.nn.Module, ABC):
    def __init__(self,
                 num_users,
                 num_items,
                 learning_rate,
                 embed_k,
                 l_w,
                 n_layers,
                 adj,
                 normalize,
                 random_seed,
                 name="LightGCN",
                 **kwargs
                 ):
        super().__init__()

        # set all seeds and deterministic behaviour
        random.seed(random_seed)
        np.random.seed(random_seed)
        torch.manual_seed(random_seed)
        torch.cuda.manual_seed(random_seed)
        torch.cuda.manual_seed_all(random_seed)
        torch.backends.cudnn.deterministic = True
        torch.use_deterministic_algorithms(True)

        # set device
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        # set all model's parameters
        self.num_users = num_users
        self.num_items = num_items
        self.embed_k = embed_k
        self.learning_rate = learning_rate
        self.l_w = l_w
        self.n_layers = n_layers
        self.weight_size_list = [self.embed_k] * (self.n_layers + 1)
        self.alpha = torch.tensor([1 / (k + 1) for k in range(len(self.weight_size_list))])
        self.adj = adj
        self.normalize = normalize

        # create node embeddings
        self.Gu = torch.nn.Embedding(
            num_embeddings=self.num_users, embedding_dim=self.embed_k)
        self.Gi = torch.nn.Embedding(
            num_embeddings=self.num_items, embedding_dim=self.embed_k)
        torch.nn.init.normal_(self.Gu.weight, std=0.1)
        torch.nn.init.normal_(self.Gi.weight, std=0.1)

        # create GNN
        propagation_network_list = []
        for _ in range(self.n_layers):
            propagation_network_list.append((LGConv(normalize=self.normalize), 'x, edge_index -> x'))
        self.propagation_network = torch_geometric.nn.Sequential('x, edge_index', propagation_network_list)
        self.propagation_network.to(self.device)
        self.softplus = torch.nn.Softplus()

        # instantiate optimizer
        self.optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)

    # train step batch-by-batch
    def train_step(self, batch, adj):

        # run message-passing on the whole GNN
        gu, gi = self.propagate_embeddings(adj)

        # run MF
        user, pos, neg = batch
        xu_pos = self.forward(inputs=(gu[user[:, 0]], gi[pos[:, 0]]))
        xu_neg = self.forward(inputs=(gu[user[:, 0]], gi[neg[:, 0]]))

        # compute loss
        loss = torch.mean(torch.nn.functional.softplus(xu_neg - xu_pos))
        reg_loss = self.l_w * (1 / 2) * (self.Gu.weight[user[:, 0]].norm(2).pow(2) +
                                         self.Gi.weight[pos[:, 0]].norm(2).pow(2) +
                                         self.Gi.weight[neg[:, 0]].norm(2).pow(2)) / float(batch[0].shape[0])
        loss += reg_loss

        # backward propagation
        self.optimizer.zero_grad()
        mf_loss.backward()
        self.optimizer.step()

        # return batch loss
        return loss.detach().cpu().numpy()
    
    # method to run message-passing
    def propagate_embeddings(self, adj):

        # create a unique embedding representation for all users $\mathbf{E}_u^{(0)}$ and items $\mathbf{E}_i^{(0)}$
        ego_embeddings = torch.cat((self.Gu.to(self.device), self.Gi.to(self.device)), 0)
        all_embeddings = [ego_embeddings]

        # run message-passing
        for layer in range(self.n_layers):
            all_embeddings += [list(
                        self.propagation_network.children()
                    )[layer](all_embeddings[layer].to(self.device), self.adj.to(self.device))]

        # aggregate all embedding representations from each layer $[\mathbf{E}^{(0)}, \mathbf{E}^{(1)}, \dots, \mathbf{E}^{(L - 1)}]$
        # note that this aggregation is slightly different from the one reported in the paper (see in the next cell)
        all_embeddings = torch.mean(torch.stack(all_embeddings, 0), dim=0)

        # get the final embedding representation
        gu, gi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)

        return gu, gi

    # method to run MF
    def forward(self, inputs, **kwargs):
          gu, gi = inputs
          gamma_u = torch.squeeze(gu).to(self.device)
          gamma_i = torch.squeeze(gi).to(self.device)

          xui = torch.sum(gamma_u * gamma_i, 1)

          return xui, gamma_u, gamma_i
```



#### Layer class

Let's recall the message-passing formulation for LightGCN:

$$\mathbf{E}^{(l)} = \hat{A}\mathbf{E}^{(l-1)},$$

where $\hat{A}$ is the normalized adjacency matrix (the class has an attribute that, when set, normalizes the adjacency matrix before running the message-passing.

Let's take a look at the layer-class for LightGCN. In this case, the layer has been already implemented in PyG ([here](https://pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/nn/conv/lg_conv.html#LGConv) for a reference):

```python
from torch import Tensor

from torch_geometric.nn.conv import MessagePassing
from torch_geometric.nn.conv.gcn_conv import gcn_norm
from torch_geometric.typing import Adj, OptTensor, SparseTensor
from torch_geometric.utils import spmm


class LGConv(MessagePassing):
    r"""The Light Graph Convolution (LGC) operator from the `"LightGCN:
    Simplifying and Powering Graph Convolution Network for Recommendation"
    <https://arxiv.org/abs/2002.02126>`_ paper.

    .. math::
        \mathbf{x}^{\prime}_i = \sum_{j \in \mathcal{N}(i)}
        \frac{e_{j,i}}{\sqrt{\deg(i)\deg(j)}} \mathbf{x}_j

    Args:
        normalize (bool, optional): If set to :obj:`False`, output features
            will not be normalized via symmetric normalization.
            (default: :obj:`True`)
        **kwargs (optional): Additional arguments of
            :class:`torch_geometric.nn.conv.MessagePassing`.

    Shapes:
        - **input:**
          node features :math:`(|\mathcal{V}|, F)`,
          edge indices :math:`(2, |\mathcal{E}|)`,
          edge weights :math:`(|\mathcal{E}|)` *(optional)*
        - **output:** node features :math:`(|\mathcal{V}|, F)`
    """
    def __init__(self, normalize: bool = True, **kwargs):
        kwargs.setdefault('aggr', 'add')
        super().__init__(**kwargs)
        self.normalize = normalize
    def forward(self, x: Tensor, edge_index: Adj,
                edge_weight: OptTensor = None) -> Tensor:

        if self.normalize and isinstance(edge_index, Tensor):
            out = gcn_norm(edge_index, edge_weight, x.size(self.node_dim),
                           add_self_loops=False, flow=self.flow, dtype=x.dtype)
            edge_index, edge_weight = out
        elif self.normalize and isinstance(edge_index, SparseTensor):
            edge_index = gcn_norm(edge_index, None, x.size(self.node_dim),
                                  add_self_loops=False, flow=self.flow,
                                  dtype=x.dtype)

        # propagate_type: (x: Tensor, edge_weight: OptTensor)
        return self.propagate(edge_index, x=x, edge_weight=edge_weight,
                              size=None)

    def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

    def message_and_aggregate(self, adj_t: SparseTensor, x: Tensor) -> Tensor:
        return spmm(adj_t, x, reduce=self.aggr)
```



## Dealing with reproducibility issues

Let's verify that the results obtained with our code are fully reproducible.

### Test LightGCN with reproducibility

To do so, we first run LightGCN on a small number of epochs (i.e., 2 epochs) and visualize the results.

In [None]:
config_filename = 'hands-on-1-log_2023_reproducibility.yml'
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.tsv',
      'test_path': '../data/{0}/test.tsv'
    },
    'dataset': 'gowalla',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.LightGCN': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'validation_rate': 2,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'lr': 0.001,
        'epochs': 2,
        'factors': 64,
        'batch_size': 2048,
        'l_w': 1e-4,
        'n_layers': 3,
        'normalize': True,
        'seed': 42
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

run_experiment(f"config_files/{config_filename}")

Now, let's run it again:

In [None]:
run_experiment(f"config_files/{config_filename}")

... and again:

In [None]:
run_experiment(f"config_files/{config_filename}")

**The results are fully aligned**. You can try it yourself on your custom experimental configurations!

### How can we seek reproducibility?

We have three thumb rules to follow:

1. Set all random seeds
2. Use deterministics operations
3. Set the proper environment variables

#### Set all random seeds

This is the piece of code that we use to set all random seeds:

```python
random.seed(random_seed)
np.random.seed(random_seed)
torch.manual_seed(random_seed)
torch.cuda.manual_seed(random_seed)
torch.cuda.manual_seed_all(random_seed)
```



#### Use deterministics operations

In PyTorch (as in other Python libraries), some operations may be deterministic, while other ones may have a non-deterministic behaviour.

In some cases, PyTorch provides, for an operation, both versions. To ensure that only the deterministic one is chosen during the running, you should include the following:

```python
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)
```

This will throw an exception if the operation has only the non-deterministic version. At this [link](https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html) you may find an updated list of all non-deterministic operations in PyTorch.



#### Set the proper environment variables

As indicated the in the official [documentation](https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html), one should set the environment variable ```CUBLAS_WORKSPACE_CONFIG=:4096:8``` or ```CUBLAS_WORKSPACE_CONFIG=:16:8``` to use deterministic operations if the CUDA version is >= 10.2.

We do this with before launching any script:



```sh
%env CUBLAS_WORKSPACE_CONFIG=:16:8
```





### What about PyTorch Geometric?

Let's take a look at the layer-class of LightGCN once again:

```python
from torch import Tensor

from torch_geometric.nn.conv import MessagePassing
from torch_geometric.nn.conv.gcn_conv import gcn_norm
from torch_geometric.typing import Adj, OptTensor, SparseTensor
from torch_geometric.utils import spmm


class LGConv(MessagePassing):
    # [...]
    def __init__(self, normalize: bool = True, **kwargs):
        kwargs.setdefault('aggr', 'add')
        super().__init__(**kwargs)
        self.normalize = normalize
    def forward(self, x: Tensor, edge_index: Adj,
                edge_weight: OptTensor = None) -> Tensor:

        # [...]
        return self.propagate(edge_index, x=x, edge_weight=edge_weight,
                              size=None)

    def message(self, x_j: Tensor, edge_weight: OptTensor) -> Tensor:
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

    def message_and_aggregate(self, adj_t: SparseTensor, x: Tensor) -> Tensor:
        return spmm(adj_t, x, reduce=self.aggr)
```

If we go deeper in the class hierarchy, we find that the base version of the ```propagate()``` method is:



```python
def propagate(self, edge_index: Adj, size: Size = None, **kwargs):
  # [...]
  if is_sparse(edge_index) and self.fuse and not self.explain:
    # [...]
    out = self.message_and_aggregate(edge_index, **msg_aggr_kwargs)
    # [...]
  else:  # Otherwise, run both functions in separation.
    # [...]
    out = self.message(**msg_kwargs)
    # [...]
    out = self.aggregate(out, **aggr_kwargs)
```

That is, the ```propagate()``` method calls the method ```message_and_aggregate()``` when the adjacency matrix is formatted as a ```SparseTensor```, while it calls the methods ```message()``` and ```aggregate()``` when the adjacency matrix is formatted as a ```Tensor```.

By going even deeper, we find out that the ```aggregate()``` method may call the ```scatter()``` method, which is one of those operations behaving in a non-deterministic manner.

This is the reason why, **if we aim to reach full reproducibility, we should always format the adjacency matrix as a ```SparseTensor```**.


However, this is done when the message-passing formulation of our model may be expressed **at graph level, and not only at node level**.

$$\begin{align}
\text{graph-level} \; → \; & \mathbf{E}^{(l)} = \hat{A}\mathbf{E}^{(l - 1)}\\
\text{node-level} \; → \; & \mathbf{e}_u^{(l)} = \sum_{i \in \mathcal{N}_u} \frac{\mathbf{e}_i^{(l-1)}}{\sqrt{|\mathcal{N}_u||\mathcal{N}_i|}},\quad \forall u \in \mathcal{U}.
\end{align}
$$

Take at look at this [link](https://github.com/pytorch/pytorch/issues/50469) and this other [link](https://pytorch-geometric.readthedocs.io/en/latest/notes/sparse_tensor.html) for further references.

### Test LightGCN without reproducibility

Here we test a non-reproducible version of LightGCN, where the adjacency matrix is represented with a ```Tensor```:

```python
# Base class
class LightGCNNoRepr(RecMixin, BaseRecommenderModel):
  @init_charger
    def __init__(self, data, config, params, *args, **kwargs):
        ######################################

        self._params_list = [
            ("_learning_rate", "lr", "lr", 0.0005, float, None),
            ("_factors", "factors", "factors", 64, int, None),
            ("_l_w", "l_w", "l_w", 0.01, float, None),
            ("_n_layers", "n_layers", "n_layers", 1, int, None),
            ("_normalize", "normalize", "normalize", True, bool, None)
        ]
        self.autoset_params()
        # [...]
        row, col = data.sp_i_train.nonzero()
        col = [c + self._num_users for c in col]
        edge_index = np.array([row, col])

        # we use the tensor version instead of the sparse tensor one from above
        edge_index = torch.tensor(edge_index, dtype=torch.int64)

        self._model = LightGCNNoReprModel(
            num_users=self._num_users,
            num_items=self._num_items,
            learning_rate=self._learning_rate,
            embed_k=self._factors,
            l_w=self._l_w,
            n_layers=self._n_layers,
            edge_index=edge_index,
            normalize=self._normalize,
            random_seed=self._seed
        )
        # [...]
```


```python
# Model class
class LightGCNNoReprModel(torch.nn.Module, ABC):
  # [...]
      def propagate_embeddings(self, evaluate=False):
        ego_embeddings = torch.cat((self.Gu.weight.to(self.device), self.Gi.weight.to(self.device)), 0)
        all_embeddings = [ego_embeddings]

        for layer in range(0, self.n_layers):
            if evaluate:
                self.propagation_network.eval()
                with torch.no_grad():
                    all_embeddings += [list(
                        self.propagation_network.children()
                    )[layer](all_embeddings[layer].to(self.device), self.edge_index.to(self.device))]
            else:
                all_embeddings += [list(
                    self.propagation_network.children()
                )[layer](all_embeddings[layer].to(self.device), self.edge_index.to(self.device))]

        if evaluate:
            self.propagation_network.train()

        all_embeddings = torch.mean(torch.stack(all_embeddings, 0), dim=0)
        gu, gi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)

        return gu, gi
  # [...]
```

We run a short experiment now:


In [None]:
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.tsv',
      'test_path': '../data/{0}/test.tsv'
    },
    'dataset': 'gowalla',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.LightGCNNoRepr': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'validation_rate': 2,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'lr': 0.001,
        'epochs': 2,
        'factors': 64,
        'batch_size': 2048,
        'l_w': 1e-4,
        'n_layers': 3,
        'normalize': True,
        'seed': 42
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

run_experiment(f"config_files/{config_filename}")

**The results are different!**

## Cite us

This notebook is heavily dependent on [Elliot](https://github.com/sisinflab/elliot), our framework for rigorous and reproducible recommender systems evaluation. If you use this in your works, please consider to cite us:



```
@inproceedings{DBLP:conf/sigir/AnelliBFMMPDN21,
  author       = {Vito Walter Anelli and
                  Alejandro Bellog{\'{\i}}n and
                  Antonio Ferrara and
                  Daniele Malitesta and
                  Felice Antonio Merra and
                  Claudio Pomo and
                  Francesco Maria Donini and
                  Tommaso Di Noia},
  title        = {Elliot: {A} Comprehensive and Rigorous Framework for Reproducible
                  Recommender Systems Evaluation},
  booktitle    = {{SIGIR}},
  pages        = {2405--2414},
  publisher    = {{ACM}},
  year         = {2021}
}
```



```
@inproceedings{DBLP:conf/recsys/AnelliMPBSN23,
  author       = {Vito Walter Anelli and
                  Daniele Malitesta and
                  Claudio Pomo and
                  Alejandro Bellog{\'{\i}}n and
                  Eugenio Di Sciascio and
                  Tommaso Di Noia},
  title        = {Challenging the Myth of Graph Collaborative Filtering: a Reasoned
                  and Reproducibility-driven Analysis},
  booktitle    = {RecSys},
  pages        = {350--361},
  publisher    = {{ACM}},
  year         = {2023}
}
```



```
@inproceedings{DBLP:conf/um/MalitestaPANF23,
  author       = {Daniele Malitesta and
                  Claudio Pomo and
                  Vito Walter Anelli and
                  Tommaso Di Noia and
                  Antonio Ferrara},
  title        = {An Out-of-the-Box Application for Reproducible Graph Collaborative
                  Filtering extending the Elliot Framework},
  booktitle    = {{UMAP} (Adjunct Publication)},
  pages        = {12--15},
  publisher    = {{ACM}},
  year         = {2023}
}
```
