# [Hands-on #2] Reproducibility


<div>
  <img src="https://logconference.org/post/announcement/featured.png" alt="Poliba" width="100">
</div>

This is the Google Colab notebook for the hands-on session #2 of the tutorial: "_[Graph Neural Networks for Recommendation: Reproducibility, Graph Topology, and Node Representation](https://sisinflab.github.io/tutorial-gnns-recsys-log2023/)_" presented at the [2nd Learning on Graphs Conference](https://logconference.org/) (LoG 2023) -- November, 30 (Online).

Credits:
- Daniele Malitesta (daniele.malitesta@poliba.it)
- Claudio Pomo (claudio.pomo@poliba.it)
- Tommaso Di Noia (tommaso.dinoia@poliba.it)

<div>
  <img src="https://www.poliba.it/sites/default/files/logo_5.png" alt="Poliba" width="200">
  <img src="https://swot.sisinflab.poliba.it/img/logo-sisinflab.png" alt="SisInfLab" width="200">
</div>

## Clone the repository

First, let's clone the repository from GitHub...

In [None]:
!git clone https://github.com/sisinflab/Multimodal-RSs-Reproducibility.git

## Set up the environment

Second, let's set up the environment with the needed (extra) pip packages and the environment variables to ensure reproducibility!

In [None]:
%cd Multimodal-RSs-Reproducibility/
%env PYTHONPATH=.
%env CUBLAS_WORKSPACE_CONFIG=:16:8
!pip install -r requirements_torch_geometric_colab.txt

## Check if GPU is available

Then, let's check if the GPU is available:

In [None]:
!nvidia-smi
!nvcc --version

## Configure the experiments
Let's set the hyper-parameters for the model to be trained and tested. We begin with LightGCN in a modified version which adopts multimodal features (we call it LightGCNM). We train and evaluate it on Amazon Office.

In [4]:
import yaml

config_filename = 'hands-on-2-log_2023.yml'
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.txt',
      'validation_path': '../data/{0}/val.txt',
      'test_path': '../data/{0}/test.txt',
      'side_information': [
        {
            'dataloader': 'VisualAttribute',
            'visual_features': '../data/{0}/image_feat'
        },
        {
            'dataloader': 'TextualAttribute',
            'textual_features': '../data/{0}/text_feat'
        }
      ]
    },
    'dataset': 'office',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.LightGCNM': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'save_recs': False,
          'validation_rate': 10,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'epochs': 200,
        'batch_size': 1024,
        'factors': 64,
        'lr': 0.005,
        'l_w': 1e-5,
        'n_layers': 1,
        'normalize': True,
        'aggregation': 'mean',
        'modalities': "('visual','textual')",
        'loaders': "('VisualAttribute','TextualAttribute')",
        'seed': 123
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

## Download the multimodal recommendation dataset

Before we launch the experiments, we need to download the multimodal dataset (i.e., Amazon Office).

**[DISCLAIMER]** You can find the original version of Amazon Office at this [link](https://cseweb.ucsd.edu/~jmcauley/datasets/amazon/links.html). We downloaded and processed the multimodal features following the official [GitHub repository](https://github.com/CRIPAC-DIG/LATTICE) of LATTICE.

In [None]:
import gdown
import os

gdown.download(f'https://drive.google.com/uc?id=1u_E3gJgjD-mMCE0GJ_P0slsVY-bO4Ph_', 'office.zip', quiet=False)

!mkdir data
!mv office.zip data
%cd data
!unzip office.zip
!rm office.zip
%cd ..

The downloaded multimodal dataset has the following structure:

```
├── office
│   ├── image_feat
│       ├── 0.npy
│       ├── 1.npy
│       ├── ...
│   ├── text_feat
│       ├── 0.npy
│       ├── 1.npy
│       ├── ...
│   ├── train.txt
│   ├── validation.txt
│   ├── test.txt
```



## Run experiments
Now we are all set to run an experiment with LightGCNM on Amazon Office.

In [None]:
from elliot.run import run_experiment

run_experiment(f"config_files/{config_filename}")

Then, we overwrite the configuration file to train and test LATTICE on Amazon Office.

In [None]:
config = {
  'experiment': {
    'backend': 'pytorch',
    'data_config': {
      'strategy': 'fixed',
      'train_path': '../data/{0}/train.txt',
      'validation_path': '../data/{0}/val.txt',
      'test_path': '../data/{0}/test.txt',
      'side_information': [
        {
            'dataloader': 'VisualAttribute',
            'visual_features': '../data/{0}/image_feat'
        },
        {
            'dataloader': 'TextualAttribute',
            'textual_features': '../data/{0}/text_feat'
        }
      ]
    },
    'dataset': 'office',
    'top_k': 20,
    'evaluation': {
      'cutoffs': [20],
      'simple_metrics': ['Recall', 'nDCG']
    },
    'gpu': 0,
    'external_models_path': '../external/models/__init__.py',
    'models': {
      'external.LATTICE': {
        'meta': {
          'hyper_opt_alg': 'grid',
          'verbose': True,
          'save_weights': False,
          'save_recs': False,
          'validation_rate': 10,
          'validation_metric': 'Recall@20',
          'restore': False
        },
        'epochs': 200,
        'batch_size': 1024,
        'factors': 64,
        'lr': 0.005,
        'l_w': 1e-5,
        'n_layers': 1,
        'n_ui_layers': 2,
        'top_k': 20,
        'cf': 'lightgcn',
        'l_m': 0.7,
        'factors_multimod': 64,
        'modalities': "('visual','textual')",
        'loaders': "('VisualAttribute','TextualAttribute')",
        'seed': 123
      }
    }
  }
}

with open(f'config_files/{config_filename}', 'w') as file:
    documents = yaml.dump(config, file)

run_experiment(f"config_files/{config_filename}")

## Take a look at the code!
Each (multimodal-based) recommender system comes with three modules:



```
├── model-name
│   ├── base-class
│   ├── model-class
│   ├── __init__.py
│   ├── <additional_class_1>
│   ├── <additional_class_2>
│   ├── ...
```

For instance, let's consider LightGCNM:

```
├── lightgcn_m
│   ├── LightGCNM.py
│   ├── LightGCNMModel.py
│   ├── __init__.py
│   ├── custom_sampler.py
```

and LATTICE:

```
├── lattice
│   ├── LATTICE.py
│   ├── LATTICEModel.py
│   ├── NGCFLayer.py
│   ├── __init__.py
│   ├── custom_sampler.py
```

Moreover, recommender systems leveraging side-information (e.g., multimodal features) require 1+ dataset loader(s) to load and inject the additional features into the recommendation model.

For instance, in the case of the visual, textual, and audio modalities, we have:

```
├── visual
│   ├── __init__.py
│   ├── visual_attribute.py
```

```
├── textual
│   ├── __init__.py
│   ├── textual_attribute.py
```

```
├── audio
│   ├── __init__.py
│   ├── audio_attribute.py
```

You can find them at ```elliot.dataset.modular_loaders```.

### Multimodal data loader

Each modality loading and processing is handled by a specific data loader.

For instance, for the visual modality, we have:

```python
class VisualAttribute(AbstractLoader):
    def __init__(self, users: t.Set, items: t.Set, ns: SimpleNamespace, logger: object):
        
        # folder path for the visual features
        self.visual_feature_folder_path = getattr(ns, "visual_features", None)

        # item mapping dictionary
        self.item_mapping = {}

        # shape of visual features
        self.visual_features_shape = None

        # get items having visual features
        inner_items = self.check_items_in_folder()

        # align items having visual features and items from the training set
        self.items = items & inner_items
    
    # method to align items having visual features and items from the training set
    def check_items_in_folder(self) -> t.Set[int]:
       items = set()
        if self.visual_feature_folder_path:

            # get the ids of all items having a visual feature
            items_folder = os.listdir(self.visual_feature_folder_path)
            items = items.union(set([int(f.split('.')[0]) for f in items_folder]))
            self.visual_features_shape = np.load(os.path.join(self.visual_feature_folder_path,
                                                              items_folder[0])).shape[0]
        if items:
            
            # map the ids of all items having a visual feature with our internal dictionary mapping
            self.item_mapping = {item: val for val, item in enumerate(items)}
        return items

      # get all visual features as one numpy array
      def get_all_visual_features(self):
      all_features = np.empty((len(self.items), self.visual_features_shape))
      if self.visual_feature_folder_path:
          for key, value in self.item_mapping.items():
              all_features[value] = np.load(self.visual_feature_folder_path + '/' + str(key) + '.npy')
      return all_features

      # this method is actually called from outside
      def get_all_features(self):
        return self.get_all_visual_features()

      # method to create a namespace from this class
      def create_namespace(self) -> SimpleNamespace:
        ns = SimpleNamespace()
        ns.__name__ = "VisualAttribute"
        ns.object = self
        ns.visual_feature_folder_path = self.visual_feature_folder_path

        ns.item_mapping = self.item_mapping

        ns.visual_features_shape = self.visual_features_shape
        
        return ns
  
```



### Light Graph Convolutional Network with Multimodal Side Information (LightGCNM)

Inspired by the baseline LightGCN-M, as presented in:

Wei Wei, Chao Huang, Lianghao Xia, Chuxu Zhang: _Multi-Modal Self-Supervised Learning for Recommendation_. WWW 2023: 790-800

\[[**paper**](https://arxiv.org/abs/2302.10632)\]

#### Base class

Let's take a look at the base-class for LightGCNM (the file is ```LightGCNM.py```):

```python
class LightGCNM(RecMixin, BaseRecommenderModel):
    r"""
    Light graph convolutional network with multimodal side information.

    For further details, please refer to the `paper <https://dl.acm.org/doi/10.1145/3543507.3583206>`_
    """
    @init_charger
    def __init__(self, data, config, params, *args, **kwargs):

        # parameters list for LightGCNM
        self._params_list = [
            ("_learning_rate", "lr", "lr", 0.0005, float, None),
            ("_factors", "factors", "factors", 64, int, None),
            ("_l_w", "l_w", "l_w", 0.01, float, None),
            ("_modalities", "modalities", "modalites", "('visual','textual')", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-")),
            ("_n_layers", "n_layers", "n_layers", 1, int, None),
            ("_normalize", "normalize", "normalize", True, bool, None),
            ("_aggregation", "aggregation", "aggregation", 'mean', str, None),
            ("_loaders", "loaders", "loads", "('VisualAttribute','TextualAttribute')", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-"))
        ]
        self.autoset_params()

        # create the sampler for BPR
        self._sampler = Sampler(self._data.i_train_dict, seed=self._seed)

        # instantiate the adjacency matrix
        row, col = data.sp_i_train.nonzero()
        col = [c + self._num_users for c in col]
        edge_index = np.array([row, col])
        edge_index = torch.tensor(edge_index, dtype=torch.int64)
        self.adj = SparseTensor(row=torch.cat([edge_index[0], edge_index[1]], dim=0),
                                col=torch.cat([edge_index[1], edge_index[0]], dim=0),
                                sparse_sizes=(self._num_users + self._num_items,
                                              self._num_users + self._num_items))

        # we associate one class attribute for each modality
        for m_id, m in enumerate(self._modalities):
            self.__setattr__(f'''_side_{m}''',
                             self._data.side_information.__getattribute__(f'''{self._loaders[m_id]}'''))

        # load all multimodal features
        all_multimodal_features = []
        for m_id, m in enumerate(self._modalities):
            all_multimodal_features.append(self.__getattribute__(
                f'''_side_{self._modalities[m_id]}''').object.get_all_features())

        # instantiate the model
        self._model = LightGCNMModel(
            num_users=self._num_users,
            num_items=self._num_items,
            learning_rate=self._learning_rate,
            embed_k=self._factors,
            l_w=self._l_w,
            n_layers=self._n_layers,
            adj=self.adj,
            modalities=self._modalities,
            multimodal_features=all_multimodal_features,
            aggregation=self._aggregation,
            normalize=self._normalize,
            random_seed=self._seed
        )
    ```


#### Model class
Let's take a look at the model-class for LightGCNM (the file is ```LightGCNMModel.py```):


```python
class LightGCNMModel(torch.nn.Module, ABC):
      def __init__(self,
                 num_users,
                 num_items,
                 learning_rate,
                 embed_k,
                 l_w,
                 n_layers,
                 adj,
                 modalities,
                 multimodal_features,
                 aggregation,
                 normalize,
                 random_seed,
                 name="LightGCNM",
                 **kwargs
                 ):
        super().__init__()

        # set all seeds and deterministic behaviour
        random.seed(random_seed)
        np.random.seed(random_seed)
        torch.manual_seed(random_seed)
        torch.cuda.manual_seed(random_seed)
        torch.cuda.manual_seed_all(random_seed)
        torch.backends.cudnn.deterministic = True
        torch.use_deterministic_algorithms(True)

        # set device
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        # set all model's parameters
        self.num_users = num_users
        self.num_items = num_items
        self.embed_k = embed_k
        self.learning_rate = learning_rate
        self.l_w = l_w
        self.n_layers = n_layers
        self.weight_size_list = [self.embed_k] * (self.n_layers + 1)
        self.alpha = torch.tensor([1 / (k + 1) for k in range(len(self.weight_size_list))])
        self.adj = adj
        self.normalize = normalize

        # create node embeddings
        self.Gu = torch.nn.Embedding(
            num_embeddings=self.num_users, embedding_dim=self.embed_k)
        torch.nn.init.xavier_uniform_(self.Gu.weight)

        # multimodal attributes
        self.modalities = modalities
        self.aggregation = aggregation

        # multimodal features and projection network
        self.F = torch.nn.ParameterList()
        if self.aggregation == 'concat':
            total_multimodal_features = 0
            for m_id, m in enumerate(self.modalities):
                self.F.append(torch.nn.Embedding.from_pretrained(torch.tensor(
                    multimodal_features[m_id], device=self.device, dtype=torch.float32),
                    freeze=False))
                total_multimodal_features += multimodal_features[m_id].shape[-1]
            self.proj = torch.nn.Linear(in_features=total_multimodal_features, out_features=self.embed_k)
        else:
            self.proj = torch.nn.ModuleList()
            for m_id, m in enumerate(self.modalities):
                self.F.append(torch.nn.Embedding.from_pretrained(torch.tensor(
                    multimodal_features[m_id], device=self.device, dtype=torch.float32),
                    freeze=False))
                self.proj.append(
                    torch.nn.Linear(in_features=multimodal_features[m_id].shape[-1], out_features=self.embed_k))
        self.F.to(self.device)
        self.proj.to(self.device)

        # create GNN (the same as LightGCN)
        propagation_network_list = []

        for _ in range(self.n_layers):
            propagation_network_list.append((LGConv(normalize=self.normalize), 'x, edge_index -> x'))

        self.propagation_network = torch_geometric.nn.Sequential('x, edge_index', propagation_network_list)
        self.propagation_network.to(self.device)

        # instantiate optimizer
        self.optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
    
    # method to run message-passing
    def propagate_embeddings(self, adj):

        # project multimodal feautures in the same latent space as the user embeddings, and then fuse them
        if self.aggregation == 'concat':
            F_proj = torch.nn.functional.normalize(
                self.proj(torch.concat([self.F[m_id].weight
                                        for m_id in range(len(self.F))], dim=-1).to(self.device)
                          ), p=2, dim=1).to(self.device)
        elif self.aggregation == 'mean':
            F_proj = [torch.nn.functional.normalize(self.proj[m_id](self.F[m_id].weight).to(self.device), p=2, dim=1)
                      for m_id in range(len(self.F))]
            F_proj = torch.mean(torch.stack(F_proj, dim=-1).to(self.device), dim=-1)
        elif self.aggregation == 'sum':
            F_proj = [torch.nn.functional.normalize(self.proj[m_id](self.F[m_id].weight).to(self.device), p=2, dim=1)
                      for m_id in range(len(self.F))]
            F_proj = torch.sum(torch.stack(F_proj, dim=-1).to(self.device), dim=-1)

        # then, run the same message-passing as LightGCN, but item nodes features are multimodal
        ego_embeddings = torch.cat((self.Gu.weight.to(self.device), F_proj.to(self.device)), 0)
        all_embeddings = [ego_embeddings]

        for layer in range(0, self.n_layers):
            if evaluate:
                self.propagation_network.eval()
                with torch.no_grad():
                    all_embeddings += [list(
                        self.propagation_network.children()
                    )[layer](all_embeddings[layer].to(self.device), self.adj.to(self.device))]
            else:
                all_embeddings += [list(
                    self.propagation_network.children()
                )[layer](all_embeddings[layer].to(self.device), self.adj.to(self.device))]

        if evaluate:
            self.propagation_network.train()

        all_embeddings = torch.mean(torch.stack(all_embeddings, 0), dim=0)
        gu, fi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)
        return gu, fi
```



### LATent sTructure mining method for multImodal reCommEndation (LATTICE)

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang: _Mining Latent Structures for Multimedia Recommendation_. ACM Multimedia 2021: 3872-3880

<div>
  <img src="https://dl.acm.org/cms/asset/6d22a6ae-6b80-431c-9092-ef5e8a52736c/3474085.3475259.key.jpg" alt="LATTICE" width="700">
</div>

\[[**paper**](https://arxiv.org/abs/2104.09036)\]\[[**code**](https://github.com/CRIPAC-DIG/LATTICE)\]

#### Base class

Let's take a look at the base-class for LATTICE (the file is ```LATTICE.py```):

```python
class LATTICE(RecMixin, BaseRecommenderModel):
    r"""
    Mining Latent Structures for Multimedia Recommendation

    For further details, please refer to the `paper <https://dl.acm.org/doi/10.1145/3474085.3475259>`_
    """
    @init_charger
    def __init__(self, data, config, params, *args, **kwargs):

        # parameters list for LATTICE
        self._params_list = [
            ("_learning_rate", "lr", "lr", 0.0005, float, None),
            ("_factors", "factors", "factors", 64, int, None),
            ("_l_w", "l_w", "l_w", 0.01, float, None),
            ("_n_layers", "n_layers", "n_layers", 1, int, None),
            ("_n_ui_layers", "n_ui_layers", "n_ui_layers", 3, int, None),
            ("_top_k", "top_k", "top_k", 100, int, None),
            ("_factors_multimod", "factors_multimod", "factors_multimod", 64, int, None),
            ("_cf", "cf", "cf", 'lightgcn', str, None),
            ("_modalities", "modalities", "modalites", "('visual','textual')", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-")),
            ("_lambda", "l_m", "l_m", 0.1, float, None),
            ("_ws", "ws", "ws", "(64,64,64)", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-")),
            ("_dl", "dl", "dl", "(0.1,0.1,0.1)", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-")),
            ("_loaders", "loaders", "loads", "('VisualAttribute','TextualAttribute')", lambda x: list(make_tuple(x)),
             lambda x: self._batch_remove(str(x), " []").replace(",", "-"))
        ]
        self.autoset_params()

        # create the sampler for BPR
        self._sampler = Sampler(self._data.i_train_dict, self._batch_size, self._seed)

        # instantiate the adjacency matrix
        row, col = data.sp_i_train.nonzero()
        col = [c + self._num_users for c in col]
        edge_index = np.array([row, col])
        edge_index = torch.tensor(edge_index, dtype=torch.int64)
        self.adj = SparseTensor(row=torch.cat([edge_index[0], edge_index[1]], dim=0),
                                col=torch.cat([edge_index[1], edge_index[0]], dim=0),
                                sparse_sizes=(self._num_users + self._num_items,
                                              self._num_users + self._num_items))
        
        # we associate one class attribute for each modality
        for m_id, m in enumerate(self._modalities):
            self.__setattr__(f'''_side_{m}''',
                             self._data.side_information.__getattribute__(f'''{self._loaders[m_id]}'''))

        # load all multimodal features
        all_multimodal_features = []
        for m_id, m in enumerate(self._modalities):
            all_multimodal_features.append(self.__getattribute__(
                f'''_side_{self._modalities[m_id]}''').object.get_all_features())


        # instantiate the model
        self._model = LATTICEModel(
            num_users=self._num_users,
            num_items=self._num_items,
            num_layers=self._n_layers,
            num_ui_layers=self._n_ui_layers,
            learning_rate=self._learning_rate,
            embed_k=self._factors,
            embed_k_multimod=self._factors_multimod,
            l_w=self._l_w,
            modalities=self._modalities,
            l_m=self._lambda,
            top_k=self._top_k,
            multimodal_features=all_multimodal_features,
            adj=self.adj,
            cf_model=self._cf,
            weight_size=self._ws,
            dropout_list=self._dl,
            random_seed=self._seed
        )
```


#### Model class

---

**Disclaimer #1.** The current formulas may not be completely aligned with the original paper because they are obtained from the algorithm as explained in the original paper and **code**.

---

**Disclaimer #2.** The following operations are performed batch-wise. They require an initialization where $\hat{\mathbf{S}}^m_{\text{curr}}$ is calculated through the equations (1), (2), (3), and a final normalization.

---

First, we project the items' multimodal features into another latent space:

$$\tilde{\mathbf{e}}_i^{m} = \mathbf{W}_m\mathbf{e}_i^{m} + \mathbf{b}_m. \quad (1)$$

Second, we calculate the similarity matrix for the item-item projected graph, and apply kNN-sparsification:

$$\mathbf{S}_{ij}^{m} = \frac{(\tilde{\mathbf{e}}_i^{m})^\top\tilde{\mathbf{e}}_j^{m}}{||\tilde{\mathbf{e}}_i^m||\;||\tilde{\mathbf{e}}_j^m||}, \quad (2)$$
$$\tilde{\mathbf{S}}_{ij}^{m} = \begin{cases}
\mathbf{S}_{ij}^m, \quad \mathbf{S}_{ij}^m \in \text{top-}k(\mathbf{S}_i^m)\\
0 \quad \text{otherwise}.
\end{cases} \quad (3)$$

Third, we compute the weighted sum of all similarity matrices for each modality. Please note that, according to the original code, the weights are normalized through the softmax:

$$\mathbf{A}_{\text{learn}} = \underbrace{\sum_{m=0}^{|\mathcal{M}|}\underbrace{\alpha_m\tilde{\mathbf{S}}^{m}}_{(4.1a)}}_{(4.2a)}. \quad (4a)$$
$$\mathbf{A}_{\text{curr}} = \underbrace{\sum_{m=0}^{|\mathcal{M}|}\underbrace{\alpha_m\hat{\mathbf{S}}_{\text{curr}}^{m}}_{(4.1b)}}_{(4.2b)}. \quad (4b)$$

Then, the obtained adjacency matrix is normalized:

$$\hat{\mathbf{A}}_{\text{learn}} = (\mathbf{D})^{-1/2}\tilde{\mathbf{A}}_{\text{learn}}(\mathbf{D})^{-1/2}, \quad (5)$$

and we update the adjacency matrix with the previous one (from the previous iteration):

$$\mathbf{A} = \lambda\mathbf{A}_{\text{curr}} + (1 - \lambda)\hat{\mathbf{A}}_{\text{learn}}. \quad (6)$$

Finally, we obtain the final item embeddings by propagating the messages on the updated item-item similarity matrix:

$$\mathbf{h}_i^{(l)} = \sum_{j \in \mathcal{N}_i}\mathbf{A}_{ij}\mathbf{h}_j^{(l-1)}, \quad (7)$$

where $\mathbf{h}$ is the item collaborative embedding.

Let's take a look at the model-class for LATTICE (the file is ```LATTICEModel.py```):


```python
class LATTICEModel(torch.nn.Module, ABC):
      def __init__(self,
                 num_users,
                 num_items,
                 num_layers,
                 num_ui_layers,
                 learning_rate,
                 embed_k,
                 embed_k_multimod,
                 l_w,
                 modalities,
                 l_m,
                 top_k,
                 multimodal_features,
                 adj,
                 cf_model,
                 weight_size,
                 dropout_list,
                 random_seed,
                 name="LATTICE",
                 **kwargs
                 ):
        super().__init__()

        # set all seeds and deterministic behaviour
        random.seed(random_seed)
        np.random.seed(random_seed)
        torch.manual_seed(random_seed)
        torch.cuda.manual_seed(random_seed)
        torch.cuda.manual_seed_all(random_seed)
        torch.backends.cudnn.deterministic = True
        torch.use_deterministic_algorithms(True)

        # set device
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

        # set all model's parameters
        self.num_users = num_users
        self.num_items = num_items
        self.embed_k = embed_k
        self.learning_rate = learning_rate
        self.l_w = l_w
        self.n_ui_layers = num_ui_layers
        self.cf_model = cf_model
        self.adj = adj
        self.weight_size = weight_size
        self.weight_size = [self.embed_k] + self.weight_size

        # multimodal attributes
        self.embed_k_multimod = embed_k_multimod
        self.modalities = modalities
        self.l_m = l_m
        self.n_layers = num_layers
        self.top_k = top_k

        # create collaborative node embeddings
        self.Gu = torch.nn.Embedding(self.num_users, self.embed_k)
        self.Gi = torch.nn.Embedding(self.num_items, self.embed_k)
        torch.nn.init.xavier_uniform_(self.Gu.weight)
        torch.nn.init.xavier_uniform_(self.Gi.weight)
        self.Gu.to(self.device)
        self.Gi.to(self.device)

        # multimodal features and networks
        self.Gim = torch.nn.ParameterDict()
        self.Sim = dict()
        self.Si = None
        self.projection_m = torch.nn.ModuleDict()
        self.importance_weights_m = torch.nn.Parameter(
            torch.tensor(len(self.modalities) * [float(1 / len(self.modalities))]))
        self.importance_weights_m.to(self.device)
        self.multimodal_features_shapes = [mf.shape[1] for mf in multimodal_features]
        ir = torch.tensor(list(range(self.num_items)), dtype=torch.int64, device=self.device)
        self.items_rows = torch.repeat_interleave(ir, self.top_k).to(self.device)
        for m_id, m in enumerate(modalities):
            self.Gim[m] = torch.nn.Embedding.from_pretrained(
                torch.tensor(multimodal_features[m_id], dtype=torch.float32, device=self.device),
                freeze=False).weight
            self.Gim[m].to(self.device)
            current_sim = self.build_sim(self.Gim[m].detach())
            weighted_adj = self.build_knn_neighbourhood(current_sim, self.top_k)
            self.Sim[m] = self.compute_normalized_laplacian(weighted_adj, 0.5)
            self.Sim[m].to(self.device)
            self.projection_m[m] = torch.nn.Linear(in_features=self.multimodal_features_shapes[m_id],
                                                   out_features=self.embed_k_multimod)
            self.projection_m[m].to(self.device)

        # item-item multimodal GNN
        propagation_network_list = []
        for layer in range(self.n_layers):
            propagation_network_list.append((LGConv(normalize=False), 'x, edge_index -> x'))
        self.propagation_network = torch_geometric.nn.Sequential('x, edge_index', propagation_network_list)
        self.propagation_network.to(self.device)

        # backbone
        if cf_model == 'ngcf':
            propagation_network_list = []
            self.dropout_layers = []
            for layer in range(self.n_ui_layers):
                propagation_network_list.append((NGCFLayer(self.weight_size[layer],
                                                           self.weight_size[layer + 1]), 'x, edge_index -> x'))
                self.dropout_layers.append(torch.nn.Dropout(p=dropout_list[layer]))
            self.propagation_network_recommend = torch_geometric.nn.Sequential('x, edge_index', propagation_network_list)
            self.propagation_network_recommend.to(self.device)

        elif cf_model == 'lightgcn':
            propagation_network_list = []
            for layer in range(self.n_ui_layers):
                propagation_network_list.append((LGConv(normalize=False), 'x, edge_index -> x'))
            self.propagation_network_recommend = torch_geometric.nn.Sequential('x, edge_index',
                                                                               propagation_network_list)
            self.propagation_network_recommend.to(self.device)

        # instantiate optimizer
        self.optimizer = torch.optim.Adam(self.parameters(), lr=self.learning_rate)
        self.lr_scheduler = self.set_lr_scheduler()

    # method to build the item-item multimodal similarity matrix
    @staticmethod
    def build_sim(context):
        context_norm = context.div(torch.norm(context, p=2, dim=-1, keepdim=True))
        sim = torch.mm(context_norm, context_norm.transpose(1, 0))
        return sim

    # method to perform knn-sparsification
    def build_knn_neighbourhood(self, adj, topk):
        knn_val, knn_ind = torch.topk(adj, topk, dim=-1)
        items_cols = torch.flatten(knn_ind).to(self.device)
        values = torch.flatten(knn_val).to(self.device)
        weighted_adj = SparseTensor(row=self.items_rows,
                                    col=items_cols,
                                    value=values,
                                    sparse_sizes=(self.num_items, self.num_items))
        return weighted_adj

    # method to build the lr_scheduler
    def set_lr_scheduler(self):
      scheduler = torch.optim.lr_scheduler.LambdaLR(self.optimizer, lr_lambda=lambda epoch: 0.96 ** (epoch / 50))
      return scheduler

    # train step batch-by-batch
    def train_step(self, batch, adj):

        # run message-passing on the whole GNN
        gum, gim = self.propagate_embeddings(build_item_graph)

        # run MF
        user, pos, neg = batch
        xu_pos, gamma_u_m, gamma_i_pos_m = self.forward(inputs=(gum[user], gim[pos]))
        xu_neg, _, gamma_i_neg_m = self.forward(inputs=(gum[user], gim[neg]))

        # compute loss
        loss = -torch.mean(torch.nn.functional.logsigmoid(xu_pos - xu_neg))
        reg_loss = self.l_w * (1 / 2) * ((gamma_u_m**2).sum() + (gamma_i_pos_m**2).sum() + (gamma_i_neg_m**2).sum()) / len(user)
        loss += reg_loss

        # backward propagation
        self.optimizer.zero_grad()
        loss.backward()
        self.optimizer.step()

        # return batch loss
        return loss.detach().cpu().numpy()
    
    # method to run message-passing
    def propagate_embeddings(self, build_item_graph=False):

        if build_item_graph:
            weights = torch.cat([torch.unsqueeze(w, 0) for w in self.importance_weights_m], dim=0)
            softmax_weights = self.softmax(weights)
            learned_adj_addendum = []
            original_adj_addendum = []
            for m_id, m in enumerate(self.modalities):
                
                # eq. (1)
                projected_m = self.projection_m[m](self.Gim[m].to(self.device))

                # eq. (2)
                current_sim = self.build_sim(projected_m)

                # eq. (3)
                weighted_adj = self.build_knn_neighbourhood(current_sim, self.top_k)

                # eq. (4.1a)
                learned_adj_addendum.append(mul_nnz(weighted_adj,
                                                    softmax_weights[m_id].repeat((weighted_adj.nnz(),)).to(
                                                        self.device),
                                                    layout='coo'))
                
                # eq. (4.1b)
                original_adj_addendum.append(mul_nnz(self.Sim[m],
                                                     softmax_weights[m_id].repeat((self.Sim[m].nnz(),)).to(
                                                         self.device),
                                                     layout='coo'))
            learned_adj = learned_adj_addendum[0]

            # eq. (4.2a)
            for i in range(1, len(learned_adj_addendum)):
                learned_adj = add(learned_adj, learned_adj_addendum[i])

            # eq. (5) from the original paper
            learned_adj = self.compute_normalized_laplacian(learned_adj, 0.5)
            original_adj = original_adj_addendum[0]

            # eq. (4.2b) from the original paper
            for i in range(1, len(original_adj_addendum)):
                original_adj = add(original_adj, original_adj_addendum[i])

            # eq. (6)
            first = mul_nnz(learned_adj, torch.tensor([1 - self.l_m]).repeat((learned_adj.nnz(),)).to(self.device),
                            layout='coo')
            second = mul_nnz(original_adj, torch.tensor([self.l_m]).repeat((original_adj.nnz(),)).to(self.device),
                             layout='coo')
            self.Si = add(first, second)
        else:
            self.Si = self.Si.detach()

        # eq. (7)
        item_embedding = self.Gi.weight
        for layer in range(self.n_layers):
            item_embedding = list(self.propagation_network.children())[layer](item_embedding.to(self.device),
                                                                              self.Si)

        ego_embeddings = torch.cat((self.Gu.weight.to(self.device), self.Gi.weight.to(self.device)), 0)
        all_embeddings = [ego_embeddings]

        if self.cf_model == 'ngcf':
            embedding_idx = 0
            for layer in range(self.n_ui_layers):
                all_embeddings += [torch.nn.functional.normalize(self.dropout_layers[embedding_idx](list(
                    self.propagation_network_recommend.children()
                )[layer](all_embeddings[embedding_idx].to(self.device), self.adj.to(self.device))), p=2, dim=1)]
                embedding_idx += 1
            all_embeddings = torch.stack(all_embeddings, dim=1)
            all_embeddings = all_embeddings.mean(dim=1, keepdim=False)
            gu, gi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)
            return gu, gi + torch.nn.functional.normalize(item_embedding.to(self.device), p=2, dim=1)

        elif self.cf_model == 'lightgcn':
            for layer in range(self.n_ui_layers):
                all_embeddings += [torch.nn.functional.normalize(list(
                    self.propagation_network_recommend.children()
                )[layer](all_embeddings[layer].to(self.device), self.adj.to(self.device)), p=2, dim=1)]
            all_embeddings = torch.stack(all_embeddings, dim=1)
            all_embeddings = all_embeddings.mean(dim=1, keepdim=False)
            gu, gi = torch.split(all_embeddings, [self.num_users, self.num_items], 0)
            return gu, gi + torch.nn.functional.normalize(item_embedding.to(self.device), p=2, dim=1)

        elif self.cf_model == 'mf':
            return self.Gu.weight, self.Gi.weight + torch.nn.functional.normalize(item_embedding.to(self.device), p=2, dim=1)

    # method to run MF
    def forward(self, inputs, **kwargs):
        gum, gim = inputs
        gamma_u_m = torch.squeeze(gum).to(self.device)
        gamma_i_m = torch.squeeze(gim).to(self.device)

        xui = torch.sum(gamma_u_m * gamma_i_m, 1)

        return xui, gamma_u_m, gamma_i_m
```



## Cite us

This notebook is heavily dependent on [Elliot](https://github.com/sisinflab/elliot), our framework for rigorous and reproducible recommender systems evaluation. If you use this in your works, please consider to cite us:



```
@inproceedings{DBLP:conf/sigir/AnelliBFMMPDN21,
  author       = {Vito Walter Anelli and
                  Alejandro Bellog{\'{\i}}n and
                  Antonio Ferrara and
                  Daniele Malitesta and
                  Felice Antonio Merra and
                  Claudio Pomo and
                  Francesco Maria Donini and
                  Tommaso Di Noia},
  title        = {Elliot: {A} Comprehensive and Rigorous Framework for Reproducible
                  Recommender Systems Evaluation},
  booktitle    = {{SIGIR}},
  pages        = {2405--2414},
  publisher    = {{ACM}},
  year         = {2021}
}
```



```
@article{DBLP:journals/corr/abs-2309-05273,
  author       = {Daniele Malitesta and
                  Giandomenico Cornacchia and
                  Claudio Pomo and
                  Felice Antonio Merra and
                  Tommaso Di Noia and
                  Eugenio Di Sciascio},
  title        = {Formalizing Multimedia Recommendation through Multimodal Deep Learning},
  journal      = {CoRR},
  volume       = {abs/2309.05273},
  year         = {2023}
}
```


```
@inproceedings{10.1145/3606040.3617441,
  author       = {Malitesta, Daniele and Cornacchia, Giandomenico and Pomo, Claudio and Di Noia, Tommaso},
  title        = {On Popularity Bias of Multimodal-Aware Recommender Systems: A Modalities-Driven Analysis},
  year        = {2023},
  isbn        = {9798400702716},
  publisher   = {Association for Computing Machinery},
  address     = {New York, NY, USA},
  url         = {https://doi.org/10.1145/3606040.3617441},
  doi         = {10.1145/3606040.3617441},
  booktitle   = {Proceedings of the 1st International Workshop on Deep Multimodal Learning for Information Retrieval},
  pages       = {59–68},
  numpages    = {10},
  series = {MMIR '23}
}
```


```
@inproceedings{DBLP:conf/kdd/MalitestaCPN23,
  author       = {Daniele Malitesta and
                  Giandomenico Cornacchia and
                  Claudio Pomo and
                  Tommaso Di Noia},
  title        = {Disentangling the Performance Puzzle of Multimodal-aware Recommender
                  Systems},
  booktitle    = {EvalRS@KDD},
  series       = {{CEUR} Workshop Proceedings},
  volume       = {3450},
  publisher    = {CEUR-WS.org},
  year         = {2023}
}
```