# **Hands-on Part 3/4: Language Models for Path Reasoning**

---

![](https://drive.google.com/uc?id=1k0--CkCfmWKYyYZ9_z4bJIq0ugb-fWDB)

## **Acknowledgment**

---

The code use in this tutorial directly derive from our [PEARLM Library](https://github.com/Chris1nexus/pearlm). If this tutorial is useful for your research, we would appreciate an acknowledgment by citing our paper:

> Balloccu, G., Boratto, L., Cancedda, C., Fenu, G., & Marras, M. (2023). Faithful Path Language Modelling for Explainable Recommendation over Knowledge Graph. ArXiv, abs/2310.16452.



## **Get Started**
---


### This notebook

By now you should already have the Tutorial folder in your google drive. You just need to mount your drive executing the following line.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


And browse the working directory.

In [4]:
# Your path to hands on
%cd '/content/drive/MyDrive/ExpRecSys Tutorial Series/2024 ECIR/Hands-On'

/content/drive/MyDrive/ExpRecSys Tutorial Series/2024 ECIR/Hands-On


If you followed Part1, you are ready! 🤘 You can skip the next lines.

### Instead, if you joined late

Open the google drive folder [https://tinyurl.com/ecir2024-tutorial1](https://tinyurl.com/ecir2024-tutorial1) containing the material and follow the instrucions inside `GetStarted.ipynb`

# Outline

---





- [ 0 - Packages](#0)
- [ 1 - Prerequisites](#1)
- [ 2 - Path Sampling](#2)
    - [ 2 - Tokenized datasets creation](#2.1)
- [ 3 - PLM pipeline](#3)
    - [ 3 - Train](#3.1)
    - [ 3 - Evaluate](#3.2)
    - [ 3 - Textual explanations](#3.3)
- [ 4 - PEARLM pipeline](#4)
    - [ 3 - Train](#4.1)
    - [ 3 - Evaluate](#4.2)
    - [ 3 - Textual explanations](#4.3)

In the **previous part** we:

1️⃣ Mapped our dataset in standard format to a **PGPR readable format** and **CAFE readable format**.

2️⃣ Trained the **TransE embedding** [[11]](#p11) used by both PGPR [[10]](#p10) and CAFE [[12]](#p12).

3️⃣ **Trained and extracted the predicted paths** from the models.

4️⃣ **Evaluated the models** and converted their path into **texual explanations via templates** [[33]](#p33).

In **this part**, you will learn about how the **causal language modelling** is used for **path reasoning** by PLM [[37]](#p37) and PEARLM [[38]](#p38) pipelines. This will include sample paths from the KG, converting them to tokenized sequences, **train** the models using them and **produce** **recommendations** and **explanation paths**.

In this part, we will:

1️⃣ Sample paths from an existing knowledge graph which will be used as **training data for PLM and PEARLM**.

2️⃣ Train the **PLM** [[37]](#p37) and **PEARLM [[38]](#p38)** models and use their decoding to generate paths.

3️⃣ **Evaluate the models** and convert their path into **texual explanations via templates** [[33]](#p33).

4️⃣ **Measure the hallucination phenomena** [[38]](#p38) in PLM and see how PEARLM's constraint decoding solves it.

<a name="0"></a>

## 0 - Packages

---

In [None]:
!pip install . #Takes around 1 min
!pip install datasets

Processing /content/drive/MyDrive/ExpRecSys Tutorial Series/2024 ECIR/Hands-On
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pathlm
  Building wheel for pathlm (setup.py) ... [?25l[?25hdone
  Created wheel for pathlm: filename=pathlm-0.0.0-py3-none-any.whl size=201903 sha256=b316365e4d031dea72319f2ae04f0c160d58519fbc53424d01a12ea29ca9f130
  Stored in directory: /root/.cache/pip/wheels/5d/d6/fa/857159ee5e51c820ba3cb52d5f60b61adc5ab379954711150d
Successfully built pathlm
Installing collected packages: pathlm
  Attempting uninstall: pathlm
    Found existing installation: pathlm 0.0.0
    Uninstalling pathlm-0.0.0:
      Successfully uninstalled pathlm-0.0.0
Successfully installed pathlm-0.0.0


In [15]:
import gzip
import pickle
import random

<a name="1"></a>

## 1 - Prerequisites

---


🔍 Before diving into the core concepts, it's essential to get familiar with some foundational blocks. Here's a brief overview of what you need to know:


### 1. - Transformer Architecture Fundamentals 🛠

The transformer architecture represents a paradigm shift in natural language processing (NLP), setting new standards for a range of tasks from translation to text generation. Central to the transformer's success is its novel attention mechanism. Unlike previous models that processed inputs sequentially, transformers employ attention to simultaneously assess the relevance of all parts of the input data. This enables the model to dynamically prioritize which parts of the input to focus on, dramatically improving its ability to understand the nuanced interplay of context and meaning in language.

![](https://drive.google.com/uc?id=1wKt8Kevq_XRX9fVzNxBRcZYCFdQMT3Qy)

*Image source: [https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html](https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html)*

**Key Concepts:**
- **Attention Mechanism:** The core of the transformer model, enabling dynamic focus on different segments of input data, enhancing the model's interpretability and performance on complex tasks.
- **Self-Attention:** Allows each input component to be contextualized in relation to the whole input, significantly enriching the model's understanding of internal relationships within the data.
- **Positional Encoding:** Injects information about the sequence order of the input data, compensating for the transformer's inherently order-agnostic design and ensuring sensitivity to the sequence dynamics of language.

### 2. Causal Language Modeling (CLM) 📚

Causal Language Modeling (CLM) is implemented as a decoder only architecture and lies at the heart of generative language tasks, teaching models to predict subsequent tokens based on preceding context in a manner analogous to human language processing.

![](https://drive.google.com/uc?id=1RAC0LkRRq-4yCS2dheDBqX86Ogm1OseN)

*Image source: [https://en.rattibha.com/thread/1640446114519474176](https://en.rattibha.com/thread/1640446114519474176)*

Unlike models that treat language as a bag of words or follow strict sequential processing, CLM approaches language with an inherently sequential perspective. This paradigm leverages the temporal nature of language, where the meaning and likelihood of a word are heavily influenced by the words that come before it.

**Foundational Principles:**
- **Sequential Prediction:** The model is trained to anticipate the next word in a sequence, given all the previous words, mirroring the forward-looking nature of human language comprehension and generation.
- **Contextual Awareness:** Through the sequential prediction task, the model develops a nuanced understanding of how context shapes meaning, enabling more accurate and coherent text generation.
- **Supervised Learning Framework:** The training process involves presenting the model with sequences of tokens, where the target for each input sequence is the same sequence shifted by one position to the right. This framework not only teaches the model about language structure but also about the probabilistic nature of language, where multiple continuations can be valid for any given context.

Incorporating these preliminary concepts is crucial for delving into the advanced topic of path language modeling for explainable recommendations. By understanding transformers and CLM, participants will be better equipped to grasp how these models can be adapted and extended to provide not just effective, but also interpretable, recommendations.

![](https://drive.google.com/uc?id=16Gn9XYKEOv1duRnRbNbxZRdMalpGkYWV)




### 3. Tokenizers: The Gateway to Language Understanding 🗝

Tokenizers are foundational to the field of natural language processing (NLP), serving as the bridge between the nuanced, variable world of human language and the structured, numerical realm of machine learning models. By decomposing text into manageable units called tokens and translating these tokens into numerical identifiers, tokenizers effectively prepare raw text for deep learning algorithms. The design and choice of tokenizer can significantly impact the performance and capabilities of an NLP model, making an understanding of different tokenization strategies crucial.

**Key Points:**

- **Tokenization:** This critical preprocessing step involves breaking down complex text into simpler units (tokens) that could be words, characters, or subwords. The nature of these tokens fundamentally influences how a model perceives and processes language data.

- **Vocabulary Management:** The tokenizer constructs a vocabulary, an exhaustive list of unique tokens it recognizes. This vocabulary is the model's linguistic repertoire, determining which tokens can be directly processed and how unseen tokens are handled.


Before the advent of sophisticated tokenization methods, **character-level** and **word-level tokenization** were common. Character-level tokenization offers granularity and a solution to the out-of-vocabulary issue but at the cost of increased sequence length and complexity. Word-level tokenization, on the other hand, is intuitive and aligns closely with human language processing but struggles with languages rich in morphology and unseen words.

**Word-Level Tokenizer:** This approach is straightforward, mapping entire words to numerical identifiers based on a predetermined vocabulary. While intuitive, its main challenge lies in handling out-of-vocabulary (OOV) words, not present in the tokenizer's initial vocabulary list, which can limit its effectiveness.

![](https://drive.google.com/uc?id=1xWUhD13bTTLU6l8ic7FOQlnJQ5pgDcdr)

*Image source: [https://towardsdatascience.com/byte-pair-encoding-for-beginners-708d4472c0c7](https://towardsdatascience.com/byte-pair-encoding-for-beginners-708d4472c0c7)*

To address the limitations of these earlier methods, modern tokenizers frequently utilize **subword tokenization**, a technique that enhances a model's ability to manage diverse linguistic phenomena by splitting unknown or rare words into smaller, recognizable units within the model's vocabulary. The most prominent example of subword tokenization is the Byte Pair Encoding (BPE).

**Byte Pair Encoding (BPE):** Originally a data compression algorithm, BPE iteratively merges the most frequent pairs of bytes or characters in a text corpus until it achieves a predefined vocabulary size. Adapted for NLP, BPE merges characters or sequences of characters to form frequently occurring subwords. This approach allows the model to efficiently process common word parts and decode rare or novel words from these components.

![](https://drive.google.com/uc?id=1dHmWdKD71XOgMX0dfHcj2ZYu2rpk7a5h)

*Image source: [https://medium.com/illuminations-mirror/on-tokenization-in-llms-34309273f238](https://medium.com/illuminations-mirror/on-tokenization-in-llms-34309273f238)*


#### The Special Case of Path Language Modeling

In the context of path language modeling for explainable recommendations, the choice of tokenizer is crucial, given the specific challenges:

1. **No Benefit in Splitting:** For path tokens representing entities and relations (e.g., E202 R1 P2001), splitting into subwords offers no benefit. These tokens derive their meaning from their entirety, not from constituent parts. Splitting them could lead to a loss of semantic information, detrimental to the model's understanding.
   
2. **Loss of Boundary Information:** BPE and similar subword tokenization methods risk losing essential boundary information between entities and relations. This boundary clarity is crucial for accurately interpreting and navigating paths within the knowledge graph.

#### Embracing Word-Level Tokenization

Considering these unique requirements, **Word-Level Tokenization** stands out as the preferred method for path language modeling:

- **Integrity Preservation:** By treating each entity, relation, and path identifier as an indivisible token, it ensures the full meaning and specificity encoded within each token are retained. This is crucial for models that depend on precise token interpretations to generate meaningful recommendations and explanations.

- **Simplified Vocabulary Management:** While potentially larger, the vocabulary under this approach facilitates straightforward and unambiguous token recognition, avoiding the complexities tied to subtoken recombination.

- **Optimized Model Performance:** In path language modeling, where the accurate representation of user-centric paths is key, Word-Level Tokenization directly supports improved model performance, enabling the generation of coherent and relevant paths for effective recommendations and explanations.

With this foundational knowledge, you're well-equipped to delve into the complexities of path reasoning through causal language modeling. Let's embark on this journey together! 🚀


### 4. - Causal Language Modeling for Path Reasoning

Causal Language Modeling (CLM) intricately mirrors human language generation, where sentences are crafted word by word. In human language, CLMs consider words as the building blocks of communication, weaving them into coherent and meaningful sentences. Similarly, in path reasoning, entities (E), products (P), users (U), and relationships (R) within a knowledge graph act as the 'words' of our narrative. These elements form paths that narrate stories about user preferences, item relationships, and the intricate web of connections that define the recommendation landscape.

#### Path Sampling: Crafting User-Centric Narratives

The initial step in preparing our data for causal language modeling and path reasoning involves a meticulous path sampling process from the knowledge graph. This stage is crucial for extracting meaningful, user-centric paths that highlight the user's interactions and preferences.

**Focusing on User-Centric Paths**

Path sampling is deliberately designed to capture sequences that begin with a user and unfold through their interactions. For example, a path might start with "U20 R-1 P20," indicating that user U20 has interacted with product P20. The objective is to extend these paths to include other related products or entities, revealing the intricate patterns and connections that map out the user’s engagement within the graph.

**Illustrative Example**

Consider a path like "U20 R-1 P20 R2 E5 R2 P45," where the user's interaction with P20 leads us through a related entity E5 to another product P45 (also interated). This path not only traces the user's direct interactions but also the relational context that influences these interactions, enriching our understanding of their preferences.


![](https://drive.google.com/uc?id=1KIOKHz85eDgxle-HTsoe92Q8OXYvFBTd)


#### Path Tokenization and Training for Reasoning
Once paths are sampled, we tokenize them using specific prefixes (E for entities, P for products, U for users, and R for relationships), creating a vocabulary that mirrors the knowledge graph’s structure. For example, "U20" denotes user 20. This tokenization allows our model to differentiate between node and edge types, ensuring accurate path interpretation. During training, the model learns to reconstruct and reason with these tokenized paths, predicting subsequent nodes or edges by analyzing the context of preceding elements. This process equips the model with the essential ability to explore the knowledge graph, unveiling patterns and connections critical for generating personalized recommendations.

#### Leveraging Path Reasoning for Recommendations and Explanations

**Path Prediction:** Upon receiving a prompt that begins with a user and their interaction—say, "U2 R-1"—our model is tasked with generating a set of potential paths forward. This is not a mere extrapolation of the most likely next step but a comprehensive prediction of a series of connected actions and relationships, encapsulating a wide range of possible user journeys.

**Ranking and Selection:** The generated paths are then ranked based on their cumulative probability, a measure that reflects the model's confidence in each path's relevance and accuracy in representing potential user behavior. This ranking process is pivotal, as it sifts through the multitude of possibilities to highlight the paths most significant and likely to resonate with the user's preferences and past interactions.

**Recommendations and Explanations**: From the ranked list, we select the top 10 unique paths (in terms of item reached), ensuring diversity in the recommendations by focusing on the uniqueness of the terminal item in each sequence.

These paths like the previous seen path reasoning models serve a dual purpose:

- **Explanations**: The sequence of interactions leading up to the final product in each path provides a narrative explanation for the recommendation.
- **Recommendations**: The last item of each unique path represents a recommended product.


<a name="2"></a>

## 2 - Path Sampling

---

In this part of the tutorial, you can chose to proceed with `ml1m` or `lfm1m`. We will use `ml1m` but you are free to chose. All the pre-trained models for these datasets are available, while for the `cellphones` dataset we will release it soon, in the [offical github tutorial repository](https://github.com/explainablerecsys/ecir24) to reduce the size of this tutorial folder.

In [18]:
dataset_name = "ml1m"

To use **Causal Language Models (CLM) for path reasoning** the first step is to sample **user-centric paths** from the knowledge graph. This paths will be tokenized and used as training sequences by our CLM based path reasoning models.

The sampled paths will start from a user, connect him to a seen product through its interaction in the train and bring to another seen products. This will allow the data to have patterns between the items interacted by each user.

To perform the sampling we can employ our `create_dataset.sh` script giving as positional parameters:
1. `dataset`: the dataset we want to sample for `{ml1m, lfm1m}`
2. `sample_size`: represent the amount of paths sampled for each user
3. `n_hop`: represent the fixed hop size for the paths sampled
4. `n_proc`: number of processors to employ for multiprocessing operations


In [5]:
sample_size = 250
n_hop = 3
n_proc = 2

⚠️ We already have the **sampled paths for all datasets**. So you **don't need** to run this command now. The `paths_end-to-end_250_3.txt` file will be store in `data/<dataset>/path_random_walk/`

⏲️ Estimate time: 20m with `ML1M`

In [None]:
! bash create_dataset.sh {dataset_name} {sample_size} {n_hop} {n_proc}

This code will create the dataset into `data/<dataset>/path_random_walk/paths_end-to-end_<sample_size>_<n_hop>.txt`

In [20]:
! ls data/{dataset_name}/paths_random_walk

paths_end-to-end_250_3.txt


Let's see how the paths look like

In [21]:
! head -10 data/{dataset_name}/paths_random_walk/paths_end-to-end_250_3.txt

U5683 R-1 P2224 R-1 U423 R-1 P2386
U5992 R-1 P2133 R8 E3960 R8 P867
U2255 R-1 P601 R3 E3912 R3 P2712
U2286 R-1 P2619 R8 E4994 R8 P810
U1895 R-1 P226 R9 E3495 R9 P1297
U4887 R-1 P1625 R-1 U2409 R-1 P722
U4423 R-1 P226 R-1 U3617 R-1 P2484
U5906 R-1 P1504 R10 E3702 R10 P871
U2800 R-1 P1242 R10 E8447 R10 P1847
U12 R-1 P2405 R1 E11680 R1 P1289


<a name="2.1"></a>

## 2.1 - Tokenized datasets creation

---

Training PLM or PEARLM requires the learning of a Whitespace tokenizer that possess as vocubary all the entities and relations token. Additionally we need to tokenize the sampled path using this learned tokenizer.

To do this we will execute `pathlm/models/lm/tokenize_dataset.py`. This will create our tokenzier that will be stored in `tokenizers/<dataset_name>/WordLevel.json` and our tokenized dataset as hugginface dataset object in `data/<dataset_name>/WordLevel/end-to-end_{sample_size}_{n_hop}_tokenized_dataset.hf[link text](https://)`

In [None]:
! python pathlm/models/lm/tokenize_dataset.py --data {dataset_name} --sample_size {sample_size} --n_hop {n_hop} --nproc {n_proc}

<a name="3"></a>

# 3 - PLM pipeline

---

*Note: The authors of the paper haven't release the code for this model, our unofficial version is available in our [PEARLM repository]()*

PLM [[37]](#p37) (Path Language Model for Explainable Recommendation) is the first attempt of employing causal language models for path reasoning for explainable recommendation.

Specifically the model uses a **generic CLM architecture** (e.g. distilgpt2) extended with an additional embedding layer named **type embeddings**. Type embeddings hold the meaning of the `i`-th token and can assume value `0` when the token is a special token like `[BOS]` or `[EOS]`, `1` for an entity or `2` for a relation. This additional embedding layer can support the model in learning the path structures.

This model also has an architecture composed by **two heads** one learned for predicting relations and one for entities. These heads are used alternatively during the decoding.

![](https://drive.google.com/uc?id=1w_GwOaNPNsITfSTEX-LgkiGiQuqTsE88)


<a name="3.1"></a>
## 3.1 Train PLM

---

The  hyperparameter list is reported as follow:
- `--num_epochs`: Max number of epochs.
- `--model`: The base huggingface model from where eredit the architecture one from `{distilgpt2, gpt2, gpt2-large}`
- `--batch_size`: Batch size.
- `--sample_size`: Dataset sample size (to dermine which dataset to use)
- `--n_hop`: Dataset hop size (to dermine which dataset to use)
- `--logit_processor_type`: Decoding strategy `gcd` PEARLM, empty for PLM
- `--n_seq_infer`: Number of sequences generated for each user should be `> k`

⚠️ We have already the **precomputed PLM for all datasets**. So you **don't need** to run this command now.

In [None]:
! python pathlm/models/lm/plm_main.py --data {dataset_name} --sample_size {sample_size} --n_hop {n_hop} --nproc {n_proc}

Again like with previous models the train will save during evaluation the topks and topks paths in `results` and the final model in `weights` under the name of `end-to-end@<dataset_name>@plm-rec@<model>@<sample_size>@<n_hop>@<logit_processor_type>`

<a name="3.2"></a>
## 3.2 Evaluate PLM

---

Let's now load the paths from `results/`

In [None]:
model_base = 'distilgpt2'
logit_constraint = ''
curr_model = f'end-to-end@{dataset_name}@plm-rec@{model_base}@{sample_size}@{n_hop}@{logit_constraint}'
plm_results_path = f"results/{dataset_name}/{curr_model}"

This paths are sorted by path score to produce the final topk of predicted item stored in `results/<dataset>/<curr_model>/top{k}_items.pkl`

In [None]:
with open(f"{plm_results_path}/top10_items.pkl", 'rb') as pred_top_items_file:
    plm_item_topks = pickle.load(pred_top_items_file)
pred_top_items_file.close()

In [None]:
list(plm_item_topks.items())[:5]

[(15, [1767, 2238, 2386, 1274, 2585, 525, 322, 1841, 566, 2405]),
 (49, [1892, 1438, 2435, 2151, 827, 1068, 2785, 1110, 1372, 1744]),
 (13, [2011, 2200, 2392, 1274, 960, 525, 599, 2238, 451, 2009]),
 (45, [966, 388, 370, 2551, 2016, 591, 2254, 1219, 649, 2399]),
 (27, [2102, 867, 1462, 355, 2406, 1841, 11, 2196, 2391, 2235])]

And the associated explanations store in `results/<dataset>/<curr_model>/top{k}_paths.pkl`

In [None]:
with open(f"{plm_results_path}/top10_paths.pkl", 'rb') as pred_top_paths_file:
    plm_path_topks = pickle.load(pred_top_paths_file)
pred_top_paths_file.close()

In [None]:
list(plm_path_topks.items())[0]

(15,
 [['[BOS]', 'U15', 'R-1', 'P1113', 'R5', 'E7174', 'R0', 'P1767'],
  ['[BOS]', 'U15', 'R-1', 'P599', 'R9', 'E7629', 'R10', 'P2238'],
  ['[BOS]', 'U15', 'R-1', 'P1274', 'R4', 'E12287', 'R7', 'P2386'],
  ['[BOS]', 'U15', 'R-1', 'P1274', 'R4', 'E12287', 'R2', 'P1274'],
  ['[BOS]', 'U15', 'R-1', 'P599', 'R9', 'E7629', 'R10', 'P2585'],
  ['[BOS]', 'U15', 'R-1', 'P2758', 'R9', 'E5831', 'R5', 'P525'],
  ['[BOS]', 'U15', 'R-1', 'P2758', 'R7', 'E7248', 'R5', 'P322'],
  ['[BOS]', 'U15', 'R-1', 'P1113', 'R2', 'E11914', 'R9', 'P1841'],
  ['[BOS]', 'U15', 'R-1', 'P1274', 'R4', 'E12287', 'R2', 'P566'],
  ['[BOS]', 'U15', 'R-1', 'P1113', 'R2', 'E11914', 'R9', 'P2405']])

In [None]:
command = f'python pathlm/evaluation/evaluate_results.py --dataset {dataset_name} --model plm-rec@{model_base} --k 10'
!$command

Evaluating rec quality for ['ndcg', 'mrr', 'precision', 'recall', 'serendipity', 'diversity', 'novelty']: 100% 6040/6040 [00:00<00:00, 11434.03it/s]
Number of users: 6040, average topk size: 10.00
ndcg: 0.27, mrr: 0.21, precision: 0.1, recall: 0.04, serendipity: 0.8, diversity: 0.84, novelty: 0.93, coverage: 0.29


<a name="3.3"></a>
## 3.3 Textual Explanation generation

---

As done before with PGPR and CAFE to convert the explanation path to textual we need to remap the entities and relations to their names. To do so let's import `get_eid_to_name` and `get_rid_to_name` from `pathlm.datasets.data_utils`.

In [None]:
from pathlm.datasets.data_utils import get_eid_to_name
eid2name = get_eid_to_name(dataset_name)
eid2name['0']

'The Phantom of the Opera (1925 film)'

In [None]:
from pathlm.datasets.data_utils import get_rid_to_name
rid2name = get_rid_to_name(dataset_name)
rid2name['0']

'cinematography_by_cinematographer'

Let's now create the template function to handle the paths in the form `['[BOS]', 'U15', 'R-1', 'P1113', 'R5', 'E7174', 'R0', 'P1767']`

In [None]:
def template(path):
    if path[0] == "[BOS]":
        path = path[1:]
    for i in range(len(path)):
        s = str(path[i])[1:]
        if i % 2 == 0: #Entity
            path[i] = eid2name[s]
        else: #Relation
            if s == "-1":
                path[i] = 'watched'
                continue
            path[i] = rid2name[s]
    u, r, pi, r1, e1, r2, rp  = path
    if e1 == 'user':
        return f"You may be interested in {rp} because you {r} {pi} also {r2} by another user"
    else:
        return f"You may be interested in {rp} because you {r} {pi} also {r2} by {e1}"

Let's convert the explanation paths for a random user to textual explanation

In [None]:
import collections
plm_textual_exps = collections.defaultdict(list)
random_user = random.randint(0, len(plm_path_topks.keys()))
for i, pid_exp_tuple in enumerate(plm_path_topks[random_user]):
    exp = pid_exp_tuple
    pid = pid_exp_tuple[-1][-1]
    plm_textual_exps[random_user].append([pid, template(exp)])

In [None]:
plm_textual_exps[random_user]

[['1',
  'You may be interested in Gladiator (2000 film) because you watched The Perfect Storm (film) also produced_by_prodcompany by United States'],
 ['5',
  'You may be interested in Erin Brockovich (film) because you watched X-Men (film) also composed_by_composer by Newton Thomas Sigel'],
 ['7',
  'You may be interested in Shanghai Noon because you watched The Patriot (2000 film) also belong_to_category by David Brenner (editor)'],
 ['8',
  'You may be interested in Magnolia (film) because you watched The Patriot (2000 film) also produced_by_producer by David Brenner (editor)'],
 ['0',
  'You may be interested in U-571 (film) because you watched The Perfect Storm (film) also related_to_wikipage by Phillip Noyce'],
 ['0',
  'You may be interested in Frequency (film) because you watched The Patriot (2000 film) also produced_by_producer by David Brenner (editor)'],
 ['9',
  'You may be interested in Braveheart (1925 film) because you watched The Perfect Storm (film) also watched by Ph

As you may see many of these explanations contain **hallucinations**. Hallucinations can arise within an explanation when a model incorrectly establishes **incoherent semantic relations** between entities in the KG, e.g., when user-item connections extend beyond mere interaction relations, which would constitute the sole viable option in the KG.

Incorrect semantics may lead to provide explanations connecting the two by a "starred by" relation, which is coherent only between an actor and a movie item as for the KG. Incoherence can also manifest between entities that are semantically linked in the real world but do not have such corresponding relationships in the KG (e.g., entity "Johnny Depp", linked by the relation "starred in", to the item "interstellar", which does not exist in the underlying KG).

Such inaccuracies in explanations compromise the fundamental rationale for utilizing a KG, as the factual truths presented in the explanations become misaligned and incoherent with the underlying KG.

To measure the extend of these phenomena let's calculate what is the rate of corrupted paths among the predicted ones. To do so let's use `` function from `` module

In [None]:
command = f'python pathlm/models/lm/assess_faithfulness.py --dataset {dataset_name} --model plm-rec@{model_base} --k 10'
!$command

Inputing a PLM
Loading KG
Load user of size 6040
Load product of size 2983
Load cinematographer of size 236
Load prodcompany of size 304
Load composer of size 292
Load category of size 1821
Load actor of size 2330
Load country of size 32
Load editor of size 198
Load producer of size 458
Load writter of size 289
Load director of size 117
Load wikipage of size 4700
Load cinematography_by_cinematographer of size 1481
Load produced_by_prodcompany of size 4330
Load composed_by_composer of size 1999
Load belong_to_category of size 40499
Load starred_by_actor of size 9411
Load produced_in_country of size 311
Load edited_by_editor of size 1153
Load produced_by_producer of size 1602
Load wrote_by_writter of size 845
Load directed_by_director of size 367
Load related_to_wikipage of size 143523
Invalid users: 0, invalid items: 0
Load review of size 556989
Loading from  data/ml1m/preprocessed  the dataset  ml1m
(205879, 3)
(193338, 3)
{0: 'cinematography_by_cinematographer', 1: 'produced_by_prodco

<a name="4"></a>

# 4. PEARLM pipeline
---

   

PEARLM [[38]](#p38) (Path-Explainable Accurate Language Model for Explainable Recommendation) is another causual language model for path reasoning and aims to resolve the "hallucination" issue of PLM, its dependency from pretrained KGE embedding and PLM inference scalability issues. Specifically the main difference are:

- **Single Head for Prediction**: Unlike PLM, which uses multiple heads to handle entities and relations, assuming their independence, PEARLM employs a single head. This design choice simplifies the model's complexity and allows for the incorporation of all previous context into the model's learning process.
- **Direct Embedding Learning**: Unlike its predecessors that rely on pretrained KGEs for initializing embeddings, PEARLM learns embeddings directly from the knowledge graph paths. This method not only fully leverages the model's capacity but also supports the concept that sequential learning from paths effectively captures the intricate relationships within the knowledge graph. Furthermore, PEARLM's design draws parallels between Graph Neural Networks (GNNs) and Causal Language Models (CLMs), likening GNNs' breadth-first search (BFS) for capturing immediate neighborhood information to the depth-first search (DFS)-like path generation of CLMs. This analogy underscores PEARLM's ability to delve deeper into the knowledge graph, unearthing distant but relevant relationships essential for generating explainable recommendations.
- **Graph Constraint Decoding**: To ensure the fidelity of generated paths to the underlying knowledge graph, PEARLM introduces "Graph Constraint Decoding" (GCD). This feature guarantees that at the decoding phase, the model adheres to the structure of the knowledge graph, thereby addressing the issue of hallucinated or factually incorrect paths generated by other models.

![](https://drive.google.com/uc?id=1leZB7KU-KgtnZPga2j8EXimktkNFAk1K)


<a name="4.1"></a>
## 4.1 Train PEARLM

---

The  hyperparameter list is reported as follow:
- `--num_epochs`: Max number of epochs.
- `--model`: The base huggingface model from where eredit the architecture one from `{distilgpt2, gpt2, gpt2-large}`
- `--batch_size`: Batch size.
- `--sample_size`: Dataset sample size (to dermine which dataset to use)
- `--n_hop`: Dataset hop size (to dermine which dataset to use)
- `--logit_processor_type`: Decoding strategy `gcd PEARLM, empty for PLM
- `--n_seq_infer`: Number of sequences generated for each user should be `> k`

⚠️ We have already the **precomputed PEARLM for all datasets**. So you **don't need** to run this command now. The hugginface model will be stored into `end-to-end@<dataset_name>@<model>@<sample_size>@<n_hop>@<logit_processor_type>`.

⏲️

In [None]:
! python pathlm/models/lm/pearlm_main.py --data {dataset_name} --sample_size {sample_size} --n_hop {n_hop} --nproc {n_proc}

<a name="4.2"></a>
## 4.2 Evaluate PEARLM

---

In [None]:
model_base = 'distilgpt2'
logit_constraint = 'gcd'
curr_model = f'end-to-end@{dataset_name}@{model_base}@{sample_size}@{n_hop}@{logit_constraint}'
pearlm_results_path = f"results/{dataset_name}/{curr_model}"

This paths are sorted by path score to produce the final topk of predicted item stored in `results/<dataset>/<curr_model>/top{k}_items.pkl`

In [None]:
with open(f"{pearlm_results_path}/top10_items.pkl", 'rb') as pred_top_items_file:
    pearlm_item_topks = pickle.load(pred_top_items_file)
pred_top_items_file.close()

In [None]:
list(pearlm_item_topks.items())[:5]

[(1, [934, 1963, 1313, 1123, 717, 2196, 531, 1457, 763, 1775]),
 (2, [1289, 455, 1372, 2290, 484, 319, 2812, 2151, 316, 272]),
 (4, [2708, 2605, 2037, 1554, 725, 1635, 2673, 2712, 1011, 2240]),
 (7, [2345, 501, 2960, 1411, 60, 44, 2341, 308, 330, 379]),
 (9, [601, 403, 2662, 1613, 118, 466, 2345, 2341, 1113, 1020])]

And the associated explanations store in `results/<dataset>/<curr_model>/top{k}_paths.pkl`

In [None]:
with open(f"{pearlm_results_path}/top10_paths.pkl", 'rb') as pred_top_paths_file:
    pearlm_path_topks = pickle.load(pred_top_paths_file)
pred_top_paths_file.close()

In [None]:
list(pearlm_path_topks.items())[0]

(1,
 [['[BOS]', 'U1', 'R-1', 'P2555', 'R3', 'E7934', 'R3', 'P934'],
  ['[BOS]', 'U1', 'R-1', 'P2444', 'R7', 'E8039', 'R7', 'P1963'],
  ['[BOS]', 'U1', 'R-1', 'P2972', 'R1', 'E8946', 'R1', 'P1313'],
  ['[BOS]', 'U1', 'R-1', 'P2972', 'R1', 'E8946', 'R1', 'P1123'],
  ['[BOS]', 'U1', 'R-1', 'P2555', 'R3', 'E7934', 'R3', 'P717'],
  ['[BOS]', 'U1', 'R-1', 'P1249', 'R3', 'E7934', 'R3', 'P2196'],
  ['[BOS]', 'U1', 'R-1', 'P2151', 'R1', 'E11970', 'R1', 'P531'],
  ['[BOS]', 'U1', 'R-1', 'P2972', 'R1', 'E8946', 'R1', 'P1457'],
  ['[BOS]', 'U1', 'R-1', 'P1249', 'R3', 'E7934', 'R3', 'P763'],
  ['[BOS]', 'U1', 'R-1', 'P1249', 'R3', 'E7934', 'R3', 'P1775']])

In [None]:
command = f'python pathlm/evaluation/evaluate_results.py --dataset {dataset_name} --model {curr_model} --k 10'
!$command

Evaluating rec quality for ['ndcg', 'mrr', 'precision', 'recall', 'serendipity', 'diversity', 'novelty']: 100% 6040/6040 [00:00<00:00, 10770.22it/s]
Number of users: 6040, average topk size: 10.00
ndcg: 0.37, mrr: 0.28, precision: 0.12, recall: 0.07, serendipity: 0.94, diversity: 0.84, novelty: 0.93, coverage: 0.8


<a name="4.3"></a>
## 4.3 Textual explanation generation

---

Let's convert some PEARLM predicted explanation paths to textual explanations. To do so we will leverage the previously defined `template` function and the dictionaries `eid2name` and `rid2name`.

In [None]:
import collections
pearlm_textual_exps = collections.defaultdict(list)
random_user = random.randint(0, len(pearlm_path_topks.keys()))
for i, pid_exp_tuple in enumerate(pearlm_path_topks[random_user]):
    exp = pid_exp_tuple
    pid = pid_exp_tuple[-1][-1]
    pearlm_textual_exps[random_user].append([pid, template(exp)])

In [None]:
pearlm_textual_exps[random_user]

[['9',
  'You may be interested in Breathless (1983 film) because you watched RoboCop also produced_by_prodcompany by Orion Pictures'],
 ['6',
  'You may be interested in Mission: Impossible II because you watched The Rock (film) also composed_by_composer by Hans Zimmer'],
 ['9',
  'You may be interested in Time Bandits because you watched Star Wars Episode I: The Phantom Menace also related_to_wikipage by 1999 in film'],
 ['9',
  'You may be interested in Mission to Mars because you watched Predator (film) also wrote_by_writter by John Thomas (screenwriter)'],
 ['4',
  'You may be interested in The Man with the Golden Gun (film) because you watched Thunderball (film) also produced_by_prodcompany by Eon Productions'],
 ['0',
  'You may be interested in The Mummy (1999 film) because you watched Star Wars Episode I: The Phantom Menace also related_to_wikipage by 1999 in film'],
 ['0',
  'You may be interested in Speed 2: Cruise Control because you watched Godfather (1991 film) also relat

In [None]:
command = f'python pathlm/models/lm/assess_faithfulness.py --dataset {dataset_name} --model {model_base} --k 10'
!$command

Loading KG
Load user of size 6040
Load product of size 2983
Load cinematographer of size 236
Load prodcompany of size 304
Load composer of size 292
Load category of size 1821
Load actor of size 2330
Load country of size 32
Load editor of size 198
Load producer of size 458
Load writter of size 289
Load director of size 117
Load wikipage of size 4700
Load cinematography_by_cinematographer of size 1481
Load produced_by_prodcompany of size 4330
Load composed_by_composer of size 1999
Load belong_to_category of size 40499
Load starred_by_actor of size 9411
Load produced_in_country of size 311
Load edited_by_editor of size 1153
Load produced_by_producer of size 1602
Load wrote_by_writter of size 845
Load directed_by_director of size 367
Load related_to_wikipage of size 143523
Invalid users: 0, invalid items: 0
Load review of size 556989
Loading from  data/ml1m/preprocessed  the dataset  ml1m
(205879, 3)
(193338, 3)
{0: 'cinematography_by_cinematographer', 1: 'produced_by_prodcompany', 2: 'com

# References

<a name="p1">[1]</a> F. Maxwell Harper, Joseph A. Konstan:
The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5(4): 19:1-19:19 (2016)

<a name="p2">[2]</a> Markus Schedl: The LFM-1b Dataset for Music Retrieval and Recommendation. ICMR 2016: 103-110

<a name="p3">[3]</a> Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, Zachary G. Ives:
DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC 2007: 722-735

<a name="p4">[4]</a> Denny Vrandecic, Markus Krötzsch:
Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10): 78-85 (2014)

<a name="p5">[5]</a> Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, Tat-Seng Chua:
Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preferences. WWW 2019: 151-161


<a name="p6">[6]</a> Qingyao Ai, Vahid Azizi, Xu Chen, Yongfeng Zhang:
Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation. Algorithms 11(9): 137 (2018)

<a name="p7">[7]</a> Kurt D. Bollacker, Colin Evans, Praveen K. Paritosh, Tim Sturge, Jamie Taylor:
Freebase: a collaboratively created graph database for structuring human knowledge. SIGMOD Conference 2008: 1247-1250

<a name="p8">[8]</a> Wayne Xin Zhao, Gaole He, Kunlin Yang, Hongjian Dou, Jin Huang, Siqi Ouyang, Ji-Rong Wen:
KB4Rec: A Data Set for Linking Knowledge Bases with Recommender Systems. Data Intell. 1(2): 121-136 (2019)

<a name="p9">[9]</a> Yongfeng Zhang, Qingyao Ai, Xu Chen, W. Bruce Croft:
Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. CIKM 2017: 1449-1458

<a name="p10">[10]</a> Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, Yongfeng Zhang:
Reinforcement Knowledge Graph Reasoning for Explainable Recommendation. SIGIR 2019: 285-294

<a name="p11">[11]</a> Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, Oksana Yakhnenko:
Translating Embeddings for Modeling Multi-relational Data. NIPS 2013: 2787-2795

<a name="p12">[12]</a> Yikun Xian, Zuohui Fu, Handong Zhao, Yingqiang Ge, Xu Chen, Qiaoying Huang, Shijie Geng, Zhou Qin, Gerard de Melo, S. Muthukrishnan, Yongfeng Zhang:
CAFE: Coarse-to-Fine Neural Symbolic Reasoning for Explainable Recommendation. CIKM 2020: 1645-1654

<a name="p13">[13]</a> Zhu Sun, Jie Yang, Jie Zhang, Alessandro Bozzon, Long-Kai Huang, Chi Xu:
Recurrent knowledge graph embedding for effective recommendation. RecSys 2018: 297-305

<a name="p14">[14]</a> Hongwei Wang, Fuzheng Zhang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo:
Multi-Task Feature Learning for Knowledge Graph Enhanced Recommendation. CoRR abs/1901.08907 (2019)

<a name="p15">[15]</a> Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, Tat-Seng Chua:
Learning Intents behind Interactions with Knowledge Graph for Recommendation. WWW 2021: 878-887

<a name="p16">[16]</a> Song, Weiping, Zhijian Duan, Ziqing Yang, Hao Zhu, Ming Zhang, and Jian Tang. "Explainable knowledge graph-based recommendation via deep reinforcement learning." arXiv preprint arXiv:1906.09506 (2019).

<a name="p17">[17]</a>	Hongwei Wang, Fuzheng Zhang, Jialin Wang, Miao Zhao, Wenjie Li, Xing Xie, Minyi Guo:
RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems. CIKM 2018: 417-426

<a name="p18">[18]</a> Xiang Wang, Dingxian Wang, Canran Xu, Xiangnan He, Yixin Cao, Tat-Seng Chua:
Explainable Reasoning over Knowledge Graphs for Recommendation. AAAI 2019: 5329-5336

<a name="p19">[19]</a> Binbin Hu, Chuan Shi, Wayne Xin Zhao, Philip S. Yu: Leveraging Meta-path based Context for Top- N Recommendation with A Neural Co-Attention Model. KDD 2018: 1531-1540

<a name="p20">[20]</a>
Chuan Shi, Binbin Hu, Wayne Xin Zhao, Philip S. Yu:
Heterogeneous Information Network Embedding for Recommendation. CoRR abs/1711.10730 (2017)

<a name="p21">[21]</a> Xiaowen Huang, Quan Fang, Shengsheng Qian, Jitao Sang, Yan Li, Changsheng Xu:
Explainable Interaction-driven User Modeling over Knowledge Graph for Sequential Recommendation. ACM Multimedia 2019: 548-556

<a name="p22">[22]</a> Song, Weiping, et al. "Explainable knowledge graph-based recommendation via deep reinforcement learning." arXiv preprint arXiv:1906.09506 (2019).

<a name="p23">[23]</a> Chang-You Tai, Liang-Ying Huang, Chien-Kun Huang, Lun-Wei Ku:
User-Centric Path Reasoning towards Explainable Recommendation. SIGIR 2021: 879-889

<a name="p24">[24]</a> Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, Xing Xie:
Multi-level Recommendation Reasoning over Knowledge Graphs with Reinforcement Learning. WWW 2022: 2098-2108

<a name="p25">[25]</a> Danyang Liu, Jianxun Lian, Zheng Liu, Xiting Wang, Guangzhong Sun, Xing Xie:
Reinforced Anchor Knowledge Graph Generation for News Recommendation Reasoning. KDD 2021: 1055-1065

<a name="p26">[26]</a> Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen:
Knowledge Graph Embedding by Translating on Hyperplanes. AAAI 2014: 1112-

<a name="p27">[27]</a> Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, Jian Tang:
RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. ICLR (Poster) 2019

<a name="p28">[28]</a>  Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, Xuan Zhu:
Learning Entity and Relation Embeddings for Knowledge Graph Completion. AAAI 2015: 2181-2187

<a name="p29">[29]</a>  Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel:
Convolutional 2D Knowledge Graph Embeddings. AAAI 2018: 1811-1818

<a name="p30">[30]</a> Ni Lao, Tom M. Mitchell, William W. Cohen:
Random Walk Inference and Learning in A Large Scale Knowledge Base. EMNLP 2011: 529-539


<a name="p31">[31]</a> Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Tie-Yan Liu:
A Theoretical Analysis of NDCG Type Ranking Measures. COLT 2013: 25-54

<a name="p32">[32]</a> Giacomo Balloccu, Ludovico Boratto, Gianni Fenu, and Mirko Marras. 2022. Post Processing Recommender Systems with Knowledge Graphs for Recency, Popularity, and Diversity of Explanations. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 646–656. https://doi.org/10.1145/3477495.3532041

<a name="p33">[33]</a> Balloccu G, Boratto L, Fenu G, Marras M. Reinforcement recommendation reasoning through knowledge graphs for explanation path quality. Knowledge-Based Systems. 2023 Jan 25;260:110098.

<a name="p34">[34]</a> Dessì D, Fenu G, Marras M, Reforgiato Recupero D. Coco: Semantic-enriched collection of online courses at scale with experimental use cases. InTrends and Advances in Information Systems and Technologies: Volume 2 6 2018 (pp. 1386-1396). Springer International Publishing.

<a name="p35">[35]</a>  Ni J, Li J, McAuley J. Justifying recommendations using distantly-labeled reviews and fine-grained aspects. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) 2019 Nov (pp. 188-197).

<a name="p36">[36]</a> Balloccu G, Boratto L, Cancedda C, Fenu G, Marras M. Knowledge is power, understanding is impact: Utility and beyond goals, explanation quality, and fairness in path reasoning recommendation. InEuropean Conference on Information Retrieval 2023 Mar 16 (pp. 3-19). Cham: Springer Nature Switzerland.

<a name="p37">[37]</a> Geng S, Fu Z, Tan J, Ge Y, De Melo G, Zhang Y. Path language modeling over knowledge graphsfor explainable recommendation. InProceedings of the ACM Web Conference 2022 2022 Apr 25 (pp. 946-955).

<a name="p38">[38]</a> Balloccu G, Boratto L, Cancedda C, Fenu G, Marras M. Faithful Path Language Modelling for Explainable Recommendation over Knowledge Graph. arXiv preprint arXiv:2310.16452. 2023 Oct 25.

<a name="p39">[39]</a> Afreen N, Balloccu G, Boratto L, Fenu G, Marras M. Towards explainable educational recommendation through path reasoning methods. InCEUR WORKSHOP PROCEEDINGS 2023 (Vol. 3448, pp. 131-136). CEUR-WS.