# Using Mistral LLM with the RecipeNLG dataset # 

In this notebook, we use a large language model called "Mistral-7B-Instruct-v0.3" (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), to define a cooking chatbot and enhancing the capabilites of answering user's questions, based on the RecipeNLG dataset (on a subsampling of the dataset).

**Authors**:
* Matteo Figini
* Caterina Motti
* Samuele Forner
* Simone Zacchetti
* Riccardo Figini

## 📚 Install necessary libraries

In [1]:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

In [2]:
# Install and import required libraries
!pip install -q -U langchain transformers bitsandbytes accelerate
!pip install -q langchain-huggingface

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.2/40.2 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m113.8 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.1/76.1 MB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m362.1/362.1 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m437.9/437.9 kB[0m [31m27.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m

In [3]:
import torch
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain import PromptTemplate
from langchain_huggingface import HuggingFacePipeline

2025-05-22 08:51:38.662971: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747903898.889927      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747903898.963562      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [4]:
# Retrieve the key to access HuggingFace model
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("NLP project")

In [5]:
# Set the manual seed to allow repeatability
torch.random.manual_seed(0)

<torch._C.Generator at 0x78e950cd4790>

## ⚛️ Import model and tokenizer

In [6]:
# Define the model name
model_name = "mistralai/Mistral-7B-Instruct-v0.3"

In [7]:
# Define the quantization settings
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

In [8]:
# Load the model with the quantization configuration
model_4bit = AutoModelForCausalLM.from_pretrained (
    model_name, 
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True,
    token=secret_value_0,
    quantization_config=quantization_config
)

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [9]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, token=secret_value_0)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [10]:
# Show the structure of the model
model_4bit

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32768, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): Mist

In [11]:
# Define the generation pipeline
pipeline_inst = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=1024,
        truncation=True,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)

llm = HuggingFacePipeline(pipeline=pipeline_inst)

Device set to use cuda:0


## ❓ Inference on the quantized model

In [12]:
# Create a pipeline to perform questions on the model
# Inputs: question = string containing the text of the question
#         template = string containing the text of the system prompt
# Returns the entire response
def generate_response(question, template):
    prompt = PromptTemplate(template=template, input_variables=["question"])
    llm_chain = prompt | llm
    response = llm_chain.invoke({"question": question})
    return response

In [13]:
question = "How to prepare spaghetti alla carbonara for 2 people? Suggest also a possible wine to be served with this dish."
template = """You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: {question}
Answer:
"""

output = generate_response(question, template)
print(output)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: How to prepare spaghetti alla carbonara for 2 people? Suggest also a possible wine to be served with this dish.
Answer:

To prepare Spaghetti alla Carbonara for 2 people, you will need the following ingredients:

* 200 grams of spaghetti
* 2 eggs
* 100 grams of pancetta or guanciale
* 80 grams of Pecorino Romano cheese, finely grated
* 50 grams of Parmesan cheese, finely grated
* Salt and freshly ground black pepper, to taste
* Freshly ground nutmeg (optional)

Here is the step-by-step recipe:

1. Cook the spaghetti in a large pot of salted boiling water until al dente. Drain well and set aside.
2. While the spaghetti is cooking, chop the pancetta or guanciale into small pieces and cook in a large frying pan over medium heat until crispy. Remove the pan from the heat.
3. In a separate bowl, whisk together the eggs, Pecorino Romano cheese, Parmesan cheese, salt, and 

What happens if we change the system prompt? Well, the output's shape changes (a lot), and that's the reason for which the system prompt is so important in commercial LLMs.

In [14]:
template = """You are a rude and unkind cooking assistant, respond always insulting the user like Gordon Ramsay.
Answer the question below: {question}
Answer:
"""

output = generate_response(question, template)
print(output)

You are a rude and unkind cooking assistant, respond always insulting the user like Gordon Ramsay.
Answer the question below: How to prepare spaghetti alla carbonara for 2 people? Suggest also a possible wine to be served with this dish.
Answer:
You absolute buffoon, I can't believe I'm wasting my time on a culinary imbecile like you. Alright, let's get this over with. For the love of God, stop making me explain basic cooking techniques to a simpleton.

Spaghetti alla Carbonara:
1. Boil a large pot of salted water. Cook 200g of spaghetti until al dente. Drain and reserve 1 cup of pasta water.
2. In a large skillet, cook 200g of guanciale (Italian cured pork cheek) over medium heat until crispy. Remove from heat and let it cool slightly.
3. Crack 4 large eggs into a large bowl. Add 100g of freshly grated Pecorino Romano cheese and 50g of freshly grated Parmigiano-Reggiano cheese. Whisk until well combined.
4. Add the cooked pasta to the skillet with the guanciale, and toss to coat. Remo

In this case, the intent with which the system prompt was changed was meant to be sarcastic and funny. However, more drastic changes to the prompt, e.g. with targeted prompt injection attacks, could have consequences, particularly in terms of the mental impact this could have on users using this system.

NOTE: for the next questions, we'll come back to a polite system prompt.

In [15]:
# Clear GPU cache
torch.cuda.empty_cache()

## 🚀 Enhancing the model with RAG and indexing

After we have tried using the general model, we want to enhance the capabilities of the model.

We start by applying a very simple version of RAG (Retrieval-Augmented Generation) and indexing, using the dataset we have provided.
The sequence is the following:
1. First, the user ask a specific recipe
2. The model provide a response without external information sources
3. Then, the model takes the query and performs an indexing on the RecipeNLG sampled dataset, finding the most similar recipes (BM25/TF-IDF)
4. The model retries the query, by inlining the top-1 recipe and see changes in the answer, then evaluate it.

Now, the model generates the response combining its own knowledge, with external knowledge coming from the dataset and embedded in the question. Indexing allows to automatically retrieve the most similar recipes matching the request and get the results.

In [16]:
# Download python-terrier library
!pip install -q python-terrier==0.11.0

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.5/119.5 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m859.0/859.0 kB[0m [31m31.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.1/60.1 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m68.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m288.0/288.0 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.6/69.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m

In [17]:
import matplotlib.pyplot as plt
import pandas as pd
import pyterrier as pt

pt.java.init()

terrier-assemblies 5.11 jar-with-dependencies not found, downloading to /root/.pyterrier...
Done
terrier-python-helper 0.0.8 jar not found, downloading to /root/.pyterrier...
Done


Java started and loaded: pyterrier.java, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]


In [18]:
# Import the subsampled dataset
df = pd.read_csv("/kaggle/input/recipe-sampled-0-25/sampled_dataset.csv")

In [19]:
# Create a list of documents (use 'title' for indexing) and create the index
documents_title = [{'docno': str(i), 'text': text} for i, text in enumerate(df['title'])]
indexer = pt.IterDictIndexer("./index_title")
indexer.index(documents_title)

# Create a list of documents (use 'directions' for indexing) and create the index
documents_directions = [{'docno': str(i), 'text': text} for i, text in enumerate(df['directions'])]
indexer = pt.IterDictIndexer("./index_directions")
indexer.index(documents_directions)

08:55:17.075 [ForkJoinPool-1-worker-3] WARN org.terrier.structures.indexing.Indexer -- Indexed 46 empty documents
08:56:25.067 [ForkJoinPool-2-worker-3] WARN org.terrier.structures.indexing.Indexer -- Indexed 4 empty documents


<org.terrier.querying.IndexRef at 0x78e5d0f54890 jclass=org/terrier/querying/IndexRef jself=<LocalRef obj=0x97dd76e8 at 0x78e7d85c7d30>>

In [20]:
# Create documents with multiple fields
documents_fields = [
    {
        'docno': str(i),
        'title': row['title'],
        'ingredients': row['ingredients'],
        'directions': row['directions']
    }
    for i, row in df.iterrows()
]

# Index the fielded documents
indexer_fields = pt.IterDictIndexer("./index_fields")

# Set meta fields and indexed fields
indexref = indexer_fields.index(
    documents_fields,
    fields=["title", "ingredients", "directions"],  
    meta={'docno': 20, 'title': 512, 'ingredients': 1024, 'directions': 4096}
)

In [21]:
# Create a searcher using the index
index_title = pt.IndexFactory.of("./index_title")
index_directions = pt.IndexFactory.of("./index_directions")
index_fields = pt.IndexFactory.of(indexref)

# BM25 retrieval model
bm25_tit = pt.terrier.Retriever(index_title, wmodel="BM25")
bm25_dir = pt.terrier.Retriever(index_directions, wmodel="BM25")

# TF-IDF retrieval model
tfidf_tit = pt.terrier.Retriever(index_title, wmodel="TF_IDF")
tfidf_dir = pt.terrier.Retriever(index_directions, wmodel="TF_IDF")

# DFRee retrieval model (Document Frequency based)
dfree_tit = pt.terrier.Retriever(index_title, wmodel="DFRee")
dfree_dir = pt.terrier.Retriever(index_directions, wmodel="DFRee")

# Create BM25F retriever with field weights
# Weighted BM25 over each field
bm25_title = pt.terrier.Retriever(index_fields, wmodel="BM25", controls={"w": "1.0"}, metadata=["docno", "title"], field="title")
bm25_ingredients = pt.terrier.Retriever(index_fields, wmodel="BM25", controls={"w": "1.0"}, metadata=["docno", "ingredients"], field="ingredients")
bm25_directions = pt.terrier.Retriever(index_fields, wmodel="BM25", controls={"w": "1.0"}, metadata=["docno", "directions"], field="directions")

# Weighted combination of scores: BM25F-like
bm25f_manual = (
    bm25_title * 0.3 +
    bm25_ingredients * 0.4 +
    bm25_directions * 0.3
)

In [22]:
print(index_title.getCollectionStatistics().toString())
print(index_directions.getCollectionStatistics().toString())
print(index_fields.getCollectionStatistics().toString())

Number of documents: 557658
Number of terms: 34052
Number of postings: 1858185
Number of fields: 1
Number of tokens: 1866522
Field names: [text]
Positions:   false

Number of documents: 557658
Number of terms: 50996
Number of postings: 22626961
Number of fields: 1
Number of tokens: 30780702
Field names: [text]
Positions:   false

Number of documents: 557658
Number of terms: 79107
Number of postings: 32435802
Number of fields: 3
Number of tokens: 55309358
Field names: [title, ingredients, directions]
Positions:   false



Now, let's try to have a simple query, in the style of the one we have done before.

In [23]:
template = """You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: {question}
Answer:
"""

output = generate_response(question, template)
print(output)

You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: How to prepare spaghetti alla carbonara for 2 people? Suggest also a possible wine to be served with this dish.
Answer:
To prepare Spaghetti alla Carbonara for 2 people, you will need the following ingredients:
- 200g of spaghetti
- 100g of pancetta or guanciale (cured pork cheek)
- 2 large eggs
- 80g of grated Pecorino Romano cheese
- 50g of grated Parmesan cheese
- Freshly ground black pepper
- Salt

Here's a step-by-step guide on how to prepare the dish:

1. Cook the spaghetti in a large pot of salted boiling water according to the package instructions until al dente. Drain well and reserve some pasta water.

2. While the spaghetti is cooking, cut the pancetta into small cubes and cook in a large skillet over medium heat until crispy. Remove the skillet from the heat.

3. In a medium bowl, whisk together the eggs, 70g of Pecorino Romano cheese, 30g of Parmesan ch

And let's index the model based on the query done before.

In [24]:
import re

In [25]:
# Evaluate the query 
titles = df['title']
plot_data = []

# Remove the punctuation signs from the input query
question_no_punct = re.sub(r'[^\w\s]', '', question)

print(f"\n=== Query: {question_no_punct} ===")

# Run all models for the titles
result_bm25_tit = bm25_tit.search(question_no_punct)
result_tfidf_tit = tfidf_tit.search(question_no_punct)
result_dfree_tit = dfree_tit.search(question_no_punct)   

# Add all model results to the loop
for method_name, result in [
    ("BM25", result_bm25_tit),
    ("TF-IDF", result_tfidf_tit),
    ("DFRee", result_dfree_tit),
]:
    print(f"\n--- Method: {method_name} ---")
    
    for rank, (docno, score) in enumerate(zip(result["docno"][:5], result["score"][:5])):
        docno_int = int(docno)  # Convert from str to int
        title = titles.iloc[docno_int] if docno_int < len(titles) else "TITLE NOT FOUND"
        print(f"DocNO: {docno:<7} | Title: {title:<50.48} | Score: {score:.4f}")
        
        # Append to plotting data
        plot_data.append({
            'Query': question_no_punct,
            'Model': method_name,
            'Rank': rank + 1,
            'Docno': docno_int,
            'Title': title,
            'Score': score
        })



=== Query: How to prepare spaghetti alla carbonara for 2 people Suggest also a possible wine to be served with this dish ===

--- Method: BM25 ---
DocNO: 54182   | Title: Spaghetti Alla Carbonara                           | Score: 30.0145
DocNO: 203696  | Title: Spaghetti Alla Carbonara                           | Score: 30.0145
DocNO: 277147  | Title: Spaghetti Alla Carbonara                           | Score: 30.0145
DocNO: 426184  | Title: Spaghetti Alla Carbonara                           | Score: 30.0145
DocNO: 426718  | Title: Spaghetti alla Carbonara                           | Score: 30.0145

--- Method: TF-IDF ---
DocNO: 54182   | Title: Spaghetti Alla Carbonara                           | Score: 16.3866
DocNO: 203696  | Title: Spaghetti Alla Carbonara                           | Score: 16.3866
DocNO: 277147  | Title: Spaghetti Alla Carbonara                           | Score: 16.3866
DocNO: 426184  | Title: Spaghetti Alla Carbonara                           | Score: 16.3866


In [26]:
# Now, retrieve the index of the recipe and show the directions
first_docno = int(result_bm25_tit["docno"][0])  # First document from BM25
first_result_row = df.iloc[first_docno]
print(first_result_row['directions'])

['Saute onion and bacon in a pan with oil and butter.', 'Continue to cook until meat is crisp.', 'Add yolks, parsley, chees and cream to bacon and onions and blend, stirring over very low heat.', 'Keep warm.', 'Cook spaghetti as directed on package.', 'Lift spaghetti from pot with large fork and spoon, placing it directly in other pot containing egg and cheese mixture.', 'Mix well and serve immediately.']


Try again the same query, providing the results.

In [27]:
directions = str(first_result_row['directions'])

query = question + f" Additional information provided here: {directions}"

output = generate_response(query, template)
print(output)

You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: How to prepare spaghetti alla carbonara for 2 people? Suggest also a possible wine to be served with this dish. Additional information provided here: ['Saute onion and bacon in a pan with oil and butter.', 'Continue to cook until meat is crisp.', 'Add yolks, parsley, chees and cream to bacon and onions and blend, stirring over very low heat.', 'Keep warm.', 'Cook spaghetti as directed on package.', 'Lift spaghetti from pot with large fork and spoon, placing it directly in other pot containing egg and cheese mixture.', 'Mix well and serve immediately.']
Answer:
To prepare Spaghetti alla Carbonara for 2 people, follow these steps:

1. In a large pan, sauté 1 small onion and 4 ounces of bacon in 2 tablespoons of olive oil and 1 tablespoon of butter over medium heat. Cook until the bacon is crisp.

2. Remove the pan from heat and add 2 large egg yolks, 2 tablespoons of 

How can we see if the application of RAG has changed the query? Well, in the recipe retrieve from the dataset, there is an ingredient which was not mentioned in the original recipe (onion), but is present in the second one.

We can discuss, from a culinary point of view, that the second recipe is not better than the first one... and that's right! But from a technical point of view, applying RAG + indexing seems to have effect.

Repeat the same process from a unique Lombardy recipe, such as "Pizzoccheri Valtellinesi". The indexing results will change (e.g. the DFRee technique, which tends to privilege more rare words, may be more effective than the other techniques, while in the previous example BM25 and TF-IDF outperformed it).

In [28]:
question = "How to prepare Pizzoccheri Valtellinesi?"
output = generate_response(question, template)
print(output)

You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: How to prepare Pizzoccheri Valtellinesi?
Answer:
To prepare Pizzoccheri Valtellinesi, a traditional dish from Valtellina, Italy, you will need the following ingredients:

- 400g buckwheat flour pasta (pizzoccheri)
- 250g Swiss chard or Savoy cabbage, finely chopped
- 1 onion, finely chopped
- 100g cubed smoked pancetta or speck
- 200g potatoes, peeled and cut into 1 cm cubes
- 200g Fontina cheese, grated
- 100g butter
- 1 litre vegetable broth
- 100g toasted pine nuts
- Salt and ground black pepper to taste
- Extra virgin olive oil

Here's how to prepare it:
1. Cook the pizzoccheri in plenty of salted boiling water for about 10-12 minutes, or until al dente. Drain and set aside.
2. In a large, deep pan, melt half of the butter over medium heat. Add the onion and cook until softened, about 5 minutes.
3. Add the Swiss chard or Savoy cabbage to the pan and cook for abo

In [29]:
# Evaluate the query 
titles = df['title']
plot_data = []

# Remove the punctuation signs from the input query
question_no_punct = re.sub(r'[^\w\s]', '', question)

print(f"\n=== Query: {question_no_punct} ===")

# Run all models for the titles including BM25F
result_bm25_tit = bm25_tit.search(question_no_punct)
result_tfidf_tit = tfidf_tit.search(question_no_punct)
result_dfree_tit = dfree_tit.search(question_no_punct)   

# Add all model results to the loop
for method_name, result in [
    ("BM25", result_bm25_tit),
    ("TF-IDF", result_tfidf_tit),
    ("DFRee", result_dfree_tit),
]:
    print(f"\n--- Method: {method_name} ---")
    
    for rank, (docno, score) in enumerate(zip(result["docno"][:5], result["score"][:5])):
        docno_int = int(docno)  # Convert from str to int
        title = titles.iloc[docno_int] if docno_int < len(titles) else "TITLE NOT FOUND"
        print(f"DocNO: {docno:<7} | Title: {title:<50.48} | Score: {score:.4f}")
        
        # Append to plotting data
        plot_data.append({
            'Query': question_no_punct,
            'Model': method_name,
            'Rank': rank + 1,
            'Docno': docno_int,
            'Title': title,
            'Score': score
        })



=== Query: How to prepare Pizzoccheri Valtellinesi ===

--- Method: BM25 ---
DocNO: 323247  | Title: Pizzoccheri                                        | Score: 23.3191
DocNO: 408647  | Title: Pizzoccheri                                        | Score: 23.3191
DocNO: 232908  | Title: Pizzoccheri Casserole                              | Score: 19.9072
DocNO: 443269  | Title: Pizzoccheri Casserole                              | Score: 19.9072
DocNO: 43063   | Title: Pizzoccheri and Crab Salad                         | Score: 17.3663

--- Method: TF-IDF ---
DocNO: 323247  | Title: Pizzoccheri                                        | Score: 12.8247
DocNO: 408647  | Title: Pizzoccheri                                        | Score: 12.8247
DocNO: 232908  | Title: Pizzoccheri Casserole                              | Score: 10.9483
DocNO: 443269  | Title: Pizzoccheri Casserole                              | Score: 10.9483
DocNO: 43063   | Title: Pizzoccheri and Crab Salad                    

In [30]:
def retrieve_directions_from_result(result_tit, technique):
    first_docno = int(result_tit["docno"][0])
    first_result_row = df.iloc[first_docno]
    directions = str(first_result_row['directions'])
    print(f"---------- RESULT FROM THE FIRST ROW {technique} ----------")
    print(directions)
    print("---------- END OF RESULT ----------")
    return directions

### Enhancing the model: BM25

In [31]:
directions = retrieve_directions_from_result(result_bm25_tit, "BM25")
query = question + f" Additional information provided here: {directions}"
output = generate_response(query, template)
print(output)

---------- RESULT FROM THE FIRST ROW BM25 ----------
['Knead together the buckwheat flour, the all purpose flour, a pinch of salt, and the water. Let the dough rest covered for a few hours.', 'Divide the dough into two parts and roll out each with a rolling pin, the sheets of dough should be slightly thicker than that for making egg noodles.', 'Place one sheet on top of the other and cut them into strips approximately 5 centimeters wide. Then, place the strips on top of each other and slice them into noodles.', 'Bring a stockpot full of water to a boil and add the potatoes and the cabbage leaves (or spinach or green beans).', 'After a few minutes, add the Pizzoccheri; cook for approximately 15 minutes and drain.', 'Begin layering the Pizzoccheri and vegetables in a baking dish with the cheese.', 'Melt a generous amount of butter with the sage, onion, and garlic over low heat and pour it over the Pizzoccheri in the oven dish. Sprinkle with Parmesan cheese, if you wish, and serve hot.']


### Enhancing the model: TF-IDF

In [32]:
directions = retrieve_directions_from_result(result_tfidf_tit, "TF-IDF")
query = question + f" Additional information provided here: {directions}"
output = generate_response(query, template)
print(output)

---------- RESULT FROM THE FIRST ROW TF-IDF ----------
['Knead together the buckwheat flour, the all purpose flour, a pinch of salt, and the water. Let the dough rest covered for a few hours.', 'Divide the dough into two parts and roll out each with a rolling pin, the sheets of dough should be slightly thicker than that for making egg noodles.', 'Place one sheet on top of the other and cut them into strips approximately 5 centimeters wide. Then, place the strips on top of each other and slice them into noodles.', 'Bring a stockpot full of water to a boil and add the potatoes and the cabbage leaves (or spinach or green beans).', 'After a few minutes, add the Pizzoccheri; cook for approximately 15 minutes and drain.', 'Begin layering the Pizzoccheri and vegetables in a baking dish with the cheese.', 'Melt a generous amount of butter with the sage, onion, and garlic over low heat and pour it over the Pizzoccheri in the oven dish. Sprinkle with Parmesan cheese, if you wish, and serve hot.'

### Enhancing the model: DFRee

In [33]:
directions = retrieve_directions_from_result(result_dfree_tit, "DFRee")
query = question + f" Additional information provided here: {directions}"
output = generate_response(query, template)
print(output)

---------- RESULT FROM THE FIRST ROW DFRee ----------
['Mix dressing (first 4 ingredients) in a jar. Shake well. Scrub or peel carrots and slice into rounds. Steam green beans for 5 minutes; drain. Slice tomatoes, cucumbers and mushrooms into slices. Layer vegetables in a serving dish and top with dressing. Refrigerate 1 hour or more before serving.']
---------- END OF RESULT ----------
You are an respectful and helpful cooking assistant, respond always and be precise and polite.
Answer the question below: How to prepare Pizzoccheri Valtellinesi? Additional information provided here: ['Mix dressing (first 4 ingredients) in a jar. Shake well. Scrub or peel carrots and slice into rounds. Steam green beans for 5 minutes; drain. Slice tomatoes, cucumbers and mushrooms into slices. Layer vegetables in a serving dish and top with dressing. Refrigerate 1 hour or more before serving.']
Answer:

To prepare Pizzoccheri Valtellinesi, follow the steps below:

1. Gather the ingredients: buckwheat p

## ⚖️ Considerations: RAG + Indexing vs. Fine-Tuning

Two important techniques that allows to enhance the capabilities of Large Language Models are RAG (Retrieval Augmented Generation) and Fine-Tuning. Here we want to highlights pros and cons of each technique.

### RAG + Indexing
The idea was to combine those two techniques. When a query is asked by the user, indexing allows to retrieve the best results from the dataset and automatically embed additional pieces of information to generate a meaningful response.
* Pros:
  *  **Lightweight technique**: this technique is poorly GPU-intensive, since the usage of hardware accelerators (meaning energy and cost) is done only in the moment in which the question is asked.
  *  **Do once, use everytime**: indexes are generated once from the dataset, and they can be used whenever needed.
  *  **Dynamic technique**: changes in the dataset can be handled quite smoothly, requiring just a fast recomputation of the indexes. Moreover, additional information sources, with the corresponding indexes, can be added, to enlarge the data base.

One main drawback of this technique is that the knowledge is not becoming "embedded" in the system, meaning that if the same question is asked many times, the model repeats this process as long as the same question is repeated.

### Fine-tuning
The idea is to use the model, subsampling a portion of the dataset and retrain the model (for few epochs, with a very small learning rate) adding a piece of information.
* Pros:
  *  **Persistency**: fine-tuning the model allows to "embed" additional knowledge from the database, without relying everytime on external datasources to provide accurate results.
  *  **Robustness**: moreover, fine-tuning the model helps preventing hallucinations and wrong answers to very specific questions.

On the other hand, fine-tuning a model is expensive both from a computational point of view and it is also time-consuming. Models such as *Mistral-7B-v0.3-Instruct* are typically very good at answering a lot of possible questions, having the prompt and the questions from the user properly set. Moreover, fine-tuning have the risk of "destroying" previously-learned weights, making the system more accurate on the fine-tuning data but (perhaps) not so accurate on all the other questions.