# IUST Computer Engineering Department 🏫
## Introduction to Natural Language Processing 📚 (The Final Project)
### Course Instructor: Dr. Marzieh Davoodabadi Farahani 👩‍🏫
### Project Teaching Assistant: Erfan Moosavi Monazzah (tel: @ErfanMoosavi2000) 📞
-------------------------------------------------------------------------------<br>
The objective of this project is to acquaint you with the fundamentals of Retrieval Augmented Generation (RAG). Be sure to explore various options and address challenges in a creative manner. 🎯

**Project Guidelines** 📝
- Avoid cheating at all costs. If a set of submissions is found to be [plagiarized](https://translate.google.as/?sl=en&tl=fa&text=Very%20hard%20word%2C%20I%20know%2C%20here%27s%20the%20meaning%3A%0Aplagiarized&op=translate), only one will be randomly chosen for grading. The others will fail the project. ❌
- You are allowed to use any document, article, paper, or video as a resource for writing your code, provided you include a link to the material used. 📖
- The use of Language Learning Models (LLMs), ChatBots, and Copilots is encouraged. If you utilize any of these tools, make sure to attach the chat history that led you to the answer to your question, or the code, to this .ipynb document. (You must provide the entire chat, not just the final answer or your initial prompt.) 💻
- You may not submit any additional documents, files, etc., along with this document. Only solutions, codes, explanations, etc., in this document will be graded. 📄
- You are required to implement everything (except the Language Modeling parts) from scratch. The use of libraries like langchain, llama_index, etc., is not permitted for this purpose. 🚫
- Please adhere to the code guidelines provided throughout the documents. 📝 I’ve spent time in a library 📚 crafting all of this, so if you overlook them, you’ll lose the points allocated for that section. ❌
- We need to use GPUs for this assignment, don't forget to turn on GPU usage for your notebook session.

-------------------------------------------------------------------------------<br>
# Alright, let's get started. 🚀

## What is RAG? 🤔
We've all used ChatGPT and experienced moments when it starts to generate content that is often incorrect or unrelated to our query. Do you know why this happens? These Large Language Models (LLMs) are not magical entities; they are simply models trained on a vast amount of text. 📚 You could even consider a significant portion of the internet. However, this is not all the data available in the world, because data is not a static concept. You yourself generate some data every day through your use of the Internet, Social Media, and so on. 🌐💻📱

So, no matter how much data you use to train your LLM, you always end up encountering new data. This is one of the reasons behind the famous ChatGPT response that tells you it only knows things up to a certain date. 📅 Also, these models tend to hallucinate too. It means they provide incorrect answers but in a very convincing manner. 🎭

On the other hand, we have retrieval techniques. Don't worry if it sounds complicated (it actually isn't easy, you may need to take a course to familiarize yourself with these concepts 😅, but that's not necessary for this project), but you use it on a daily basis. You can think of Search Engines (like Google, for example) as a complex form of information retrieval. 🔍

So, one day, people came up with this idea that it would be cool if ChatGPT could search Google for us, read the articles for us, summarize what it read, and tell us that. 📖 So, this is not exactly what RAG is, but it's something similar. We have a corpus (a large amount of data) and a query (what a user typed as input). Now, we search through this corpus using techniques related to vectors and vector databases, and find the most similar items in our corpus to the query. Then, we pass these items to an LLM and ask for a structured, well-formatted, user-friendly output. 📈📊

## I'm Interested in the Technical Details, What Should I Read? 📚🔍
- I strongly recommend reading the [original RAG paper](https://arxiv.org/abs/2005.11401). If you need help understanding the paper or have any questions about it, feel free to reach out to me via Telegram or find me on the second floor of the department in the NLP lab on Sundays and Tuesdays. 📖
- There appears to be a [comprehensive 2.5-hour course](https://www.freecodecamp.org/news/mastering-rag-from-scratch/) available. I haven't personally watched it, but if you find a better one, let me know so I can update this document. 🎥
- Here is [an article](https://www.smashingmagazine.com/2024/01/guide-retrieval-augmented-generation-language-models/) that explains the concepts very well. Initially, I wanted to use this article as the basis for this project, but unfortunately, the llama_index library used in the article seems to be outdated, so most of the code would need to be rewritten. On second thought, I found it more useful to focus on core concepts rather than learning specific libraries. You might want to check out some libraries like langchain or llama_index which provide a lot of tools for RAG. (But not for this project) 📝💡
- Don't hesitate to use Google, ask chatbots about any new concepts and terms. If you use search engine-aware chatbots like Microsoft Copilot, they provide links for each part of their answers which is useful if you want to delve deeper into that part. 🌐🤖
- Lastly, we have [the article](https://learnbybuilding.ai/tutorials/rag-from-scratch) that serves as the foundation for this project. 📚🔍

# Learn
First, we’re going to go through a simple RAG implementation. It’s going to be similar to the article, except for the (LLM) part. For that, I’m going to use Hugging Face. 🤗 I’ll also try to explain the code in simple terms, but feel free to read the article if you prefer their writing style.

## Let's Install the Necessary Libraries 📚🔧
Did you know that using the `--quiet` or `-q` option with the `pip install` command minimizes the output displayed on your screen? 🖥️ This can make your terminal less cluttered. Also, using `-U` will upgrade the libraries if they were previously installed. This is particularly useful for certain libraries like `transformers` that are frequently updated. 🔄

In [None]:
!pip install -U accelerate transformers --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m42.2 MB/s[0m eta [36m0:00:00[0m
[?25h

## Gather a Corpus 📚
Technically, a corpus refers to a large and structured set of texts. However, for the sake of our discussion, let’s consider our collection as a “corpus”, even though it might not be large in the traditional sense. 😉

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

## Create a Retriever 🕵️‍♂️
Now, we’re going to create a simple retriever. The role of the retriever is to compare the user’s query with a large corpus of text and find those that are most similar in context. (You know what context is by now, don’t you? 😊 If you’ve forgotten, refer back to your initial lectures). For now, let’s say we want to find similar text based on simple similarity metrics. The code is straightforward, and I have faith in you, chief! Dive into the code. 👨‍💻

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

Hey, you may want to look at wikipedia page for [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).

In [None]:
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

## Create a Generator 🖥️
Now, we’re going to create a generator. This will help us compile the information retrieved into a well-structured and user-friendly text.

OK, let's say in a senario, we ask user what they like to do, the their answer is this:

In [None]:
user_input = "I like to hike"

Now by using the retrieval model I find this activity that best fits this user.

In [None]:
relevant_document = return_response(user_input, corpus_of_documents)
print(relevant_document)

Go for a hike and admire the natural scenery.


The answer seems good enough, but we can do better, yeah?

Let’s import a Language Model. I’m going to try out Microsoft Phi-3 because it recently hit the market, and I haven’t had a chance to try it for myself yet. So, I’m seizing this opportunity to do so! 😊👨‍💻

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Downloading the model gonna take a while, use this time to rest your eyes for a bit. 😊👀💤

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Now we try to get the LLM to become our generator. We simply place the retrieved information and user query in the following prompt and ask the model for well formatted text.

In [None]:
prompt = """You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input."""

In [None]:
prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)

You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: Go for a hike and admire the natural scenery.
The user input is: I like to hike
Compile a recommendation to the user based on the recommended activity and the user input.


In [None]:
messages = [
    {"role": "user", "content": prompt},
]

Here's the augmented generated text

In [None]:
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

Based on your interest in hiking and our recommended activity, I suggest you embark on a scenic hike in a beautiful natural environment. This will not only allow you to enjoy the physical benefits of hiking but also provide a wonderful opportunity to admire breathtaking landscapes, observe diverse flora and fauna, and experience the tranquility of nature. Don't forget to bring along essentials like water, snacks, and appropriate hiking gear for a safe and enjoyable adventure!


## Very Cool, but Not Perfect! 😎👌
Alright, you’ve just seen a very basic example of RAG. However, there are some issues present. The corpus is small, and the documents in the corpus are short sentences, which causes the Language Model (LM) to generate some text on its own. 📚🤖

Also, our retriever is not very efficient and it may encounter bugs in some cases. For instance, even when users specify that they are not interested in a certain activity, the retriever might still bring up that activity for them. 🐜🔍

So, in this project, you’re going to address some of these issues. The rest of this document consists of some empty cells and tips for you on how to fill them with code. Let’s get coding! 👨‍💻🚀

# The Project

## Determine Your Task 🎯
What do you aim to implement with RAG? A recommender system? 🎁 A chatbot for a website’s FAQ? 💬 A medical advisor? 🩺 Or perhaps something else entirely?

Specify your objective in this cell.

In [110]:
task_title = "travel-related recommender system"
url_for_more_information = ""

print(f"My task is: {task_title}")
print(f'For more information see: {url_for_more_information}')

My task is: travel-related recommender system
For more information see: 


defining flush function to be able to free up space in RAM and clear cache.

In [None]:
import gc
import torch

def flush():
  gc.collect()
  torch.cuda.empty_cache()
  torch.cuda.reset_peak_memory_stats()

## 🧐 Find or gather a corpus
Remember the fake corpus? 📚 It’s time to switch things up and use something real. 🌐 You need to use a dataset from  [huggingface datasets](https://huggingface.co/datasets) for this project. 🚀 Don’t use files that are outside of this notebook, this notebook should be able to run on its own without depending on anything external. 💻👍


In [None]:
!pip install -U accelerate transformers --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m37.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m71.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl (547 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/547.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m204.8/547.8 kB[0m [31m5.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-16.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (40.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m18.2 MB/s[0m eta [36m0:00:00[0m
Collecting requests>=2.32.2 (from datasets)
  Downloading requests-2.32.3-py3-none-any

In [None]:
from datasets import load_dataset
raw_dataset = load_dataset("soniawmeyer/reddit-travel-QA-finetuning")


Downloading readme:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/54.8M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/10482 [00:00<?, ? examples/s]

In [None]:
print(raw_dataset['train'][:5]['comments'])

['I live in Vancouver as well and booking a flight to Colombia recently I noticed its much cheaper to get to MEX on a layover! Annoying., Are you checking bags? If so, dont do this. Itll cause all sorts of problems for you, the airline, and the other passengers on the flight., Just go up to the counter for your next flight and tell them you have diarrhea and you wont be boarding. Theyll be happy to use your seat for standbyers, its only a problem if they dont know if youre going to show up or not. Signed, an experienced skiplagger, Wouldnt be banned, but the practice for many airlines now is if you miss one and dont rebook nearly immediately, they will cancel the rest of your return flights, so best to do with one way bookings., I did it exactly once. International flight, originating in Atlanta. The return connected at Dulles, where I lived. I collected my bags and let the employees know that my company had changed my plans last minute and that I would not be boarding my connection. T

In [None]:
from datasets import load_dataset

# Load the dataset
raw_dataset = load_dataset("soniawmeyer/reddit-travel-QA-finetuning")

# Access the 'comments' data from the training set and convert the comments corpus to a list
comments_corpus = raw_dataset['train']
comments_list = comments_corpus['comments']

# Print the first few comments to verify
for comment in comments_list[:5]:
    print(comment)


I live in Vancouver as well and booking a flight to Colombia recently I noticed its much cheaper to get to MEX on a layover! Annoying., Are you checking bags? If so, dont do this. Itll cause all sorts of problems for you, the airline, and the other passengers on the flight., Just go up to the counter for your next flight and tell them you have diarrhea and you wont be boarding. Theyll be happy to use your seat for standbyers, its only a problem if they dont know if youre going to show up or not. Signed, an experienced skiplagger, Wouldnt be banned, but the practice for many airlines now is if you miss one and dont rebook nearly immediately, they will cancel the rest of your return flights, so best to do with one way bookings., I did it exactly once. International flight, originating in Atlanta. The return connected at Dulles, where I lived. I collected my bags and let the employees know that my company had changed my plans last minute and that I would not be boarding my connection. The

## 📝 Create some queries
I want you to create 20 queries related to your task. You can use any Language Model you want for this matter, or if you’re feeling strong 💪 and have the time, write it yourself. 🖊️

You need to create a Hugging Face account, format your 20 queries into the accepted dataset format for Hugging Face 🤗 and push it to your Hugging Face account. Be sure to make it public and use it for the evaluation task. 👀

In [None]:
from datasets import list_datasets, load_dataset
from pprint import pprint

# My dataset on Hugging Face
dataset_name = "shabika0A/travel-recommender"

# Load the dataset
queries = load_dataset(dataset_name)

with open("queries.txt", "w") as f:
  for item in queries['train']:
      f.write("%s\n" % item)
      pprint(item)


Downloading readme:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/878 [00:00<?, ?B/s]

Generating train split:   0%|          | 0/20 [00:00<?, ? examples/s]

{'query': 'What is the best time to visit Paris?'}
{'query': 'What are the top attractions in Asia?'}
{'query': 'What are the top attractions in America?'}
{'query': 'What are the top attractions in Africa?'}
{'query': 'What are the top attractions in Europe?'}
{'query': 'What are the top attractions in Australia?'}
{'query': 'What is the currency used in Japan?'}
{'query': 'What is the currency used in Germany?'}
{'query': 'What is the best way to travel between European countries?'}
{'query': 'Is it safe to travel to Egypt?'}
{'query': 'What is the best time to visit New York?'}
{'query': 'What are some must-try foods in Italy?'}
{'query': 'What are some popular souvenirs to buy in India?'}
{'query': 'What is the best time to visit Japan?'}
{'query': 'What are the top attractions in South America?'}
{'query': 'What is the currency used in Australia?'}
{'query': 'Do I need a visa to travel to Brazil?'}
{'query': 'What are some must-try foods in Mexico?'}
{'query': 'What are some popul

## 🛠️ Create a Retriever
To create your retriever, you need to use an encoder model. Something like BERT? Nah, BERT is so yesterday. Find something new and shiny! ✨ The basic idea is to encode every document (sentence) in your corpus into a vector space using the same encoder. Then, encode the user query into that same space. With some similarity metrics like dot product, you can find the most similar document to the user’s input and retrieve it. 🎯 You can train your own encoder if you have enough data and resources, 💪 or you can use one of those [ready-made on Hugging Face](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending), like these ones.

##retriver model

In [None]:
!pip install sentence_transformers

Collecting sentence_transformers
  Downloading sentence_transformers-3.0.1-py3-none-any.whl (227 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/227.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m225.3/227.1 kB[0m [31m6.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.1/227.1 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentence_transformers
Successfully installed sentence_transformers-3.0.1


I will add this code to be able to empty cache.

In [None]:
!pip install -i https://pypi.org/simple/ bitsandbytes --upgrade --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [111]:
from transformers import BitsAndBytesConfig
import torch
from transformers import AutoModel

q_config = BitsAndBytesConfig (
      load_in_4bit = True,
      bnb_4bit_quant_type="nf4",
  )

In [None]:
from sentence_transformers import SentenceTransformer, util
import pandas as pd
import torch

# Load the pre-trained model
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
ret_model = SentenceTransformer(model_name)

# Encode the comments_list into embeddings
corpus_embeddings = ret_model.encode(comments_list)
corpus_embeddings_tensor = torch.tensor(corpus_embeddings)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
#loading queries
queries=[]

with open("queries.txt", "r") as f:
    for line in f:
        queries.append(line.strip())

In [None]:
# Create an empty list to store queries and their retrieved answers
results = []

# Function to retrieve the most similar document and return the result
def retrieve(query):
    query_embedding = ret_model.encode(query, convert_to_tensor=True)
    hits = util.semantic_search(query_embedding, corpus_embeddings_tensor, top_k=1)
    hit = hits[0][0]
    retrieved_answer = comments_list[hit['corpus_id']]
    return query, retrieved_answer, hit['score']

# Using the retriever for sample queries and store the results
for query in queries:
    query_text, retrieved_text, score = retrieve(query)
    results.append((query_text, retrieved_text, score))

# Print the results
for result in results:
    query_text, retrieved_text, score = result
    print(f"Query: {query_text}")
    print(f"Retrieved: {retrieved_text} (Score: {score})")

with open("retriever_outputs.txt", "w") as f:
  for result in results:
      query_text, retrieved_text, score = result
      print(f"Query: {query_text}")
      print(f"Retrieved: {retrieved_text} (Score: {score})")
      # f.write("%s\n" % result)
      f.write(f"Query: {query_text}\n")
      f.write(f"Retrieved: {retrieved_text} (Score: {score})\n")


Query: {'query': 'What is the best time to visit Paris?'}
Retrieved: Depends on where and when youre going. Paris in spring? Book months out or more Porto in July? Good luck, should have booked last year. Most of Cental America? Just wait until youre on the way., I try to know where Im going at least months in advance. Right now, I have my plan sorted all the way to things can obviously still change tho, so generally I book or months in advance. For example, Ill be staying in Japan in late November, but Im booking the place maybe with AirBnB... maybe not in early July., I have things booked until late November this year and I am about to start looking for December. Summer in Europe I always have booked by February, if not earlier., deleted, Ive booked a few days ahead and had no problems. Some cities like Istanbul have so many places that the prices will be the same, even at the last minute. Look into the supply and demand of Airbnbs in your location., I usually have everything booked 

In [None]:
flush()

## 🎛️ Create a Generator
For this part, I practically handed you the whole code on a silver platter. 🍽️ But since we know you’re an explorer at heart and love trying new things, you can’t use the model I previously used. 😈 You have to try 3 different generators and compare them based on the quality of their answers. 🧪📊 [These might come in handy](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In [None]:
!pip install accelerate



In [None]:
#loading the results for generator input
results = []

with open("retriever_outputs.txt", "r") as f:
    for line in f:
        results.append(line.strip())

results = []

with open("retriever_outputs.txt", "r") as f:
    lines = f.readlines()
    i = 0
    while i < len(lines):
        query = lines[i].strip()[7:]  # Extract the query text
        retrieved = lines[i + 1].strip()[11:].split(" (Score: ")  # Extract the retrieved text and score
        retrieved_text = retrieved[0]
        score = float(retrieved[1][:-1])
        results.append((query, retrieved_text, score))
        i += 2

# Print the loaded results
for result in results:
    print(f"Query: {result[0]}")
    print(f"Retrieved: {result[1]} (Score: {result[2]})")


Query: {'query': 'What is the best time to visit Paris?'}
Retrieved: Depends on where and when youre going. Paris in spring? Book months out or more Porto in July? Good luck, should have booked last year. Most of Cental America? Just wait until youre on the way., I try to know where Im going at least months in advance. Right now, I have my plan sorted all the way to things can obviously still change tho, so generally I book or months in advance. For example, Ill be staying in Japan in late November, but Im booking the place maybe with AirBnB... maybe not in early July., I have things booked until late November this year and I am about to start looking for December. Summer in Europe I always have booked by February, if not earlier., deleted, Ive booked a few days ahead and had no problems. Some cities like Istanbul have so many places that the prices will be the same, even at the last minute. Look into the supply and demand of Airbnbs in your location., I usually have everything booked 

creating a prompt for models.

In [None]:
prompt = """You are a bot that makes recommendations for travellers. Try to be helpful recommender system.
This is the recommended answer: {relevant_document}
The user question is: {user_input}
Compile a recommendation to the user based on the recommended answer and the user question."""

relevant_document = "Visit the Eiffel Tower in Paris during the evening for a beautiful view of the city lights."
user_input = "I am planning a trip to Paris. What should I do?"

prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)


You are a bot that makes recommendations for travellers. Try to be helpful recommender system.
This is the recommended answer: Visit the Eiffel Tower in Paris during the evening for a beautiful view of the city lights.
The user question is: I am planning a trip to Paris. What should I do?
Compile a recommendation to the user based on the recommended answer and the user question.


### first model:
using HuggingFaceH4/zephyr-7b-beta as the first model.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import shutil

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceH4/zephyr-7b-beta",
    quantization_config = q_config,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")

# Create a text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

# Define generation arguments
generation_args = {
    "max_new_tokens": 200,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

# Store for model1 outputs
model1_outputs = []

# Process each item in the dataset
for result in results:
    query, retrieved_text, score = result

    prompt = f"""You are a bot that makes recommendations for travellers. Try to be a helpful recommender system.
    This is the recommended answer: {retrieved_text}
    The user question is: {query}
    Compile a recommendation to the user based on the recommended answer and the user question."""

    messages = [{"role": "user", "content": prompt}]
    generated_text = pipe(messages, **generation_args)

    # Append the generated text to model1_outputs
    model1_outputs.append(generated_text)

# Store the model1_outputs to a file if needed
with open("model1_outputs.txt", "w") as f:
    for item in model1_outputs:
        f.write("%s\n" % item)

# Clear the cache for model1
# I also deleted the model using terminal
model_dir = f"../root/.cache/huggingface/hub/models--HuggingFaceH4-zephyr-7b-beta"
shutil.rmtree(model_dir, ignore_errors=True)

print("Model 1 cache cleared successfully.")

config.json:   0%|          | 0.00/638 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/1.89G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/1.98G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/816M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]



Model 1 cache cleared successfully.


In [None]:
# Read and print the contents of model1_outputs from the file
model1_outputs = []

with open("model1_outputs.txt", "r") as f:
    for line in f:
        model1_outputs.append(line.strip())

# Print the model1_outputs
for output in model1_outputs:
    print(output)


[{'generated_text': "Based on the recommended answer, it's best to book accommodations in Paris months in advance, especially during popular times like spring. While it's possible to find last-minute deals in some cities, supply and demand can vary in Paris, and prices may not be as low as they are further out. To ensure the best possible experience in Paris, we recommend booking your accommodations at least a few months in advance. This will give you plenty of time to research and compare options, as well as secure the best possible rates. Additionally, if you're planning a trip to Paris during peak tourist season, it's especially important to book early to avoid missing out on your preferred accommodations. Happy travels!"}]
[{'generated_text': "Based on your interest in UNESCO sites and your recent travel to Asia, I would highly recommend visiting the Angkor Wat temple complex in Cambodia. It's a stunning example of Khmer architecture and is considered one of the most significant ar

## second model:
using microsoft/Phi-3-mini-128k-instruct as the second model.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import shutil

# Load the model and tokenizer for model2
model2 = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    quantization_config = q_config,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer2 = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

# Create a text generation pipeline for model2
pipe2 = pipeline(
    "text-generation",
    model=model2,
    tokenizer=tokenizer2,
)

# Define generation arguments for model2
generation_args2 = {
    "max_new_tokens": 200,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

# Store for model2 outputs
model2_outputs = []

# Process each item in the dataset for model2
for result in results:
    query, retrieved_text, score = result

    prompt = f"""You are a bot that makes recommendations for travellers. Try to be a helpful recommender system.
    This is the recommended answer: {retrieved_text}
    The user question is: {query}
    Compile a recommendation to the user based on the recommended answer and the user question."""

    messages = [{"role": "user", "content": prompt}]
    generated_text = pipe2(messages, **generation_args2)

    # Append the generated text to model2_outputs
    model2_outputs.append(generated_text)

# Store the model2_outputs to a file if needed
with open("model2_outputs.txt", "w") as f:
    for item in model2_outputs:
        f.write("%s\n" % item)

# Clear the cache for model2
model2_dir = f"../root/.cache/huggingface/hub/models--microsoft-Phi-3-mini-128k-instruct/"
shutil.rmtree(model2_dir, ignore_errors=True)

print("Model 2 cache cleared successfully.")


config.json:   0%|          | 0.00/3.48k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Model 2 cache cleared successfully.


In [128]:
model2_dir = f"../root/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct/"
shutil.rmtree(model2_dir, ignore_errors=True)

In [None]:
flush()

In [None]:
del model2
del tokenizer2
del pipe2
del generation_args2
del model2_outputs

print("Model 2 memory released.")

Model 2 memory released.


In [None]:
model2_outputs = []

with open("model2_outputs.txt", "r") as f:
    for line in f:
        model2_outputs.append(line.strip())

# Print the model1_outputs
for output in model2_outputs:
    print(output)

[{'generated_text': " Based on your query about the best time to visit Paris, I recommend planning your trip for the spring season. Paris in the spring is generally less crowded and the weather is pleasant, making it an ideal time to explore the city'allee des Champs-Elysées,' 'Eiffel Tower,' 'Louvre Museum,' 'Notre-Dame Cathedral,' and 'Montmartre.' To secure the best accommodations and avoid last-minute price hikes, it's advisable to book your flights and accommodations months in advance. However, if you prefer a more spontaneous approach, you can start looking for options a few weeks before your planned travel date. Enjoy your trip to Paris!"}]
[{'generated_text': " Based on your interest in top attractions in Asia, I would recommend the following UNESCO World Heritage Sites that you might find incredible:\n\n1. The Great Wall of China: This iconic landmark is a must-visit for its historical significance and breathtaking views. I can share your experience of using points and flying 

## third model:
using openchat/openchat-3.6-8b-20240522 as the third model.

In [None]:
# from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import shutil

# Load the model and tokenizer for model3
model3 = AutoModelForCausalLM.from_pretrained(
    "openchat/openchat-3.6-8b-20240522",
    quantization_config = q_config,
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,

)
tokenizer3 = AutoTokenizer.from_pretrained("openchat/openchat-3.6-8b-20240522")

# Create a text generation pipeline for model3
pipe3 = pipeline(
    "text-generation",
    model=model3,
    tokenizer=tokenizer3,
)

# Define generation arguments for model3
generation_args3 = {
    "max_new_tokens": 200,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
# Store for model3 outputs
model3_outputs = []

# Process each item in the dataset for model3
for result in results:
    query, retrieved_text, score = result

    prompt = f"""You are a bot that makes recommendations for travellers. Try to be a helpful recommender system.
    This is the recommended answer: {retrieved_text}
    The user question is: {query}
    Compile a recommendation to the user based on the recommended answer and the user question."""

    messages = [{"role": "user", "content": prompt}]
    generated_text = pipe3(messages, **generation_args3)

    # Append the generated text to model3_outputs
    model3_outputs.append(generated_text)

# Store the model3_outputs to a file if needed
with open("model3_outputs.txt", "w") as f:
    for item in model3_outputs:
        f.write("%s\n" % item)

# Clear the cache for model3
model3_dir = f"../root/.cache/huggingface/hub/models--openchat-openchat-3.6-8b-20240522"
shutil.rmtree(model3_dir, ignore_errors=True)

print("Model 3 cache cleared successfully.")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for

Model 3 cache cleared successfully.


###removing model3 data from disk:

In [None]:
model3_dir = f"/root/.cache/huggingface/hub/models--openchat-openchat-3.6-8b-20240522"
shutil.rmtree(model3_dir, ignore_errors=True)

In [None]:
cd /root/.cache/huggingface/hub/

/root/.cache/huggingface/hub


In [None]:
ls

[0m[01;34mmodels--microsoft--Phi-3-mini-128k-instruct[0m/  [01;34mmodels--sentence-transformers--all-MiniLM-L6-v2[0m/
[01;34mmodels--openchat--openchat-3.6-8b-20240522[0m/   version.txt


In [None]:
cd models--microsoft--Phi-3-mini-128k-instruct/

/root/.cache/huggingface/hub/models--microsoft--Phi-3-mini-128k-instruct


In [None]:
ls

[0m[01;34mmodels--microsoft--Phi-3-mini-128k-instruct[0m/      version.txt
[01;34mmodels--sentence-transformers--all-MiniLM-L6-v2[0m/


In [None]:
rm -r models--openchat--openchat-3.6-8b-20240522/

### printing output for check:

In [None]:
model3_outputs = []

with open("model3_outputs.txt", "r") as f:
    for line in f:
        model3_outputs.append(line.strip())

# Print the model1_outputs
for output in model3_outputs:
    print(output)

[{'generated_text': 'The best time to visit Paris is generally considered to be in the spring, from March to May, or in the fall, from September to November. During these seasons, the weather is milder, and the city is less crowded compared to the peak tourist season in summer. To secure accommodations and avoid higher prices, it is recommended to book your trip to Paris months in advance.'}]
[{'generated_text': 'Based on the recommended answer and your question about top attractions in Asia, I would recommend the following must-see destinations:\n\n1. The Great Wall of China: A UNESCO World Heritage site and one of the most iconic landmarks in Asia. As mentioned in the recommended answer, you can use points to fly via Qatar Airways and experience the Great Wall through a mix of hostels and couchsurfing.\n\n2. Angkor Wat, Cambodia: Another UNESCO World Heritage site, Angkor Wat is the largest religious monument in the world and a stunning example of Khmer architecture.\n\n3. Taj Mahal,

In [None]:
flush()

In [None]:
del model3
del tokenizer3
del pipe3
del generation_args3
del model3_outputs

## 📊 Evaluate the results
Here, you’ve got to put those 3 models to the test. Use the 20 queries you’ve created on each of the 3 models. Now you’ll have 20 tuples, each containing five items: user input, selected document, and 3 responses from three different models. Use a judge model on each tuple to select the best answer. 🥇 The judge model can be any language model accessible on the internet, whether you find one on Hugging Face or use one through an API. 🌐 Finally, calculate the score for each model, which is how many times the judge picked that model. 🏆

In [None]:
# Load the outputs of each model
model1_outputs = []
model2_outputs = []
model3_outputs = []

with open("model1_outputs.txt", "r") as f:
    for line in f:
        model1_outputs.append(line.strip())

with open("model2_outputs.txt", "r") as f:
    for line in f:
        model2_outputs.append(line.strip())

with open("model3_outputs.txt", "r") as f:
    for line in f:
        model3_outputs.append(line.strip())


# Merge the outputs into a single list for each query
merged_outputs = []
for query_idx in range(len(results)):
    query_text = results[query_idx][0]
    retrieved_text = results[query_idx][1]

    response_model1 = model1_outputs[query_idx][19:]
    response_model2 = model2_outputs[query_idx][19:]
    response_model3 = model3_outputs[query_idx][19:]

    merged_outputs.append((query_text, retrieved_text, response_model1, response_model2, response_model3))

In [None]:
print(model1_outputs[0])

[{'generated_text': "Based on the recommended answer, it's best to book accommodations in Paris months in advance, especially during popular times like spring. While it's possible to find last-minute deals in some cities, supply and demand can vary in Paris, and prices may not be as low as they are further out. To ensure the best possible experience in Paris, we recommend booking your accommodations at least a few months in advance. This will give you plenty of time to research and compare options, as well as secure the best possible rates. Additionally, if you're planning a trip to Paris during peak tourist season, it's especially important to book early to avoid missing out on your preferred accommodations. Happy travels!"}]


In [None]:
print(model1_outputs[0][19:])


 "Based on the recommended answer, it's best to book accommodations in Paris months in advance, especially during popular times like spring. While it's possible to find last-minute deals in some cities, supply and demand can vary in Paris, and prices may not be as low as they are further out. To ensure the best possible experience in Paris, we recommend booking your accommodations at least a few months in advance. This will give you plenty of time to research and compare options, as well as secure the best possible rates. Additionally, if you're planning a trip to Paris during peak tourist season, it's especially important to book early to avoid missing out on your preferred accommodations. Happy travels!"}]


In [None]:
print(merged_outputs[0])

("{'query': 'What is the best time to visit Paris?'}", 'Depends on where and when youre going. Paris in spring? Book months out or more Porto in July? Good luck, should have booked last year. Most of Cental America? Just wait until youre on the way., I try to know where Im going at least months in advance. Right now, I have my plan sorted all the way to things can obviously still change tho, so generally I book or months in advance. For example, Ill be staying in Japan in late November, but Im booking the place maybe with AirBnB... maybe not in early July., I have things booked until late November this year and I am about to start looking for December. Summer in Europe I always have booked by February, if not earlier., deleted, Ive booked a few days ahead and had no problems. Some cities like Istanbul have so many places that the prices will be the same, even at the last minute. Look into the supply and demand of Airbnbs in your location., I usually have everything booked months into t

In [107]:
pip install -U FlagEmbedding

Collecting FlagEmbedding
  Downloading FlagEmbedding-1.2.10.tar.gz (141 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.3/141.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: FlagEmbedding
  Building wheel for FlagEmbedding (setup.py) ... [?25l[?25hdone
  Created wheel for FlagEmbedding: filename=FlagEmbedding-1.2.10-py3-none-any.whl size=166100 sha256=83b84668fa18febe3c4f5185a9635f3db88b2816ce7c3200267e816c67659a53
  Stored in directory: /root/.cache/pip/wheels/3b/1d/d2/eec38cd59144f4c9767d7c55cfae8e8feec699071aa41ca5da
Successfully built FlagEmbedding
Installing collected packages: FlagEmbedding
Successfully installed FlagEmbedding-1.2.10


In [109]:
from FlagEmbedding import FlagReranker
from transformers import pipeline

# Initialize the FlagReranker model
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True)

# Initialize the judge model
judge = pipeline('text-classification', model=model, tokenizer=tokenizer)

model1_score = 0
model2_score = 0
model3_score = 0

for query in merged_outputs:
    query, retrieved_text, response_model1, response_model2, response_model3 = query
    responses = [response_model1, response_model2, response_model3]

    query_text = query[9:]

    best_response_idx = 0
    best_score = -float('inf')  # Initialize with negative infinity

    for idx, response in enumerate(responses):
        input_text = query_text + " " + response
        score = reranker.compute_score([query_text, response], normalize=True)

        print(f"Query-Response Pair: {query_text} - {response}")
        print(f"Score: {score}")

        if score > best_score:
            best_score = score
            best_response_idx = idx

    if best_response_idx == 0:
        model1_score += 1
    elif best_response_idx == 1:
        model2_score += 1
    else:
        model3_score += 1

# Print the scores for each model
print(f"Model 1 Score: {model1_score}")
print(f"Model 2 Score: {model2_score}")
print(f"Model 3 Score: {model3_score}")


The model 'LlamaForCausalLM' is not supported for text-classification. Supported models are ['AlbertForSequenceClassification', 'BartForSequenceClassification', 'BertForSequenceClassification', 'BigBirdForSequenceClassification', 'BigBirdPegasusForSequenceClassification', 'BioGptForSequenceClassification', 'BloomForSequenceClassification', 'CamembertForSequenceClassification', 'CanineForSequenceClassification', 'LlamaForSequenceClassification', 'ConvBertForSequenceClassification', 'CTRLForSequenceClassification', 'Data2VecTextForSequenceClassification', 'DebertaForSequenceClassification', 'DebertaV2ForSequenceClassification', 'DistilBertForSequenceClassification', 'ElectraForSequenceClassification', 'ErnieForSequenceClassification', 'ErnieMForSequenceClassification', 'EsmForSequenceClassification', 'FalconForSequenceClassification', 'FlaubertForSequenceClassification', 'FNetForSequenceClassification', 'FunnelForSequenceClassification', 'GemmaForSequenceClassification', 'Gemma2ForSequen

Query-Response Pair:  'What is the best time to visit Paris?'} -  "Based on the recommended answer, it's best to book accommodations in Paris months in advance, especially during popular times like spring. While it's possible to find last-minute deals in some cities, supply and demand can vary in Paris, and prices may not be as low as they are further out. To ensure the best possible experience in Paris, we recommend booking your accommodations at least a few months in advance. This will give you plenty of time to research and compare options, as well as secure the best possible rates. Additionally, if you're planning a trip to Paris during peak tourist season, it's especially important to book early to avoid missing out on your preferred accommodations. Happy travels!"}]
Score: 0.6932653293207841
Query-Response Pair:  'What is the best time to visit Paris?'} -  " Based on your query about the best time to visit Paris, I recommend planning your trip for the spring season. Paris in the 

so we can infer that the third model is the best for my travelling recommender system. =)

## refrences links:

My queries dataset link:

* https://huggingface.co/datasets/shabika0A/travel-recommender

Used models:

* https://huggingface.co/HuggingFaceH4/zephyr-7b-beta.
* https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
* https://huggingface.co/openchat/openchat-3.6-8b-20240522
* https://huggingface.co/BAAI/bge-reranker-v2-m3#usage

my chats with dear GPT-3.5-turbo
* https://shareg.pt/JTLb1c5
* https://shareg.pt/CiuZO6m
* https://shareg.pt/Q5aRxac
* https://shareg.pt/cbJ5925
* https://sharegpt.com/c/tkK6kWK
* https://sharegpt.com/c/zVz2rwg
* https://shareg.pt/TvC2XSK


other links:

* https://learn.deeplearning.ai/courses/quantization-fundamentals/lesson/4/loading-models-by-data-type
* https://www.deeplearning.ai/short-courses/quantization-fundamentals-with-hugging-face/
* https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Getting_the_most_out_of_LLMs.ipynb#scrollTo=yhwHj948GdQy


##Some of the challenges I faced while doing the project were:

- facing with colab GPU usage limits.

- The RAM was getting full while running models, so I quantized the models.

- After running at most two models, the disk was full. So I learned to store the outputs to be able to restart a running session, and then I learned how to remove models and their files from the disk (this cost several hours for me to undo my mistakes in addressing the folders :") )

- I was forgetting to restart the session after some of the installations.