# IUST Computer Engineering Department 🏫
## Introduction to Natural Language Processing 📚 (The Final Project)
### Course Instructor: Dr. Marzieh Davoodabadi Farahani 👩‍🏫
### Project Teaching Assistant: Erfan Moosavi Monazzah (tel: @ErfanMoosavi2000) 📞
-------------------------------------------------------------------------------<br>
The objective of this project is to acquaint you with the fundamentals of Retrieval Augmented Generation (RAG). Be sure to explore various options and address challenges in a creative manner. 🎯

**Project Guidelines** 📝
- Avoid cheating at all costs. If a set of submissions is found to be [plagiarized](https://translate.google.as/?sl=en&tl=fa&text=Very%20hard%20word%2C%20I%20know%2C%20here%27s%20the%20meaning%3A%0Aplagiarized&op=translate), only one will be randomly chosen for grading. The others will fail the project. ❌
- You are allowed to use any document, article, paper, or video as a resource for writing your code, provided you include a link to the material used. 📖
- The use of Language Learning Models (LLMs), ChatBots, and Copilots is encouraged. If you utilize any of these tools, make sure to attach the chat history that led you to the answer to your question, or the code, to this .ipynb document. (You must provide the entire chat, not just the final answer or your initial prompt.) 💻
- You may not submit any additional documents, files, etc., along with this document. Only solutions, codes, explanations, etc., in this document will be graded. 📄
- You are required to implement everything (except the Language Modeling parts) from scratch. The use of libraries like langchain, llama_index, etc., is not permitted for this purpose. 🚫
- Please adhere to the code guidelines provided throughout the documents. 📝 I’ve spent time in a library 📚 crafting all of this, so if you overlook them, you’ll lose the points allocated for that section. ❌
- We need to use GPUs for this assignment, don't forget to turn on GPU usage for your notebook session.

-------------------------------------------------------------------------------<br>
# Alright, let's get started. 🚀

## What is RAG? 🤔
We've all used ChatGPT and experienced moments when it starts to generate content that is often incorrect or unrelated to our query. Do you know why this happens? These Large Language Models (LLMs) are not magical entities; they are simply models trained on a vast amount of text. 📚 You could even consider a significant portion of the internet. However, this is not all the data available in the world, because data is not a static concept. You yourself generate some data every day through your use of the Internet, Social Media, and so on. 🌐💻📱

So, no matter how much data you use to train your LLM, you always end up encountering new data. This is one of the reasons behind the famous ChatGPT response that tells you it only knows things up to a certain date. 📅 Also, these models tend to hallucinate too. It means they provide incorrect answers but in a very convincing manner. 🎭

On the other hand, we have retrieval techniques. Don't worry if it sounds complicated (it actually isn't easy, you may need to take a course to familiarize yourself with these concepts 😅, but that's not necessary for this project), but you use it on a daily basis. You can think of Search Engines (like Google, for example) as a complex form of information retrieval. 🔍

So, one day, people came up with this idea that it would be cool if ChatGPT could search Google for us, read the articles for us, summarize what it read, and tell us that. 📖 So, this is not exactly what RAG is, but it's something similar. We have a corpus (a large amount of data) and a query (what a user typed as input). Now, we search through this corpus using techniques related to vectors and vector databases, and find the most similar items in our corpus to the query. Then, we pass these items to an LLM and ask for a structured, well-formatted, user-friendly output. 📈📊

## I'm Interested in the Technical Details, What Should I Read? 📚🔍
- I strongly recommend reading the [original RAG paper](https://arxiv.org/abs/2005.11401). If you need help understanding the paper or have any questions about it, feel free to reach out to me via Telegram or find me on the second floor of the department in the NLP lab on Sundays and Tuesdays. 📖
- There appears to be a [comprehensive 2.5-hour course](https://www.freecodecamp.org/news/mastering-rag-from-scratch/) available. I haven't personally watched it, but if you find a better one, let me know so I can update this document. 🎥
- Here is [an article](https://www.smashingmagazine.com/2024/01/guide-retrieval-augmented-generation-language-models/) that explains the concepts very well. Initially, I wanted to use this article as the basis for this project, but unfortunately, the llama_index library used in the article seems to be outdated, so most of the code would need to be rewritten. On second thought, I found it more useful to focus on core concepts rather than learning specific libraries. You might want to check out some libraries like langchain or llama_index which provide a lot of tools for RAG. (But not for this project) 📝💡
- Don't hesitate to use Google, ask chatbots about any new concepts and terms. If you use search engine-aware chatbots like Microsoft Copilot, they provide links for each part of their answers which is useful if you want to delve deeper into that part. 🌐🤖
- Lastly, we have [the article](https://learnbybuilding.ai/tutorials/rag-from-scratch) that serves as the foundation for this project. 📚🔍

# Learn
First, we’re going to go through a simple RAG implementation. It’s going to be similar to the article, except for the (LLM) part. For that, I’m going to use Hugging Face. 🤗 I’ll also try to explain the code in simple terms, but feel free to read the article if you prefer their writing style.

## Let's Install the Necessary Libraries 📚🔧
Did you know that using the `--quiet` or `-q` option with the `pip install` command minimizes the output displayed on your screen? 🖥️ This can make your terminal less cluttered. Also, using `-U` will upgrade the libraries if they were previously installed. This is particularly useful for certain libraries like `transformers` that are frequently updated. 🔄

In [None]:
!pip install -U accelerate transformers datasets --quiet
!pip install -i https://pypi.org/simple/ bitsandbytes

# Restart Session
import os
os.kill(os.getpid(), 9)

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m33.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m12.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

## Gather a Corpus 📚
Technically, a corpus refers to a large and structured set of texts. However, for the sake of our discussion, let’s consider our collection as a “corpus”, even though it might not be large in the traditional sense. 😉

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

## Create a Retriever 🕵️‍♂️
Now, we’re going to create a simple retriever. The role of the retriever is to compare the user’s query with a large corpus of text and find those that are most similar in context. (You know what context is by now, don’t you? 😊 If you’ve forgotten, refer back to your initial lectures). For now, let’s say we want to find similar text based on simple similarity metrics. The code is straightforward, and I have faith in you, chief! Dive into the code. 👨‍💻

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

Hey, you may want to look at wikipedia page for [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).

In [None]:
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

## Create a Generator 🖥️
Now, we’re going to create a generator. This will help us compile the information retrieved into a well-structured and user-friendly text.

OK, let's say in a senario, we ask user what they like to do, the their answer is this:

In [None]:
user_input = "I like to hike"

Now by using the retrieval model I find this activity that best fits this user.

In [None]:
relevant_document = return_response(user_input, corpus_of_documents)
print(relevant_document)

Go for a hike and admire the natural scenery.


The answer seems good enough, but we can do better, yeah?

Let’s import a Language Model. I’m going to try out Microsoft Phi-3 because it recently hit the market, and I haven’t had a chance to try it for myself yet. So, I’m seizing this opportunity to do so! 😊👨‍💻

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Downloading the model gonna take a while, use this time to rest your eyes for a bit. 😊👀💤

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.18k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Now we try to get the LLM to become our generator. We simply place the retrieved information and user query in the following prompt and ask the model for well formatted text.

In [None]:
prompt = """You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input."""

In [None]:
prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)

You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: Go for a hike and admire the natural scenery.
The user input is: I like to hike
Compile a recommendation to the user based on the recommended activity and the user input.


In [None]:
messages = [
    {"role": "user", "content": prompt},
]

Here's the augmented generated text

In [None]:
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])

Based on your interest in hiking and our recommended activity, I suggest you embark on a scenic hike in a beautiful natural environment. This will not only allow you to enjoy the physical benefits of hiking but also provide a wonderful opportunity to admire breathtaking landscapes, observe diverse flora and fauna, and experience the tranquility of nature. Don't forget to bring along essentials like water, snacks, and appropriate hiking gear for a safe and enjoyable adventure!


## Very Cool, but Not Perfect! 😎👌
Alright, you’ve just seen a very basic example of RAG. However, there are some issues present. The corpus is small, and the documents in the corpus are short sentences, which causes the Language Model (LM) to generate some text on its own. 📚🤖

Also, our retriever is not very efficient and it may encounter bugs in some cases. For instance, even when users specify that they are not interested in a certain activity, the retriever might still bring up that activity for them. 🐜🔍

So, in this project, you’re going to address some of these issues. The rest of this document consists of some empty cells and tips for you on how to fill them with code. Let’s get coding! 👨‍💻🚀

# The Project

## Determine Your Task 🎯
What do you aim to implement with RAG? A recommender system? 🎁 A chatbot for a website’s FAQ? 💬 A medical advisor? 🩺 Or perhaps something else entirely?

Specify your objective in this cell.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install -U "huggingface_hub[cli]" --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.7/67.7 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
!pip install -U datasets --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.8/40.8 MB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m16.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 16.1.0 w

In [4]:
import pandas as pd

import torch
import torch.nn.functional as F

from tqdm import tqdm

from transformers import pipeline, AutoTokenizer, AutoModel, AutoModelForCausalLM

from datasets import load_dataset
from datasets import Dataset

In [5]:
task_title = "Medical Advice"
url_for_more_information = "https://huggingface.co/datasets/bigbio/pubmed_qa"

print(f"My task is: {task_title}")
print(f'For more information see: {url_for_more_information}')

My task is: Medical Advice
For more information see: https://huggingface.co/datasets/bigbio/pubmed_qa


## 🧐 Find or gather a corpus
Remember the fake corpus? 📚 It’s time to switch things up and use something real. 🌐 You need to use a dataset from  [huggingface datasets](https://huggingface.co/datasets) for this project. 🚀 Don’t use files that are outside of this notebook, this notebook should be able to run on its own without depending on anything external. 💻👍


In [6]:
# Load the PubMedQA dataset
dataset = load_dataset("pubmed_qa", "pqa_labeled")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/5.19k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['pubid', 'question', 'context', 'long_answer', 'final_decision'],
        num_rows: 1000
    })
})

In [7]:
train_ds = dataset['train']
train_ds

Dataset({
    features: ['pubid', 'question', 'context', 'long_answer', 'final_decision'],
    num_rows: 1000
})

In [8]:
corpus_of_documents = train_ds.to_pandas()[['question', 'context', 'long_answer']]
corpus_of_documents.head()

Unnamed: 0,question,context,long_answer
0,Do mitochondria play a role in remodelling lac...,{'contexts': ['Programmed cell death (PCD) is ...,Results depicted mitochondrial dynamics in viv...
1,Landolt C and snellen e acuity: differences in...,{'contexts': ['Assessment of visual acuity dep...,"Using the charts described, there was only a s..."
2,"Syncope during bathing in infants, a pediatric...",{'contexts': ['Apparent life-threatening event...,"""Aquagenic maladies"" could be a pediatric form..."
3,Are the long-term results of the transanal pul...,{'contexts': ['The transanal endorectal pull-t...,Our long-term study showed significantly bette...
4,Can tailored interventions increase mammograph...,{'contexts': ['Telephone counseling and tailor...,The effects of the intervention were most pron...


## 📝 Create some queries
I want you to create 20 queries related to your task. You can use any Language Model you want for this matter, or if you’re feeling strong 💪 and have the time, write it yourself. 🖊️

You need to create a Hugging Face account, format your 20 queries into the accepted dataset format for Hugging Face 🤗 and push it to your Hugging Face account. Be sure to make it public and use it for the evaluation task. 👀

In [9]:
medical_issues_solutions = {
    "issue_1": {
        "question": "What are the common symptoms of diabetes?",
        "response": "Common symptoms of diabetes include increased thirst, frequent urination, extreme fatigue, blurred vision, and slow-healing sores. If you experience any of these symptoms, it's important to consult a healthcare provider for proper diagnosis and treatment."
    },
    "issue_2": {
        "question": "How can I lower my blood pressure naturally?",
        "response": "To lower blood pressure naturally, consider adopting a healthy diet rich in fruits, vegetables, and whole grains, reducing sodium intake, exercising regularly, maintaining a healthy weight, managing stress, and avoiding tobacco and excessive alcohol consumption."
    },
    "issue_3": {
        "question": "What steps can I take to improve my cholesterol levels?",
        "response": "Improving cholesterol levels can be achieved by eating a heart-healthy diet, exercising regularly, quitting smoking, maintaining a healthy weight, and limiting alcohol intake. It's also important to follow any medication regimens prescribed by your doctor."
    },
    "issue_4": {
        "question": "What are the early signs of Alzheimer's disease?",
        "response": "Early signs of Alzheimer's disease include memory loss that disrupts daily life, difficulty planning or solving problems, confusion with time or place, trouble understanding visual images, and problems with speaking or writing. If you notice these symptoms, seek medical advice promptly."
    },
    "issue_5": {
        "question": "How can I manage chronic back pain?",
        "response": "Managing chronic back pain involves a combination of physical activity, stretching exercises, proper posture, ergonomic adjustments at work, over-the-counter pain relievers, and in some cases, physical therapy or medical treatments as advised by your doctor."
    },
    "issue_6": {
        "question": "What are the best ways to prevent heart disease?",
        "response": "Preventing heart disease involves eating a healthy diet, engaging in regular physical activity, maintaining a healthy weight, avoiding tobacco use, managing stress, controlling blood pressure and cholesterol levels, and getting regular medical check-ups."
    },
    "issue_7": {
        "question": "What are the symptoms of a stroke, and what should I do if I suspect one?",
        "response": "Symptoms of a stroke include sudden numbness or weakness in the face, arm, or leg (especially on one side of the body), confusion, trouble speaking, difficulty seeing, dizziness, and severe headache. If you suspect a stroke, seek emergency medical help immediately."
    },
    "issue_8": {
        "question": "How can I boost my immune system naturally?",
        "response": "Boosting your immune system naturally can be achieved by eating a balanced diet rich in fruits and vegetables, getting regular exercise, staying hydrated, getting adequate sleep, managing stress, and practicing good hygiene."
    },
    "issue_9": {
        "question": "What are the causes and treatments for seasonal allergies?",
        "response": "Seasonal allergies are caused by exposure to pollen from trees, grasses, and weeds. Treatments include avoiding allergens, using over-the-counter antihistamines, decongestants, nasal sprays, and in some cases, allergy shots or prescription medications."
    },
    "issue_10": {
        "question": "What lifestyle changes can help with arthritis pain?",
        "response": "Lifestyle changes that can help with arthritis pain include regular physical activity, maintaining a healthy weight, using hot and cold therapies, practicing relaxation techniques, eating an anti-inflammatory diet, and avoiding activities that strain your joints."
    },
    "issue_11": {
        "question": "What should I do if I have frequent headaches?",
        "response": "If you have frequent headaches, keep a headache diary to identify triggers, practice stress management, stay hydrated, get adequate sleep, maintain good posture, and consider over-the-counter pain relievers. Consult a healthcare provider if headaches persist or worsen."
    },
    "issue_12": {
        "question": "How can I tell if I have a food allergy?",
        "response": "Signs of a food allergy include hives, itching, swelling of the lips, face, tongue, or throat, difficulty breathing, abdominal pain, diarrhea, and dizziness. If you suspect a food allergy, consult an allergist for testing and proper diagnosis."
    },
    "issue_13": {
        "question": "What are the symptoms and treatment options for asthma?",
        "response": "Symptoms of asthma include shortness of breath, wheezing, coughing, and chest tightness. Treatment options include inhalers, long-term control medications, avoiding triggers, and in some cases, allergy medications or immunotherapy."
    },
    "issue_14": {
        "question": "How can I maintain good mental health?",
        "response": "Maintaining good mental health involves regular physical activity, a balanced diet, adequate sleep, staying connected with loved ones, practicing mindfulness and relaxation techniques, seeking professional help when needed, and avoiding alcohol and drugs."
    },
    "issue_15": {
        "question": "What should I know about managing diabetes?",
        "response": "Managing diabetes involves monitoring blood sugar levels, following a healthy eating plan, getting regular physical activity, taking prescribed medications, and regularly consulting with your healthcare provider to manage and adjust your treatment plan."
    },
    "issue_16": {
        "question": "How can I prevent osteoporosis?",
        "response": "Preventing osteoporosis involves getting enough calcium and vitamin D, engaging in weight-bearing and muscle-strengthening exercises, avoiding smoking and excessive alcohol consumption, and discussing bone health with your healthcare provider, especially if you have risk factors."
    },
    "issue_17": {
        "question": "What are the best ways to manage stress?",
        "response": "Managing stress can be achieved through regular physical activity, practicing mindfulness and relaxation techniques, maintaining a healthy diet, getting adequate sleep, staying connected with supportive people, and setting aside time for hobbies and interests."
    },
    "issue_18": {
        "question": "What should I do if I experience chest pain?",
        "response": "If you experience chest pain, seek emergency medical help immediately, as it can be a sign of a heart attack. Other potential causes include angina, indigestion, or muscle strain, which should also be evaluated by a healthcare provider."
    },
    "issue_19": {
        "question": "What are the symptoms of dehydration, and how can I prevent it?",
        "response": "Symptoms of dehydration include dry mouth, extreme thirst, dark urine, dizziness, and fatigue. Prevent dehydration by drinking plenty of fluids, especially water, and consuming foods with high water content, especially in hot weather or during physical activity."
    },
    "issue_20": {
        "question": "How can I improve my digestive health?",
        "response": "Improving digestive health involves eating a high-fiber diet, staying hydrated, getting regular physical activity, managing stress, avoiding excessive intake of fatty and processed foods, and incorporating probiotics into your diet."
    }
}

In [10]:
# Convert the issues to a list of dicts
dict_data = {"question": [value["question"] for key, value in medical_issues_solutions.items()], "response": [value["response"] for key, value in medical_issues_solutions.items()]}

data = Dataset.from_dict(dict_data)
data

Dataset({
    features: ['question', 'response'],
    num_rows: 20
})

In [None]:
data.push_to_hub("iMahdiGhazavi/medical-advice-issues")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/iMahdiGhazavi/medical-advice-issues/commit/0888a99951d3756f6adf7440a5eb02ed8f5cd62c', commit_message='Upload dataset', commit_description='', oid='0888a99951d3756f6adf7440a5eb02ed8f5cd62c', pr_url=None, pr_revision=None, pr_num=None)

## 🛠️ Create a Retriever
To create your retriever, you need to use an encoder model. Something like BERT? Nah, BERT is so yesterday. Find something new and shiny! ✨ The basic idea is to encode every document (sentence) in your corpus into a vector space using the same encoder. Then, encode the user query into that same space. With some similarity metrics like dot product, you can find the most similar document to the user’s input and retrieve it. 🎯 You can train your own encoder if you have enough data and resources, 💪 or you can use one of those [ready-made on Hugging Face](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending), like these ones.

In [11]:
device = torch.device('cuda' if torch.cuda.is_available else 'cpu')

tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L12-v2")
encoder_model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L12-v2")
encoder_model.to(device)

tokenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 384, padding_idx=0)
    (position_embeddings): Embedding(512, 384)
    (token_type_embeddings): Embedding(2, 384)
    (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSdpaSelfAttention(
            (query): Linear(in_features=384, out_features=384, bias=True)
            (key): Linear(in_features=384, out_features=384, bias=True)
            (value): Linear(in_features=384, out_features=384, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=384, out_features=384, bias=True)
            (LayerNorm): LayerNorm((384,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False

In [12]:
def get_query_embeddings(texts):
    # Tokenize the texts using the tokenizer
    inputs = tokenizer(texts, return_tensors='pt', max_length=512, padding=True, truncation=True).to(device)

    with torch.no_grad():
        outputs = encoder_model(**inputs)

        # Extract the outputs' last hidden state to get the text embeddings
        hidden_states = outputs.last_hidden_state

    embeddings = hidden_states.mean(dim=1)
    return embeddings


def return_response(corpus, query):
  similarities = []

  # Get texts embeddings first
  query_embedding = get_query_embeddings([query])

  # Calculate similarities
  for doc in tqdm(corpus):
      doc_embedding = get_query_embeddings([doc])
      similarity = F.cosine_similarity(query_embedding, doc_embedding).item()
      similarities.append(similarity)

  relevant_doc = corpus[similarities.index(max(similarities))]
  score = max(similarities)

  return relevant_doc, score

In [13]:
corpus = corpus_of_documents['question'].tolist()

In [None]:
# Dataframe containing the scores between queries and documents
doc_query_score = pd.DataFrame(columns=['query', 'relevant_document', 'similarity_score'])

# Find the similarity scores
for i, query in enumerate(data['question']):
  print(f'Processing Query {i+1}')
  relevant_doc, score = return_response(corpus, query)
  doc_query_score.loc[len(doc_query_score.index)] = [query, relevant_doc, score]

doc_query_score

Processing Query 1


100%|██████████| 1000/1000 [00:08<00:00, 117.46it/s]


Processing Query 2


100%|██████████| 1000/1000 [00:09<00:00, 107.50it/s]


Processing Query 3


100%|██████████| 1000/1000 [00:08<00:00, 117.84it/s]


Processing Query 4


100%|██████████| 1000/1000 [00:08<00:00, 116.34it/s]


Processing Query 5


100%|██████████| 1000/1000 [00:10<00:00, 91.39it/s]


Processing Query 6


100%|██████████| 1000/1000 [00:16<00:00, 60.82it/s]


Processing Query 7


100%|██████████| 1000/1000 [00:09<00:00, 109.31it/s]


Processing Query 8


100%|██████████| 1000/1000 [00:12<00:00, 81.74it/s]


Processing Query 9


100%|██████████| 1000/1000 [00:08<00:00, 124.70it/s]


Processing Query 10


100%|██████████| 1000/1000 [00:09<00:00, 109.52it/s]


Processing Query 11


100%|██████████| 1000/1000 [00:08<00:00, 114.75it/s]


Processing Query 12


100%|██████████| 1000/1000 [00:08<00:00, 118.84it/s]


Processing Query 13


100%|██████████| 1000/1000 [00:09<00:00, 109.04it/s]


Processing Query 14


100%|██████████| 1000/1000 [00:08<00:00, 118.98it/s]


Processing Query 15


100%|██████████| 1000/1000 [00:08<00:00, 114.99it/s]


Processing Query 16


100%|██████████| 1000/1000 [00:09<00:00, 109.60it/s]


Processing Query 17


100%|██████████| 1000/1000 [00:08<00:00, 113.55it/s]


Processing Query 18


100%|██████████| 1000/1000 [00:08<00:00, 112.89it/s]


Processing Query 19


100%|██████████| 1000/1000 [00:09<00:00, 108.40it/s]


Processing Query 20


100%|██████████| 1000/1000 [00:07<00:00, 125.40it/s]


Unnamed: 0,query,relevant_document,similarity_score
0,What are the common symptoms of diabetes?,Are complex coronary lesions more frequent in ...,0.546383
1,How can I lower my blood pressure naturally?,Does blood pressure change in treated hyperten...,0.423323
2,What steps can I take to improve my cholestero...,Cholesterol screening in school children: is f...,0.480346
3,What are the early signs of Alzheimer's disease?,Memory-provoked rCBF-SPECT as a diagnostic too...,0.57361
4,How can I manage chronic back pain?,Does high blood pressure reduce the risk of ch...,0.600094
5,What are the best ways to prevent heart disease?,The Omega-3 Index: a new risk factor for death...,0.480054
6,"What are the symptoms of a stroke, and what sh...",Are stroke patients' reports of home blood pre...,0.528878
7,How can I boost my immune system naturally?,Vitamin D supplementation and regulatory T cel...,0.454598
8,What are the causes and treatments for seasona...,Is the atopy patch test with house dust mites ...,0.463463
9,What lifestyle changes can help with arthritis...,Pharmacologic regimens for knee osteoarthritis...,0.517769


In [None]:
# Save the scores into a csv file in order not to replicate this process later
doc_query_score.to_csv('/content/drive/MyDrive/doc_query_score.csv')

In [14]:
# Load the scores saved in the csv file
doc_query_score = pd.read_csv('/content/drive/MyDrive/doc_query_score.csv')
doc_query_score.head()

Unnamed: 0.1,Unnamed: 0,query,relevant_document,similarity_score
0,0,What are the common symptoms of diabetes?,Are complex coronary lesions more frequent in ...,0.546383
1,1,How can I lower my blood pressure naturally?,Does blood pressure change in treated hyperten...,0.423323
2,2,What steps can I take to improve my cholestero...,Cholesterol screening in school children: is f...,0.480346
3,3,What are the early signs of Alzheimer's disease?,Memory-provoked rCBF-SPECT as a diagnostic too...,0.57361
4,4,How can I manage chronic back pain?,Does high blood pressure reduce the risk of ch...,0.600094


## 🎛️ Create a Generator
For this part, I practically handed you the whole code on a silver platter. 🍽️ But since we know you’re an explorer at heart and love trying new things, you can’t use the model I previously used. 😈 You have to try 3 different generators and compare them based on the quality of their answers. 🧪📊 [These might come in handy](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In [15]:
prompt = """You are a bot that makes recommendations for medical advice. Try to be a helpful medical advisor.
This is the retrieved information: {relevant_doc_answer}
The user input is: {user_input}
Provide a comprehensive response based on the retrieved information and the user input."""

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

In [25]:
def get_relevant_doc(query):
  for i, row in doc_query_score.iterrows():
    if query == row['query']:
      return row['relevant_document']

  raise ValueError(f"No relevant document was found for query: {query}")


def get_relevant_document_answer(relevant_doc):
  for i, row in corpus_of_documents.iterrows():
    if relevant_doc == row['question']:
      return row['long_answer']

  raise  ValueError(f"No answer was found for query: {relevant_doc}")



def get_all_inferences(user_inputs, pipe, prompt):
  inferences = pd.DataFrame(columns=['query', 'relevant_document', 'relevant_document_answer', 'model_inference'])

  for i, user_input in tqdm(enumerate(user_inputs)):
    relevant_doc = get_relevant_doc(user_input)
    relevant_doc_answer = get_relevant_document_answer(relevant_doc)

    prompt = prompt.replace("{relevant_doc_answer}", relevant_doc_answer).replace("{user_input}", user_input)

    messages = [
        {"role": "user", "content": prompt},
    ]

    output = pipe(messages, **generation_args)[0]['generated_text']
    inferences.loc[len(inferences.index)] = [user_input, relevant_doc, relevant_doc_answer, output]

  return inferences

### 1st Generator

In [17]:
!pip install flash_attn --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.6/2.6 MB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for flash_attn (setup.py) ... [?25l[?25hdone


In [None]:
import gc
import torch

gc.collect()
torch.cuda.empty_cache()
!nvidia-smi

Sat Jul  6 20:07:53 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   75C    P0              33W /  70W |  15083MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
gen1 = AutoModelForCausalLM.from_pretrained(
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device_map="cuda",
    torch_dtype=torch.bfloat16
).to(device)

gen1_pipe = pipeline(
    "text-generation",
    model=gen1,
    tokenizer=tokenizer,
)

In [None]:
all_inferences = get_all_inferences(data['question'], gen1_pipe, prompt)
all_inferences.to_csv('/content/drive/MyDrive/TinyLlama-1.1B-Chat-v1.0_all_inferences.csv')

10it [02:02, 13.65s/it]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
20it [04:11, 12.58s/it]


In [18]:
tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
gen1 = AutoModelForCausalLM.from_pretrained(
    "distilgpt2",
    device_map="cuda",
    torch_dtype=torch.bfloat16
).to(device)

gen1_pipe = pipeline(
    "text-generation",
    model=gen1,
    tokenizer=tokenizer,
)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [26]:
all_inferences = get_all_inferences(data['question'], gen1_pipe, prompt)
all_inferences.to_csv('/content/drive/MyDrive/distilgpt2_all_inferences.csv')

0it [00:00, ?it/s]No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_t

### 2nd Generator

In [None]:
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
gen2 = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
).to(device)

gen2_pipe = pipeline(
    "text-generation",
    model=gen2,
    tokenizer=tokenizer,
)

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

In [None]:
all_inferences = get_all_inferences(data['question'], gen2_pipe, prompt)
all_inferences.to_csv('/content/drive/MyDrive/microsoft-Phi-3-mini-4k-instruct_all_inferences.csv')



### 3nd Generator

In [None]:
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
gen3 = AutoModelForCausalLM.from_pretrained(
    "HuggingFaceH4/zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto"
).to(device)

gen3_pipe = pipeline(
    "text-generation",
    model=gen3,
    tokenizer=tokenizer,
)

In [None]:
all_inferences = get_all_inferences(data['question'], gen3_pipe)
all_inferences.to_csv('/content/drive/MyDrive/HuggingFaceH4/zephyr-7b-beta_all_inferences.csv')

## 📊 Evaluate the results
Here, you’ve got to put those 3 models to the test. Use the 20 queries you’ve created on each of the 3 models. Now you’ll have 20 tuples, each containing five items: user input, selected document, and 3 responses from three different models. Use a judge model on each tuple to select the best answer. 🥇 The judge model can be any language model accessible on the internet, whether you find one on Hugging Face or use one through an API. 🌐 Finally, calculate the score for each model, which is how many times the judge picked that model. 🏆

In [41]:
gen1_inferences.loc[0]

Unnamed: 0                                                                  0
query                               What are the common symptoms of diabetes?
relevant_document           Are complex coronary lesions more frequent in ...
relevant_document_answer    Complex coronary lesions such as bifurcation a...
model_inference             The common symptoms of diabetes include:\n\n1....
Name: 0, dtype: object

In [42]:
gen1_inferences = pd.read_csv('/content/drive/MyDrive/TinyLlama-1.1B-Chat-v1.0_all_inferences.csv')
gen2_inferences = pd.read_csv('/content/drive/MyDrive/microsoft-Phi-3-mini-4k-instruct_all_inferences.csv')
gen3_inferences = pd.read_csv('/content/drive/MyDrive/HuggingFaceH4-zephyr-7b-beta_all_inferences.csv')

model_inferences = pd.DataFrame(columns=['query', 'advice', 'gen1_inference', 'gen2_inference', 'gen3_inference'])

for i in range(20):
  query = gen1_inferences.loc[i]['query']
  advice = gen1_inferences.loc[i]['relevant_document_answer']
  gen1_inference = gen1_inferences.loc[i]
  gen2_inference = gen2_inferences.loc[i]
  gen3_inference = gen3_inferences.loc[i]

  model_inferences.loc[len(model_inferences.index)] = [
      query,
      advice,
      gen1_inference['model_inference'],
      gen2_inference['model_inference'],
      gen3_inference['model_inference']
  ]

model_inferences

Unnamed: 0,query,advice,gen1_inference,gen2_inference,gen3_inference
0,What are the common symptoms of diabetes?,Complex coronary lesions such as bifurcation a...,The common symptoms of diabetes include:\n\n1....,"Based on the retrieved information, it is evi...","Diabetes, also known as diabetes mellitus, is ..."
1,How can I lower my blood pressure naturally?,Systolic BP measured by the nurse in treated h...,"As a medical advisor, I would recommend that y...","Based on the retrieved information, it is evi...","Based on the retrieved information, it suggest..."
2,What steps can I take to improve my cholestero...,"We suggest that regardless of family history, ...","As a medical advisor, we suggest that regardle...",Based on the retrieved information and your c...,"Based on the information provided, here are so..."
3,What are the early signs of Alzheimer's disease?,Memory provocation increased the sensitivity o...,"As a medical advisor, I would like to provide ...","Based on the retrieved information, it appear...",While memory loss is a well-known symptom of A...
4,How can I manage chronic back pain?,Results for low back pain are consistent with ...,"As a medical advisor, I would recommend the fo...","I'm sorry, but I cannot provide medical advic...","Based on the retrieved information, it suggest..."
5,What are the best ways to prevent heart disease?,"The Omega-3 Index may represent a novel, physi...","As a medical advisor, I would recommend that y...","Based on the retrieved information, the Omega...","Based on the retrieved information, one of the..."
6,"What are the symptoms of a stroke, and what sh...",This group of elderly stroke patients were abl...,"As a medical advisor, I would like to provide ...","I'm sorry, but I cannot provide medical advic...",Symptoms of a stroke can vary depending on the...
7,How can I boost my immune system naturally?,Vitamin D supplementation was associated with ...,As a bot that makes recommendations for medica...,"Based on the retrieved information, vitamin D...",Based on the retrieved information and your re...
8,What are the causes and treatments for seasona...,These observations lead to conclude that the A...,"As a medical advisor, I would like to provide ...","Seasonal allergies, also known as allergic rh...","Based on your user input, I'd like to inform y..."
9,What lifestyle changes can help with arthritis...,The cost-effectiveness of DMOADs for OA preven...,"As a medical advisor, I would recommend the fo...","Based on the retrieved information, it is imp...",Based on the retrieved information and your us...


In [60]:
judge_model = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device_map="auto")

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [83]:
import numpy as np

def evaluate_generators(model_inferences):
  # Initialize scores
  scores = {'generator_1': 0, 'generator_2': 0, 'generator_3': 0}

  for idx, row in model_inferences.iterrows():
    row_query = row['query']
    row_gen1 = row['gen1_inference']
    row_gen2 = row['gen2_inference']
    row_gen3 = row['gen3_inference']
    gen1_prompt = f'Question: {row_query}\nAnswer: {row_gen1}'
    gen2_prompt = f'Question: {row_query}\nAnswer: {row_gen2}'
    gen3_prompt = f'Question: {row_query}\nAnswer: {row_gen3}'

    gen1_result = judge_model(gen1_prompt, candidate_labels=['relevant', 'irrelevant'])
    gen1_result = gen1_result['scores'][gen1_result['labels'].index('relevant')]

    gen2_result = judge_model(gen2_prompt, candidate_labels=['relevant', 'irrelevant'])
    gen2_result = gen2_result['scores'][gen2_result['labels'].index('relevant')]

    gen3_result = judge_model(gen3_prompt, candidate_labels=['relevant', 'irrelevant'])
    gen3_result = gen3_result['scores'][gen3_result['labels'].index('relevant')]

    best_inference = ['generator_1', 'generator_2', 'generator_3'][np.argmax([gen1_result, gen2_result, gen3_result])]

    if best_inference == 'generator_1':
        scores['generator_1'] += 1
    elif best_inference == 'generator_2':
        scores['generator_2'] += 1
    elif best_inference == 'generator_3':
        scores['generator_3'] += 1
    else:
        print(f"Unexpected response for row {idx}: {best_inference}")

  return scores

In [84]:
scores = evaluate_generators(model_inferences)
print(f'Generator Scors are as follows: {scores}')

Generator Scors are as follows: {'generator_1': 9, 'generator_2': 1, 'generator_3': 10}


### Now that I'm writing this message, it's 3 in the morning and I'm tired as fox. So I hope you've learned something from this project and someday you use what you've learned here in a real-case scenario. Good Luck! ✌️