# IUST Computer Engineering Department 🏫
## Introduction to Natural Language Processing 📚 (The Final Project)
-------------------------------------------------------------------------------<br>

## Navid Ebrahimi
The objective of this project is to acquaint you with the fundamentals of Retrieval Augmented Generation (RAG). Be sure to explore various options and address challenges in a creative manner. 🎯

**Project Guidelines** 📝
- Avoid cheating at all costs. If a set of submissions is found to be [plagiarized](https://translate.google.as/?sl=en&tl=fa&text=Very%20hard%20word%2C%20I%20know%2C%20here%27s%20the%20meaning%3A%0Aplagiarized&op=translate), only one will be randomly chosen for grading. The others will fail the project. ❌
- You are allowed to use any document, article, paper, or video as a resource for writing your code, provided you include a link to the material used. 📖
- The use of Language Learning Models (LLMs), ChatBots, and Copilots is encouraged. If you utilize any of these tools, make sure to attach the chat history that led you to the answer to your question, or the code, to this .ipynb document. (You must provide the entire chat, not just the final answer or your initial prompt.) 💻
- You may not submit any additional documents, files, etc., along with this document. Only solutions, codes, explanations, etc., in this document will be graded. 📄
- You are required to implement everything (except the Language Modeling parts) from scratch. The use of libraries like langchain, llama_index, etc., is not permitted for this purpose. 🚫
- Please adhere to the code guidelines provided throughout the documents. 📝 I’ve spent time in a library 📚 crafting all of this, so if you overlook them, you’ll lose the points allocated for that section. ❌
- We need to use GPUs for this assignment, don't forget to turn on GPU usage for your notebook session.

-------------------------------------------------------------------------------<br>
# Alright, let's get started. 🚀

## What is RAG? 🤔
We've all used ChatGPT and experienced moments when it starts to generate content that is often incorrect or unrelated to our query. Do you know why this happens? These Large Language Models (LLMs) are not magical entities; they are simply models trained on a vast amount of text. 📚 You could even consider a significant portion of the internet. However, this is not all the data available in the world, because data is not a static concept. You yourself generate some data every day through your use of the Internet, Social Media, and so on. 🌐💻📱

So, no matter how much data you use to train your LLM, you always end up encountering new data. This is one of the reasons behind the famous ChatGPT response that tells you it only knows things up to a certain date. 📅 Also, these models tend to hallucinate too. It means they provide incorrect answers but in a very convincing manner. 🎭

On the other hand, we have retrieval techniques. Don't worry if it sounds complicated (it actually isn't easy, you may need to take a course to familiarize yourself with these concepts 😅, but that's not necessary for this project), but you use it on a daily basis. You can think of Search Engines (like Google, for example) as a complex form of information retrieval. 🔍

So, one day, people came up with this idea that it would be cool if ChatGPT could search Google for us, read the articles for us, summarize what it read, and tell us that. 📖 So, this is not exactly what RAG is, but it's something similar. We have a corpus (a large amount of data) and a query (what a user typed as input). Now, we search through this corpus using techniques related to vectors and vector databases, and find the most similar items in our corpus to the query. Then, we pass these items to an LLM and ask for a structured, well-formatted, user-friendly output. 📈📊

## I'm Interested in the Technical Details, What Should I Read? 📚🔍
- I strongly recommend reading the [original RAG paper](https://arxiv.org/abs/2005.11401). If you need help understanding the paper or have any questions about it, feel free to reach out to me via Telegram or find me on the second floor of the department in the NLP lab on Sundays and Tuesdays. 📖
- There appears to be a [comprehensive 2.5-hour course](https://www.freecodecamp.org/news/mastering-rag-from-scratch/) available. I haven't personally watched it, but if you find a better one, let me know so I can update this document. 🎥
- Here is [an article](https://www.smashingmagazine.com/2024/01/guide-retrieval-augmented-generation-language-models/) that explains the concepts very well. Initially, I wanted to use this article as the basis for this project, but unfortunately, the llama_index library used in the article seems to be outdated, so most of the code would need to be rewritten. On second thought, I found it more useful to focus on core concepts rather than learning specific libraries. You might want to check out some libraries like langchain or llama_index which provide a lot of tools for RAG. (But not for this project) 📝💡
- Don't hesitate to use Google, ask chatbots about any new concepts and terms. If you use search engine-aware chatbots like Microsoft Copilot, they provide links for each part of their answers which is useful if you want to delve deeper into that part. 🌐🤖
- Lastly, we have [the article](https://learnbybuilding.ai/tutorials/rag-from-scratch) that serves as the foundation for this project. 📚🔍

# Learn
First, we’re going to go through a simple RAG implementation. It’s going to be similar to the article, except for the (LLM) part. For that, I’m going to use Hugging Face. 🤗 I’ll also try to explain the code in simple terms, but feel free to read the article if you prefer their writing style.

## Let's Install the Necessary Libraries 📚🔧
Did you know that using the `--quiet` or `-q` option with the `pip install` command minimizes the output displayed on your screen? 🖥️ This can make your terminal less cluttered. Also, using `-U` will upgrade the libraries if they were previously installed. This is particularly useful for certain libraries like `transformers` that are frequently updated. 🔄

In [None]:
!pip install -U accelerate transformers --quiet

## Gather a Corpus 📚
Technically, a corpus refers to a large and structured set of texts. However, for the sake of our discussion, let’s consider our collection as a “corpus”, even though it might not be large in the traditional sense. 😉

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

## Create a Retriever 🕵️‍♂️
Now, we’re going to create a simple retriever. The role of the retriever is to compare the user’s query with a large corpus of text and find those that are most similar in context. (You know what context is by now, don’t you? 😊 If you’ve forgotten, refer back to your initial lectures). For now, let’s say we want to find similar text based on simple similarity metrics. The code is straightforward, and I have faith in you, chief! Dive into the code. 👨‍💻

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

Hey, you may want to look at wikipedia page for [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).

In [None]:
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

## Create a Generator 🖥️
Now, we’re going to create a generator. This will help us compile the information retrieved into a well-structured and user-friendly text.

OK, let's say in a senario, we ask user what they like to do, the their answer is this:

In [None]:
user_input = "I like to hike"

Now by using the retrieval model I find this activity that best fits this user.

In [None]:
relevant_document = return_response(user_input, corpus_of_documents)
print(relevant_document)

Go for a hike and admire the natural scenery.


The answer seems good enough, but we can do better, yeah?

Let’s import a Language Model. I’m going to try out Microsoft Phi-3 because it recently hit the market, and I haven’t had a chance to try it for myself yet. So, I’m seizing this opportunity to do so! 😊👨‍💻

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Downloading the model gonna take a while, use this time to rest your eyes for a bit. 😊👀💤

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Now we try to get the LLM to become our generator. We simply place the retrieved information and user query in the following prompt and ask the model for well formatted text.

In [None]:
prompt = """You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input."""

In [None]:
prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)

You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: Go for a hike and admire the natural scenery.
The user input is: I like to hike
Compile a recommendation to the user based on the recommended activity and the user input.


In [None]:
messages = [
    {"role": "user", "content": prompt},
]

In [None]:
prompt

'You are a bot that makes recommendations for activities. Try to be helpful recommender system.\nThis is the recommended activity: doctor: depends on severity. covid-19 pandemic at this time, so a doctor on video may consult by video instead of requiring an in-person visit. flu-like symptoms can be from a strep throat infection, a cold or influenza, or from some other cause like covid-19. usually, a person calls the doctor if the symptoms are bothersome, serious, recurrent, or persistent. covid-19 testing depends on local availability. (3/22/20)\nThe user input is: What are the symptoms of COVID-19?\nCompile a recommendation to the user based on the recommended activity and the user input.'

Here's the augmented generated text

In [None]:
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])



 Based on your query about the symptoms of COVID-19, it would be beneficial for you to consult with a healthcare professional. Given the current COVID-19 pandemic, many doctors are offering video consultations to ensure everyone's safety. This allows you to discuss your concerns and get accurate information without the need for an in-person visit. Symptoms of COVID-19 can include fever, cough, and difficulty breathing among others. However, it's important to remember that these symptoms can also be associated with other illnesses like the flu or a common cold. Therefore, a healthcare professional can provide the most accurate advice based on your specific situation.


## Very Cool, but Not Perfect! 😎👌
Alright, you’ve just seen a very basic example of RAG. However, there are some issues present. The corpus is small, and the documents in the corpus are short sentences, which causes the Language Model (LM) to generate some text on its own. 📚🤖

Also, our retriever is not very efficient and it may encounter bugs in some cases. For instance, even when users specify that they are not interested in a certain activity, the retriever might still bring up that activity for them. 🐜🔍

So, in this project, you’re going to address some of these issues. The rest of this document consists of some empty cells and tips for you on how to fill them with code. Let’s get coding! 👨‍💻🚀

# The Project

## Determine Your Task 🎯
What do you aim to implement with RAG? A recommender system? 🎁 A chatbot for a website’s FAQ? 💬 A medical advisor? 🩺 Or perhaps something else entirely?

Specify your objective in this cell.

In [None]:
task_title = "A medical advisor"
url_for_more_information = "https://medium.com/@mohdzeesh2002/dr-insights-build-your-own-llm-rag-medical-advisor-using-langchain-mistral-and-chromadb-9b678143ecbd"

print(f"My task is: {task_title}")
print(f'For more information see: {url_for_more_information}')

My task is: A medical advisor
For more information see: https://medium.com/@mohdzeesh2002/dr-insights-build-your-own-llm-rag-medical-advisor-using-langchain-mistral-and-chromadb-9b678143ecbd


In [None]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## 🧐 Find or gather a corpus
Remember the fake corpus? 📚 It’s time to switch things up and use something real. 🌐 You need to use a dataset from  [huggingface datasets](https://huggingface.co/datasets) for this project. 🚀 Don’t use files that are outside of this notebook, this notebook should be able to run on its own without depending on anything external. 💻👍


In [None]:
!pip install tensorflow-datasets -q
!pip install datasets -q
!pip install tqdm -q
!pip install transformers einops accelerate bitsandbytes -q
!pip install openai -q
!pip install tenacity -q

In [None]:
!pip install datasets -q


In [None]:
from datasets import load_dataset, DatasetDict, concatenate_datasets

### Medical Dialog Dataset

In [None]:
# Load your dataset
medical_dialog_dataset = DatasetDict({
    "train": load_dataset('medical_dialog', 'processed.en', split='train'),
    "validation": load_dataset('medical_dialog', 'processed.en', split='validation'),
    "test": load_dataset('medical_dialog', 'processed.en', split='test')
})

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


In [None]:
medical_dialog_dataset

DatasetDict({
    train: Dataset({
        features: ['description', 'utterances'],
        num_rows: 482
    })
    validation: Dataset({
        features: ['description', 'utterances'],
        num_rows: 60
    })
    test: Dataset({
        features: ['description', 'utterances'],
        num_rows: 61
    })
})

In [None]:
medical_dialog_dataset['train']['utterances'][0]

['patient: throat a bit sore and want to get a good imune booster, especially in light of the virus. please advise. have not been in contact with nyone with the virus.',
 "doctor: during this pandemic. throat pain can be from a strep throat infection (antibiotics needed), a cold or influenza or other virus, or from some other cause such as allergies or irritants. usually, a person sees the doctor (call first) if the sore throat is bothersome, recurrent, or doesn't go away quickly. covid-19 infections tend to have cough, whereas strep throat usually lacks cough but has more throat pain. (3/21/20)"]

### Medical Meadow Wikidoc Dataset

In [None]:
medical_meadow_wikidoc_dataset = load_dataset('medalpaca/medical_meadow_wikidoc')

In [None]:
medical_meadow_wikidoc_dataset

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 10000
    })
})

In [None]:
medical_meadow_wikidoc_dataset['train'][0]

{'input': "Can you provide an overview of the lung's squamous cell carcinoma?",
 'output': 'Squamous cell carcinoma of the lung may be classified according to the WHO histological classification system into 4 main types: papillary, clear cell, small cell, and basaloid.',
 'instruction': 'Answer this question truthfully'}

## 📝 Create some queries
I want you to create 20 queries related to your task. You can use any Language Model you want for this matter, or if you’re feeling strong 💪 and have the time, write it yourself. 🖊️

You need to create a Hugging Face account, format your 20 queries into the accepted dataset format for Hugging Face 🤗 and push it to your Hugging Face account. Be sure to make it public and use it for the evaluation task. 👀

In [2]:
import datasets
from datasets import Dataset
import pandas as pd

# Define the queries
queries = [
    "What are the symptoms of diabetes?",
    "How can I manage high blood pressure?",
    "What are the side effects of taking aspirin daily?",
    "Can you explain the causes of chronic back pain?",
    "What diet should I follow for heart health?",
    "How often should I get a general health check-up?",
    "What are the treatment options for asthma?",
    "Can stress cause physical illnesses?",
    "What are the early signs of Alzheimer's disease?",
    "How can I improve my mental health?",
    "What are the common symptoms of a stroke?",
    "How do I know if I have a food allergy?",
    "What is the best way to quit smoking?",
    "Can you explain the stages of cancer?",
    "What should I do if I have a high fever?",
    "How can I prevent the common cold?",
    "What are the symptoms of COVID-19?",
    "How is arthritis diagnosed and treated?",
    "What are the benefits of regular exercise?",
    "Can you explain the different types of headaches?"
]

# Create a pandas DataFrame
ds = datasets.Dataset.from_dict({'query': queries})


In [None]:
!sudo apt-get install git-lfs -q

Reading package lists...
Building dependency tree...
Reading state information...
git-lfs is already the newest version (3.0.2-1ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


In [None]:
!git config --global credential.helper store

In [None]:
!huggingface-cli login

In [None]:
ds.push_to_hub('Navidium/Medical_Queries')

## 🛠️ Create a Retriever
To create your retriever, you need to use an encoder model. Something like BERT? Nah, BERT is so yesterday. Find something new and shiny! ✨ The basic idea is to encode every document (sentence) in your corpus into a vector space using the same encoder. Then, encode the user query into that same space. With some similarity metrics like dot product, you can find the most similar document to the user’s input and retrieve it. 🎯 You can train your own encoder if you have enough data and resources, 💪 or you can use one of those [ready-made on Hugging Face](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending), like these ones.

In [None]:
from transformers import AutoTokenizer, AutoModel
from tqdm import tqdm

### BioBERT Model

In [None]:
# Load the BioBERT tokenizer and model
biobert_tokenizer = AutoTokenizer.from_pretrained("dmis-lab/biobert-base-cased-v1.1")
biobert_model = AutoModel.from_pretrained("dmis-lab/biobert-base-cased-v1.1")
biobert_model = biobert_model.to(device)

### MiniLML6 Model

In [None]:
# Load the miniLML6 tokenizer and model
miniLML6_tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
miniLML6_model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
miniLML6_model = miniLML6_model.to(device)

### Select Model and Tokenizer

In [None]:
import numpy as np
import torch

batch_size = 32

model = miniLML6_model
tokenizer = miniLML6_tokenizer

### Medical Dialog + Retriever

In [None]:
# Function to process a batch of utterances
def process_batch(model, tokenizer, batch):
    inputs_pa = tokenizer([item[0] for item in batch], padding=True, truncation=True, return_tensors="pt").to(device)
    inputs_do = tokenizer([item[1] for item in batch], padding=True, truncation=True, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs_pa = model(**inputs_pa)
        outputs_do = model(**inputs_do)

    embeddings_pa = outputs_pa.last_hidden_state.mean(dim=1).cpu().numpy()
    embeddings_do = outputs_do.last_hidden_state.mean(dim=1).cpu().numpy()

    return embeddings_pa, embeddings_do

# Iterate over the dataset in batches
embeddings_pa_list = []
embeddings_do_list = []

for i in tqdm(range(0, len(medical_dialog_dataset['train']), batch_size)):
    batch = medical_dialog_dataset['train']['utterances'][i:i + batch_size]
    embeddings_pa, embeddings_do = process_batch(model, tokenizer, batch)
    embeddings_pa_list.extend(embeddings_pa)
    embeddings_do_list.extend(embeddings_do)

# Convert the lists to numpy arrays if needed
embeddings_pa_array = np.array(embeddings_pa_list)
embeddings_do_array = np.array(embeddings_do_list)

### Medical Meadow Wikidoc + Retriever

In [None]:
# Function to process a batch of utterances
def process_batch(model, tokenizer, batch):
    inputs_pa = tokenizer([item for item in batch['input']], padding=True, truncation=True, return_tensors="pt").to(device)
    inputs_do = tokenizer([item for item in batch['output']], padding=True, truncation=True, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs_pa = model(**inputs_pa)
        outputs_do = model(**inputs_do)

    embeddings_pa = outputs_pa.last_hidden_state.mean(dim=1).cpu().numpy()
    embeddings_do = outputs_do.last_hidden_state.mean(dim=1).cpu().numpy()

    return embeddings_pa, embeddings_do

# Iterate over the dataset in batches
embeddings_pa_list = []
embeddings_do_list = []

medical_meadow_wikidoc_dataset_size = len(medical_meadow_wikidoc_dataset['train'])
for i in tqdm(range(0, medical_meadow_wikidoc_dataset_size, batch_size)):
    batch = medical_meadow_wikidoc_dataset['train'][i:min(i + batch_size, medical_meadow_wikidoc_dataset_size)]
    embeddings_pa, embeddings_do = process_batch(model, tokenizer, batch)
    embeddings_pa_list.extend(embeddings_pa)
    embeddings_do_list.extend(embeddings_do)

# Convert the lists to numpy arrays if needed
embeddings_pa_array = np.array(embeddings_pa_list)
embeddings_do_array = np.array(embeddings_do_list)

100%|██████████| 313/313 [00:53<00:00,  5.83it/s]


In [None]:
len(embeddings_pa_array)

10000

In [None]:
query_inputs = tokenizer(queries, padding=True, truncation=True, return_tensors="pt").to(device)

with torch.no_grad():
    query_outputs = model(**query_inputs)

query_embeddings = query_outputs.last_hidden_state.mean(dim=1).cpu().numpy()

In [None]:
def compute_similarity(embeddings_pa, query_embeddings):
    embeddings_pa_norm = embeddings_pa / np.linalg.norm(embeddings_pa, axis=1, keepdims=True)
    query_embeddings_norm = query_embeddings / np.linalg.norm(query_embeddings, axis=1, keepdims=True)

    similarities = np.dot(query_embeddings_norm, embeddings_pa_norm.T)
    return similarities

similarity_scores = compute_similarity(embeddings_pa_array, query_embeddings)

most_similar_indices = np.argmax(similarity_scores, axis=1)

for i, query in enumerate(queries):
    most_similar_utterance_index = most_similar_indices[i]
    print(f"Query: {query}")
    print(f"Most similar utterance index: {most_similar_utterance_index}")
    print(f"Similarity score: {similarity_scores[i][most_similar_utterance_index]}")
    print()


Query: What are the symptoms of diabetes?
Most similar utterance index: 9627
Similarity score: 0.7933228015899658

Query: How can I manage high blood pressure?
Most similar utterance index: 3184
Similarity score: 0.6937186121940613

Query: What are the side effects of taking aspirin daily?
Most similar utterance index: 2175
Similarity score: 0.6134620904922485

Query: Can you explain the causes of chronic back pain?
Most similar utterance index: 1718
Similarity score: 0.8491851091384888

Query: What diet should I follow for heart health?
Most similar utterance index: 294
Similarity score: 0.6903603076934814

Query: How often should I get a general health check-up?
Most similar utterance index: 2701
Similarity score: 0.3770182728767395

Query: What are the treatment options for asthma?
Most similar utterance index: 302
Similarity score: 0.7672215104103088

Query: Can stress cause physical illnesses?
Most similar utterance index: 4699
Similarity score: 0.7072659730911255

Query: What are

## 🎛️ Create a Generator
For this part, I practically handed you the whole code on a silver platter. 🍽️ But since we know you’re an explorer at heart and love trying new things, you can’t use the model I previously used. 😈 You have to try 3 different generators and compare them based on the quality of their answers. 🧪📊 [These might come in handy](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, AutoConfig, BitsAndBytesConfig
import json

### **Structure of Models**

In [None]:
def create_pipeline(model, tokenizer):
  pipe = pipeline(
    "text-generation",
      model=model,
      tokenizer=tokenizer,
      torch_dtype=torch.bfloat16,
      trust_remote_code=True,
      device_map="auto",
  )
  return pipe

In [None]:
def create_prompt(relevant_document, user_input):
    prompt = f"{relevant_document}\n\nQuestion: {user_input}\n\nAs a health advisor, give your advice below:\n"
    return prompt

In [None]:
def generator_model(model_name):
  quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
  )

  tokenizer = AutoTokenizer.from_pretrained(model_name)

  model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config = quant_config,
    trust_remote_code=True
  )

  return model, tokenizer

def process_query(model, tokenizer, file_name):
  # pipe = create_pipeline(model, tokenizer)

  results = []
  for i, query in tqdm(enumerate(queries)):
      most_similar_utterance_index = int(most_similar_indices[i])
      prompt = create_prompt(medical_meadow_wikidoc_dataset['train'][most_similar_utterance_index]['output'], query)

      input_ids = tokenizer.encode(prompt, return_tensors='pt')

      output = model.generate(input_ids, max_length=2048, num_return_sequences=1, no_repeat_ngram_size=2)

      response = tokenizer.decode(output[0], skip_special_tokens=True)

      start_index = response.find(prompt)
      if start_index != -1:
        response = response[start_index+len(prompt):]

      result = {
          "query": query,
          "prompt": prompt,
          "response": response
      }

      results.append(result)

  with open(f"{file_name}.json", "w") as outfile:
      json.dump(results, outfile)

In [None]:
def json_information(file_name):
  with open(f"{file_name}.json", "r") as infile:
      results = json.load(infile)

  for result in results:
      print(f"Query: {result['query']}")
      print(f"Response: {result['response']}")
      print()

### **Falcon Model**

In [None]:
falcon_jsonFile="falcon_results"

model, tokenizer = generator_model("tiiuae/falcon-7b-instruct")
process_query(model, tokenizer, falcon_jsonFile)

In [None]:
process_query(model, tokenizer, falcon_jsonFile)

0it [00:00, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
1it [00:27, 27.51s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
2it [00:36, 16.56s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
3it [00:40, 10.76s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable r

In [None]:
json_information(falcon_jsonFile)

Query: What are the symptoms of diabetes?
Response: 1. Frequent urinations
2. Increased thirst
3. Blurred vision
4. Fatigue
5. Slow healing of wounds
6. Unintended weight loss
7. Numbness or tingling in the feet or legs
8. Inability to concentrate
9. Persistent infections
10. Skin rashes or dry skin
11. Dark patches of skin around the eyes
12. Swollen feet
13. High blood pressure
14. Low blood sugar
15. Urgent need to urinate
16. Decreased libido
17. Foot pain
18. Itchy skin or rash
19. Unexpected weight changes
20. Headaches
21. Dizziness
22. Difficulty breathing
23. Fainting
24. Heart palpitations
25. Kidney problems
26. Diabetes insulitis
27. Diabetic ketoacidosis
28. Hypoglycemia
29. Allergic reactions
30. Erectile dysfunction
31. Depression
32. Anxiety
33. Sleep disturbances
34. Gum disease
35. Bad breath
36. Poor wound healing
37. Eye problems - blurred vision, cataracts, etc.

Query: How can I manage high blood pressure?
Response: 1. Maintain a healthy weight
2. Exercise regular

### **Qwen Model**

In [None]:
qwen_jsonFile="qwen_results"

model, tokenizer = generator_model("Qwen/CodeQwen1.5-7B-Chat")
process_query(model, tokenizer, qwen_jsonFile)

In [None]:
process_query(model, tokenizer, qwen_jsonFile)

20it [09:02, 27.11s/it]


In [None]:
json_information(qwen_jsonFile)

Query: What are the symptoms of diabetes?
Response: 1. Wearing high heels and/or wearing very revealing clothing can expose you to sweat and make your skin and hair vulnerable to the harm of skin cancer. 

2. Do not eat anything that may have been left over from a previous meal, or any solid waste that has not been flushed. This can cause food poisoning. It is important to properly clean and sanitize all your surfaces before and after eating to prevent contamination.
  
3. Avoid taking high risk sports, such as skydiving or rock climbing. These risks are out of your control and may result in serious injury or death. Instead, consider alternative sports that are safer.  

4. Before going to bed, make sure your bed is comfortable and clean. If you feel uncomfortable, try adjusting your mattress, pillows, blankets, etc. As a rule of thumb, your room should not have more than two feet of empty floor space between your ceiling and your floor. You should also check your airflow and insulatio

### **aya Model**

In [None]:
from huggingface_hub import login
login()

In [None]:
aya23_jsonFile="aya23_results"

model, tokenizer = generator_model("CohereForAI/aya-23-8B")
process_query(model, tokenizer, aya23_jsonFile)

In [None]:
process_query(model, tokenizer, aya23_jsonFile)

4it [02:20, 35.05s/it]


In [None]:
json_information(aya23_jsonFile)

Query: What are the symptoms of diabetes?
Response: Answer: The symptoms include:

1. Excessive thirst
2. Extreme hunger
3. Weight loss
4. Blurred vision
5. Fatigue
6. Irregular heartbeat
7. Dry mouth
8. Slow healing wounds
9. Itchy skin
10. Dark-colored urine
The symptoms are similar in type to the ones of type 2 diabetes.
Symptom: Excessively thirsty
Diabetes insípidus is characterized by excessive thirst and excessive urinar
. The excessive drinking and urinating is caused by the lack of vasopressin, a hormone that regulates water balance in the body. Vasop ressin is produced by a small area of the brain called the hypothalamus. In diabetes, the vaspressin-producing cells in this area are destroyed. This causes the water to be lost from the blood and is excreted in urine. As a result, you feel thirsty and need to drink more water. You also urinate more often. If you do not drink enough, your blood becomes too concentrated and you can become dehydrated. Dehydration can cause dizzines

## 📊 Evaluate the results
Here, you’ve got to put those 3 models to the test. Use the 20 queries you’ve created on each of the 3 models. Now you’ll have 20 tuples, each containing five items: user input, selected document, and 3 responses from three different models. Use a judge model on each tuple to select the best answer. 🥇 The judge model can be any language model accessible on the internet, whether you find one on Hugging Face or use one through an API. 🌐 Finally, calculate the score for each model, which is how many times the judge picked that model. 🏆

### **Open JSON Files**

In [None]:
import json

In [None]:
# Open the file
with open('/content/falcon_results.json', 'r') as f:
    # Load JSON data from file
    falcon_results = json.load(f)

In [None]:
with open('/content/qwen_results.json', 'r') as f:
    # Load JSON data from file
    qwen_results = json.load(f)

In [None]:
with open('/content/aya23_results.json', 'r') as f:
    # Load JSON data from file
    aya23_results = json.load(f)

In [None]:
falcon_results[0]

{'query': 'What are the symptoms of diabetes?',
 'prompt': 'Symptoms of diabetes insipidus are quite similar to those of untreated diabetes mellitus, with the distinction that the urine is not sweet as it does not contain glucose and there is no hyperglycemia (elevated blood glucose):\nExcessive urination and extreme thirst (especially for cold water) Blurry vision Extreme urination that continues throughout the day and the night\nIn children, DI can interfere with appetite, eating, weight gain, and growth as well. They may present with: \nFever Vomiting Diarrhe\nAdults with untreated DI may remain healthy for decades as long as enough water is drunk to offset the urinary losses. However, there is a continuous risk of dehydration.\n\nQuestion: What are the symptoms of diabetes?\n\nAs a health advisor, give your advice below:\n',
 'response': '1. Frequent urinations\n2. Increased thirst\n3. Blurred vision\n4. Fatigue\n5. Slow healing of wounds\n6. Unintended weight loss\n7. Numbness or 

### **DeepSeek Model**

In [None]:
model, tokenizer = generator_model("deepseek-ai/deepseek-coder-7b-instruct-v1.5")

In [None]:
import requests
import time

def judge_responses(query, falcon_result, qwen_result, aya23_result):
    judge_prompt = f"Query: {query}\n\nResponses:\n- Model 1: {falcon_result}\n- Model 2: {qwen_result}\n- Model 3: {aya23_result}\n\nWhich response is the best? (Model 1, Model 2, or Model 3):"

    input_ids = tokenizer.encode(judge_prompt, return_tensors='pt')

    output = model.generate(input_ids, max_length=2048, num_return_sequences=1, no_repeat_ngram_size=2)

    response = tokenizer.decode(output[0], skip_special_tokens=True)

    start_index = response.find(judge_prompt)
    if start_index != -1:
      response = response[start_index+len(judge_prompt):]

    return response

judged_results = []

for i in range(len(falcon_results)):
    best_model = judge_responses(falcon_results[i]['query'], falcon_results[i]['response'], qwen_results[i]['response'], aya23_results[i]['response'])
    judged_results.append(f"Question{i}, Model: {best_model}")
    print(best_model)


**Model 1: 8 times**

**Model 2: 3 times**

**Model 3: 9 times**

### Now that I'm writing this message, it's 3 in the morning and I'm tired as fox. So I hope you've learned something from this project and someday you use what you've learned here in a real-case scenario. Good Luck! ✌️