# IUST Computer Engineering Department 🏫
## Introduction to Natural Language Processing 📚 (The Final Project)
### Course Instructor: Dr. Marzieh Davoodabadi Farahani 👩‍🏫
### Project Teaching Assistant: Erfan Moosavi Monazzah (tel: @ErfanMoosavi2000) 📞
-------------------------------------------------------------------------------<br>
The objective of this project is to acquaint you with the fundamentals of Retrieval Augmented Generation (RAG). Be sure to explore various options and address challenges in a creative manner. 🎯

**Project Guidelines** 📝
- Avoid cheating at all costs. If a set of submissions is found to be [plagiarized](https://translate.google.as/?sl=en&tl=fa&text=Very%20hard%20word%2C%20I%20know%2C%20here%27s%20the%20meaning%3A%0Aplagiarized&op=translate), only one will be randomly chosen for grading. The others will fail the project. ❌
- You are allowed to use any document, article, paper, or video as a resource for writing your code, provided you include a link to the material used. 📖
- The use of Language Learning Models (LLMs), ChatBots, and Copilots is encouraged. If you utilize any of these tools, make sure to attach the chat history that led you to the answer to your question, or the code, to this .ipynb document. (You must provide the entire chat, not just the final answer or your initial prompt.) 💻
- You may not submit any additional documents, files, etc., along with this document. Only solutions, codes, explanations, etc., in this document will be graded. 📄
- You are required to implement everything (except the Language Modeling parts) from scratch. The use of libraries like langchain, llama_index, etc., is not permitted for this purpose. 🚫
- Please adhere to the code guidelines provided throughout the documents. 📝 I’ve spent time in a library 📚 crafting all of this, so if you overlook them, you’ll lose the points allocated for that section. ❌
- We need to use GPUs for this assignment, don't forget to turn on GPU usage for your notebook session.

-------------------------------------------------------------------------------<br>
# Alright, let's get started. 🚀

## What is RAG? 🤔
We've all used ChatGPT and experienced moments when it starts to generate content that is often incorrect or unrelated to our query. Do you know why this happens? These Large Language Models (LLMs) are not magical entities; they are simply models trained on a vast amount of text. 📚 You could even consider a significant portion of the internet. However, this is not all the data available in the world, because data is not a static concept. You yourself generate some data every day through your use of the Internet, Social Media, and so on. 🌐💻📱

So, no matter how much data you use to train your LLM, you always end up encountering new data. This is one of the reasons behind the famous ChatGPT response that tells you it only knows things up to a certain date. 📅 Also, these models tend to hallucinate too. It means they provide incorrect answers but in a very convincing manner. 🎭

On the other hand, we have retrieval techniques. Don't worry if it sounds complicated (it actually isn't easy, you may need to take a course to familiarize yourself with these concepts 😅, but that's not necessary for this project), but you use it on a daily basis. You can think of Search Engines (like Google, for example) as a complex form of information retrieval. 🔍

So, one day, people came up with this idea that it would be cool if ChatGPT could search Google for us, read the articles for us, summarize what it read, and tell us that. 📖 So, this is not exactly what RAG is, but it's something similar. We have a corpus (a large amount of data) and a query (what a user typed as input). Now, we search through this corpus using techniques related to vectors and vector databases, and find the most similar items in our corpus to the query. Then, we pass these items to an LLM and ask for a structured, well-formatted, user-friendly output. 📈📊

## I'm Interested in the Technical Details, What Should I Read? 📚🔍
- I strongly recommend reading the [original RAG paper](https://arxiv.org/abs/2005.11401). If you need help understanding the paper or have any questions about it, feel free to reach out to me via Telegram or find me on the second floor of the department in the NLP lab on Sundays and Tuesdays. 📖
- There appears to be a [comprehensive 2.5-hour course](https://www.freecodecamp.org/news/mastering-rag-from-scratch/) available. I haven't personally watched it, but if you find a better one, let me know so I can update this document. 🎥
- Here is [an article](https://www.smashingmagazine.com/2024/01/guide-retrieval-augmented-generation-language-models/) that explains the concepts very well. Initially, I wanted to use this article as the basis for this project, but unfortunately, the llama_index library used in the article seems to be outdated, so most of the code would need to be rewritten. On second thought, I found it more useful to focus on core concepts rather than learning specific libraries. You might want to check out some libraries like langchain or llama_index which provide a lot of tools for RAG. (But not for this project) 📝💡
- Don't hesitate to use Google, ask chatbots about any new concepts and terms. If you use search engine-aware chatbots like Microsoft Copilot, they provide links for each part of their answers which is useful if you want to delve deeper into that part. 🌐🤖
- Lastly, we have [the article](https://learnbybuilding.ai/tutorials/rag-from-scratch) that serves as the foundation for this project. 📚🔍

# Learn
First, we’re going to go through a simple RAG implementation. It’s going to be similar to the article, except for the (LLM) part. For that, I’m going to use Hugging Face. 🤗 I’ll also try to explain the code in simple terms, but feel free to read the article if you prefer their writing style.

## Let's Install the Necessary Libraries 📚🔧
Did you know that using the `--quiet` or `-q` option with the `pip install` command minimizes the output displayed on your screen? 🖥️ This can make your terminal less cluttered. Also, using `-U` will upgrade the libraries if they were previously installed. This is particularly useful for certain libraries like `transformers` that are frequently updated. 🔄

In [2]:
!pip install -U accelerate transformers bitsandbytes --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.3/9.3 MB[0m [31m40.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m66.2 MB/s[0m eta [36m0:00:00[0m
[?25h

## Gather a Corpus 📚
Technically, a corpus refers to a large and structured set of texts. However, for the sake of our discussion, let’s consider our collection as a “corpus”, even though it might not be large in the traditional sense. 😉

In [None]:
corpus_of_documents = [
    "Take a leisurely walk in the park and enjoy the fresh air.",
    "Visit a local museum and discover something new.",
    "Attend a live music concert and feel the rhythm.",
    "Go for a hike and admire the natural scenery.",
    "Have a picnic with friends and share some laughs.",
    "Explore a new cuisine by dining at an ethnic restaurant.",
    "Take a yoga class and stretch your body and mind.",
    "Join a local sports league and enjoy some friendly competition.",
    "Attend a workshop or lecture on a topic you're interested in.",
    "Visit an amusement park and ride the roller coasters."
]

## Create a Retriever 🕵️‍♂️
Now, we’re going to create a simple retriever. The role of the retriever is to compare the user’s query with a large corpus of text and find those that are most similar in context. (You know what context is by now, don’t you? 😊 If you’ve forgotten, refer back to your initial lectures). For now, let’s say we want to find similar text based on simple similarity metrics. The code is straightforward, and I have faith in you, chief! Dive into the code. 👨‍💻

In [None]:
def jaccard_similarity(query, document):
    query = query.lower().split(" ")
    document = document.lower().split(" ")
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

Hey, you may want to look at wikipedia page for [Jaccard Similarity](https://en.wikipedia.org/wiki/Jaccard_index).

In [None]:
def return_response(query, corpus):
    similarities = []
    for doc in corpus:
        similarity = jaccard_similarity(user_input, doc)
        similarities.append(similarity)
    return corpus_of_documents[similarities.index(max(similarities))]

## Create a Generator 🖥️
Now, we’re going to create a generator. This will help us compile the information retrieved into a well-structured and user-friendly text.

OK, let's say in a senario, we ask user what they like to do, the their answer is this:

In [None]:
user_input = "I like to hike"

Now by using the retrieval model I find this activity that best fits this user.

In [None]:
relevant_document = return_response(user_input, corpus_of_documents)
print(relevant_document)

Go for a hike and admire the natural scenery.


The answer seems good enough, but we can do better, yeah?

Let’s import a Language Model. I’m going to try out Microsoft Phi-3 because it recently hit the market, and I haven’t had a chance to try it for myself yet. So, I’m seizing this opportunity to do so! 😊👨‍💻

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

Downloading the model gonna take a while, use this time to rest your eyes for a bit. 😊👀💤

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

Now we try to get the LLM to become our generator. We simply place the retrieved information and user query in the following prompt and ask the model for well formatted text.

In [None]:
prompt = """You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: {relevant_document}
The user input is: {user_input}
Compile a recommendation to the user based on the recommended activity and the user input."""

In [None]:
prompt = prompt.replace("{relevant_document}", relevant_document).replace("{user_input}", user_input)
print(prompt)

You are a bot that makes recommendations for activities. Try to be helpful recommender system.
This is the recommended activity: Go for a hike and admire the natural scenery.
The user input is: I like to hike
Compile a recommendation to the user based on the recommended activity and the user input.


In [None]:
messages = [
    {"role": "user", "content": prompt},
]

Here's the augmented generated text

In [None]:
output = pipe(messages, **generation_args)
print(output[0]['generated_text'])



 Based on your interest in hiking and the recommended activity, I suggest you plan a hike in a beautiful natural setting. Look for trails that offer stunning views of mountains, forests, or water bodies. This will allow you to fully enjoy the serenity of nature, appreciate the scenic beauty, and experience the physical benefits of hiking. Don't forget to pack essentials like water, snacks, and appropriate gear for your adventure. Happy hiking!


## Very Cool, but Not Perfect! 😎👌
Alright, you’ve just seen a very basic example of RAG. However, there are some issues present. The corpus is small, and the documents in the corpus are short sentences, which causes the Language Model (LM) to generate some text on its own. 📚🤖

Also, our retriever is not very efficient and it may encounter bugs in some cases. For instance, even when users specify that they are not interested in a certain activity, the retriever might still bring up that activity for them. 🐜🔍

So, in this project, you’re going to address some of these issues. The rest of this document consists of some empty cells and tips for you on how to fill them with code. Let’s get coding! 👨‍💻🚀

# The Project

## Determine Your Task 🎯
What do you aim to implement with RAG? A recommender system? 🎁 A chatbot for a website’s FAQ? 💬 A medical advisor? 🩺 Or perhaps something else entirely?

Specify your objective in this cell.

In [None]:
task_title = "A medical advisor"
url_for_more_information = ""

print(f"My task is: {task_title}")
print(f'For more information see: {url_for_more_information}')

My task is: A medical advisor
For more information see: 


## 🧐 Find or gather a corpus
Remember the fake corpus? 📚 It’s time to switch things up and use something real. 🌐 You need to use a dataset from  [huggingface datasets](https://huggingface.co/datasets) for this project. 🚀 Don’t use files that are outside of this notebook, this notebook should be able to run on its own without depending on anything external. 💻👍


In [None]:
!pip install -U accelerate transformers flash_attn quanto datasets bitsandbytes>=0.39.0  --quiet

In [None]:
from datasets import load_dataset
dataset = load_dataset("medalpaca/medical_meadow_wikidoc")

In [None]:
dataset

## 📝 Create some queries
I want you to create 20 queries related to your task. You can use any Language Model you want for this matter, or if you’re feeling strong 💪 and have the time, write it yourself. 🖊️

You need to create a Hugging Face account, format your 20 queries into the accepted dataset format for Hugging Face 🤗 and push it to your Hugging Face account. Be sure to make it public and use it for the evaluation task. 👀

In [11]:
# List of queries
queries = [
    "What are the symptoms of the flu?",
    "How can I lower my blood pressure?",
    "What is the recommended dosage of ibuprofen?",
    "What are the risks of smoking?",
    "How can I prevent diabetes?",
    "What is the difference between a cold and allergies?",
    "What are the benefits of exercise?",
    "How can I manage my stress?",
    "What are the side effects of the COVID-19 vaccine?",
    "What is the best way to treat a sunburn?",
    "How can I improve my sleep?",
    "What are the symptoms of a heart attack?",
    "What is the recommended diet for someone with high cholesterol?",
    "How can I quit smoking?",
    "What are the causes of migraines?",
    "What is the best way to treat a sprained ankle?",
    "How can I prevent skin cancer?",
    "What are the symptoms of depression?",
    "What is the recommended treatment for acid reflux?",
    "How can I maintain a healthy weight?"
]

In [None]:
import datasets

# Create a dataset from the list of queries
ds = datasets.Dataset.from_dict({"query": queries})
ds

Dataset({
    features: ['query'],
    num_rows: 20
})

In [None]:
!huggingface-cli login --token "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"

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
ds.push_to_hub("Sina-Alinejad-2002/Medical_Advisor_Queries")

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Creating parquet from Arrow format:   0%|          | 0/1 [00:00<?, ?ba/s]

README.md:   0%|          | 0.00/24.0 [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/datasets/Sina-Alinejad-2002/Medical_Advisor_Queries/commit/4d223e6475be411f30b5dd726e655dc64b907de2', commit_message='Upload dataset', commit_description='', oid='4d223e6475be411f30b5dd726e655dc64b907de2', pr_url=None, pr_revision=None, pr_num=None)

## 🛠️ Create a Retriever
To create your retriever, you need to use an encoder model. Something like BERT? Nah, BERT is so yesterday. Find something new and shiny! ✨ The basic idea is to encode every document (sentence) in your corpus into a vector space using the same encoder. Then, encode the user query into that same space. With some similarity metrics like dot product, you can find the most similar document to the user’s input and retrieve it. 🎯 You can train your own encoder if you have enough data and resources, 💪 or you can use one of those [ready-made on Hugging Face](https://huggingface.co/models?pipeline_tag=sentence-similarity&sort=trending), like these ones.

In [2]:
import numpy as np
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModel, pipeline, QuantoConfig, BitsAndBytesConfig
import torch
import torch.nn.functional as F
from tqdm import tqdm

In [None]:
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

def get_embeddings_sentence_transformers(sentences, tokenizer, model):
    # Tokenize sentences
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)

    return sentence_embeddings


def cosine_similarity(emb1, emb2):
    # Calculate the dot product of the two vectors
    dot_product = np.dot(emb1, emb2)

    # Calculate the magnitude (length) of the two vectors
    mag_emb1 = np.linalg.norm(emb1)
    mag_emb2 = np.linalg.norm(emb2)

    # Calculate the cosine similarity
    cosine_sim = dot_product / (mag_emb1 * mag_emb2)

    return cosine_sim

def get_embeddings(docs, tokenizer, model, embedding_func):
    docs_embeddings = embedding_func(docs, tokenizer, model)
    return docs_embeddings

def get_most_similar_doc(docs, query_embedding, embedding_func, embedding_strategy,  similarity_strategy, tokenizer, model):
    similarities = []
    for doc in docs:
      doc_emb = embedding_func([doc,], tokenizer, model, embedding_strategy)[0]
      similarities.append(similarity_strategy(doc_emb, query_embedding))
    return similarities.index(max(similarities))

In [None]:
dataset

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 10000
    })
})

In [None]:
# Sentences we want sentence embeddings for
docs = list(dataset['train']['output'])[:800]

In [None]:
# Load model from HuggingFace Hub
emb_tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
emb_model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

## 🎛️ Create a Generator
For this part, I practically handed you the whole code on a silver platter. 🍽️ But since we know you’re an explorer at heart and love trying new things, you can’t use the model I previously used. 😈 You have to try 3 different generators and compare them based on the quality of their answers. 🧪📊 [These might come in handy](https://huggingface.co/models?pipeline_tag=text-generation&sort=trending).

In [None]:
def load_model_and_tokenizer(model_id, quantization_config=None):
    import torch
    model = AutoModelForCausalLM.from_pretrained(
      model_id,
      device_map="cuda",
      torch_dtype="auto",
      trust_remote_code=True,
      quantization_config = quantization_config
    )
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    return model, tokenizer

def generate_medical_advice(model, tokenizer, messages, generation_args):
    pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer)

    output = pipe(messages, **generation_args)
    return output[0]['generated_text']

def medical_advisor(similar_docs, queries, gen_model, gen_tokenizer, prompt, generation_args):
    results = []

    for i in tqdm(range(len(queries))):
      new_prompt = prompt.replace("{relevant_document}", similar_docs[i]).replace("{user_input}", queries[i])
      messages = [
        {'role': 'user', 'content': new_prompt},
      ]
      output = generate_medical_advice(gen_model, gen_tokenizer, messages, generation_args)
      results.append(output)

    return results

In [None]:
prompt = """You are a bot that makes recommendations for health problems. Try to be helpful medical advisor system.
This is the recommended advice: {relevant_document}
The user input is: {user_input}
Compile an advice to the user based on the recommended advice and the user input."""

generation_args = {
  "max_new_tokens": 500,
  "return_full_text": False,
  "do_sample": False}

## 📊 Evaluate the results
Here, you’ve got to put those 3 models to the test. Use the 20 queries you’ve created on each of the 3 models. Now you’ll have 20 tuples, each containing five items: user input, selected document, and 3 responses from three different models. Use a judge model on each tuple to select the best answer. 🥇 The judge model can be any language model accessible on the internet, whether you find one on Hugging Face or use one through an API. 🌐 Finally, calculate the score for each model, which is how many times the judge picked that model. 🏆

In [None]:
similar_docs = []

query_embeddings = get_embeddings(queries, emb_tokenizer, emb_model, get_embeddings_sentence_transformers)

for query_emb in tqdm(query_embeddings):
  most_similar_doc = get_most_similar_doc(docs, query_emb, get_embeddings, get_embeddings_sentence_transformers, cosine_similarity, emb_tokenizer, emb_model)
  most_similar_doc = docs[most_similar_doc]
  similar_docs.append(most_similar_doc)

100%|██████████| 20/20 [27:06<00:00, 81.34s/it]


In [None]:
import json

# Save the list to a file
with open('similar_docs.json', 'w') as f:
    json.dump(similar_docs, f)

In [None]:
import json

# Load the list from a file
with open('similar_docs.json', 'r') as f:
    similar_docs = json.load(f)

print(similar_docs)

['If you think you have been exposed to avian influenza, call your health care provider before your visit. This will give the staff a chance to take proper precautions that will protect them and other patients during your office visit. Tests to identify the avian flu exist but are not widely available. A test for diagnosing strains of bird flu in people suspected of having the virus gives preliminary results within 4 hours. Older tests took 2 to 3 days. Your doctor might also perform the following tests:\nAuscultation (to detect abnormal breath sounds) Chest x-ray Nasopharyngeal culture White blood cell differential\nOther tests may be done to look at the functions of your heart, kidneys, and liver.', 'The goal of treatment is to reduce your risk of heart disease and diabetes. Your doctor will recommend lifestyle changes or medicines to help reduce your blood pressure, LDL cholesterol, and blood sugar.\nRecommendations include:\nLose weight. The goal is to lose between 7% and 10% of yo

In [None]:
model_id = "microsoft/Phi-3-medium-4k-instruct" # I loaded other 2 models here too
# quantization_config = QuantoConfig(weights="int8")
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    #bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    #bnb_4bit_compute_dtype=torch.bfloat16
)
gen_model, gen_tokenizer = load_model_and_tokenizer(model_id, quantization_config=quant_config)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/934 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-medium-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-medium-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/20.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/6 [00:00<?, ?it/s]

model-00001-of-00006.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00002-of-00006.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00006.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00004-of-00006.safetensors:   0%|          | 0.00/4.77G [00:00<?, ?B/s]

model-00005-of-00006.safetensors:   0%|          | 0.00/4.77G [00:00<?, ?B/s]

model-00006-of-00006.safetensors:   0%|          | 0.00/3.61G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.15k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
result = medical_advisor(similar_docs, queries, gen_model, gen_tokenizer, prompt, generation_args)

  5%|▌         | 1/20 [00:37<11:43, 37.00s/it]

 Based on your inquiry about the symptoms of the flu, it's important to note that avian influenza, or bird flu, can present similar symptoms to the common flu. These symptoms may include fever, cough, sore throat, muscle aches, headache, and fatigue. In some cases, severe respiratory symptoms, gastrointestinal issues, and even neurological complications may occur.

If you suspect you have been exposed to avian influenza and are experiencing flu-like symptoms, it is crucial to contact your healthcare provider before visiting their office. This will allow the staff to take necessary precautions to protect themselves and other patients.

Tests for avian flu are available, with preliminary results typically provided within 4 hours. Your doctor may also perform additional tests, such as auscultation, chest x-ray, nasopharyngeal culture, and white blood cell differential, to help diagnose the infection and assess your overall health.

Remember, early detection and proper medical care are ess

100%|██████████| 20/20 [18:22<00:00, 55.12s/it]


In [None]:
len(result)

20

In [None]:
# Save the list to a file
with open('microsoft_Phi_3_medium_4k_instruct.json', 'w') as f:
    json.dump(result, f)

In [None]:
result[3]

' Based on the recommended advice, smoking is a significant risk factor for developing lung cancer, which is a leading cause of cancer death in both men and women in the United States. The risks of smoking include not only lung cancer but also other types of cancer, as well as chronic obstructive pulmonary disease (COPD), heart disease, stroke, and various other health issues.\n\nHere are some specific risks associated with smoking:\n\n1. Lung cancer: Smoking is the main cause of most lung cancers, with both small cell lung carcinoma and non-small cell lung carcinoma being linked to tobacco use.\n\n2. Other cancers: Smoking increases the risk of developing cancers of the mouth, throat, esophagus, pancreas, liver, stomach, kidney, bladder, and cervix, among others.\n\n3. Respiratory diseases: Smoking can cause chronic bronchitis, emphysema, and COPD, which can lead to difficulty breathing, coughing, and wheezing.\n\n4. Cardiovascular diseases: Smoking increases the risk of heart disease

In [None]:
# queries, results
pretty_results = []
for i in range(len(queries)):
  new_item = [queries[i], similar_docs[i]]
  for result in results:
    new_item.append(result[i])
  pretty_results.append(tuple(new_item))

In [None]:
import json

# Convert the list of tuples to a list of lists
# (this is because JSON doesn't support tuples)
my_results_json = [list(item) for item in pretty_results]

# Save the list to a file
with open('results.json', 'w') as f:
    json.dump(my_results_json, f)

In [None]:
import json

# Load the list from a file
with open('my_file.json', 'r') as f:
    my_list = json.load(f)

# Convert the list of lists to a list of tuples
# (this is because JSON doesn't support tuples)
my_list_tuples = [tuple(item) for item in my_list]

# Print the list of tuples
print(my_list_tuples)

In [8]:
import json
model_ids = ['deepseek_ai_deepseek_math_7b_instruct', 'gorilla_llm_gorilla_openfunctions_v2', 'microsoft_Phi_3_medium_4k_instruct']
# Load the list from a file
with open('deepseek_ai_deepseek_math_7b_instruct.json', 'r') as f:
    model1_results = json.load(f)

with open('gorilla_llm_gorilla_openfunctions_v2.json', 'r') as f:
    model2_results = json.load(f)

with open('microsoft_Phi_3_medium_4k_instruct.json', 'r') as f:
    model3_results = json.load(f)

In [16]:
message = "I have 3 answers to a query for the task of medical advising. I want you to tell me which answer is the best between them. Your answer should be in json format with key of 'best answer' and value of 1, 2 or 3.\nanswer 1: {model1}\nanswer 2: {model2}\nanswer 3: {model3}."

In [17]:
messages = []
for i in range(len(queries)):
  new_message = message.replace('{model1}', model1_results[i]).replace('{model2}', model2_results[i]).replace('{model3}', model3_results[i])
  messages.append(new_message)

In [18]:
messages[1]

'I have 3 answers to a query for the task of medical advising. I want you to tell me which answer is the best between them. Your answer should be in json format with key of \'best answer\' and value of 1, 2 or 3.\nanswer 1:  To lower your blood pressure, you can try to make some lifestyle changes. Here are some tips:\n\n1. Lose weight: If you are overweight, losing weight can help lower your blood pressure. Aim to lose between 7% and 10% of your current weight. This can be achieved through a combination of diet and exercise.\n\n2. Exercise: Regular exercise, such as walking, can help lower your blood pressure. Aim to get 30 minutes of moderate intensity exercise, 5 - 7 days per week.\n\n3. Quit smoking: Smoking can increase your blood pressure, so if you smoke, it\'s important to quit.\n\n4. Manage stress: Stress can also increase blood pressure, so try to manage your stress levels through relaxation techniques, such as meditation or deep breathing.\n\n5. Follow your doctor\'s advice: 

In [3]:
eval_model = "tiiuae/falcon-7b-instruct"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
)

In [15]:
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

config.json:   0%|          | 0.00/3.48k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [19]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

In [20]:
outputs = []
for message in messages:
  output = pipe([{'role': 'user', 'content': message}], **generation_args)
  outputs.append(output[0]['generated_text'])
  print(output[0]['generated_text'])

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


 {
  "best answer": 3
}
 {
  "best answer": 2
}
 {
  "best answer": 3
}
 {
  "best answer": 3
}
 {
  "best answer": 3
}
 {
  "best answer": 1
}
 {
  "best answer": 3
}
 {
  "best answer": 1
}
 {
  "best answer": 3
}
 {
  "best answer": 3
}
 ```json
{
  "best answer": 2
}
```
The second answer provides a more comprehensive and detailed set of recommendations for improving sleep quality. It addresses the user's specific concern about improving sleep quality and includes a wider range of tips, such as creating a relaxing bedtime routine, making the sleep environment comfortable, and managing stress. Additionally, it provides specific advice for people with diabetes, which is not mentioned in the other answers. Overall, the second answer is more tailored to the user's needs and provides a more thorough set of recommendations for improving sleep quality.
 {
  "best answer": 3
}
 {
  "best answer": 2
}
 {
  "best answer": 3
}
 {
  "best answer": 3
}
 {
  "best answer": 1
}
 {
  "best answer"

### Now that I'm writing this message, it's 3 in the morning and I'm tired as fox. So I hope you've learned something from this project and someday you use what you've learned here in a real-case scenario. Good Luck! ✌️