RAG Configuration in this notebook:

Embedding model: sentence-transformers/all-mpnet-base-v2

Chunk size: 2000

Chunk overlap: 200

Generation Model: llama-3-8B-Instruct

Retriever: VectorStore

Embedding Size: 768

In [None]:
import pickle
from pinecone import Pinecone, ServerlessSpec
import getpass
import os
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import pinecone
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pinecone import Index
from langchain_pinecone import PineconeVectorStore
from transformers import BitsAndBytesConfig
import bitsandbytes
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
from tqdm import tqdm
from langchain.prompts import PromptTemplate

In [3]:
# Load the document
with open("data_5983_updated.pkl", "rb") as file:
    documents = pickle.load(file)

In [4]:
documents[10]

Document(metadata={'title': 'How to talk about mental health?', 'source': 'https://www.samhsa.gov/mental-health/how-to-talk', 'category': 'Mental Health', 'id': '7b054985-0adf-457c-b0dd-179704748975'}, page_content="Mental health is essential to a person’s life in the same way as physical health. Hesitation to talk about mental health adds to the notion that the topic is taboo. It is important to normalize conversations surrounding mental health so people can feel empowered to seek the help they need. The following resources can help you feel more informed to talk about mental health with the people in your life who may need your support.\nFor People with Mental Health Problems\nIf you have, or believe you may have, a mental health problem, it is helpful to talk about these issues with others. Learn more about building a strong support system and developing a recovery plan.\nFor Young People Looking for Help\nMental health problems don't only affect adults. Children, teenagers, and you

In [2]:
!pip install langchain pinecone-client openai

Collecting pinecone-client
  Downloading pinecone_client-5.0.1-py3-none-any.whl.metadata (19 kB)
Collecting pinecone-plugin-inference<2.0.0,>=1.0.3 (from pinecone-client)
  Downloading pinecone_plugin_inference-1.1.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone-client)
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Downloading pinecone_client-5.0.1-py3-none-any.whl (244 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.8/244.8 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pinecone_plugin_inference-1.1.0-py3-none-any.whl (85 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.4/85.4 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pinecone_plugin_interface-0.0.7-py3-none-any.whl (6.2 kB)
Installing collected packages: pinecone-plugin-interface, pinecone-plugin-inference, pinecone-client
Successfully installed pinecone-client-5.0

In [1]:
from pinecone import Pinecone, ServerlessSpec
import getpass
import os
import time

if not os.getenv("PINECONE_API_KEY"):
    os.environ["PINECONE_API_KEY"] = getpass.getpass("Enter your Pinecone API key: ")

pinecone_api_key = os.environ.get("PINECONE_API_KEY")

pc = Pinecone(api_key=pinecone_api_key)

Enter your Pinecone API key: ··········


In [None]:
# Embedding model
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda" if torch.cuda.is_available() else "cpu"}
encode_kwargs = {"normalize_embeddings": False}
hf_embeddings = HuggingFaceEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

  hf_embeddings = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [2]:
from pinecone import ServerlessSpec

# Create the index if it doesn't exist
#pc.create_index(name="rag-llm-updated",
#                    dimension=768,metric="cosine",
#                    spec=ServerlessSpec(
#                    cloud="aws",
#                    region="us-east-1"))

pc_index = pc.Index("rag-llm-updated")

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
type(documents[0])

In [None]:
# Chunking
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
)

chunks = text_splitter.split_documents(documents)


In [None]:
len(chunks)

1904

In [None]:
total_size = sum(len(chunk.page_content) for chunk in chunks)
average_size = total_size / len(chunks) if chunks else 0

print(f"Total size of chunks: {total_size}")
print(f"Number of chunks: {len(chunks)}")
print(f"Average size of chunks: {average_size:.2f} characters")


Total size of chunks: 1603382
Number of chunks: 1904
Average size of chunks: 842.11 characters


In [None]:
# VectorStore
vectorstore = PineconeVectorStore(
    index_name="rag-llm-updated",
    embedding=hf_embeddings,
)

In [None]:
#for chunk in tqdm(chunks, desc="Adding documents to Pinecone", unit="chunk"):
#    vectorstore.add_documents([chunk])

Adding documents to Pinecone: 100%|██████████| 1904/1904 [04:46<00:00,  6.65chunk/s]


In [3]:
index_stats = pc_index.describe_index_stats()

print("Index Stats:", index_stats)

Index Stats: {'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 1904}},
 'total_vector_count': 1904}


In [None]:
vectorstore.similarity_search("What is Mental Health?")

[Document(id='f1f9d5c1-6fca-4a9a-a97c-4c32da12a547', metadata={'category': 'Mental Health', 'id': '055c9c95-0034-48a2-ab0c-2a024efbcd11', 'source': 'https://www.samhsa.gov/mental-health', 'title': 'What is Mental Health?'}, page_content='Mental health includes our emotional, psychological, and social well-being. It affects how we think, feel, and act, and helps determine how we handle stress, relate to others, and make choices.\nMental health is important at every stage of life, from childhood and adolescence through adulthood. Over the course of your life, if you experience\nmental health problems, your thinking, mood, and behavior could be affected.\nMental Health Conditions\nMental illnesses are disorders, ranging from mild to severe, that affect a person’s thinking, mood, and/or behavior. According to the National Institute of Mental Health, nearly one-in-five adults live with a mental illness.\nMany factors contribute to mental health conditions, including:\nBiological factors, su

In [None]:
vectorstore.similarity_search_with_score("What is Mental Health?")

[(Document(id='f1f9d5c1-6fca-4a9a-a97c-4c32da12a547', metadata={'category': 'Mental Health', 'id': '055c9c95-0034-48a2-ab0c-2a024efbcd11', 'source': 'https://www.samhsa.gov/mental-health', 'title': 'What is Mental Health?'}, page_content='Mental health includes our emotional, psychological, and social well-being. It affects how we think, feel, and act, and helps determine how we handle stress, relate to others, and make choices.\nMental health is important at every stage of life, from childhood and adolescence through adulthood. Over the course of your life, if you experience\nmental health problems, your thinking, mood, and behavior could be affected.\nMental Health Conditions\nMental illnesses are disorders, ranging from mild to severe, that affect a person’s thinking, mood, and/or behavior. According to the National Institute of Mental Health, nearly one-in-five adults live with a mental illness.\nMany factors contribute to mental health conditions, including:\nBiological factors, s

In [None]:
vectorstore.similarity_search_with_score("What is PTSD?")

[(Document(id='96bbed64-a1c3-4a6e-b158-b60a6ba69e66', metadata={'category': 'Mental Illness', 'id': 'e0709607-37f7-4085-a9c4-220a95994af5', 'source': 'https://www.samhsa.gov/mental-health/post-traumatic-stress-disorder', 'title': 'Post Traumatic Stress Disorder (PTSD)'}, page_content='Post-traumatic stress disorder (PTSD) is a real disorder that develops when a person has experienced or witnessed a scary, shocking, terrifying, or dangerous event. These stressful or traumatic events usually involve a situation where someone’s life has been threatened or severe injury has occurred. Children and adults with PTSD may feel anxious or stressed even when they are not in present danger.\nCauses\nYou can get PTSD after living through or seeing a traumatic event, such as war, a natural disaster, sexual assault, physical abuse, or a bad accident. PTSD makes you feel stressed and afraid after the danger is over. It affects your life and the people around you.\nPTSD starts at different times for di

In [None]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [None]:
retriever.invoke("What is PTSD?")

[Document(id='96bbed64-a1c3-4a6e-b158-b60a6ba69e66', metadata={'category': 'Mental Illness', 'id': 'e0709607-37f7-4085-a9c4-220a95994af5', 'source': 'https://www.samhsa.gov/mental-health/post-traumatic-stress-disorder', 'title': 'Post Traumatic Stress Disorder (PTSD)'}, page_content='Post-traumatic stress disorder (PTSD) is a real disorder that develops when a person has experienced or witnessed a scary, shocking, terrifying, or dangerous event. These stressful or traumatic events usually involve a situation where someone’s life has been threatened or severe injury has occurred. Children and adults with PTSD may feel anxious or stressed even when they are not in present danger.\nCauses\nYou can get PTSD after living through or seeing a traumatic event, such as war, a natural disaster, sexual assault, physical abuse, or a bad accident. PTSD makes you feel stressed and afraid after the danger is over. It affects your life and the people around you.\nPTSD starts at different times for dif

In [None]:
retriever.invoke("What are various ways to prevent suicide?")

 Document(id='f75140ae-cc5b-4cc4-8de2-ca0d7f086f62', metadata={'category': 'Suicide', 'id': '7b743fe8-5283-4397-b1d3-80b58d3ea0db', 'source': ' https://www.samhsa.gov/mental-health/suicide/prevention-initiatives', 'title': 'Suicide Prevention Initiatives'}, page_content='Suicide prevention is a high priority for SAMHSA and a key area of focus in\nSAMHSA’s 2023-2026 Strategic Plan\n. Below is more information about SAMHSA’s suicide prevention initiatives.\nFunding and Grant Programs\nSAMHSA’s Suicide Prevention Branch funds discretionary grant programs focused on suicide prevention, early intervention, crisis support, treatment, recovery, and postvention for youth and adults, including:\nGarrett Lee Smith State/Tribal\n: Community-based suicide prevention for youth and young adults up to age 24. This program supports states and tribes with implementing youth suicide prevention and early intervention strategies in educational settings, juvenile justice and foster care systems, substance 

In [None]:
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)

llm_model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", quantization_config=bnb_config)
llm_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

`low_cpu_mem_usage` was None, now default to True since model is quantized.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

In [None]:
llm_tokenizer.pad_token_id = llm_tokenizer.eos_token_id

In [None]:
llm_pipeline = pipeline(
    "text-generation",
    model=llm_model,
    tokenizer=llm_tokenizer,
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=100,
    top_p=0.9,
    top_k=50,
    eos_token_id=llm_tokenizer.eos_token_id
)

In [None]:
from langchain.llms import HuggingFacePipeline

llm_final_model = HuggingFacePipeline(pipeline=llm_pipeline)

In [None]:
# Prompt Template
template = """
You are a compassionate and knowledgeable mental health assistant that answers questions related to mental health.\n
Use the following pieces of retrieved context to provide a helpful and empathetic response to the user's question.\n
Use only the context provided and not any prior knowledge.\n
If you are unsure of the answer, tell that you do not know the answer.\n
Stick to the question and just answer the question in a short manner.\n
Avoid any additional greetings or elaborations.\n

Context: \n
------------------------------------------------------------------------------\n
{context}
------------------------------------------------------------------------------\n
Given the context and without any prior knowledge, answer the below question.\n
Question: {question}
Answer:
"""

prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm_final_model,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt},
    return_source_documents = True
)

In [None]:
question = "What is bipolar disorder??"

qa_chain.invoke(question)['result']

'Bipolar disorder is a serious mental illness that causes unusual shifts in mood, ranging from extreme highs (mania or “manic” episodes) to lows (depression or “depressive” episode). A person who has bipolar disorder also experiences changes in their energy, thinking, behavior, and sleep. During bipolar mood episodes, it is difficult to carry out day-to-day tasks, go to work or school, and maintain relationships.'

In [None]:
result = qa_chain.invoke(question)['result']

In [None]:
context = result['source_documents']
print("Context:", context)



In [None]:
answer

'Bipolar disorder is a serious mental illness that causes unusual shifts in mood, ranging from extreme highs (mania or “manic” episodes) to lows (depression or “depressive” episode). A person who has bipolar disorder also experiences changes in their energy, thinking, behavior, and sleep. During bipolar mood episodes, it is difficult to carry out day-to-day tasks, go to work or school, and maintain relationships.'

In [None]:
context

[Document(id='5380558b-cc22-4a89-88db-e5d2ed6ea5b1', metadata={'category': 'Mental Illness', 'id': '160ee4cd-6f02-43bd-9549-24a76ded11a5', 'source': 'https://www.samhsa.gov/mental-health/bipolar', 'title': 'Bipolar Disorder'}, page_content='Bipolar disorder is a serious mental illness that causes unusual shifts in mood, ranging from extreme highs (mania or “manic” episodes) to lows (depression or “depressive” episode).\nA person who has bipolar disorder also experiences changes in their energy, thinking, behavior, and sleep. During bipolar mood episodes, it is difficult to carry out day-to-day tasks, go to work or school, and maintain relationships.\nWhat Causes Bipolar Disorder?\nBipolar disorder affects millions of adults in the U.S. Most people are diagnosed with bipolar disorder in their teens or twenties, however, it can occur at any age and although the symptoms can persist, many find ways to manage their symptoms successfully. People are at a higher risk if they have a family hi