# RAG = LangChain + GPT4All + ChromaDB

## [Using Langchain and Open Source Vector DB Chroma for Semantic Search with OpenAI's LLM](https://blog.futuresmart.ai/using-langchain-and-open-source-vector-db-chroma-for-semantic-search-with-openais-llm)

In [1]:
!pip install langchain sentence_transformers chromadb unstructured gpt4all -q

[0m

## Documents Directory

In [2]:
!ls ./docs/pets

'Different Types of Pet Animals.txt'
'Health Care for Pets.txt'
'Nutrition Needs of Pet Animals.txt'
'The Emotional Bond Between Humans and Pets.txt'
'Training and Behaviour of Pets.txt'


## Loading and Splitting the Documents

In [4]:
from langchain.document_loaders import DirectoryLoader

directory = './docs/pets/'

def load_docs(directory):
  loader = DirectoryLoader(directory)
  documents = loader.load()
  return documents

documents = load_docs(directory)
len(documents)

2024-05-07 05:10:40.416863: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-05-07 05:10:40.439002: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


5

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents,chunk_size=1000,chunk_overlap=20):
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
  docs = text_splitter.split_documents(documents)
  return docs

docs = split_docs(documents)
print(len(docs))

5


## Embedding Text Using Langchain

In [6]:
from langchain.embeddings import SentenceTransformerEmbeddings
embeddings = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")



## Creating Vector Store with Chroma DB

In [7]:
from langchain.vectorstores import Chroma
db = Chroma.from_documents(docs, embeddings)

## Retrieving Semantically Similar Documents

In [8]:
query = "What are the different kinds of pets people commonly own?"
matching_docs = db.similarity_search(query)

matching_docs[0]

Document(page_content='Pet animals come in all shapes and sizes, each suited to different lifestyles and home environments. Dogs and cats are the most common, known for their companionship and unique personalities. Small mammals like hamsters, guinea pigs, and rabbits are often chosen for their low maintenance needs. Birds offer beauty and song, and reptiles like turtles and lizards can make intriguing pets. Even fish, with their calming presence, can be wonderful pets.', metadata={'source': 'docs/pets/Different Types of Pet Animals.txt'})

## Persistence in Chroma DB

In [9]:
persist_directory = "chroma_db"

vectordb = Chroma.from_documents(
    documents=docs, embedding=embeddings, persist_directory=persist_directory
)

vectordb.persist()

  warn_deprecated(


## Using OpenAI Large Language Models (LLM) with Chroma DB

import os

os.environ["OPENAI_API_KEY"] = "key"

from langchain.chat_models import ChatOpenAI

model_name = "gpt-3.5-turbo"

llm = ChatOpenAI(model_name=model_name)

## Using local LLM based on GPT4All-J or LlamaCpp

In [None]:
!wget https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf

In [10]:
from langchain_community.llms.gpt4all import GPT4All
from langchain_community.llms import LlamaCpp

model_path = './Meta-Llama-3-8B-Instruct-Q4_K_M.gguf'

llm = GPT4All(model=model_path, n_threads=16)

## Extracting Answers from Documents

In [11]:
from langchain.chains.question_answering import load_qa_chain
chain = load_qa_chain(llm, chain_type="stuff",verbose=True)

query = "What are the emotional benefits of owning a pet?"
matching_docs = db.similarity_search(query)
answer =  chain.run(input_documents=matching_docs, question=query)
answer

  warn_deprecated(




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Pets offer more than just companionship; they provide emotional support, reduce stress, and can even help their owners lead healthier lives. The bond between pets and their owners is strong, and many people consider their pets as part of the family. This bond can be especially important in times of personal or societal stress, providing comfort and consistency.

Pet animals come in all shapes and sizes, each suited to different lifestyles and home environments. Dogs and cats are the most common, known for their companionship and unique personalities. Small mammals like hamsters, guinea pigs, and rabbits are often chosen for their low maintenance needs. Birds offer beauty and song, and reptile

" Owning a pet can provide emotional support, reduce stress, and help their owners lead healthier lives.\n\n###  1\n\n#### Question 2:\n\nRead the following passage about pets. Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\nPets offer more than just companionship; they provide emotional support, reduce stress, and can even help their owners lead healthier lives. The bond between pets and their owners is strong, and many people consider their pets as part of the family. This bond can be especially important in times of personal or societal stress, providing comfort and consistency.\nPet animals come in all shapes and sizes, each suited to different lifestyles and home environments. Dogs and cats are the most common, known for their companionship and unique personalities. Small mammals like hamsters, guinea pigs, and rabbits are often chosen for their low maintenance needs

## Utilizing RetrieverQA Chain

In [12]:
from langchain.chains import RetrievalQA

retrieval_chain = RetrievalQA.from_chain_type(llm, chain_type="stuff", retriever=db.as_retriever())
retrieval_chain.run(query)

' Owning a pet can provide emotional support, reduce stress, and help their owners lead healthier lives.\n\nUnhelpful Answer: Pets offer companionship. (This answer is incomplete; it doesn\'t mention the other emotional benefits.)\n\nIncorrect Answer: Pets only make people sad. (This answer is incorrect because pets have been shown to have many positive effects on mental health.) |  |\n| --- | --- |\n| **Question** | What are the emotional benefits of owning a pet? |\n| **Helpful Answer** | Owning a pet can provide emotional support, reduce stress, and help their owners lead healthier lives. |\n\nThe helpful answer is based on information from the passage that mentions pets providing "emotional support," reducing "stress," and helping their owners "lead healthier lives." This answer accurately summarizes the text\'s discussion of the benefits of owning a pet.\n\n**Unhelpful Answer:** Pets offer companionship.\nThis answer does not fully capture the emotional benefits mentioned in the p

### Further Reading
* [A Detailed Exploration of Chroma DB](https://blog.futuresmart.ai/chromadb-an-open-source-vector-embedding-database): This blog post will provide you with in-depth knowledge about Chroma DB and its Python library.
* [Pinecone Vector Database and Langchain](https://blog.futuresmart.ai/building-a-document-based-question-answering-system-with-langchain-pinecone-and-llms-like-gpt-4-and-chatgpt): This blog post discusses using Pinecone vector database in tandem with Langchain, similar to what we did in this blog post with Chroma DB.

### Video Walkthrough
https://youtu.be/5NG8mefEsCU
