<a href="https://colab.research.google.com/github/hhnafis/Simple_Langchain_RAG_app_using_Chroma/blob/main/Getting_started_with_ChromaDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Install the langchain-chroma integration package.

In [12]:
pip install -Uq langchain-chroma langchain-google-genai

##Setting up the environment variable

In [16]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from google.colab import userdata

import os
os.environ['GOOGLE_API_KEY'] = userdata.get("GOOGLE_API_KEY")
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

In [42]:
!pip install -Uq pypdf
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/AI_applications.pdf")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)
docs = text_splitter.split_documents(documents)
# print(docs)


for doc in docs:
  doc.metadata["source"] = "AI_applications.pdf"




In [43]:
from langchain.vectorstores import Chroma

vector_store = Chroma(
    collection_name="AI_introduction",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

from uuid import uuid4
from langchain_core.documents import Document

# Iterating through the split documents and creating individual Document objects
for i, doc in enumerate(docs):
    document = Document(page_content=doc.page_content, metadata={"source": "AI_applications.pdf"})
    uuid = str(uuid4())  # Generated a unique ID for each document
    vector_store.add_documents(documents=[document], ids=[uuid]) # Added document to Chroma

In [44]:
#setup the retriever
results = vector_store.similarity_search(
    "What is Artifial Intelligence?",
    k=1,
    filter={"source": "AI_applications.pdf"}
)
for res in results:
    print(f"{res.page_content} [{res.metadata}]")

It is the science and engineering of making intelligent machines, especially intelligent computer programs. It 
is related to the similar task of using computers to understand human intelligence, but AI does not have to 
confine itself to methods that are biologically observable. While no co nsensual definition of Artificial 
Intelligence (AI) exists, AI is broadly characterized as the study of computations that allow for perception, [{'source': 'AI_applications.pdf'}]


In [45]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    verbose=True,
)

In [46]:
def answer_to_user(query: str):
    # Vector Search
    vector_results = vector_store.similarity_search(query, k=1)


    # Pass the model vector search + user query
    final_answer = llm.invoke(
        f"ANSWER THIS USER QUERY: {query}, Here are some references {vector_results}"
    )


    return final_answer

In [47]:
answer = answer_to_user("What is the  artificial intelligence?")
answer.content

'Based on the provided text, Artificial Intelligence (AI) is a system that possesses thinking power and basic skills such as learning, reasoning, self-improvement (through experience), language understanding, and problem-solving.  The document highlights its use in many fields, particularly technology, and its projected job creation potential.'

In [48]:
# prompt: deploy this RAG app as an API using fast api
!pip install uvicorn
!pip install fastapi
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from google.colab import userdata
import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from uuid import uuid4
from langchain_core.documents import Document
from langchain_google_genai import ChatGoogleGenerativeAI

# Install necessary libraries (only needed once)
# !pip install -Uq fastapi uvicorn pypdf langchain-chroma langchain-google-genai

app = FastAPI()

# --- (Existing code from the original notebook) ---
os.environ['GOOGLE_API_KEY'] = userdata.get("GOOGLE_API_KEY")  # Replace with your actual key retrieval method
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

loader = PyPDFLoader("/content/AI_applications.pdf")  # Ensure the file path is correct
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=10)
docs = text_splitter.split_documents(documents)

for doc in docs:
    doc.metadata["source"] = "AI_applications.pdf"

vector_store = Chroma(
    collection_name="AI_introduction",
    embedding_function=embeddings,
    persist_directory="./chroma_langchain_db",
)

for i, doc in enumerate(docs):
    document = Document(page_content=doc.page_content, metadata={"source": "AI_applications.pdf"})
    uuid = str(uuid4())
    vector_store.add_documents(documents=[document], ids=[uuid])

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    verbose=True,
)
# --- End of existing code ---

def answer_to_user(query: str):
    vector_results = vector_store.similarity_search(query, k=1)
    final_answer = llm.invoke(
        f"ANSWER THIS USER QUERY: {query}, Here are some references {vector_results}"
    )
    return final_answer.content


class Query(BaseModel):
    query: str


@app.post("/query/")
async def query_endpoint(query_data: Query):
    try:
      response = answer_to_user(query_data.query)
      return {"response": response}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# If you are running this in colab,  you may need to install uvicorn:
# !pip install uvicorn

# To run the server:
# import uvicorn
# uvicorn main:app --reload  # Replace main with the name of your Python file if different



In [51]:
test_query = "What is AI?"
test_answer = answer_to_user(test_query)

# Print the question and answer in the desired format
print(f"Question:\n{test_query}\n\nResponse (Generated by Gemini):\n\"{test_answer}\"")

Question:
What is AI?

Response (Generated by Gemini):
"Based on the provided text from `AI_applications.pdf`, Artificial Intelligence (AI) is the science and engineering of creating intelligent machines, primarily intelligent computer programs.  While there's no single, universally agreed-upon definition, AI generally involves studying computations that enable perception.  It's related to using computers to understand human intelligence, but AI isn't limited to methods found in biological systems."
