<a href="https://colab.research.google.com/github/raheelam98/langgraph_guru/blob/main/rag_from_scratch/rag_part_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Leaderboard

1.   [Arena](https://lmarena.ai/)
2.   [Berkeley Function-Calling Leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html)
3. sora video

4.   [huggingface](https://huggingface.co/spaces/ArtificialAnalysis/Video-Generation-Arena-Leaderboard)



### RAG From Scratch

**Tutorials**

[Build a Retrieval Augmented Generation (RAG) App: Part 1](https://python.langchain.com/docs/tutorials/rag/)

[Build a Retrieval Augmented Generation (RAG) App: Part 2](https://python.langchain.com/docs/tutorials/qa_chat_history/)


[Making it easier to build human-in-the-loop agents with interrupt](https://blog.langchain.dev/making-it-easier-to-build-human-in-the-loop-agents-with-interrupt/)

### RAG From Scratch

[Build a Retrieval Augmented Generation (RAG) App: Part 1](https://python.langchain.com/docs/tutorials/rag/)


[RAG From Scratch - videos](https://www.youtube.com/watch?v=wd7TZ4w1mSw&list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x)

[rag-from-scratch - github
](https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_1_to_4.ipynb)

**Basic**

* Indexting
* Retrieval
* Generation

**Advance**

* Query transformation
* Routing
* Query contruction
* Indexing
* Retrieval
* Grenration




**Google Generative AI Embeddings**

In [1]:
# Install the required packages:
%%capture --no-stderr
%pip install --upgrade --quiet  langchain-google-genai

In [2]:
# Install the required packages:
%%capture --no-stderr
%pip install langchain_community tiktoken langchainhub chromadb langchain


In [3]:
import os
from google.colab import userdata

os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "rag_ai_agent"

In [4]:
# API Keys
# Get the GEMINI API key from user data
from google.colab import userdata
gemini_api_key = userdata.get('GEMINI_API_KEY')

In [5]:
# Initialize the ChatGoogleGenerativeAI with the Gemini model

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",  # Specify the model to use
    max_retries=2,
    api_key=gemini_api_key    # Provide the Google API key for authentication
)

In [26]:
# Invoke the LLM with a query
result = llm.invoke("hi")
result

AIMessage(content='Hi there! How can I help you today?\n', additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []}, id='run-d9752be9-fa39-4223-a18d-aecf440e35e2-0', usage_metadata={'input_tokens': 2, 'output_tokens': 11, 'total_tokens': 13, 'input_token_details': {'cache_read': 0}})

In [27]:
response = llm.invoke("Hi")
print(response)

content='Hi there! How can I help you today?\n' additional_kwargs={} response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': []} id='run-bc402e3e-db64-444b-a0de-c3a2b02067a6-0' usage_metadata={'input_tokens': 2, 'output_tokens': 11, 'total_tokens': 13, 'input_token_details': {'cache_read': 0}}


In [None]:
# from langchain_google_genai import GoogleGenerativeAIEmbeddings
# from langchain_core.vectorstores import InMemoryVectorStore

# embeddings = GoogleGenerativeAIEmbeddings(
#     model="models/embedding-001" , # Specify the embedding model to use
#     api_key=gemini_api_key
# )#vector = embeddings.embed_query("hello, world!")
# # vector[:5]

# embeddings.embed_query("Hello")


### **Embedding**

[Vector embeddings](https://platform.openai.com/docs/guides/embeddings#what-are-embeddings)

```
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001" ,
    google_api_key=gemini_api_key  # correct
    )

vector = embeddings.embed_query("hello, world!")
vector[:5]    
```

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001" , google_api_key=gemini_api_key)

vector = embeddings.embed_query("hello, world!")
vector[:4]

# embeddings.embed_query("Hello")


[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734263241291]

In [None]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
# from langchain_core.vectorstores import InMemoryVectorStore

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001" ,
    google_api_key=gemini_api_key  # correct
    )

vector = embeddings.embed_query("hello, world!")
vector[:10]


[0.05168594419956207,
 -0.030764883384108543,
 -0.03062233328819275,
 -0.02802734263241291,
 0.01813093200325966,
 -0.0018945280462503433,
 0.028477225452661514,
 -0.007562300190329552,
 0.011064725928008556,
 -0.005353901535272598]

## Indexting - Part 2

In [12]:
# Documents
question = "What kinds of pets do I like?"
document = "My favorite pet is a cat."

**How many tokens a string has before I embed it**

**Count tokens with Tiktoken**

[How to count tokens with tiktoken
](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)



In [13]:
# How many tokens a string has before I embed it
import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

num_tokens_from_string(question, "cl100k_base")


8

**Text embedding models**

[Text embedding models](https://python.langchain.com/docs/integrations/text_embedding/openai/)

In [14]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embd = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001" ,
    google_api_key=gemini_api_key  # correct
    )

query_result = embd.embed_query(question)
document_result = embd.embed_query(document)

len(query_result)

#print(type(query_result))

768

In [15]:
print(type(query_result))

<class 'list'>


**Cosine similarity is reccomended (1 indicates identical) for OpenAI embeddings.**

[Cosine similarity](https://platform.openai.com/docs/guides/embeddings#faq)

In [16]:
import numpy as np

def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    return dot_product / (norm_vec1 * norm_vec2)

similarity = cosine_similarity(query_result, document_result)
print("Cosine Similarity:", similarity)

Cosine Similarity: 0.8535652119095083


**Document Loaders**

[Document loaders](https://python.langchain.com/docs/integrations/document_loaders/)

In [17]:
import os

# Set a custom User-Agent string
os.environ["USER_AGENT"] = "WebBaseLoaderBot/1.0 (https://lilianweng.github.io/posts/2023-06-23-agent/)"

In [18]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

[How to recursively split text by characters](https://python.langchain.com/docs/how_to/recursive_text_splitter/)

In [19]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300,
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

[Vector stores](https://python.langchain.com/docs/integrations/vectorstores/)

In [20]:
# # Index

from langchain_google_genai import GoogleGenerativeAIEmbeddings

from langchain_community.vectorstores import Chroma

embd = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001" ,
    google_api_key=gemini_api_key  # correct
    )

vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=embd)

retriever = vectorstore.as_retriever()


## Part 3: Retrieval

In [21]:
# Index

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits,
                                    embedding=embd)


retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [22]:
docs = retriever.get_relevant_documents("What is Task Decomposition?")

  docs = retriever.get_relevant_documents("What is Task Decomposition?")


In [23]:
len(docs)

1

## Part 4: Generation

In [24]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.prompts import ChatPromptTemplate

# Prompt
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='Answer the question based only on the following context:\n{context}\n\nQuestion: {question}\n'), additional_kwargs={})])

In [25]:
# LLM
# from google.colab import userdata
# gemini_api_key = userdata.get('GEMINI_API_KEY')

# from langchain_google_genai import ChatGoogleGenerativeAI
# llm: ChatGoogleGenerativeAI = ChatGoogleGenerativeAI(model="gemini-1.5-flash", api_key=gemini_api_key)

from langchain_google_genai import ChatGoogleGenerativeAI
llm: ChatGoogleGenerativeAI = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

In [26]:
# Chain
chain = prompt | llm

In [27]:
# # Run  (ERROR)
chain.invoke({"context":docs,"question":"What is Task Decomposition?"})

In [None]:
from langchain import hub
prompt_hub_rag = hub.pull("rlm/rag-prompt")

In [None]:
prompt_hub_rag

**RAG chains**

In [44]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'Based on the provided text, task decomposition is the process of breaking down a complex task into smaller, simpler steps.  This can be achieved through prompting an LLM (large language model) with instructions like "Steps for XYZ...,"  using task-specific instructions (e.g., "Write a story outline"), or with human input.  Techniques like Chain of Thought (CoT) and Tree of Thoughts further enhance this process by guiding the model\'s reasoning step-by-step or exploring multiple reasoning possibilities, respectively.\n'

**Steps to Debug**

In [29]:
# 1 -  Check gemini_api_key Initialization:

# print(f"gemini_api_key: {gemini_api_key}")


In [28]:
# 2 -  Validate Splits:

print(f"Number of splits: {len(splits)}")
print(splits[0] if splits else "No splits found")


Number of splits: 52
page_content='LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Me

In [30]:
# 3 - Check Embeddings Initialization:

print(f"Embedding Model: {embd}")

Embedding Model: client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7ffaec5f1f00> model='models/embedding-001' task_type=None google_api_key=SecretStr('**********') credentials=None client_options=None transport=None request_options=None


In [31]:
# 4 - Ensure vectorstore Setup:

print(f"VectorStore: {vectorstore}")

VectorStore: <langchain_community.vectorstores.chroma.Chroma object at 0x7ffaecc9c370>


In [32]:
# 5 - Validate Retrieved Documents:

docs = retriever.get_relevant_documents("What is Task Decomposition?")
print(f"Number of retrieved docs: {len(docs)}")

Number of retrieved docs: 1


**test code**

In [46]:
# Ensure gemini_api_key is defined
assert gemini_api_key, "API Key is not defined!"

# Check if blog_docs and splits are populated
assert blog_docs, "Failed to load blog documents!"
assert splits, "Document splitting failed!"

# Validate embedding initialization
try:
    embd = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=gemini_api_key
    )
except Exception as e:
    raise ValueError(f"Error initializing embeddings: {e}")

print(f"Embedding Model: {embd}")

# Validate vector store
try:
    vectorstore = Chroma.from_documents(documents=splits, embedding=embd)
except Exception as e:
    raise ValueError(f"Error creating vectorstore: {e}")

print(f"\nVectorStore: {vectorstore}")

# Check retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

print(f"\nretriever: {retriever}")

# Retrieve documents
try:
    docs = retriever.get_relevant_documents("What is Task Decomposition?")
    assert docs, "No relevant documents found!"
except Exception as e:
    raise ValueError(f"Error retrieving documents: {e}")

print(f"\nNumber of retrieved docs: {len(docs)}")


# # Chain setup and invocation
# try:
#     response = chain.invoke({"context": docs, "question": "What is Task Decomposition?"})
#     print(response)
# except Exception as e:
#     raise ValueError(f"Error running chain: {e}")


Embedding Model: client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7ffaec96fd00> model='models/embedding-001' task_type=None google_api_key=SecretStr('**********') credentials=None client_options=None transport=None request_options=None

VectorStore: <langchain_community.vectorstores.chroma.Chroma object at 0x7ffaec96c3a0>

retriever: tags=['Chroma', 'GoogleGenerativeAIEmbeddings'] vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7ffaec96c3a0> search_kwargs={'k': 1}

Number of retrieved docs: 1


In [34]:
# # Chain setup and invocation  (ERROR)
# try:
#     response = chain.invoke({"context": docs, "question": "What is Task Decomposition?"})
#     print(response)
# except Exception as e:
#     raise ValueError(f"Error running chain: {e}")

# print(f"response: {response}")


**test 2**

In [35]:
#Ensure the API Key Is Correct and Passed Properly:

embd = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key=gemini_api_key
)

llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    google_api_key=gemini_api_key
)


In [37]:
# # Test API Key Validity: (ERROR)

# from google.auth.transport.requests import Request
# from google.oauth2.credentials import Credentials

# # Test the key with a simple request
# creds = Credentials(token=gemini_api_key)
# request = Request()
# try:
#     creds.refresh(request)
#     print("API Key is valid.")
# except Exception as e:
#     raise ValueError(f"Invalid API Key: {e}")



In [47]:
print(f"Embedding Model: {embd}")
print(f"\nLLM Model: {llm}")
print(f"\nRetrieved Docs: {docs}")


Embedding Model: client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7ffaec96fd00> model='models/embedding-001' task_type=None google_api_key=SecretStr('**********') credentials=None client_options=None transport=None request_options=None

LLM Model: model='models/gemini-1.5-flash' google_api_key=SecretStr('**********') client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7ffaec9b5570> default_metadata=()

Retrieved Docs: [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The mod

In [43]:
# # Prepare context as plain text.     # ERROR
# context = "\n\n".join([doc.page_content for doc in docs])

# # Chain setup and invocation
# try:
#     response = chain.invoke({"context": context, "question": "What is Task Decomposition?"})
#     print(f"response: {response}")
# except Exception as e:
#     raise ValueError(f"Error running chain: {e}")


In [40]:
# print(f"Context: {context}")  # ERROR