In [1]:
# Basic RAG Based Application

In [None]:
# End to End RAG Based Application -
    # Input Data (External Knowledge Base)
    # Splitting - Text Splitters
    # Vector Stores / Vector DBs
    # Integrate LLMs
    # Q&A System
    # UI

In [2]:
# Step 1 - Generate External Knowledge (as challenges in uploading pdf/docs etc)

In [None]:
# Prompt to LLM =>
# Lets build an end to end RAG based application which will start from Text splitters and will end on building Gradio UI. So for starting to build this application - generate some sample text (external knowledge) - 100 - lines on Gen-AI and Langchain topic and store it in a variable.


In [5]:
# Generate sample text about Gen-AI and Langchain
sample_text = """
Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation. Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language.

Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured way to chain together different components, such as LLMs, data sources, and other tools, to build more complex and powerful applications. Langchain's modular design allows developers to easily swap out components and experiment with different configurations.

One of the core concepts in Langchain is the "chain," which represents a sequence of operations performed by different components. For example, a simple chain might involve taking user input, passing it to an LLM to generate a response, and then formatting the response before presenting it to the user. More complex chains can involve retrieving information from external knowledge bases, performing calculations, or interacting with other APIs.

Retrieval Augmented Generation (RAG) is a technique that combines the power of LLMs with the ability to retrieve relevant information from external sources. In a RAG system, the LLM's generation process is augmented by retrieved information, leading to more accurate, factual, and contextually relevant responses. Langchain provides built-in support for building RAG applications, making it easier to integrate external knowledge into LLM-powered systems.

Text splitting is a crucial step in preparing large documents or datasets for use with LLMs and RAG systems. Since LLMs have limitations on the amount of text they can process at once, large texts need to be split into smaller chunks. Various text splitting strategies exist, such as splitting by characters, words, or sentences, and different strategies may be more suitable for different types of data and applications.

Vector stores, also known as vector databases, are specialized databases designed to store and search for vector representations of data. In the context of LLMs and RAG, text chunks are often converted into numerical vectors (embeddings) using embedding models. These vectors capture the semantic meaning of the text, allowing for efficient similarity search. When a user query is received, it is also converted into a vector, and the vector store is queried to find the most similar text chunks from the external knowledge base.

Integrating LLMs into applications involves selecting an appropriate LLM, configuring its parameters, and handling the input and output. Langchain provides connectors for various LLMs, including popular models from OpenAI, Google, and others. It also offers tools for managing prompts, parsing model outputs, and handling conversational flows.

Building a Question Answering (Q&A) system using LLMs and RAG involves several steps:
1. Loading the external knowledge base.
2. Splitting the knowledge base into smaller chunks.
3. Generating embeddings for the text chunks and storing them in a vector store.
4. Receiving a user query.
5. Generating an embedding for the user query.
6. Searching the vector store for relevant text chunks based on the query embedding.
7. Passing the retrieved text chunks and the user query to an LLM to generate an answer.
8. Presenting the answer to the user.

User interfaces (UIs) are essential for interacting with RAG-based applications. Gradio is a popular Python library for building simple and interactive UIs for machine learning models and demos. It allows developers to quickly create web interfaces with input and output components, such as text boxes, image displays, and audio players. Integrating Gradio with Langchain enables the creation of user-friendly Q&A interfaces that allow users to ask questions and receive answers generated by the RAG system.

Langchain's expressiveness allows for the creation of complex chains involving multiple steps and components. For instance, a chain might involve retrieving information, summarizing it with an LLM, and then using another LLM to generate a response based on the summary. This flexibility makes Langchain suitable for a wide range of natural language processing tasks.

The choice of text splitter can significantly impact the performance of a RAG system. Different splitters have different parameters, such as chunk size and overlap, which can be tuned to optimize the retrieval process. Experimenting with different splitters and parameters is often necessary to find the best approach for a given dataset and application.

Embedding models play a crucial role in converting text into meaningful vector representations. The quality of the embeddings directly affects the accuracy of the similarity search in the vector store. Various embedding models are available, each with its strengths and weaknesses. Selecting an appropriate embedding model is an important consideration when building a RAG system.

Vector stores provide efficient mechanisms for storing and searching high-dimensional vectors. Different vector stores offer different features and performance characteristics. Some popular vector stores include FAISS, Annoy, Pinecone, and Weaviate. The choice of vector store depends on factors such as the size of the knowledge base, the required search performance, and the desired scalability.

Integrating LLMs with external APIs allows for the creation of applications that can interact with real-world services. For example, an LLM-powered application could use an API to retrieve weather information, make a reservation, or send an email. Langchain provides tools for integrating with various APIs, enabling the creation of more powerful and versatile applications.

Handling conversational history is important for building engaging and contextually aware Q&A systems. Langchain provides mechanisms for managing conversational memory, allowing the LLM to remember previous turns in the conversation and generate responses that are consistent with the ongoing dialogue.

Error handling and robustness are important considerations when building RAG-based applications. Potential issues include errors during text loading, splitting, embedding, retrieval, or LLM generation. Implementing proper error handling mechanisms is essential to ensure the application's stability and reliability.

Evaluating the performance of a RAG system involves measuring its ability to retrieve relevant information and generate accurate answers. Various evaluation metrics can be used, such as precision, recall, F1-score, and ROUGE. Evaluating the system's performance on a representative dataset is crucial for identifying areas for improvement.

Fine-tuning LLMs on specific datasets can further improve their performance on particular tasks. While pre-trained LLMs are powerful, fine-tuning them on a domain-specific dataset can lead to more accurate and relevant responses in that domain. Langchain can be used to integrate fine-tuned LLMs into RAG systems.

Deploying RAG-based applications involves packaging the application components and making them accessible to users. Various deployment options are available, such as deploying the application as a web service, a desktop application, or a mobile application.

Security and privacy are important considerations when building RAG-based applications, especially when dealing with sensitive data. Protecting the external knowledge base and ensuring the privacy of user queries are crucial.

The field of Gen-AI and LLMs is constantly evolving, with new models and techniques being developed regularly. Staying updated with the latest advancements is important for building cutting-edge RAG-based applications.

Langchain's community and documentation are valuable resources for developers building RAG-based applications. The community provides support, shares examples, and contributes to the framework's development. The documentation provides detailed information on Langchain's components and features.

Building a successful RAG-based application requires a combination of technical skills, domain expertise, and careful consideration of the application's requirements and constraints.

Langchain simplifies the process of building complex LLM applications by providing a structured and modular framework. Its focus on chains and components makes it easier to reason about and manage the different parts of an application.

The ability to integrate external knowledge is a key advantage of RAG systems. By leveraging external data sources, RAG systems can overcome the limitations of the LLM's internal knowledge and provide more accurate and up-to-date information.

Text splitting strategies can be tailored to the specific characteristics of the text data. For example, splitting by sentences might be suitable for conversational data, while splitting by paragraphs might be more appropriate for documents.

Embedding models are trained on different types of data and have varying levels of performance on different tasks. Choosing an embedding model that is well-suited to the domain of the external knowledge base can improve the retrieval accuracy.

Vector stores offer different indexing techniques and search algorithms, which can impact the search performance. Selecting a vector store that provides efficient search for the desired scale and dimensionality of the vectors is important.

Integrating external APIs can extend the capabilities of RAG-based applications and enable them to interact with the real world. This allows for the creation of applications that can perform actions based on the information retrieved from the external knowledge base.

Managing conversational memory is essential for building engaging and natural-sounding Q&A systems. By remembering the conversational history, the LLM can maintain context and generate responses that are relevant to the ongoing dialogue.

Implementing proper error handling mechanisms is crucial for ensuring the stability and reliability of RAG-based applications. This involves anticipating potential errors and implementing strategies to handle them gracefully.

Evaluating the performance of a RAG system is an iterative process that involves refining the system's components and parameters based on the evaluation results.

Fine-tuning LLMs on specific datasets can improve their performance on particular tasks and domains. This can be especially beneficial when dealing with specialized or technical knowledge bases.

Deploying RAG-based applications requires careful planning and consideration of the deployment environment and infrastructure.

Security and privacy should be considered throughout the development process of RAG-based applications. Implementing appropriate security measures is essential to protect sensitive data and ensure user privacy.

Staying updated with the latest advancements in Gen-AI and LLMs is important for building cutting-edge RAG-based applications. The field is rapidly evolving, with new models and techniques being developed regularly.

Langchain's community and documentation are valuable resources for developers seeking support and information on building RAG-based applications.

Building a successful RAG-based application requires a combination of technical skills, domain expertise, and careful consideration of the application's requirements and constraints.

Langchain simplifies the process of building complex LLM applications by providing a structured and modular framework. Its focus on chains and components makes it easier to reason about and manage the different parts of an application.

The ability to integrate external knowledge is a key advantage of RAG systems. By leveraging external data sources, RAG systems can overcome the limitations of the LLM's internal knowledge and provide more accurate and up-to-date information.

Text splitting strategies can be tailored to the specific characteristics of the text data. For example, splitting by sentences might be suitable for conversational data, while splitting by paragraphs might be more appropriate for documents.

Embedding models are trained on different types of data and have varying levels of performance on different tasks. Choosing an embedding model that is well-suited to the domain of the external knowledge base can improve the retrieval accuracy.

Vector stores offer different indexing techniques and search algorithms, which can impact the search performance. Selecting a vector store that provides efficient search for the desired scale and dimensionality of the vectors is important.

Integrating external APIs can extend the capabilities of RAG-based applications and enable them to interact with the real world. This allows for the creation of applications that can perform actions based on the information retrieved from the external knowledge base.

Managing conversational memory is essential for building engaging and natural-sounding Q&A systems. By remembering the conversational history, the LLM can maintain context and generate responses that are relevant to the ongoing dialogue.

Implementing proper error handling mechanisms is crucial for ensuring the stability and reliability of RAG-based applications. This involves anticipating potential errors and implementing strategies to handle them gracefully.

Evaluating the performance of a RAG system is an iterative process that involves refining the system's components and parameters based on the evaluation results.

Fine-tuning LLMs on specific datasets can improve their performance on particular tasks and domains. This can be especially beneficial when dealing with specialized or technical knowledge bases.

Deploying RAG-based applications requires careful planning and consideration of the deployment environment and infrastructure.

Security and privacy should be considered throughout the development process of RAG-based applications. Implementing appropriate security measures is essential to protect sensitive data and ensure user privacy.

Staying updated with the latest advancements in Gen-AI and LLMs is important for building cutting-edge RAG-based applications. The field is rapidly evolving, with new models and techniques being developed regularly.

Langchain's community and documentation are valuable resources for developers seeking support and information on building RAG-based applications.

Building a successful RAG-based application requires a combination of technical skills, domain expertise, and careful consideration of the application's requirements and constraints.

Langchain simplifies the process of building complex LLM applications by providing a structured and modular framework. Its focus on chains and components makes it easier to reason about and manage the different parts of an application.

The ability to integrate external knowledge is a key advantage of RAG systems. By leveraging external data sources, RAG systems can overcome the limitations of the LLM's internal knowledge and provide more accurate and up-to-date information.

Text splitting strategies can be tailored to the specific characteristics of the text data. For example, splitting by sentences might be suitable for conversational data, while splitting by paragraphs might be more appropriate for documents.

Embedding models are trained on different types of data and have varying levels of performance on different tasks. Choosing an embedding model that is well-suited to the domain of the external knowledge base can improve the retrieval accuracy.

Vector stores offer different indexing techniques and search algorithms, which can impact the search performance. Selecting a vector store that provides efficient search for the desired scale and dimensionality of the vectors is important.

Integrating external APIs can extend the capabilities of RAG-based applications and enable them to interact with the real world. This allows for the creation of applications that can perform actions based on the information retrieved from the external knowledge base.

Managing conversational memory is essential for building engaging and natural-sounding Q&A systems. By remembering the conversational history, the LLM can maintain context and generate responses that are relevant to the ongoing dialogue.

Implementing proper error handling mechanisms is crucial for ensuring the stability and reliability of RAG-based applications. This involves anticipating potential errors and implementing strategies to handle them gracefully.

Evaluating the performance of a RAG system is an iterative process that involves refining the system's components and parameters based on the evaluation results.

Fine-tuning LLMs on specific datasets can improve their performance on particular tasks and domains. This can be especially beneficial when dealing with specialized or technical knowledge bases.

Deploying RAG-based applications requires careful planning and consideration of the deployment environment and infrastructure.

Security and privacy should be considered throughout the development process of RAG-based applications. Implementing appropriate security measures is essential to protect sensitive data and ensure user privacy.

Staying updated with the latest advancements in Gen-AI and LLMs is important for building cutting-edge RAG-based applications. The field is rapidly evolving, with new models and techniques being developed regularly.

Langchain's community and documentation are valuable resources for developers seeking support and information on building RAG-based applications.

Building a successful RAG-based application requires a combination of technical skills, domain expertise, and careful consideration of the application's requirements and constraints.
"""

In [11]:
# Generate sample text about Gen-AI and Langchain with topic headers and chapter names
sample_text_with_chapters = """
# Chapter 1: Introduction to Generative AI
## What is Gen-AI?
Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation.

## Large Language Models (LLMs)
Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language. They are the backbone of many modern AI applications.

# Chapter 2: Understanding Langchain
## What is Langchain?
Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured way to chain together different components, such as LLMs, data sources, and other tools.

## The Concept of Chains
One of the core concepts in Langchain is the "chain," which represents a sequence of operations performed by different components. This allows for building complex workflows.

# Chapter 3: Retrieval Augmented Generation (RAG)
## RAG Explained
Retrieval Augmented Generation (RAG) is a technique that combines the power of LLMs with the ability to retrieve relevant information from external sources. This enhances the accuracy and relevance of generated responses.

## RAG in Langchain
Langchain provides built-in support for building RAG applications, making it easier to integrate external knowledge into LLM-powered systems.

# Chapter 4: Data Preparation for RAG
## Text Splitting
Text splitting is a crucial step in preparing large documents or datasets for use with LLMs and RAG systems. Large texts need to be split into smaller chunks due to LLM limitations.

## Text Splitting Strategies
Various text splitting strategies exist, such as splitting by characters, words, or sentences, and different strategies may be more suitable for different types of data.

# Chapter 5: Vector Stores and Embeddings
## Vector Stores
Vector stores, also known as vector databases, are specialized databases designed to store and search for vector representations of data.

## Embeddings
Text chunks are often converted into numerical vectors (embeddings) using embedding models. These vectors capture the semantic meaning of the text, allowing for efficient similarity search.

# Chapter 6: Integrating LLMs
## Selecting and Configuring LLMs
Integrating LLMs into applications involves selecting an appropriate LLM, configuring its parameters, and handling the input and output.

## Langchain Connectors
Langchain provides connectors for various LLMs, including popular models from OpenAI, Google, and others, simplifying the integration process.

# Chapter 7: Building a Q&A System
## Steps in Building a Q&A System
Building a Question Answering (Q&A) system using LLMs and RAG involves several steps, from loading data to generating answers.

## Q&A Workflow
The workflow typically includes loading data, splitting text, generating embeddings, storing in a vector store, receiving queries, retrieving relevant chunks, and generating answers with an LLM.

# Chapter 8: User Interfaces with Gradio
## Importance of UIs
User interfaces (UIs) are essential for interacting with RAG-based applications, providing a way for users to input queries and receive responses.

## Using Gradio
Gradio is a popular Python library for building simple and interactive UIs for machine learning models and demos. It allows quickly creating web interfaces.

## Gradio and Langchain
Integrating Gradio with Langchain enables the creation of user-friendly Q&A interfaces that allow users to ask questions and receive answers.

# Chapter 9: Advanced Langchain Concepts
## Expressiveness of Chains
Langchain's expressiveness allows for the creation of complex chains involving multiple steps and components, suitable for a wide range of NLP tasks.

## Customizing Splitters
The choice of text splitter can significantly impact RAG performance. Different splitters have parameters that can be tuned for optimization.

# Chapter 10: Embedding Models in Detail
## Role of Embedding Models
Embedding models play a crucial role in converting text into meaningful vector representations. The quality of embeddings affects similarity search accuracy.

## Selecting Embedding Models
Various embedding models are available, each with strengths and weaknesses. Selecting an appropriate model for the domain is important.

# Chapter 11: Exploring Vector Stores
## Vector Store Options
Different vector stores offer varying features and performance. Popular options include FAISS, Annoy, Pinecone, and Weaviate.

## Choosing a Vector Store
The choice of vector store depends on factors like data size, required search performance, and desired scalability.

# Chapter 12: Integrating External APIs
## Extending Capabilities
Integrating LLMs with external APIs allows for creating applications that interact with real-world services, extending their capabilities.

## Langchain API Tools
Langchain provides tools for integrating with various APIs, enabling the creation of more powerful and versatile applications.

# Chapter 13: Conversational Memory
## Handling Conversation History
Handling conversational history is important for building engaging and contextually aware Q&A systems.

## Langchain Memory Mechanisms
Langchain provides mechanisms for managing conversational memory, allowing the LLM to remember previous turns and generate consistent responses.

# Chapter 14: Error Handling and Robustness
## Importance of Error Handling
Error handling and robustness are important considerations when building RAG applications to ensure stability and reliability.

## Implementing Error Handling
Implementing proper error handling mechanisms is essential to anticipate potential errors during different stages of the RAG process.

# Chapter 15: Evaluating RAG Systems
## Evaluating Performance
Evaluating the performance of a RAG system involves measuring its ability to retrieve relevant information and generate accurate answers.

## Evaluation Metrics
Various evaluation metrics can be used, such as precision, recall, F1-score, and ROUGE, to assess system performance.

# Chapter 16: Fine-tuning LLMs
## Improving Performance
Fine-tuning LLMs on specific datasets can further improve their performance on particular tasks and domains.

## Fine-tuning with Langchain
Langchain can be used to integrate fine-tuned LLMs into RAG systems, leveraging their specialized knowledge.

# Chapter 17: Deploying RAG Applications
## Deployment Options
Deploying RAG-based applications involves packaging components and making them accessible to users through various options like web services or desktop applications.

## Planning Deployment
Deploying requires careful planning and consideration of the deployment environment and infrastructure.

# Chapter 18: Security and Privacy
## Protecting Sensitive Data
Security and privacy are important considerations, especially when dealing with sensitive data in RAG applications.

## Implementing Security Measures
Implementing appropriate security measures is essential to protect the external knowledge base and ensure user privacy.

# Chapter 19: Staying Updated
## Evolving Field
The field of Gen-AI and LLMs is constantly evolving, with new models and techniques regularly developed.

## Importance of Staying Updated
Staying updated with the latest advancements is important for building cutting-edge RAG applications.

# Chapter 20: Resources and Community
## Langchain Resources
Langchain's community and documentation are valuable resources for developers seeking support and information.

## Building Successful Applications
Building a successful RAG application requires technical skills, domain expertise, and careful consideration of requirements.
"""

In [7]:
print(sample_text)


Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation. Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language.

Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured way to chain together different components, such as LLMs, data sources, and other tools, to build more complex and powerful applications. Langchain's modular design allows developers to easily swap out components and experiment with different configurations.

One of the core concepts in Langchain is the "chain," which represents a sequence of operations performed by different components. For example, a simple chain might involve taking user input, passing it to an LLM to gener

In [9]:
# Step 2 => Create Chunks of External Knowledge by using Text Splitters

# Prompt => Split the above 'sample_text' using a basic character splitter from text splitter class in Langchain and display few chunks.

In [10]:
from langchain.text_splitter import CharacterTextSplitter

# Initialize the CharacterTextSplitter
# chunk_size and chunk_overlap can be adjusted based on your needs
text_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)

# Split the sample_text
texts = text_splitter.create_documents([sample_text])

# Display a few chunks
print(type(sample_text))
print(type(text_splitter))
print(type(texts))

print(f"Number of chunks: {len(texts)}")
print("First 5 chunks:")
for i, text in enumerate(texts[:5]):
    print(f"--- Chunk {i+1} ---")
    print(text.page_content)

<class 'str'>
<class 'langchain_text_splitters.character.CharacterTextSplitter'>
<class 'list'>
Number of chunks: 21
First 5 chunks:
--- Chunk 1 ---
Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation. Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language.

Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured way to chain together different components, such as LLMs, data sources, and other tools, to build more complex and powerful applications. Langchain's modular design allows developers to easily swap out components and experiment with different configurations.
--- Chunk 2 ---
One of the core concepts in Langchain is the "chain," whic

In [12]:
print(sample_text_with_chapters)


# Chapter 1: Introduction to Generative AI
## What is Gen-AI?
Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation.

## Large Language Models (LLMs)
Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language. They are the backbone of many modern AI applications.

# Chapter 2: Understanding Langchain
## What is Langchain?
Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured way to chain together different components, such as LLMs, data sources, and other tools.

## The Concept of Chains
One of the core concepts in Langchain is the "chain," which represents a sequence of operations performed by different components. This allows for building co

In [None]:
# Prompt => From the above 'sample_text_with_chapters' split the input text using Text plitters in langchain - the chunks/splits should be based on each 'chapter name'.

In [13]:
from langchain.text_splitter import MarkdownHeaderTextSplitter

# Define the headers to split on
headers_to_split_on = [
    ("#", "Chapter"),
    ("##", "Topic"),
]

# Initialize the MarkdownHeaderTextSplitter
markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

# Split the sample_text_with_chapters
md_header_splits = markdown_splitter.split_text(sample_text_with_chapters)

# Display a few chunks
print(type(headers_to_split_on))
print(type(markdown_splitter))
print(type(md_header_splits))


print(f"Number of chunks: {len(md_header_splits)}")
print("First 5 chunks:")
for i, chunk in enumerate(md_header_splits[:5]):
    print(f"--- Chunk {i+1} ---")
    print(chunk)

<class 'list'>
<class 'langchain_text_splitters.markdown.MarkdownHeaderTextSplitter'>
<class 'list'>
Number of chunks: 41
First 5 chunks:
--- Chunk 1 ---
page_content='Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation.' metadata={'Chapter': 'Chapter 1: Introduction to Generative AI', 'Topic': 'What is Gen-AI?'}
--- Chunk 2 ---
page_content='Large Language Models (LLMs) are a prominent example of Gen-AI, trained on vast amounts of text data to understand and generate human-like language. They are the backbone of many modern AI applications.' metadata={'Chapter': 'Chapter 1: Introduction to Generative AI', 'Topic': 'Large Language Models (LLMs)'}
--- Chunk 3 ---
page_content='Langchain is a framework designed to simplify the development of applications using large language models. It provides a structured 

In [14]:
# Prompt => for the above splits - create embeddings and display few embeddings too. Use openai embedding models to build embeddings. Store the embeddings later in vector store / db, do not write the code to storing, stick to create and display embeddings only.

In [15]:
%pip install --upgrade --quiet  openai

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.0/812.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [17]:
from openai import OpenAI
import os
from google.colab import userdata

# Set your OpenAI API key
# In Colab, add the key to the secrets manager under the "🔑" in the left panel.
# Give it the name `OPENAI_API_KEY`.
# Then pass the key to the SDK:

openai_api_key = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = openai_api_key

# Initialize the OpenAI client
client = OpenAI()

# Generate embeddings
embeddings = []
for chunk in md_header_splits:
    response = client.embeddings.create(
        input=chunk.page_content,
        model="text-embedding-ada-002"  # You can choose a different embedding model if needed
        #model = "text-embedding-3-small"
      )
    embeddings.append(response.data[0].embedding)

# Display a few embeddings
print(f"Number of embeddings generated: {len(embeddings)}")
print("First 3 embeddings:")
for i, embedding in enumerate(embeddings[:3]):
    print(f"--- Embedding {i+1} (first 10 dimensions) ---")
    print(embedding[:10])

Number of embeddings generated: 41
First 3 embeddings:
--- Embedding 1 (first 10 dimensions) ---
[-0.025043487548828125, -0.012731093913316727, -0.016826502978801727, 0.007274910807609558, 0.012358189560472965, 0.020529380068182945, -0.029125811532139778, 0.015177871100604534, -0.01033665332943201, -0.04071856662631035]
--- Embedding 2 (first 10 dimensions) ---
[-0.023705128580331802, 0.007175141014158726, 0.0011254636337980628, -0.010037247091531754, -0.00905671063810587, 0.013416122645139694, -0.007380523718893528, 0.02825004793703556, -0.01583433710038662, -0.0336032472550869]
--- Embedding 3 (first 10 dimensions) ---
[-0.007045831996947527, 0.008539576083421707, -0.015674076974391937, -0.04586270451545715, 0.009446735493838787, 0.0065854317508637905, -0.0030352326575666666, 0.011138280853629112, 0.012529713101685047, -0.026696404442191124]


In [18]:
# Note => Use a different embedding model below

In [19]:
from openai import OpenAI
import os
from google.colab import userdata

# Set your OpenAI API key
# In Colab, add the key to the secrets manager under the "🔑" in the left panel.
# Give it the name `OPENAI_API_KEY`.
# Then pass the key to the SDK:

openai_api_key = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = openai_api_key

# Initialize the OpenAI client
client = OpenAI()

# Generate embeddings
embeddings = []
for chunk in md_header_splits:
    response = client.embeddings.create(
        input=chunk.page_content,
        #model="text-embedding-ada-002"  # You can choose a different embedding model if needed
        model = "text-embedding-3-small"
      )
    embeddings.append(response.data[0].embedding)

# Display a few embeddings
print(f"Number of embeddings generated: {len(embeddings)}")
print("First 3 embeddings:")
for i, embedding in enumerate(embeddings[:3]):
    print(f"--- Embedding {i+1} (first 10 dimensions) ---")
    print(embedding[:10])

Number of embeddings generated: 41
First 3 embeddings:
--- Embedding 1 (first 10 dimensions) ---
[0.014702312648296356, 0.009112956933677197, 0.02872917428612709, -0.007750798016786575, 0.010638350620865822, -0.08983495831489563, 0.011617753654718399, 0.005397977773100138, 0.022233588621020317, -0.02839144878089428]
--- Embedding 2 (first 10 dimensions) ---
[0.0038948608562350273, -0.000697912706527859, 0.053160130977630615, -0.008831404149532318, 0.011985068209469318, -0.04526166245341301, -0.019322631880640984, -0.012088091112673283, -0.013358714990317822, 0.051008082926273346]
--- Embedding 3 (first 10 dimensions) ---
[-0.03487681597471237, -0.04441509768366814, 0.04697801172733307, -0.0356694720685482, 0.016064472496509552, -0.012523947283625603, -0.0033852970227599144, -0.003979788161814213, -0.049223870038986206, 0.0212299395352602]


In [20]:
# Step 4 Build a Vector Store & Vector DB

In [None]:
# Prompt 1 => cretae a FAISS vector store and write/store the embeddings for 'sample_text_with_chapters'

In [22]:
%pip install langchain-community



In [24]:
# Prompt => cretae a FAISS vector store and write/store the embeddings for 'sample_text_with_chapters' - print a success message once the embeddings are stored.

In [25]:
%pip install --upgrade --quiet  faiss-cpu

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.4/31.4 MB[0m [31m27.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
%pip install --upgrade --quiet  langchain-openai

In [30]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
import os
from google.colab import userdata

# Set your OpenAI API key
# In Colab, add the key to the secrets manager under the "🔑" in the left panel.
# Give it the name `OPENAI_API_KEY`.
# Then pass the key to the SDK:

openai_api_key = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = openai_api_key

# Initialize the OpenAI Embedding model
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small") # Using the same model as before

# Create a FAISS vector store from the text chunks and embeddings
vectorstore = FAISS.from_documents(md_header_splits, embeddings_model)

print(type(vectorstore))
print("Embeddings successfully stored in FAISS vector store.")

<class 'langchain_community.vectorstores.faiss.FAISS'>
Embeddings successfully stored in FAISS vector store.


In [31]:
# Prompt => Display few embeddings from the above FAISS Vector Store.

In [32]:
# Perform a similarity search to demonstrate the vector store
query = "What is RAG?"
docs = vectorstore.similarity_search(query)

print(f"Number of relevant documents found: {len(docs)}")
print("Relevant documents:")
for i, doc in enumerate(docs):
    print(f"--- Document {i+1} ---")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}") # Display metadata which includes chapter/topic info

Number of relevant documents found: 4
Relevant documents:
--- Document 1 ---
Content: Retrieval Augmented Generation (RAG) is a technique that combines the power of LLMs with the ability to retrieve relevant information from external sources. This enhances the accuracy and relevance of generated responses.
Metadata: {'Chapter': 'Chapter 3: Retrieval Augmented Generation (RAG)', 'Topic': 'RAG Explained'}
--- Document 2 ---
Content: Building a successful RAG application requires technical skills, domain expertise, and careful consideration of requirements.
Metadata: {'Chapter': 'Chapter 20: Resources and Community', 'Topic': 'Building Successful Applications'}
--- Document 3 ---
Content: Evaluating the performance of a RAG system involves measuring its ability to retrieve relevant information and generate accurate answers.
Metadata: {'Chapter': 'Chapter 15: Evaluating RAG Systems', 'Topic': 'Evaluating Performance'}
--- Document 4 ---
Content: Staying updated with the latest advancements

In [33]:
%pip install --upgrade --quiet  chromadb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.2/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━[0m [32m41.0/67.3 kB[0m [31m528.7 kB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━[0m [32m41.0/67.3 kB[0m [31m528.7 kB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m493.4 kB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.8/19.8 MB[0m [31m42.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

In [40]:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
import os
from google.colab import userdata

# Set your OpenAI API key
openai_api_key = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = openai_api_key

# Initialize the OpenAI Embedding model
embeddings_model = OpenAIEmbeddings(model="text-embedding-3-small") # Using the same model as before

# Define the directory to store ChromaDB data
persist_directory = "/content/chroma_db"

# Create a Chroma vector store from the text chunks and embeddings, and persist it to disk
vectorstore_chroma = Chroma.from_documents(
    md_header_splits,
    embeddings_model,
    persist_directory=persist_directory
)

print(type(vectorstore_chroma))
print(f"Embeddings successfully stored in ChromaDB and persisted to {persist_directory}.")

# Perform a similarity search to demonstrate the vector store
query = "What is RAG?"
docs_chroma = vectorstore_chroma.similarity_search(query)

print(f"\nNumber of relevant documents found in ChromaDB: {len(docs_chroma)}")
print("Relevant documents from ChromaDB:")
for i, doc in enumerate(docs_chroma):
    print(f"--- Document {i+1} ---")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}") # Display metadata which includes chapter/topic info

# To confirm data is on disk, you can list the directory contents
print(f"\nContents of the persistence directory ({persist_directory}):")
print(os.listdir(persist_directory))

<class 'langchain_community.vectorstores.chroma.Chroma'>
Embeddings successfully stored in ChromaDB and persisted to /content/chroma_db.

Number of relevant documents found in ChromaDB: 4
Relevant documents from ChromaDB:
--- Document 1 ---
Content: Retrieval Augmented Generation (RAG) is a technique that combines the power of LLMs with the ability to retrieve relevant information from external sources. This enhances the accuracy and relevance of generated responses.
Metadata: {'Chapter': 'Chapter 3: Retrieval Augmented Generation (RAG)', 'Topic': 'RAG Explained'}
--- Document 2 ---
Content: Retrieval Augmented Generation (RAG) is a technique that combines the power of LLMs with the ability to retrieve relevant information from external sources. This enhances the accuracy and relevance of generated responses.
Metadata: {'Chapter': 'Chapter 3: Retrieval Augmented Generation (RAG)', 'Topic': 'RAG Explained'}
--- Document 3 ---
Content: Retrieval Augmented Generation (RAG) is a technique 

In [41]:
# Prompt => Integrate openai LLM for the above embeddings store in chroma vector db. Only integrate LLM and do not build Retrieval, we will build it in the next step.

In [42]:
from langchain_openai import ChatOpenAI
import os
from google.colab import userdata

# Set your OpenAI API key
openai_api_key = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = openai_api_key

# Initialize the OpenAI LLM
# Using a different model that might be better at leveraging retrieved context
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

print("OpenAI LLM integrated.")

OpenAI LLM integrated.


In [43]:
from langchain.chains import RetrievalQA

# Create a retriever from the Chroma vector store
# The vectorstore_chroma object was created in a previous step (cell LuGxK11QQQJK)
retriever = vectorstore_chroma.as_retriever()

# Create a RetrievalQA chain
# The llm object was created in a previous step (cell 0iG_7SN4Sbz2)
qa_chain = RetrievalQA.from_chain_type(
    llm,
    chain_type="stuff", # Other chain types include "map_reduce", "refine", "map_rerank"
    retriever=retriever,
    return_source_documents=True # Optional: return the retrieved documents
)

print("RetrievalQA chain created and connected to the LLM and ChromaDB vector store.")

RetrievalQA chain created and connected to the LLM and ChromaDB vector store.


In [45]:
# Step 7 -> Build a CLI chatbot

In [46]:
# Prompt 1 => Now lets build a CLI chatbot on the above embeddings.

In [52]:
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA

# Define a custom prompt template
custom_prompt_template = """Use the following pieces of context to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
--------------------
{context}
Question: {question}
"""

CUSTOM_PROMPT = PromptTemplate(
    template=custom_prompt_template, input_variables=["context", "question"]
)

# Create a RetrievalQA chain with the custom prompt
# The llm object and vectorstore_chroma object were created in previous steps
qa_chain = RetrievalQA.from_chain_type(
    llm,
    chain_type="stuff", # Other chain types include "map_reduce", "refine", "map_rerank"
    retriever=vectorstore_chroma.as_retriever(),
    return_source_documents=True, # Optional: return the retrieved documents
    chain_type_kwargs={"prompt": CUSTOM_PROMPT} # Pass the custom prompt here
)


# Simple CLI interaction for the chatbot
print("Chatbot ready! Type 'exit' to quit.")

while True:
    query = input("You: ")
    if query.lower() == 'exit':
        break

    # Use the created RetrievalQA chain to get the answer
    response = qa_chain({"query": query})
    answer = response["result"]
    source_documents = response["source_documents"]

    # Print the response
    print(f"Bot: {answer}")
    #print("\nSource Documents:")

    """for i, doc in enumerate(source_documents):
        #print(f"--- Document {i+1} ---")
        #print(f"Content: {doc.page_content}")
        #print(f"Metadata: {doc.metadata}") # Display metadata which includes chapter/topic info
    """
    print("-" * 20) # Separator for clarity

Chatbot ready! Type 'exit' to quit.
You: what is Gen-AI
Bot: Generative AI, or Gen-AI, is a type of artificial intelligence that can create new content, such as text, images, audio, and video. Unlike traditional AI that focuses on analysis and prediction, Gen-AI is about creation and innovation.
--------------------
You: WHAT IS cRICKET
Bot: I don't know.
--------------------
You: exit


In [53]:
# Step 8 - Build a UI and deploy the chatbot.

In [54]:
%pip install --upgrade --quiet  gradio

import gradio as gr

def chatbot_response(message, history):
    # Use the created RetrievalQA chain to get the answer
    response = qa_chain({"query": message})
    answer = response["result"]
    source_documents = response["source_documents"]

    # Format the response to include the answer and source documents (optional)
    formatted_response = f"{answer}" # You can add source documents here if desired

    return formatted_response

# Create the Gradio interface
iface = gr.ChatInterface(
    fn=chatbot_response,
    title="RAG Chatbot",
    description="Ask questions about Gen-AI and Langchain based on the provided text."
)

# Launch the interface
iface.launch(share=True)

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.2/60.2 MB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[?25h

  self.chatbot = Chatbot(


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7007a35ba947d4096e.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [57]:
# Insurance documents - RAG Based

In [58]:
# RAG
    # 1 Private Data as External Knowledge
    # 2 Chat with LLM - on Private Data
    # 3 Reduce Hallucinations


In [59]:
# Trading application - It can read charts, can implement diff strategies and make decisions to place and order or not - Buy/Sell.

In [60]:
# Hands-on Assignemnts -
# QnA Chatbot - > YouTube Video Transcripts.

#Build a RAG Based Gen-AI Application (chat-bot) - Q&A over YouTube Videos.

#Steps:

# 1 Select Topics (e.g. Healthcare, Data Engineering, Gen-AI, or any Podcast videos.

# 2 Get the transcript of the YouTube Video / Videos (Hint - Use YouTube API and Video-IDs to get the transcript)

# 3 Use Text-Splitter to divide huge transcripts into chunks.

# 4 Generate Embeddings for the chunks and store in Vector-DB (eg - Chroma / PineCone)

# 5 Integrate LLM

# 6 Send prompts and generate responses from the external knowledge (transcripts)

# 7 Try to make it dynamic - to chat in a loop untill 'exit' is typed, the bot shall run.

# 8 Build UI using 'Gradio'

# 9 Optional - Deploy the application on Azure Cloud.

In [61]:
# End of the Notebook