<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo"  />
    </a>
</p>


# **Final Project: Build an AI RAG Assistant Using LangChain**

** Scenario **

Imagine you work as a consultant for Quest Analytics, a small but fast-growing research organization.

In today’s fast-paced research environment, the sheer volume of scientific papers can be overwhelming, making it nearly impossible to stay up-to-date with the latest developments. 

The researchers at Quest Analytics have been struggling to find the time to examine countless documents, let alone extract the most relevant and insightful information. 

You have been hired to build an AI RAG assistant that can read, understand, and summarize vast amounts of data, all in real time. Follow the below tasks to construct the AI-powered RAG assistant to optimize the research endeavors at Quest Analytics.

** Project Tasks and Deliverables **

The LLM used for the following tasks can be ‘mistralai/mixtral-8x7b-instruct-v01’ sourced from watsonx.ai API.

Task 1: Load document using LangChain for different sources (10 points)

(This task corresponds with Exercise 1 in the lab “Load Documents Using LangChain for Different Sources” from Module 1)

Capture a screenshot (saved as pdf_loader) that displays both the code used and the first 1000 characters of the content after loading the paper 
link
. 

Task 2: Apply text splitting techniques (10 points)

(This task corresponds with Exercise 2 in the lab, “Apply text splitting techniques to enhance model responsiveness.”)

Submit a screenshot (saved as ‘code_splitter.png’) that displays the code used to split the following LATEX code and its corresponding results.

latex_text = """

    \documentclass{article}

    \begin{document}

    \maketitle

    \section{Introduction}

    Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. In recent years, LLMs have made significant advances in various natural language processing tasks, including language translation, text generation, and sentiment analysis.

    \subsection{History of LLMs}

The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time. In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.

\subsection{Applications of LLMs}

LLMs have many applications in the industry, including chatbots, content creation, and virtual assistants. They can also be used in academia for research in linguistics, psychology, and computational linguistics.

\end{document}

"""

Task 3: Embed documents (10 points)

(This task corresponds with Exercise 1 in the lab “Embed documents using watsonx’s embedding model.”)

Submit a screenshot (saved as ‘embedding.png’) that displays the code used to embed the following sentence and its corresponding results, which display the first five embedding numbers.
query = "How are you?"

Task 4: Create and configure vector databases to store embeddings (10 points)

(This task corresponds with Exercise 1 in the lab “Create and Configure a Vector Database to Store Document Embeddings”)

Submit a screenshot (saved as ‘vectordb.png’) that displays the code used to create a Chroma vector database that stores the embeddings of the document ‘
new-Policies.txt
’ and then conduct a similarity search for the following query with the top 5 results used.

query = "Smoking policy"

Task 5: Develop a retriever to fetch document segments based on queries (10 points)

(This task corresponds with Exercise 1 in the lab “Develop a Retriever to Fetch Document Segments based on Queries.”)

Submit a screenshot (saved as ‘retriever.png’) that displays the code used to use ChromaDB as a retriever and conduct a similarity search with the top 2 return results. 

The document you can use is ‘
new-Policies.txt
’. 

The query you can use is:

query = "Email policy"

Task 6: Construct a QA Bot that leverages the LangChain and LLM to answer questions (10 points)

(This task corresponds with the lab “Construct a QA Bot That Leverages the LangChain and LLM to Answer Questions from Loaded Documents.”)


Submit a screenshot (saved as ‘QA_bot.png’) that displays the QA bot interface you created based on the lab “Construct a QA Bot That Leverages the LangChain and LLM to Answer Questions from Loaded Documents.” Also, the picture should display that you uploaded a PDF and are asking a query to the bot. 

The PDF you can use is available 
here
.

The query you can use is:

query = "What this paper is talking about?

In [1]:
!python --version
!pip list | grep langchain
!pip list | grep ibm-watsonx-ai

Python 3.12.8


In [2]:
!pip install -U langchain ibm-watsonx-ai chromadb sentence-transformers

Collecting langchain
  Downloading langchain-0.3.21-py3-none-any.whl.metadata (7.8 kB)
Collecting ibm-watsonx-ai
  Downloading ibm_watsonx_ai-1.3.1-py3-none-any.whl.metadata (6.5 kB)
Collecting chromadb
  Downloading chromadb-0.6.3-py3-none-any.whl.metadata (6.8 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.4.1-py3-none-any.whl.metadata (10 kB)
Collecting langchain-core<1.0.0,>=0.3.45 (from langchain)
  Downloading langchain_core-0.3.47-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.7 (from langchain)
  Downloading langchain_text_splitters-0.3.7-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain)
  Downloading langsmith-0.3.18-py3-none-any.whl.metadata (15 kB)
Collecting pandas<2.3.0,>=0.24.2 (from ibm-watsonx-ai)
  Downloading pandas-2.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)
Collecting lomond (from ibm-watsonx-ai)
  Downloading lomond-0.3.3-py2.py3

In [3]:
!pip install -U langchain langchain-community pypdf

Collecting langchain-community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-jso

In [4]:
from langchain_community.document_loaders import PyPDFLoader

In [5]:
pdf_path = "A Comprehensive Review of Low-Rank.pdf"  # Change this if needed
loader = PyPDFLoader(pdf_path)

In [6]:
# Load pages from the document
pages = loader.load()

In [7]:
# Extract the first 1000 characters
extracted_text = pages[0].page_content[:1000]

In [8]:
# Print the extracted text
print("Extracted Text (First 1000 characters):\n")
print(extracted_text)

Extracted Text (First 1000 characters):

A Comprehensive Review of Low-Rank
Adaptation in Large Language Models for
Efficient Parameter Tuning
September 10, 2024
Abstract
Natural Language Processing (NLP) often involves pre-training large
models on extensive datasets and then adapting them for specific tasks
through fine-tuning. However, as these models grow larger, like GPT-3
with 175 billion parameters, fully fine-tuning them becomes computa-
tionally expensive. We propose a novel method called LoRA (Low-Rank
Adaptation) that significantly reduces the overhead by freezing the orig-
inal model weights and only training small rank decomposition matrices.
This leads to up to 10,000 times fewer trainable parameters and reduces
GPU memory usage by three times. LoRA not only maintains but some-
times surpasses fine-tuning performance on models like RoBERTa, De-
BERTa, GPT-2, and GPT-3. Unlike other methods, LoRA introduces
no extra latency during inference, making it more efficient for pra

In [9]:
!pip install -U langchain



In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [11]:
latex_text = """
    \documentclass{article}
    \begin{document}
    \maketitle
    \section{Introduction}
    Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like language. 
    In recent years, LLMs have made significant advances in various natural language processing tasks, including language translation, text generation, and sentiment analysis.
    \subsection{History of LLMs}
    The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the computational power available at the time.
    In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant improvements in performance.
    \subsection{Applications of LLMs}
    LLMs have many applications in the industry, including chatbots, content creation, and virtual assistants. 
    They can also be used in academia for research in linguistics, psychology, and computational linguistics.
    \end{document}
"""

In [12]:
# Initialize the text splitter
splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=20)

# Split the text into chunks
chunks = splitter.split_text(latex_text)

# Print the split chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:\n{chunk}\n")

Chunk 1:
\documentclass{article}
   egin{document}
    \maketitle
    \section{Introduction}

Chunk 2:
Large language models (LLMs) are a type of machine learning model that can be trained on vast amounts of text data to generate human-like

Chunk 3:
generate human-like language.

Chunk 4:
In recent years, LLMs have made significant advances in various natural language processing tasks, including language translation, text

Chunk 5:
translation, text generation, and sentiment analysis.

Chunk 6:
\subsection{History of LLMs}

Chunk 7:
The earliest LLMs were developed in the 1980s and 1990s, but they were limited by the amount of data that could be processed and the

Chunk 8:
processed and the computational power available at the time.

Chunk 9:
In the past decade, however, advances in hardware and software have made it possible to train LLMs on massive datasets, leading to significant

Chunk 10:
to significant improvements in performance.

Chunk 11:
\subsection{Applications of LLMs}
   

In [14]:
!pip install -U langchain_ibm ibm-watsonx-ai

Collecting langchain_ibm
  Downloading langchain_ibm-0.3.8-py3-none-any.whl.metadata (5.2 kB)
Downloading langchain_ibm-0.3.8-py3-none-any.whl (27 kB)
Installing collected packages: langchain_ibm
Successfully installed langchain_ibm-0.3.8


In [15]:
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from langchain_ibm import WatsonxEmbeddings

In [16]:
embed_params = {
    EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
    EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
}

watsonx_embedding = WatsonxEmbeddings(
    model_id="ibm/slate-125m-english-rtrvr",
    url="https://us-south.ml.cloud.ibm.com",
    project_id="skills-network",
    params=embed_params,
)

In [17]:
query = "How are you?"

query_result = watsonx_embedding.embed_query(query)

In [18]:
len(query_result)

768

In [19]:
query_result[:5]

[-0.06722454, -0.023729993, 0.017487843, -0.013195328, -0.039584607]

In [20]:
# Install required packages
!pip install langchain chromadb sentence-transformers



In [21]:
# Import necessary libraries
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings

In [22]:
# Load document (new-Policies.txt)
with open("new-Policies.txt", "r", encoding="utf-8") as file:
    document_text = file.read()

In [23]:
# Split document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
text_chunks = text_splitter.split_text(document_text)

In [24]:
# Initialize embedding model
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

  embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [25]:
# Create a Chroma vector database and store document embeddings
vector_db = Chroma.from_texts(text_chunks, embedding_model)

In [26]:
# Conduct a similarity search for "Smoking policy"
query = "Smoking policy"
results = vector_db.similarity_search(query, k=5)  # Top 5 matches

In [27]:
# Display results
print("Top 5 similar results for query:", query)
for idx, result in enumerate(results, 1):
    print(f"{idx}. {result.page_content[:200]}...")  # Displaying first 200 characters

Top 5 similar results for query: Smoking policy
1. This policy encourages the responsible use of mobile devices in line with legal and ethical standards. Employees are expected to understand and follow these guidelines. The policy is regularly reviewe...
2. 4. Mobile Phone Policy

Our Mobile Phone Policy defines standards for responsible use of mobile devices within the organization to ensure alignment with company values and legal requirements.

Accepta...
3. Consequences: Violations of this policy may lead to disciplinary action, including potential termination.

This policy promotes the safe and responsible use of digital communication tools in line with...
4. 3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need...
5. Safety: We prioritize the safety of our employees, clients, and the community. We encourage a culture of safety, includin

In [28]:
# Install required libraries
!pip install langchain chromadb sentence-transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)




In [29]:
# Import necessary libraries
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI  # Use any compatible LLM

In [30]:
# Load document (new-Policies.txt)
with open("new-Policies.txt", "r", encoding="utf-8") as file:
    document_text = file.read()

In [31]:
# Split document into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
text_chunks = text_splitter.split_text(document_text)

In [32]:
# Initialize embedding model
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

In [33]:
# Create a Chroma vector database and store document embeddings
vector_db = Chroma.from_texts(text_chunks, embedding_model)

In [34]:
# Create a retriever to fetch document segments
retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": 2})

In [35]:
# Query the retriever
query = "Email policy"
retrieved_docs = retriever.get_relevant_documents(query)

  retrieved_docs = retriever.get_relevant_documents(query)


In [36]:
# Display the retrieved results
print(f"Top 2 document segments for query: {query}")
for idx, doc in enumerate(retrieved_docs, 1):
    print(f"{idx}. {doc.page_content[:200]}...")  # Display first 200 characters of each segment

Top 2 document segments for query: Email policy
1. 3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need...
2. 3. Internet and Email Policy

Our Internet and Email Policy ensures the responsible and secure use of these tools within our organization, recognizing their importance in daily operations and the need...


In [37]:
# Install required libraries
!pip install langchain chromadb sentence-transformers pypdf gradio

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting gradio
  Downloading gradio-5.22.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.1-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)
  Downloading safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-version~=2.0 

In [38]:
# Import necessary libraries
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI  # Use any compatible LLM
import gradio as gr

In [49]:
# Function to process PDF and create retriever
def load_pdf_and_create_retriever(pdf_path):
    loader = PyPDFLoader(pdf_path)
    pages = loader.load()

    # Extract text from PDF
    text = "\n".join([page.page_content for page in pages])

    # Split text into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    text_chunks = text_splitter.split_text(text)

    # Initialize embedding model
    embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

    # Store embeddings in ChromaDB
    vector_db = Chroma.from_texts(text_chunks, embedding_model)

    # Create retriever
    retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": 3})
    return retriever

In [52]:
# Load PDF document and create retriever
pdf_path = "A Comprehensive Review of Low-Rank.pdf"  # Update with actual PDF path
retriever = load_pdf_and_create_retriever(pdf_path)

In [53]:
# Define QA bot function
def qa_bot(query):
    docs = retriever.get_relevant_documents(query)
    response = "\n".join([doc.page_content[:300] for doc in docs])  # Return top relevant segments
    return response if response else "No relevant information found."

In [54]:
# Set up Gradio UI
iface = gr.Interface(
    fn=qa_bot,
    inputs="text",
    outputs="text",
    title="RAG-Powered AI Assistant",
    description="Upload a research paper and ask questions!",
)

In [57]:
# Launch the chatbot
# iface.launch()
iface.launch(share=True)

Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----
* Running on public URL: https://a6bbc7c61b09af4389.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


