# RAG Retrieval Testing Notebook
Test the complete RAG pipeline with different question types

## Setup

In [1]:
import sys
sys.path.append('..')

from src.main import ResearchAssistant
from src.utils.formatters import ResponseFormatter

## Initialize Research Assistant

In [2]:
# Create assistant instance
assistant = ResearchAssistant()
print("‚úì Research Assistant initialized")

‚úì Research Assistant initialized


## Load Documents

In [3]:
# Load sample PDF
pdf_path = "../data/samples/sample.pdf"
assistant.load_documents([pdf_path])

üì• Processing documents...
üì• Loading 1 PDFs...
‚úÖ Loaded 9 pages
‚úÇÔ∏è  Splitting documents into chunks...
‚úÖ Created 18 chunks
üî¢ Generating embeddings and storing in vector database...


Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


‚úÖ Processing complete! Vector store ready for search.
‚úÖ Documents loaded and indexed


## Setup QA Chain

In [4]:
# Initialize QA system (retrieve top 4 chunks)
assistant.setup_qa(k=4)

‚úÖ QA system ready (retrieving top 4 chunks)


## Test 1: Factual Question

In [5]:
question = "What is Artificial Intelligence?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What is Artificial Intelligence?



  warn_deprecated(
Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given



ANSWER:
Artificial Intelligence (AI) is a computing concept that helps a machine think and solve complex problems as humans do with our intelligence (A Brief Introduction to Artificial Intelligence, page 1). It is a process that enables machines to achieve a humanlike mental behavior, involving self-correction and improvement from mistakes (A Brief Introduction to Artificial Intelligence, page 1). In essence, AI is a vast and growing field that includes subfields like machine learning and deep learning (A Brief Introduction to Artificial Intelligence, page 2).

SOURCES (2):

[1] sample.pdf (Page 0)
    Preview: A Brief Introduction to Artificial Intelligence
What is AI and how is it going to shape the future 
By Dibbyo Saha, Undergraduate Student, Computer Science,
Ryerson University
What is Artificial Intel...

[2] sample.pdf (Page 2)
    Preview: Intelligence
as
a 
process
that
is
going
to
help
machines
achieve
a
humanlike
mental
behaviour.
AI
is 
a
vast
and
growing
field
which
also

## Test 2: Specific Detail Question

In [6]:
question = "What are the subfields of AI mentioned in the document?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What are the subfields of AI mentioned in the document?


ANSWER:
The document mentions two subfields of AI: 

1. Artificial Intelligence (AI) - a computing concept that helps a machine think and solve complex problems (A Brief Introduction to Artificial Intelligence, no page number provided).
2. Deep Learning - a concept of computers simulating the process a human brain takes to analyze, think, and learn (Source: http://datasciencecentral.com, no page number provided).

Note: The document does not provide page numbers, but the information can be found in the provided context.

SOURCES (2):

[1] sample.pdf (Page 0)
    Preview: A Brief Introduction to Artificial Intelligence
What is AI and how is it going to shape the future 
By Dibbyo Saha, Undergraduate Student, Computer Science,
Ryerson University
What is Artificial Intel...

[2] sample.pdf (Page 4)
    Preview: Sour ce:h ttp://da tasciencecen tral.com
Deep
Learning,
on
the
other
hand
is
the
concept
of
computers
simul

## Test 3: Comparison Question

In [7]:
question = "What is the difference between Machine Learning and Deep Learning?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What is the difference between Machine Learning and Deep Learning?


ANSWER:
I don't have enough information to provide a specific difference between Machine Learning and Deep Learning as the context only mentions that AI is a vast and growing field which includes subfields like machine learning and deep learning, but does not elaborate on the differences between them. (Source document, no specific page number available)

SOURCES (1):

[1] sample.pdf (Page 2)
    Preview: Intelligence
as
a 
process
that
is
going
to
help
machines
achieve
a
humanlike
mental
behaviour.
AI
is 
a
vast
and
growing
field
which
also
includes
a
lot
more
subfields
like
machine 
learning
and
deep...



## Test 4: Application Question

In [8]:
question = "What are some current applications of AI?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What are some current applications of AI?


ANSWER:
Some current applications of AI include smart personal assistants like Apple's Siri and Amazon's Alexa, as well as the use of algorithms in Netflix to provide accurate and relevant suggestions of movies and TV series (Source: http://datasciencecentral.com, no page number provided). These devices and services use user interactions and data to learn and improve their performance. [No specific page number is available as the context is a continuous text from a website].

SOURCES (2):

[1] sample.pdf (Page 4)
    Preview: Sour ce:h ttp://da tasciencecen tral.com
Deep
Learning,
on
the
other
hand
is
the
concept
of
computers
simulating
the 
process
a
human
brain
takes
to
analyze,
think
and
learn.
The
deep
learning
process...

[2] sample.pdf (Page 2)
    Preview: complicated
and
intuitive
sense
of
thinking
and
problem-solving
abilities
of
the 
human mind.
A Brief History of AI
The
concept
of
Artificial
Intelligence
is
not
as
mo

## Test 5: Future/Impact Question

In [9]:
question = "How will AI impact jobs in the future?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: How will AI impact jobs in the future?


ANSWER:
According to the context, AI may replace some jobs due to automation, with a predicted loss of 85 million jobs by 2025, as stated in a Forbes article [6]. However, it is also mentioned that AI might create more jobs than it replaces, changing the way humans work by creating new types of jobs (Source: https://hackernoon.com/artificial-intelligence-and-big-data-zys3258). Unfortunately, the page number is not provided in the context.

SOURCES (2):

[1] sample.pdf (Page 6)
    Preview: great
tool
in
the
future
of
education.
AI
can
be
used
to
analyze
data
from
an 
individual‚Äôs
personal
and
intellectual
needs,
capabilities,
choices
and
limitations
to 
develop
customized
curriculum,
st...

[2] sample.pdf (Page 5)
    Preview: Intelligence
is
also
viewed
as
a
great
tool
for
better
cybersecurity.
Many
banks
are 
using
AI
as
a
means
to
identify
unauthorized
credit
cards
uses.
From
analyzing 
complex
genetic
data
to
perform
th...



## Test 6: Concerns/Challenges Question

In [10]:
question = "What concerns exist about AI and automation?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What concerns exist about AI and automation?


ANSWER:
According to the context, concerns about AI and automation include the fear of losing jobs due to automation, with a predicted loss of 85 million jobs by 2025 [1, no page number available]. Additionally, there is a concern that as machines become smarter, they may become opinionated and biased like their human trainers [1, no page number available]. Another concern is that AI might replace jobs, although it may also create new ones [Source: https://hackernoon.com/artificial-intelligence-and-big-data-zys3258, no page number available].

Note: Since the context is provided as a plain text without page numbers, I couldn't include specific page numbers in the citations. The source document is mentioned where available.

SOURCES (2):

[1] sample.pdf (Page 6)
    Preview: great
tool
in
the
future
of
education.
AI
can
be
used
to
analyze
data
from
an 
individual‚Äôs
personal
and
intellectual
needs,
capabilities,
choices
and


## Test 7: Historical Question

In [11]:
question = "What is the history of Artificial Intelligence?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: What is the history of Artificial Intelligence?


ANSWER:
The concept of Artificial Intelligence (AI) dates back to 1950 when Alan Turing invented the Turing test (A Brief History of AI, no page number). The first chatbot computer program, ELIZA, was created in the 1960s (A Brief History of AI, no page number). In 1977, IBM's deep blue, a chess computer, beat a world chess champion in two out of six games (A Brief History of AI, no page number). Later, in 2011, Siri was announced as a digital assistant by Apple (A Brief History of AI, no page number), and in 2015, Elon Musk and others founded OpenAI (A Brief History of AI, no page number). 

Note: The provided context does not have page numbers, so I couldn't include them in the citations. The information is from the document "A Brief History of AI".

SOURCES (2):

[1] sample.pdf (Page 2)
    Preview: complicated
and
intuitive
sense
of
thinking
and
problem-solving
abilities
of
the 
human mind.
A Brief History of AI
The
c

## Test 8: Question Outside Document Scope

In [12]:
question = "What is quantum computing?"
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))
print("\n‚ö†Ô∏è Expected: Should say 'I don't have enough information' since quantum computing is not in the document")


‚ùì Question: What is quantum computing?


ANSWER:
I don't have enough information.

The context provided does not mention quantum computing. The documents discuss Artificial Intelligence, its history, and its subfields, but quantum computing is not mentioned. (Source: All documents, no specific page number as the information is not present)

SOURCES (2):

[1] sample.pdf (Page 2)
    Preview: complicated
and
intuitive
sense
of
thinking
and
problem-solving
abilities
of
the 
human mind.
A Brief History of AI
The
concept
of
Artificial
Intelligence
is
not
as
modern
as
we
think
it
is.
This
trac...

[2] sample.pdf (Page 0)
    Preview: A Brief Introduction to Artificial Intelligence
What is AI and how is it going to shape the future 
By Dibbyo Saha, Undergraduate Student, Computer Science,
Ryerson University
What is Artificial Intel...


‚ö†Ô∏è Expected: Should say 'I don't have enough information' since quantum computing is not in the document


## Test 9: Custom Question (Your Turn!)

In [13]:
# Try your own question!
question = "what is Reinforcement learning "
print(f"\n‚ùì Question: {question}\n")

result = assistant.ask_question(question)
print(ResponseFormatter.format_for_display(result))


‚ùì Question: what is Reinforcement learning 


ANSWER:
Reinforcement learning is a feedback-dependent machine learning model, where the machine is given data and made to predict what the data was. If the machine generates an inaccurate conclusion, it is given feedback about its incorrectness, allowing it to learn and improve (Source document, no page number provided, as the context is a continuous text). 

Specifically, it is stated that "Reinforcement learning is a feedback dependent machine learning model. In this process the machine is given a data and made to predict what the data was." (Context, no page number). 

This process enables the machine to learn from its mistakes and eventually make accurate predictions, as illustrated by the example of a machine learning to identify a basketball image (Context, no page number).

SOURCES (2):

[1] sample.pdf (Page 3)
    Preview: which
is
not
apparently
comprehendible
by
the
human
eyes.
The 
machine
looks
for
patterns
and
draws
conclus

## Inspect Raw Response Structure

In [14]:
# See the raw response structure
question = "What is AI?"
result = assistant.ask_question(question)

print("Response Structure:")
print(f"  - answer: {type(result['answer'])}")
print(f"  - citations: {type(result['citations'])} with {len(result['citations'])} items")
print(f"  - num_sources: {result['num_sources']}")
print("\nFirst citation:")
print(result['citations'][0])

Response Structure:
  - answer: <class 'str'>
  - citations: <class 'list'> with 2 items
  - num_sources: 2

First citation:
{'filename': 'sample.pdf', 'page': 0, 'content_preview': 'A Brief Introduction to Artificial Intelligence\nWhat is AI and how is it going to shape the future \nBy Dibbyo Saha, Undergraduate Student, Computer Science,\nRyerson University\nWhat is Artificial Intel...', 'chunk_id': 0}
