---
layout: post
title:  LlamaIndex RAG for Science Community
date:   2025-09-12
maths: true
categories: [AI, LlamaIndex, OpenAI, RAG]
typora-root-url: /Users/ojitha/GitHub/ojitha.github.io
typora-copy-images-to: ../assets/images/${filename}
---

<style>
/* Styles for the two-column layout */
.image-text-container {
    display: flex; /* Enables flexbox */
    flex-wrap: wrap; /* Allows columns to stack on small screens */
    gap: 10px; /* Space between the image and text */
    align-items: left; /* Vertically centers content in columns */
    margin-bottom: 20px; /* Space below this section */
}

.image-column {
    flex: 1; /* Allows this column to grow */
    min-width: 150px; /* Minimum width for the image column before stacking */
    max-width: 20%; /* Maximum width for the image column to not take up too much space initially */
    box-sizing: border-box; /* Include padding/border in element's total width/height */
}

.text-column {
    flex: 2; /* Allows this column to grow more (e.g., twice as much as image-column) */
    min-width: 300px; /* Minimum width for the text column before stacking */
    box-sizing: border-box;
}

</style>

<div class="image-text-container">
    <div class="image-column">
        <img src="/assets/images/2025-09-12-LlamaIndexScienceCommunityChat/llamaindex_ink_drawing.svg" alt="LangGraph Testing Sketch" width="150" height="150">
    </div>
    <div class="text-column">
<p>
 LlamaIndex is a comprehensive data framework designed to connect large language models with private data sources through Retrieval-Augmented Generation (RAG) architecture. The framework operates through two main stages: indexing, where vector embeddings are created from documents, and querying, where relevant information is retrieved and synthesized. LlamaIndex supports multiple chat engines including condense question, context, and condense-plus-context modes for enhanced conversational AI applications. Quality evaluation is crucial for RAG performance, utilizing metrics like Mean Reciprocal Rank (MRR) and Hit Rate to assess retrieval accuracy. The framework includes faithfulness and relevance evaluators to measure response quality, making it essential for building reliable AI applications that require domain-specific knowledge integration with seamless LLM integration capabilities.
</p>
    </div>
</div>

<!--more-->

📝 source: [Jupyter Notebook](https://github.com/ojitha/rag_llamaindex/blob/main/Science_Community/2025-09-12-LlamaIndexScienceCommunityChat.ipynb){:target="_blank"}

📂 App source: 

------

* TOC
{:toc}
------

## Introduction
LlamaIndex is a data framework designed to help developers build applications that connect large language models (LLMs) with their own private or custom data sources. It serves as a bridge between LLMs and external data, enabling you to create AI applications that can reason over your specific information. For example, I'v created the AI Chatbot which can retrieve the information from my data and inference to answer the user response:

![AI Chatbot for Science Community](/assets/images/2025-09-12-LlamaIndexScienceCommunityChat/AI_Chat_Bot.jpg)

1. User prompt to ask Questions?
2. Previous question
3. Answer of the previous question
4. Files in the RAG store
5. Specific setting such as Context/System prompt for Context
6. Select the Chat Engine
7. Common setting for all the prompts




### What is LlamaIndex?
The main challenge LlamaIndex addresses is that while LLMs like GPT-4 or Claude are trained on vast amounts of general knowledge, they don't have access to your specific data - whether that's company documents, personal files, databases, or real-time information. LlamaIndex provides the infrastructure to make your data "queryable" by LLMs.
Introducint Llama-index components with OpenAI

Key Features
- Data Ingestion: LlamaIndex can connect to various data sources including:
    - Documents (PDFs, Word files, text files)
    - Databases (SQL, NoSQL)
    - APIs and web services
    - Vector databases
    - Knowledge graphs
- Indexing and Retrieval: It creates searchable indexes of your data using techniques like:
    - Vector embeddings for semantic search
    - Keyword-based retrieval
    - Hybrid search approaches
    - Hierarchical indexing for large datasets
- Query Processing: When you ask a question, LlamaIndex:
    - Retrieves the most relevant pieces of information from your indexed data
    - Provides that context to the LLM
    - Returns an answer based on your specific data

### Core Architecture Components
The LlamaIndex ecosystem is built around three fundamental stages and components. As outlined in _Generative AI for Cloud Solutions_[^1], LlamaIndex operates through two main stages:

1. **Indexing stage**: In this stage, LlamaIndex creates a vector index of your private data. This makes it possible to search through your own organization's domain-specific knowledge base. You can input text documents, database records, knowledge graphs, and other data types. 

2. **Querying stage**: In this stage, the RAG pipeline finds the most relevant information based on the user's query. This information is then passed to the LLM, along with the query, to generate a more accurate response."

The framework then implements these stages through three main components:

1. Data Loading: LlamaIndex provides data connectors, they allow you to pull data from wherever it is stored, such as APIs, PDFs, databases, or external apps.
2. Data Indexes: This includes creating vector embeddings and adding metadata to make information retrievable.
3. Stroing: Once the embeddings have been created, they are stored to be queried. Llama­Index has multiple storage solutions.
4. Engines: This engine takes a natural language query, retrieves relevant information from the index, and passes it to an LLM. You can feed a query via prompt and get the retrieved context with the LLM’s answer using different strategies: Condensce, Context and Condense+context. This includes query engines that combine retrieval and synthesis capabilities[^2].
5. Evaluating: RAG solution is an iterative process that can work only if you evaluate your LLM answers for query prompts. LlamaIndex provides tooling to measure how accurate and so on.

At the foundational level, LlamaIndex works with Documents and Nodes as its basic building blocks.Documents represent raw data sources, while Nodes are coherent, indexable chunks of information parsed from Documents[^4].

### Integration
LlamaIndex integrates with popular LLM providers (OpenAI, Anthropic, local models) and vector databases (Pinecone, Chroma, Weaviate), making it quite flexible for different technical stacks. The framework essentially democratizes the ability to create "ChatGPT for your data" applications without needing to build all the retrieval and indexing infrastructure from scratch.

> You have to setup the environmen (eg: ChatGPT API Key) and the other housekeeping functions:

In [1]:
%load_ext dotenv
%dotenv ../../../.env

from IPython.display import Markdown, display
def in_md(md_txt):
    md_formated_txt = f"--- Response -------<br/>{md_txt}<br/>-------------------"
    display(Markdown(md_formated_txt))

In RAG systems, data ingestion is the first critical step where LlamaIndex connects to various data sources through its data connectors. The "data" directory in this example contains raw documents that will be processed through LlamaIndex's `SimpleDirectoryReader`, which serves as a <span>universal data loader capable of handling multiple file formats</span>{:gtxt} including 

- PDFs, 
- text files, 
- CSVs, 

and more. This ingestion phase represents the beginning of the RAG indexing pipeline where unstructured data is transformed into a queryable format for retrieval-augmented generation.

In [2]:
from llama_index.core import SimpleDirectoryReader
documents =SimpleDirectoryReader(
    input_dir="./data").load_data()

LlamaIndex uses a centralized Settings system that manages global configurations for both Large Language Models (LLMs) and embedding models across the entire RAG pipeline. The `Settings.llm` defines which language model will be used for text generation and query processing, while `Settings.embed_model` specifies the embedding model for creating vector representations. This unified configuration approach ensures consistency throughout the indexing and querying phases, allowing seamless integration between different components of the RAG system while supporting various LLM providers like OpenAI, Anthropic, or local models.

In [3]:
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-4o", temperature=0.3)
embed_model = OpenAIEmbedding()

Settings.llm = llm
Settings.embed_model = embed_model

Node creation is fundamental to RAG systems, where *documents are parsed and split into manageable, coherent chunks called nodes*. LlamaIndex's `TokenTextSplitter` employs a token-based chunking strategy with configurable `chunk_size` and `chunk_overlap` parameters. The `chunk_size` (1024 tokens) ensures each node contains sufficient context while staying within LLM token limits, while `chunk_overlap` (20 tokens) maintains semantic continuity between adjacent chunks. 

> This chunking strategy is critical for balancing **retrieval precision** and **context preservation**, as smaller chunks provide more precise matches while larger chunks retain more comprehensive context for better answer generation.
{:.green}

In [4]:
from llama_index.core.node_parser import TokenTextSplitter


splittter = TokenTextSplitter(
    chunk_size= 1024,
    chunk_overlap=20
)
nodes = splittter.get_nodes_from_documents(documents)

In [5]:
import pprint

pprint.pprint(nodes[0], indent=2)

TextNode(id_='cd821c26-64cf-4755-98f2-9e831ab0d306', embedding=None, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2a37d51d-ac93-4091-aad5-06b963941e44', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, hash='ebbeab8f1f

Vector embeddings form the core of dense retrieval in RAG systems, where text nodes are **encoded** into high-dimensional numerical vectors that capture semantic meaning beyond simple keyword matching. LlamaIndex's `VectorStoreIndex` transforms each node into dense vector representations using the configured embedding model, enabling semantic similarity search.

> These **dense vectors** (typically 768-1536 dimensions) are populated with *non-zero values representing rich semantic relationships* and contextual understanding. 

During the <u>retrieval phase</u>, user queries are embedded into the same vector space, *allowing the system to find semantically similar content using **cosine similarity** or other **distance metrics**, even when exact keywords don't match*. 

The `similarity_top_k` parameter controls how many of the most semantically similar nodes are retrieved for context generation.

In [6]:
from llama_index.core import VectorStoreIndex


index = VectorStoreIndex(nodes)

2025-09-12 20:21:23,238 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


> The `similarity_top_k` is the  number of most similar chunks or nodes retrieved for a given query.
{:.info-box}

In [7]:
q_engine = index.as_query_engine(similarity_top_k = 5) 

In [8]:
response = q_engine.query("who are the scientists in these documents?")

2025-09-12 20:21:23,746 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:26,220 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [9]:
in_md(response.response)

--- Response -------<br/>The documents mention several scientists, including Albert Einstein, Marie Curie, Paul Langevin, Max Planck, Henri Poincaré, Ernest Rutherford, Hendrik Lorentz, and Niels Bohr.<br/>-------------------

In [10]:
len(response.source_nodes)

5

In [11]:
response.source_nodes[4]

NodeWithScore(node=TextNode(id_='cd821c26-64cf-4755-98f2-9e831ab0d306', embedding=None, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2a37d51d-ac93-4091-aad5-06b963941e44', node_type='4', metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, hash='ebbeab8f1fb4d4

### Summarization

LlamaIndex's `SummaryIndex` provides an alternative indexing approach optimized for generating comprehensive summaries rather than precise retrieval. Unlike `VectorStoreIndex` which uses semantic similarity for selective retrieval, SummaryIndex processes all nodes sequentially to create holistic responses. 

> This index type is particularly effective for document summarization tasks where you need to synthesize information from multiple sources rather than finding specific facts. 

The `SummaryIndex` leverages the LLM's context window to process large amounts of text systematically, making it ideal for generating executive summaries, research overviews, or comprehensive analyses of entire document collections.

In [12]:
from llama_index.core import SummaryIndex


summary_index = SummaryIndex(nodes)
s_engine = summary_index.as_query_engine()

In [13]:
summary = s_engine.query("Povide the summary")

2025-09-12 20:21:29,052 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [14]:
in_md(summary.response)

--- Response -------<br/>Albert Einstein and Marie Curie, two of the most influential scientists of the early 20th century, first met at the 1911 Solvay Conference in Brussels. Despite the personal challenges they faced—Curie with a scandal in the French press and Einstein as a Jewish outsider—they formed a mutual respect and friendship. Their initial meeting was marked by deep conversations about their groundbreaking work in physics, with Einstein admiring Curie's dedication to isolating radium and Curie intrigued by Einstein's theory of relativity. This meeting laid the foundation for a lasting intellectual partnership, characterized by shared scientific interests and personal understanding.<br/>-------------------

## Chat Engines
Using LlamaIndex, you can
- Q&A
- Summarise
- Chat Engines
    1. Condense Question
    2. Context Question 
    3. Condense+Context Question

I am using the same question test to compare the differences of thes 3 chat engines:

- who are the scientists in these documents?
- Have these scientist met in their lifetime?
- what are the topics they discussed?
- what are the other topics they discussed?
- Which physicist who had a romantic relationship with Marie Curie?
- what happen after that?
- what happen after that?

The dataset is in the `data` folder.


### Condense Question Chat Engine

>The Condense Question chat engine implements a <span>stateless conversational RAG pattern</span>{:gtxt} that transforms multi-turn conversations into standalone queries.
{:.info-box}

This architecture addresses the challenge of maintaining conversation context by condensing the entire chat history and current question into a single, comprehensive query before performing retrieval. 

The engine first analyzes the conversation thread to understand contextual references, then reformulates the current question to be self-contained, including all necessary context from previous exchanges. 

This approach *ensures that each retrieval operation has complete context without requiring complex memory management*, making it <span>ideal for scenarios where conversation history needs to inform the current response</span>{:gtxt} while maintaining simplicity in the RAG pipeline.

Chat query is inhanced by the chat engine as shwon in the following conversation. You can find this enhanced query as `Querying with: ...`.

In [15]:
cq_c_engine = index.as_chat_engine("condense_question", verbose=True)

In [16]:
cq_c_response = cq_c_engine.chat("who are the scientists in these documents?")

2025-09-12 20:21:29,089 - INFO - Querying with: who are the scientists in these documents?


Querying with: who are the scientists in these documents?


2025-09-12 20:21:29,925 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:32,280 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [17]:
in_md(cq_c_response.response)

--- Response -------<br/>The scientists mentioned in the documents include Albert Einstein, Marie Curie, Paul Langevin, Max Planck, Henri Poincaré, Ernest Rutherford, Hendrik Lorentz, and Niels Bohr. Additionally, the 1911 First Solvay Conference attendees listed are Walther Nernst, Marcel Brillouin, Ernest Solvay, Emil Warburg, Jean Baptiste Perrin, Wilhelm Wien, Robert Goldschmidt, Heinrich Rubens, Arnold Sommerfeld, Frederick Lindemann, Maurice de Broglie, Martin Knudsen, Friedrich Hasenöhrl, Georges Hostelet, Edouard Herzen, James Hopwood Jeans, and Heike Kamerlingh Onnes.<br/>-------------------

In [18]:
cq_c_response = cq_c_engine.chat("Have these scientist met in their lifetime?")

2025-09-12 20:21:33,682 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:21:33,684 - INFO - Querying with: Did the scientists mentioned, such as Albert Einstein, Marie Curie, and the attendees of the 1911 First Solvay Conference, meet each other during their lifetimes?


Querying with: Did the scientists mentioned, such as Albert Einstein, Marie Curie, and the attendees of the 1911 First Solvay Conference, meet each other during their lifetimes?


2025-09-12 20:21:33,989 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:35,729 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [19]:
in_md(cq_c_response.response)

--- Response -------<br/>Yes, the scientists mentioned, including Albert Einstein, Marie Curie, and the attendees of the 1911 First Solvay Conference, did meet each other during their lifetimes. They gathered at the Solvay Conferences, which were significant events for discussing advancements in physics and brought together many of the leading scientific minds of the time.<br/>-------------------

In the document `Components/data/AlbertEinstein.txt` has been mentioned this meeting.

In [20]:
cq_c_response = cq_c_engine.chat("what are the topics they discussed?")

2025-09-12 20:21:37,061 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:21:37,064 - INFO - Querying with: What topics were discussed by the scientists, including Albert Einstein, Marie Curie, and the attendees of the 1911 First Solvay Conference, when they met during their lifetimes?


Querying with: What topics were discussed by the scientists, including Albert Einstein, Marie Curie, and the attendees of the 1911 First Solvay Conference, when they met during their lifetimes?


2025-09-12 20:21:37,367 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:40,329 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [21]:
in_md(cq_c_response.response)

--- Response -------<br/>During the 1911 First Solvay Conference, the scientists, including Albert Einstein, Marie Curie, and other attendees, primarily discussed the revolutionary concept of quantum theory. This gathering of prominent physicists focused on the latest developments and challenges in understanding quantum mechanics, an area that was rapidly reshaping the scientific landscape at the time. Additionally, Einstein and Curie engaged in discussions about the implications of their respective work, such as the behavior of radioactive decay in curved spacetime, reflecting their deep interest in the intersection of their scientific pursuits.<br/>-------------------

In [22]:
cq_c_response = cq_c_engine.chat("what are the other topics they discussed?")

2025-09-12 20:21:42,288 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:21:42,297 - INFO - Querying with: What other topics, besides quantum theory and the implications of Einstein's and Curie's work, did the scientists discuss at the 1911 First Solvay Conference?


Querying with: What other topics, besides quantum theory and the implications of Einstein's and Curie's work, did the scientists discuss at the 1911 First Solvay Conference?


2025-09-12 20:21:42,618 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:44,043 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [23]:
in_md(cq_c_response.response)

--- Response -------<br/>At the 1911 First Solvay Conference, besides quantum theory and the implications of Einstein's and Curie's work, the scientists also discussed their shared concerns about the growing tensions in Europe. Additionally, they likely engaged in conversations about their experiences as outsiders in the scientific community, given the challenges both faced—Curie as a woman and Einstein as a Jew.<br/>-------------------

In [24]:
cq_c_response = cq_c_engine.chat("What are the most significant things that have happened among them?")

2025-09-12 20:21:45,725 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:21:45,727 - INFO - Querying with: What are the most significant events or developments that occurred among the scientists who attended the 1911 First Solvay Conference, including Albert Einstein and Marie Curie?


Querying with: What are the most significant events or developments that occurred among the scientists who attended the 1911 First Solvay Conference, including Albert Einstein and Marie Curie?


2025-09-12 20:21:46,093 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:21:53,045 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [25]:
in_md(cq_c_response.response) 

--- Response -------<br/>The 1911 First Solvay Conference was a pivotal event in the history of physics, bringing together many of the greatest scientific minds of the time to discuss quantum theory. Among the significant developments and interactions:

1. **Quantum Theory Discussions**: The conference was primarily focused on the revolutionary concept of quantum theory, which was still in its early stages of development. This gathering allowed for the exchange of ideas that would shape the future of physics.

2. **Marie Curie and Albert Einstein's Interaction**: A notable event was the meeting between Marie Curie and Albert Einstein. They engaged in deep discussions about their work, with Curie expressing interest in how radioactive decay might behave in Einstein's curved spacetime.

3. **Mutual Support and Respect**: Einstein expressed his admiration for Curie's work and offered support during a time when she was facing personal scandal. Their mutual respect and understanding of each other's challenges as outsiders in the scientific community were significant.

4. **Networking and Collaboration**: The conference provided an opportunity for scientists like Max Planck, Henri Poincaré, Ernest Rutherford, and Hendrik Lorentz to interact and collaborate, fostering relationships that would lead to further scientific advancements.

5. **Influence on Future Conferences**: The success and impact of the 1911 conference set the stage for future Solvay Conferences, which continued to be a major forum for discussing groundbreaking scientific ideas.

These interactions and discussions at the conference played a crucial role in advancing the understanding of quantum mechanics and relativity, influencing the direction of 20th-century physics.<br/>-------------------

In [26]:
cq_c_response = cq_c_engine.chat("what happen after that?")

2025-09-12 20:21:54,563 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:21:54,569 - INFO - Querying with: What significant events or developments occurred following the 1911 First Solvay Conference, particularly in relation to the scientists who attended and the discussions they had on quantum theory and other topics?


Querying with: What significant events or developments occurred following the 1911 First Solvay Conference, particularly in relation to the scientists who attended and the discussions they had on quantum theory and other topics?


2025-09-12 20:21:54,916 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:00,221 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [27]:
in_md(cq_c_response.response) 

--- Response -------<br/>Following the 1911 First Solvay Conference, significant developments occurred in the realm of quantum theory and other scientific areas. The conference itself was a pivotal moment for the discussion of quantum theory, bringing together key figures like Marie Curie, Albert Einstein, Max Planck, and others who were instrumental in advancing the field. After the conference, these scientists continued to develop their groundbreaking theories and research.

Albert Einstein further developed his theories on relativity, which continued to reshape the understanding of space and time. Max Planck's work on energy quanta laid the foundation for future quantum mechanics developments. The interactions and debates among these scientists, particularly between Einstein and Niels Bohr in later conferences, further propelled the exploration and understanding of quantum mechanics.

Marie Curie continued her pioneering research on radioactivity, contributing significantly to the field and maintaining her status as a leading scientist despite personal challenges. The conference also fostered collaborations and mutual respect among the attendees, which influenced their subsequent scientific endeavors and the broader scientific community.<br/>-------------------

In [28]:
cq_c_response = cq_c_engine.chat("what happen after that?")

2025-09-12 20:22:01,518 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:22:01,522 - INFO - Querying with: What significant developments and events occurred in the field of physics and among the scientists, such as Albert Einstein, Marie Curie, and others, following the 1911 First Solvay Conference?


Querying with: What significant developments and events occurred in the field of physics and among the scientists, such as Albert Einstein, Marie Curie, and others, following the 1911 First Solvay Conference?


2025-09-12 20:22:01,989 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:07,268 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [29]:
in_md(cq_c_response.response) 

--- Response -------<br/>Following the 1911 First Solvay Conference, several significant developments and events occurred in the field of physics and among the scientists who attended. Albert Einstein continued to develop his theories, leading to the publication of the general theory of relativity in 1915, which fundamentally changed our understanding of gravity. Marie Curie persisted in her pioneering work on radioactivity, furthering research that would have lasting impacts on both science and medicine.

The Solvay Conferences continued to be a crucial meeting ground for physicists, fostering discussions that would shape the future of quantum mechanics and relativity. Scientists like Max Planck, who had already made significant contributions to quantum theory, continued to influence the field, while figures like Niels Bohr engaged in pivotal debates with Einstein about the nature of quantum mechanics.

The scientific community also faced challenges due to the political tensions in Europe, which affected collaborations and the movement of scientists. Despite these challenges, the relationships and exchanges fostered at events like the Solvay Conferences played a vital role in advancing scientific knowledge during this period.<br/>-------------------

### Context Chat Engine

> The Context chat engine leverages explicit memory management through `ChatMemoryBuffer` to maintain conversational state across multiple interactions.
{:.info-box} 

This architecture retrieves relevant context based on the current query while preserving conversation history up to a specified token limit (3900 tokens in this example). 

The engine combines retrieved documents with chat memory, allowing it to reference previous exchanges while grounding responses in the knowledge base. The `system_prompt` provides domain-specific context that influences how the LLM interprets and responds to queries, making this approach particularly effective for specialized applications where conversation continuity and expert knowledge synthesis are essential for maintaining coherent multi-turn interactions.

In [30]:
from llama_index.core.memory import ChatMemoryBuffer


memory = ChatMemoryBuffer.from_defaults(token_limit=3900)
ctx_c_engine = index.as_chat_engine(
    chat_mode="context"
    , memory=memory
    , system_prompt = (
        "You are familiar with biographies of Albert and Marie, as well as their professional and social friendships and relationships."
    ))

In [31]:
ctx_c_response = ctx_c_engine.chat("who are the scientists in these documents?")

2025-09-12 20:22:07,676 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:11,057 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [32]:
in_md(ctx_c_response.response)

--- Response -------<br/>The documents mention several prominent scientists who were part of the early 20th-century scientific community, particularly those who attended the Solvay Conferences. Here are the scientists mentioned:

1. Albert Einstein
2. Marie Curie
3. Paul Langevin
4. Max Planck
5. Henri Poincaré
6. Ernest Rutherford
7. Hendrik Lorentz
8. Niels Bohr
9. Walther Nernst
10. Marcel Brillouin
11. Ernest Solvay
12. Emil Warburg
13. Jean Baptiste Perrin
14. Wilhelm Wien
15. Robert Goldschmidt
16. Heinrich Rubens
17. Arnold Sommerfeld
18. Frederick Lindemann
19. Maurice de Broglie
20. Martin Knudsen
21. Friedrich Hasenöhrl
22. Georges Hostelet
23. Edouard Herzen
24. James Hopwood Jeans
25. Heike Kamerlingh Onnes

These scientists were part of a closely-knit community working on cutting-edge physics and often met at the Solvay Conferences to discuss significant scientific advancements, particularly in the field of quantum mechanics and relativity.<br/>-------------------

In [33]:
ctx_c_response = ctx_c_engine.chat("Have these scientist met in their lifetime?")

2025-09-12 20:22:11,669 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:19,349 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [34]:
in_md(ctx_c_response.response)

--- Response -------<br/>Yes, many of these scientists met in their lifetime, particularly at the Solvay Conferences, which were pivotal gatherings for discussing advancements in physics. The Solvay Conferences, starting in 1911, served as a primary meeting ground for these scientists. For example:

- Albert Einstein and Marie Curie attended multiple Solvay Conferences together, including the first one in 1911 and the famous fifth conference in 1927.
- Paul Langevin, Max Planck, Henri Poincaré, Ernest Rutherford, and Hendrik Lorentz were also present at the 1911 Solvay Conference.
- Niels Bohr attended later Solvay Conferences, including the 1927 conference, where he engaged in famous debates with Einstein about quantum mechanics.

These conferences brought together many of the leading scientists of the time, allowing them to collaborate, debate, and advance the field of physics collectively.<br/>-------------------

In [35]:
ctx_c_response = ctx_c_engine.chat("what are the topics they discussed?")

2025-09-12 20:22:19,720 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:26,312 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [36]:
in_md(ctx_c_response.response)

--- Response -------<br/>The scientists at the Solvay Conferences discussed several groundbreaking topics in physics, particularly focusing on quantum theory and relativity. Here are some of the key topics they explored:

1. **Quantum Theory:** The early Solvay Conferences, starting with the first one in 1911, were pivotal in discussing the emerging field of quantum mechanics. Scientists debated the nature of atomic and subatomic particles, energy quantization, and the implications of quantum theory on classical physics.

2. **Theory of Relativity:** Albert Einstein's theories of special and general relativity were significant topics of discussion. These theories challenged existing notions of space, time, and gravity, leading to profound implications for physics.

3. **Radioactivity:** Marie Curie's work on radioactivity was a crucial subject, as it opened new realms in understanding atomic structure and nuclear physics.

4. **Atomic Structure:** Discussions on the structure of the atom, including models proposed by scientists like Niels Bohr, were central to these conferences.

5. **Wave-Particle Duality:** The dual nature of light and matter, exhibiting both wave-like and particle-like properties, was a topic of intense debate, particularly between Einstein and Bohr.

6. **Statistical Mechanics and Thermodynamics:** These areas were explored to understand the behavior of systems at the atomic level, contributing to the development of statistical physics.

7. **Philosophical Implications of Physics:** The conferences also delved into the philosophical questions raised by new physical theories, such as determinism, causality, and the nature of reality.

These discussions were instrumental in shaping modern physics, as they brought together the leading minds to collaborate and challenge each other's ideas, leading to significant advancements in the field.<br/>-------------------

In [37]:
ctx_c_response = ctx_c_engine.chat("what are the other topics they discussed?")

2025-09-12 20:22:27,337 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:33,928 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [38]:
in_md(ctx_c_response.response)    

--- Response -------<br/>In addition to the major topics previously mentioned, the scientists at the Solvay Conferences and in their personal interactions likely discussed a variety of other scientific and philosophical topics, including:

1. **Electromagnetism:** The integration of electromagnetic theory with quantum mechanics and relativity was a significant area of interest, particularly in understanding light and radiation.

2. **Nuclear Physics:** As the understanding of the atomic nucleus developed, discussions likely included nuclear reactions, isotopes, and the forces within the nucleus.

3. **Chemical Physics:** The intersection of chemistry and physics, particularly in understanding molecular structures and reactions, was an area of exploration, especially relevant to Marie Curie's work on radioactivity.

4. **Experimental Techniques:** Advances in experimental methods and instrumentation for studying atomic and subatomic phenomena were crucial for validating theoretical predictions.

5. **Mathematical Physics:** The development of mathematical tools and frameworks to describe physical phenomena, including tensor calculus and group theory, was essential for advancing theories like relativity and quantum mechanics.

6. **Philosophy of Science:** Broader philosophical discussions about the nature of scientific inquiry, the role of observation and theory, and the limits of scientific knowledge were likely part of their conversations.

7. **Social and Political Issues:** Given the historical context, they may have discussed the impact of political events on scientific collaboration, funding, and the movement of scientists across borders.

These topics reflect the broad and interdisciplinary nature of the discussions among these pioneering scientists, as they sought to understand and explain the fundamental principles governing the natural world.<br/>-------------------

In [39]:
ctx_c_response = ctx_c_engine.chat("What are the most significant things that have happened among them?")

2025-09-12 20:22:34,452 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:22:48,844 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [40]:
in_md(ctx_c_response.response)    

--- Response -------<br/>Several significant events and interactions occurred among these scientists, shaping the course of modern physics and their personal and professional lives. Here are some of the most notable:

1. **Solvay Conferences:** The Solvay Conferences themselves were significant events, bringing together the greatest minds in physics to discuss and debate the latest scientific theories and discoveries. The 1927 conference, in particular, is famous for the debates between Albert Einstein and Niels Bohr on the interpretation of quantum mechanics.

2. **Einstein and Curie's Friendship:** Albert Einstein and Marie Curie developed a close personal and professional friendship. They exchanged letters discussing scientific ideas and personal matters, and Curie supported Einstein against anti-German sentiment in France.

3. **Development of Quantum Mechanics:** The collaboration and debates among these scientists, including figures like Niels Bohr, Werner Heisenberg, and Erwin Schrödinger, were crucial in developing and refining quantum mechanics, a cornerstone of modern physics.

4. **Einstein's Theory of Relativity:** Albert Einstein's development of the theories of special and general relativity revolutionized the understanding of space, time, and gravity, influencing many of his contemporaries and future generations of physicists.

5. **Marie Curie's Work on Radioactivity:** Marie Curie's pioneering research on radioactivity laid the groundwork for nuclear physics and chemistry, earning her two Nobel Prizes and inspiring many scientists in the field.

6. **Einstein's Flight from Nazi Germany:** Einstein's emigration to the United States in 1933, due to the rise of the Nazi regime, was a significant event, impacting his life and career and highlighting the broader challenges faced by scientists during this period.

7. **Philosophical Debates:** The philosophical debates, particularly between Einstein and Bohr, about the nature of reality and the interpretation of quantum mechanics, had a lasting impact on the philosophy of science.

These events and interactions not only advanced scientific knowledge but also demonstrated the importance of collaboration, debate, and friendship in the scientific community.<br/>-------------------

In [41]:
ctx_c_response = ctx_c_engine.chat("what happen after that?")

2025-09-12 20:22:49,292 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:01,150 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [42]:
in_md(ctx_c_response.response)    

--- Response -------<br/>After the significant events and interactions among these scientists, several developments unfolded in both their personal lives and the broader scientific community:

1. **Continued Scientific Advancements:** The foundational work laid by these scientists continued to influence and propel advancements in physics. Quantum mechanics and relativity became integral parts of modern physics, leading to new discoveries and technologies.

2. **World War II and the Manhattan Project:** The onset of World War II had a profound impact on the scientific community. Many scientists, including some from the Solvay Conferences, were involved in the development of the atomic bomb through the Manhattan Project. Although Einstein was not directly involved in the project, his letter to President Roosevelt urging atomic research was a catalyst for its initiation.

3. **Post-War Scientific Collaboration:** After the war, there was a renewed emphasis on international scientific collaboration. Organizations like CERN were established to foster cooperative research in Europe, partly as a response to the wartime disruptions.

4. **Einstein's Later Years:** Albert Einstein spent his later years at the Institute for Advanced Study in Princeton, New Jersey, working on a unified field theory, though he was unsuccessful. He remained active in social and political causes, advocating for civil rights and nuclear disarmament until his death in 1955.

5. **Legacy of Marie Curie:** Marie Curie's legacy continued through her contributions to science and her family. Her daughter, Irène Joliot-Curie, followed in her footsteps, winning a Nobel Prize in Chemistry in 1935 for her work on artificial radioactivity.

6. **Development of New Theories:** The groundwork laid by these scientists paved the way for new theories and discoveries in physics, including the development of quantum field theory and the Standard Model of particle physics.

7. **Cultural Impact:** The cultural impact of these scientists, particularly Einstein, continued to grow. Einstein became an enduring symbol of genius and intellectual achievement, influencing both scientific thought and popular culture.

These developments highlight the lasting influence of the early 20th-century scientific community on both the scientific world and society at large.<br/>-------------------

In [43]:
ctx_c_response = ctx_c_engine.chat("what happen after that?")

2025-09-12 20:23:01,738 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:09,421 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [44]:
in_md(ctx_c_response.response)

--- Response -------<br/>Following the post-war period and the contributions of these scientists, several key developments and trends emerged in the scientific and global landscape:

1. **Cold War Era Science:** The Cold War period saw significant scientific advancements driven by competition between the United States and the Soviet Union. This included the space race, leading to milestones like the launch of Sputnik and the Apollo moon landings.

2. **Expansion of Physics:** The fields of quantum mechanics and relativity continued to expand, leading to the development of quantum electrodynamics (QED) and later quantum chromodynamics (QCD), which are part of the Standard Model of particle physics.

3. **Technological Innovations:** The foundational work in physics contributed to numerous technological innovations, including the development of semiconductors, transistors, and eventually, computers and the internet.

4. **Nuclear Energy and Weapons:** The development of nuclear energy for both civilian and military purposes continued to evolve, with ongoing debates about its safety, ethical implications, and environmental impact.

5. **Human Rights and Scientific Responsibility:** Influenced by the humanitarian concerns of scientists like Einstein, there was a growing emphasis on the ethical responsibilities of scientists, particularly regarding the use of scientific discoveries for military purposes.

6. **Global Scientific Collaboration:** International collaboration in science became more prominent, with organizations like the United Nations Educational, Scientific and Cultural Organization (UNESCO) promoting scientific cooperation across borders.

7. **Einstein's Enduring Influence:** Albert Einstein's theories continued to be tested and confirmed, with his work remaining a cornerstone of modern physics. His influence extended beyond science, as he became an icon of intellectual curiosity and humanitarian values.

8. **Legacy of the Solvay Conferences:** The tradition of international scientific conferences continued, fostering dialogue and collaboration among scientists worldwide, much like the original Solvay Conferences.

These developments reflect the enduring impact of the early 20th-century scientific community on both the progression of science and its integration into broader societal and global contexts.<br/>-------------------

### Condense Plus Context Chat Engine

>The Condense Plus Context chat engine represents a hybrid approach that combines the best features of both condense question and context chat engines. This sophisticated architecture first condenses the conversation history into a standalone question (condense question mode), then retrieves relevant context based on this reformulated query while maintaining conversation memory through `ChatMemoryBuffer`.
{:.info-box}

This *dual-phase process* ensures optimal retrieval accuracy by creating self-contained queries while preserving conversational continuity through explicit memory management. 

The `context_prompt` parameter provides domain-specific guidance, while the verbose mode reveals the condensation process, making this approach ideal for complex conversational RAG applications requiring both high retrieval precision and coherent multi-turn dialogue capabilities.

> Chat query is inhanced by the chat engine as shwon in the following conversation. You can find this enhanced query as `Querying with: ...`.

In [45]:
from tabnanny import verbose
from llama_index.core.memory import ChatMemoryBuffer 

condenseContext_memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

condenseContext_engine = index.as_chat_engine(
    chat_mode= "condense_plus_context"
    ,memory= condenseContext_memory
    ,llm = llm
    , context_prompt =  "You are familiar with biographies of Albert and Marie, as well as their professional and social friendships and relationships."
    , verbose = True
)

In [46]:
condenseContext_response =condenseContext_engine.chat("who are the scientists in these documents?")

2025-09-12 20:23:09,464 - INFO - Condensed question: who are the scientists in these documents?


Condensed question: who are the scientists in these documents?


2025-09-12 20:23:09,829 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:22,937 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [47]:
in_md(condenseContext_response.response)

--- Response -------<br/>To help identify the scientists in the documents, it would be useful to know which documents you are referring to. However, if you are asking about scientists commonly associated with Albert Einstein and Marie Curie, here are a few notable ones:

1. **Albert Einstein**:
   - **Max Planck**: A physicist known for his work on quantum theory, which Einstein built upon.
   - **Niels Bohr**: A physicist who made foundational contributions to understanding atomic structure and quantum theory.
   - **Werner Heisenberg**: Known for the development of quantum mechanics and the uncertainty principle.
   - **Erwin Schrödinger**: A physicist who developed wave mechanics, one of the formulations of quantum mechanics.
   - **Wolfgang Pauli**: Known for the Pauli exclusion principle in quantum physics.
   - **Satyendra Nath Bose**: Collaborated with Einstein on Bose-Einstein statistics.

2. **Marie Curie**:
   - **Henri Becquerel**: Discovered radioactivity, which Curie further investigated.
   - **Pierre Curie**: Marie Curie's husband and collaborator in her research on radioactivity.
   - **Irène Joliot-Curie**: Marie Curie's daughter, who also won a Nobel Prize in Chemistry.
   - **Frédéric Joliot-Curie**: Irène's husband, who collaborated with her on artificial radioactivity.
   - **Albert Einstein**: Although primarily associated with physics, Einstein and Curie had mutual respect and were contemporaries.

If you have specific documents or contexts in mind, please provide more details for a more tailored response.<br/>-------------------

In [48]:
condenseContext_response =condenseContext_engine.chat("Have these scientist met in their lifetime?")

2025-09-12 20:23:24,663 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:23:24,672 - INFO - Condensed question: Did the scientists associated with Albert Einstein and Marie Curie, such as Max Planck, Niels Bohr, Werner Heisenberg, Erwin Schrödinger, Wolfgang Pauli, Satyendra Nath Bose, Henri Becquerel, Pierre Curie, Irène Joliot-Curie, and Frédéric Joliot-Curie, meet each other during their lifetimes?


Condensed question: Did the scientists associated with Albert Einstein and Marie Curie, such as Max Planck, Niels Bohr, Werner Heisenberg, Erwin Schrödinger, Wolfgang Pauli, Satyendra Nath Bose, Henri Becquerel, Pierre Curie, Irène Joliot-Curie, and Frédéric Joliot-Curie, meet each other during their lifetimes?


2025-09-12 20:23:25,076 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:39,422 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [49]:
in_md(condenseContext_response.response)

--- Response -------<br/>Yes, many of the scientists mentioned did meet during their lifetimes, particularly those who were contemporaries and active in the early to mid-20th century. Here are some notable interactions:

1. **Albert Einstein and Marie Curie**: They met on several occasions and had a mutual respect for each other's work. They both attended the Solvay Conferences, which were gatherings of prominent scientists to discuss issues in physics. Einstein admired Curie's perseverance and scientific contributions.

2. **Albert Einstein and Niels Bohr**: They had a famous series of debates on the fundamentals of quantum mechanics, particularly at the Solvay Conferences. These debates were pivotal in the development of quantum theory.

3. **Marie Curie and Henri Becquerel**: They collaborated closely after Becquerel discovered radioactivity. Curie's research expanded on his initial findings, leading to her own groundbreaking discoveries.

4. **Marie Curie and Pierre Curie**: As husband and wife, they worked closely together on their research into radioactivity, which led to their joint Nobel Prize in Physics in 1903.

5. **Irène Joliot-Curie and Frédéric Joliot-Curie**: They were married and collaborated on research that led to their Nobel Prize in Chemistry in 1935 for their synthesis of new radioactive elements.

6. **Albert Einstein and Max Planck**: They had a professional relationship and mutual respect. Planck was one of the early supporters of Einstein's theory of relativity.

7. **Albert Einstein and Satyendra Nath Bose**: Although they collaborated on Bose-Einstein statistics, it was primarily through correspondence. It is not well-documented whether they met in person.

These interactions were part of a vibrant scientific community in Europe during the early 20th century, where many of these scientists met through conferences, academic institutions, and collaborative research.<br/>-------------------

In [50]:
condenseContext_response =condenseContext_engine.chat("what are the topics they discussed?")

2025-09-12 20:23:40,852 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:23:40,854 - INFO - Condensed question: What topics did the scientists mentioned, such as Albert Einstein, Marie Curie, Niels Bohr, and others, discuss during their meetings and interactions?


Condensed question: What topics did the scientists mentioned, such as Albert Einstein, Marie Curie, Niels Bohr, and others, discuss during their meetings and interactions?


2025-09-12 20:23:41,265 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:49,963 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [51]:
in_md(condenseContext_response.response)

--- Response -------<br/>The interactions among these scientists often revolved around some of the most groundbreaking and complex topics in physics and chemistry of their time. Here are some of the key topics they discussed:

1. **Albert Einstein and Marie Curie**:
   - **Radioactivity**: Curie's pioneering work in radioactivity was a topic of great interest, and Einstein admired her contributions to the field.
   - **Scientific Collaboration and Recognition**: They both navigated the challenges of scientific recognition and the politics of the Nobel Prize, often discussing the broader implications of their work.

2. **Albert Einstein and Niels Bohr**:
   - **Quantum Mechanics**: Their debates, particularly at the Solvay Conferences, focused on the interpretation of quantum mechanics. Einstein famously challenged the probabilistic nature of quantum mechanics, encapsulated in his quote, "God does not play dice with the universe."
   - **Wave-Particle Duality**: Discussions on the nature of light and matter, which were central to the development of quantum theory.

3. **Marie Curie and Henri Becquerel**:
   - **Radioactivity**: They discussed the nature and properties of radioactivity, which Becquerel discovered and Curie further explored, leading to the identification of new radioactive elements like polonium and radium.

4. **Marie Curie and Pierre Curie**:
   - **Discovery of Radium and Polonium**: Their collaborative work focused on isolating and studying these elements, which were crucial to understanding radioactivity.
   - **Applications of Radioactivity**: They explored the potential uses of radioactive materials in medicine and science.

5. **Irène Joliot-Curie and Frédéric Joliot-Curie**:
   - **Artificial Radioactivity**: They discussed and conducted experiments that led to the discovery of artificial radioactivity, which had significant implications for nuclear physics and chemistry.

6. **Albert Einstein and Max Planck**:
   - **Theory of Relativity**: Planck was an early supporter of Einstein's theory, and they discussed its implications for physics.
   - **Quantum Theory**: While Planck laid the groundwork for quantum theory, Einstein's work on the photoelectric effect furthered its development, leading to discussions on the dual nature of light.

7. **Albert Einstein and Satyendra Nath Bose**:
   - **Bose-Einstein Statistics**: Their collaboration through correspondence led to the development of Bose-Einstein statistics, which describe the behavior of indistinguishable particles and laid the groundwork for the concept of Bose-Einstein condensates.

These discussions were instrumental in advancing scientific understanding and often took place in formal settings like conferences, as well as through personal correspondence and collaborations.<br/>-------------------

In [52]:
condenseContext_response =condenseContext_engine.chat("what are the other topics they discussed?")

2025-09-12 20:23:50,992 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:23:50,995 - INFO - Condensed question: What additional topics did these scientists discuss during their interactions?


Condensed question: What additional topics did these scientists discuss during their interactions?


2025-09-12 20:23:51,299 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:23:59,728 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [53]:
in_md(condenseContext_response.response)

--- Response -------<br/>Beyond their scientific work, these scientists also engaged in discussions on a variety of other topics, reflecting their broader interests and the socio-political context of their times:

1. **Philosophy of Science**:
   - **Albert Einstein and Niels Bohr**: Their debates on quantum mechanics often touched on philosophical questions about the nature of reality, determinism, and the role of the observer in scientific measurements.

2. **Education and Scientific Collaboration**:
   - **Marie Curie**: As a pioneering female scientist, Curie was deeply involved in discussions about education, particularly for women, and the importance of international scientific collaboration. She helped establish the Radium Institute in Paris, which became a hub for scientific research.

3. **Ethics and Responsibility**:
   - **Albert Einstein**: He was vocal about the ethical responsibilities of scientists, especially in the context of nuclear weapons development during and after World War II. He famously signed a letter to President Franklin D. Roosevelt urging the development of atomic research, which contributed to the Manhattan Project.

4. **Social and Political Issues**:
   - **Albert Einstein**: He was an advocate for civil rights, pacifism, and Zionism. He often discussed these topics in public forums and private correspondences, emphasizing the role of science in promoting peace and understanding.
   - **Marie Curie**: While less politically active than Einstein, Curie was involved in efforts to support scientific research and education, particularly during and after World War I, when she helped equip ambulances with X-ray machines to aid in medical treatment.

5. **Personal and Professional Challenges**:
   - **Marie Curie**: She faced significant challenges as a woman in science, including issues of recognition and funding. Her correspondence and interactions often reflected her efforts to overcome these barriers.
   - **Albert Einstein**: He discussed the challenges of working in a politically turbulent Europe, particularly as a Jewish scientist during the rise of Nazism, which eventually led to his emigration to the United States.

These discussions reveal that these scientists were not only focused on advancing their respective fields but were also deeply engaged with the broader implications of their work and the societal issues of their time.<br/>-------------------

In [54]:
condenseContext_response =condenseContext_engine.chat("What are the most significant things that have happened among them?")

2025-09-12 20:24:00,618 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:24:00,621 - INFO - Condensed question: What are the most significant events or interactions that occurred among these scientists?


Condensed question: What are the most significant events or interactions that occurred among these scientists?


2025-09-12 20:24:01,027 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:24:09,664 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [55]:
in_md(condenseContext_response.response)

--- Response -------<br/>The interactions and collaborations among these scientists led to several significant developments in science and had a profound impact on the scientific community and beyond. Here are some of the most notable events and achievements:

1. **Albert Einstein and Marie Curie**:
   - **Solvay Conferences**: Both attended these influential meetings, which were critical for the development of modern physics. These conferences facilitated discussions on quantum mechanics and relativity, bringing together the brightest minds of the time.

2. **Albert Einstein and Niels Bohr**:
   - **Einstein-Bohr Debates**: Their debates on the interpretation of quantum mechanics were pivotal in shaping the philosophical foundations of quantum theory. These discussions highlighted the tension between Einstein's preference for deterministic theories and Bohr's acceptance of probabilistic interpretations.

3. **Marie Curie and Henri Becquerel**:
   - **Discovery of Radioactivity**: Becquerel's discovery of radioactivity, followed by Curie's extensive research, led to the identification of polonium and radium. This work laid the foundation for the field of nuclear physics and had significant implications for medicine and industry.

4. **Marie Curie and Pierre Curie**:
   - **Nobel Prize in Physics (1903)**: Marie and Pierre Curie, along with Henri Becquerel, were awarded the Nobel Prize for their joint research on radioactivity. This recognition was a landmark achievement for Marie Curie, making her the first woman to win a Nobel Prize.

5. **Irène Joliot-Curie and Frédéric Joliot-Curie**:
   - **Nobel Prize in Chemistry (1935)**: Their discovery of artificial radioactivity was a significant advancement in nuclear chemistry, leading to further research in nuclear reactions and the development of nuclear energy.

6. **Albert Einstein and Max Planck**:
   - **Support for Relativity**: Planck's support of Einstein's theory of relativity helped gain acceptance for the theory within the scientific community. This collaboration was crucial in establishing relativity as a cornerstone of modern physics.

7. **Albert Einstein and Satyendra Nath Bose**:
   - **Bose-Einstein Statistics**: Their collaboration led to the development of Bose-Einstein statistics, which describe the behavior of particles known as bosons. This work was foundational for later developments in quantum mechanics and the study of Bose-Einstein condensates.

These significant events and achievements not only advanced scientific knowledge but also influenced the direction of research and the development of new technologies in the 20th century. The collaborations and debates among these scientists were instrumental in shaping modern physics and chemistry.<br/>-------------------

In [56]:
condenseContext_response =condenseContext_engine.chat("what happen after that?")

2025-09-12 20:24:10,828 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:24:10,833 - INFO - Condensed question: What significant developments or events occurred following these achievements and interactions among the scientists?


Condensed question: What significant developments or events occurred following these achievements and interactions among the scientists?


2025-09-12 20:24:11,368 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:24:23,240 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [57]:
in_md(condenseContext_response.response)

--- Response -------<br/>After these significant achievements and interactions, the legacy of these scientists continued to influence the fields of physics and chemistry, as well as broader societal and scientific developments. Here's what happened in the subsequent years:

1. **Albert Einstein**:
   - **Later Years and Legacy**: Einstein continued to work on theoretical physics, focusing on a unified field theory, although he did not achieve a breakthrough comparable to his earlier work. He became an iconic public figure, advocating for peace, civil rights, and scientific collaboration. Einstein's theories, particularly relativity, have continued to be foundational in physics, influencing areas such as cosmology and quantum mechanics.
   - **Impact on Technology**: His work laid the groundwork for technologies like GPS, which relies on the principles of general relativity.

2. **Marie Curie**:
   - **Continued Research and Influence**: Curie continued her research on radioactivity until her death in 1934. Her work paved the way for advancements in medical treatments, such as cancer radiotherapy. She also established research institutes in Paris and Warsaw, which became centers for scientific research.
   - **Legacy and Recognition**: Curie's legacy as a pioneering woman in science has inspired generations of female scientists. She remains the only person to have won Nobel Prizes in two different scientific fields (Physics and Chemistry).

3. **Niels Bohr**:
   - **Contributions to Quantum Mechanics**: Bohr continued to develop quantum theory and played a crucial role in the establishment of the Copenhagen interpretation. He also contributed to the development of nuclear energy and was involved in discussions about the ethical implications of atomic weapons.
   - **Post-War Efforts**: After World War II, Bohr advocated for peaceful uses of nuclear energy and international cooperation in science.

4. **Irène Joliot-Curie and Frédéric Joliot-Curie**:
   - **Scientific and Political Involvement**: Both continued their research in nuclear physics and were involved in the French atomic energy program. They also engaged in political activities, advocating for scientific research and education.
   - **Legacy**: Their work on artificial radioactivity had lasting impacts on nuclear science and technology.

5. **Max Planck**:
   - **Later Life and Influence**: Planck continued to be an influential figure in the scientific community, serving as president of the Kaiser Wilhelm Society (later the Max Planck Society). His contributions to quantum theory remain fundamental to modern physics.

6. **Satyendra Nath Bose**:
   - **Recognition and Influence**: Bose's work on quantum statistics gained recognition over time, and he became a prominent figure in Indian science. The concept of bosons, named in his honor, is central to particle physics.

The collective contributions of these scientists have had a profound and lasting impact on science and technology, shaping the modern world in countless ways. Their work continues to inspire new generations of scientists and researchers.<br/>-------------------

In [58]:
condenseContext_response =condenseContext_engine.chat("what happen after that?")

2025-09-12 20:24:24,372 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 20:24:24,374 - INFO - Condensed question: What developments and impacts followed the significant achievements and interactions of these scientists?


Condensed question: What developments and impacts followed the significant achievements and interactions of these scientists?


2025-09-12 20:24:24,885 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 20:24:48,210 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [59]:
in_md(condenseContext_response.response)

--- Response -------<br/>Following the significant contributions and interactions of these scientists, their work continued to influence scientific progress and societal developments in various ways:

1. **Advancements in Physics and Chemistry**:
   - **Quantum Mechanics and Relativity**: The foundational work of Einstein, Bohr, Planck, and others laid the groundwork for further advancements in quantum mechanics and relativity. These theories have been integral to developments in particle physics, cosmology, and quantum computing.
   - **Nuclear Science**: The research initiated by Marie Curie and continued by Irène Joliot-Curie and Frédéric Joliot-Curie contributed to the development of nuclear energy and medical applications, such as cancer treatments using radiation therapy.

2. **Technological Innovations**:
   - **Applications of Relativity**: Einstein's theories have been crucial for technologies like GPS, which requires precise time measurements adjusted for relativistic effects.
   - **Quantum Technologies**: The principles of quantum mechanics have led to the development of technologies such as semiconductors, lasers, and quantum computers.

3. **Scientific Institutions and Collaboration**:
   - **Research Institutes**: Institutions like the Radium Institute in Paris and the Max Planck Society in Germany have continued to be leading centers for scientific research and innovation.
   - **International Collaboration**: The legacy of these scientists has fostered a spirit of international collaboration in science, exemplified by organizations like CERN and the International Atomic Energy Agency (IAEA).

4. **Social and Ethical Implications**:
   - **Nuclear Weapons and Peace Efforts**: The development of nuclear weapons during World War II, which was indirectly influenced by the work on radioactivity and nuclear physics, led to significant ethical debates. Scientists like Einstein and Bohr were involved in discussions about the peaceful use of nuclear energy and the prevention of nuclear proliferation.
   - **Science and Society**: The contributions of these scientists have highlighted the importance of science in addressing global challenges, from energy and health to environmental sustainability.

5. **Inspiration and Education**:
   - **Role Models**: Marie Curie, in particular, has become an enduring symbol of women's contributions to science, inspiring countless women to pursue careers in STEM fields.
   - **Educational Impact**: The discoveries and theories developed by these scientists are integral parts of science education worldwide, forming the basis of curricula in physics and chemistry.

Overall, the work and legacy of these scientists have had a lasting impact on both the scientific community and society at large, continuing to influence research, technology, and policy decisions well into the 21st century.<br/>-------------------

## Retrivers

Retrievers are fundamental components in RAG systems responsible for finding and ranking the most relevant chunks or nodes based on query similarity. LlamaIndex's `VectorIndexRetriever` performs semantic search using vector embeddings, where the `similarity_top_k` parameter determines how many of the most relevant nodes are retrieved for context generation.

> The *retriever acts as the bridge between the query and the knowledge base*, using distance metrics like cosine similarity to identify content that best matches the semantic intent of the user's question. 
{:.green}

This component is crucial for RAG performance as it directly impacts the quality of context provided to the LLM for answer generation.

In [61]:
from llama_index.core.retrievers import VectorIndexRetriever


retriever = VectorIndexRetriever(index=index, similarity_top_k=3)

### Configure response synthesizer

Response synthesis is the final stage of the RAG pipeline where the LLM combines retrieved context with the user query to generate coherent, factually-grounded answers. 

LlamaIndex's response synthesizer offers multiple synthesis modes including:

- "refine" 
- "compact" and 
- "tree_summarize" 

each optimized for different use cases. The "refine" mode processes retrieved nodes iteratively, refining the answer with each additional piece of context, making it ideal for comprehensive responses that need to incorporate information from multiple sources. 

This component ensures that the LLM's generation capabilities are augmented with relevant, retrieved knowledge while maintaining factual accuracy and minimizing hallucination through grounded response generation.

In [63]:
from llama_index.core import get_response_synthesizer

synthesizer = get_response_synthesizer(response_mode="refine")

The *Retriever-Synthesizer pattern* represents the modular architecture of RAG systems in LlamaIndex. The retriever component performs semantic search to identify and rank relevant nodes, while the synthesizer processes these retrieved contexts with the user query to generate the final response. 

This separation of concerns allows for fine-tuned control over both retrieval accuracy and synthesis quality. The `RetrieverQueryEngine` coordinates these components, ensuring that the most relevant information is retrieved first and then effectively synthesized into coherent answers.

This architecture enables experimentation with different retrieval strategies and synthesis modes independently, allowing for optimal RAG performance tuning.

In [64]:
from llama_index.core.query_engine import RetrieverQueryEngine


retriever_query_engine = RetrieverQueryEngine(
    retriever = retriever, response_synthesizer=synthesizer
)

In [65]:
retriever_response = retriever_query_engine.query("Who met at the 1911 First Solvay Conference?")

2025-09-12 21:31:31,350 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2025-09-12 21:31:33,319 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 21:31:34,622 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
2025-09-12 21:31:35,441 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [66]:
in_md(retriever_response.response)

--- Response -------<br/>At the 1911 First Solvay Conference, Marie Curie and Albert Einstein were among the notable attendees.<br/>-------------------

An alternative approach in RAG systems involves using the index directly as a retriever, which provides more granular control over the retrieval process. 

This method allows direct access to retrieved nodes with their similarity scores, enabling detailed analysis of retrieval quality and debugging of RAG performance. 

By examining individual nodes and their relevance scores, developers can
- fine-tune similarity thresholds, 
- analyze retrieval accuracy, 
- and optimize the knowledge base organization 

for better semantic matching between queries and relevant content.

In [68]:
retrieved_index =  index.as_retriever(similarity_top_k=3)

In [69]:
retrieved_nodes = retrieved_index.retrieve("Who met at the 1911 First Solvay Conference?")

2025-09-12 21:37:21,695 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [72]:
from llama_index.core.response.notebook_utils import display_source_node


for text_node in retrieved_nodes:
    print(text_node)

Node ID: 245c9289-4899-4e04-8909-c6d7b3c9cc25
Text: Based on my research, Einstein and Marie Curie shared many
mutual acquaintances through scientific conferences, particularly the
famous Solvay Conferences, and the broader European scientific
community of the early 20th century. Here are the key people both
scientists knew and how they were connected: Major Mutual
Acquaintances Paul Langevin (18...
Score:  0.841

Node ID: 2e1fb072-65d1-4889-b906-5d499f866244
Text: # When Marie Curie Met Albert Einstein  ## Brussels, October
1911  The first Solvay Conference buzzed with intellectual energy as
the greatest minds in physics gathered to discuss the revolutionary
concept of quantum theory. Among the distinguished attendees, two
figures commanded particular attention: the legendary Marie Curie,
recently awarded...
Score:  0.832

Node ID: eb0fd574-b6ec-402f-b7c0-47d87b103fe3
Text: 29 Legendary Scientists Came Together in the "Most Intelligent
Photo" Ever Taken 1927 Fifth Solvay Conference (t

The `similarity_top_k` parameter directly controls the context window size for RAG responses by determining how many semantically similar nodes are retrieved. This parameter represents a crucial trade-off in RAG systems:

- higher values provide more comprehensive context but may introduce noise,
- while lower values ensure focused context but might miss relevant information. 

> The optimal `similarity_top_k` setting depends on factors such as document chunk size, query complexity, and the desired balance between context completeness and response coherence.

## Evaluation

Necessary to measuring whether a found document is either relevant or irrelevant: the most commonly used valuation metrics are **precision** \eqref{eq:precision} and **recall** \eqref{eq:recall} where Precision is the fraction of retrieved documents that are relevant, while recall is the fraction of relevant documents that are successfully retrieved.

\\(R\\) - represents all the relevant documents

\\(NR\\) - represent all the irrelevant ones in a corpus of documents $$D$$

\\(Rq\\) - represent relevant documents found

\\(Dq\\) - represent documents returned by the system

$$
\begin{equation}
\text{precision}=\dfrac{Rq}{Dq} \label{eq:precision}
\end{equation}
$$

$$
\begin{equation}
\text{recall}=\dfrac{R_{q}}{R} \label{eq:recall}
\end{equation}
$$

However, these two metrics don't represent ranking. Therefore, retriever uses the ***top-k*** of documents for the context to rank to assume. 

> Whenever we find a relevant document in the rank, **recall** increases. **Precision**, on the other hand, increases with documents but decreases with each irrelevant document.
{:.info-box}

**Hit Rate** in LlamaIndex is a fundamental retrieval evaluation metric that measures how often the correct or relevant document appears within the ***top-k*** retrieved results for a given query. The calculation is straightforward: for each query, you assign a score of 1 if the ground truth (correct) node appears anywhere in the top-k retrieved results, and 0 if it doesn't. The overall hit rate is then the average across all queries. For example, if you have 10 queries and 8 of them have their correct answer node present in the top-k retrieved results, your hit rate would be 8/10 = 0.8.

You must have a method to evaluate whether changes in the system prompt improve user-query hit rates. Is the improvement 1%, 2%, or more[^5]?

**Mean average precision** (***MAP***) \eqref{eq:map} calculate precision values at the points where a relevant item is retrieved called average precision (AP) and then average these APs. For example

The avarage precision (AP) value for the single query is

$$
\begin{equation}
MAP = \dfrac{\sum_{\text{} }^{} \text{precision at each relevant item}}{\text{Total number of relevant items}} \label{eq:map}
\end{equation}
$$

> Hit rate will not be always a right metric to look for in the retrieval evaluation. It doesn't take into account whether the correct node is in the first few position let's say the first position or the second position or say towards the nth position of the retrieved nodes. This is why it's often used alongside Mean Reciprocal Rank (MRR), which considers the position of the correct result.

The **Mean Reciprocal Rrank** (***MRR***) \eqref{eq:mrr}  which access the quality of shrot-ranked list having the correct answer (usually of human labels). The reciprocal rank is the reciprocal of the rank of the first item relevant to the question. For a set of queries $Q$, we take the reciprocal ranks and conduct the average:

$$
\begin{equation}
MRR = \frac{1}{Q} \sum_{i = 1}^{Q} \frac{1}{rank_{i}} \label{eq:mrr}
\end{equation}
$$

### Response Evaluation

Response evaluation in LlamaIndex represents the critical final assessment phase of RAG system performance, *focusing on the quality of generated answers rather than just retrieval accuracy*. This evaluation paradigm operates on multiple dimensions to ensure comprehensive quality assessment:

**Core Evaluation Dimensions:**

- **Faithfulness**: The cornerstone of RAG evaluation, faithfulness measures whether generated responses are grounded in retrieved context without hallucination. LlamaIndex's `FaithfulnessEvaluator` analyzes whether each statement in the response can be traced back to the source documents, providing a quantitative measure of factual accuracy and preventing the LLM from generating information not present in the knowledge base.

- **Relevance**: This metric evaluates *semantic alignment between the user's query and the generated response*. LlamaIndex's `RelevancyEvaluator` assesses whether the answer directly addresses the question asked, considering both topical relevance and the appropriateness of the response scope.

- **Coherence**: *Measures the logical flow, readability, and structural quality of generated text*. This includes evaluating

    - sentence transitions, 
    - argument consistency, and 
    - overall narrative structure 

to ensure responses are not only accurate but also well-articulated.

- **Completeness**: Determines whether *responses fully address all aspects of complex queries*. This is particularly important for multi-part questions or requests requiring comprehensive explanations across multiple topics.

**Advanced Evaluation Capabilities:**

LlamaIndex provides automated evaluation workflows through its `DatasetGenerator`, which can autonomously generate question-answer pairs from your knowledge base. This capability enables:

- **Synthetic Dataset Creation**: Automatically generating evaluation datasets that reflect your specific domain and use case
- **Continuous Evaluation Pipelines**: Setting up automated testing as your RAG system evolves
- **Comparative Analysis**: Benchmarking different RAG configurations using consistent evaluation criteria

**Evaluation Methodologies:**

The framework supports both **reference-based evaluation** (comparing against ground truth answers) and **reference-free evaluation** (assessing quality without predefined answers). Reference-free evaluation is particularly valuable for RAG systems where creating comprehensive ground truth datasets is impractical.

**Integration with Retrieval Metrics:**

Response evaluation works synergistically with retrieval metrics (***Hit Rate***, ***MRR***, ***MAP***) to provide end-to-end RAG assessment. While retrieval metrics focus on finding relevant information, response evaluation ensures that retrieved context is effectively synthesized into high-quality answers.

> LlamaIndex's evaluation framework enables iterative RAG improvement through quantitative feedback loops, allowing developers to systematically optimize both retrieval and generation components for production-ready applications.


In [74]:
import nest_asyncio
nest_asyncio.apply()

import logging
import sys

logger = logging.getLogger()
logger.setLevel(logging.INFO)

# clear any existing handlers
logger.handlers = []

# Setup the stream handler
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.INFO)

logger.addHandler(handler)

In [75]:
import pandas as pd

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addFilter(logging.StreamHandler(stream=sys.stdout))

from llama_index.core.evaluation import (DatasetGenerator
    , FaithfulnessEvaluator
    , RelevancyEvaluator
    , CorrectnessEvaluator
    , RetrieverEvaluator
    , generate_qa_embedding_pairs
)

from llama_index.core import (SimpleDirectoryReader
    , VectorStoreIndex
    , Response
)



Some important evaluators:

- The `CorrectnessEvaluator` to assess the relevance and correctness of generated answers by comparing them to reference ground-truth answers.

- The `EmbeddingSimilarityEvaluator` is designed to assess the semantic similarity between generated responses and reference answers by comparing their embeddings using cosine similarity.

In [None]:
dataset_generator = DatasetGenerator.from_documents(documents
    , llm=llm
    , show_progress=False
)

eval_dataset = dataset_generator.generate_dataset_from_nodes(num=20)

Here the generated 20 questions:

In [85]:
for i, q  in enumerate(list(eval_dataset.queries.values())):
    print(f"Q.{i+1}: {q}")

Q.1: What pivotal childhood experience did Albert Einstein have that contributed to his scientific curiosity?
Q.2: Describe the circumstances under which Einstein left school in Munich and how he eventually gained admission to the Swiss Federal Polytechnic.
Q.3: What role did Einstein's position at the Swiss Patent Office play in his scientific career?
Q.4: List and briefly explain the four groundbreaking papers Einstein published during his "Annus Mirabilis" in 1905.
Q.5: How did Einstein's work on the photoelectric effect contribute to quantum theory, and what recognition did he receive for it?
Q.6: Explain the significance of Einstein's theory of special relativity and its impact on the concepts of space and time.
Q.7: What is the famous equation derived from Einstein's work on special relativity, and what does it signify?
Q.8: Outline Einstein's academic career progression from 1908 to 1914, including the institutions he was affiliated with.
Q.9: How did Sir Arthur Eddington's 1919

Create a evaluation qurey to evaluate which is consistent with the above.

In [88]:
eval_query = "What is Einstein's miracle year?"

Using GTP-3.5-Turbo for generate response and GPT-4 for evaluation:

In [86]:
gpt35_llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
gpt4_llm = OpenAI(model="gpt-4", temperature=0)

Create vector index and query engine from that for evaluation:

In [87]:
vector_index = VectorStoreIndex.from_documents( documents, llm=gpt35_llm)
query_eval_engine = vector_index.as_query_engine()


HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


Now create a retriever:

In [90]:
query_eval_retriever = vector_index.as_retriever(similarity_top_k=3)
eval_nodes = query_eval_retriever.retrieve(eval_query)

HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [101]:
import pprint
pprint.pprint(eval_nodes[1].get_text())

('# Albert Einstein: A Biography\n'
 '\n'
 '## Early Life (1879-1896)\n'
 '\n'
 'Albert Einstein was born on March 14, 1879, in Ulm, in the Kingdom of '
 'Württemberg in the German Empire. His parents were Hermann Einstein, a '
 'salesman and engineer, and Pauline Koch Einstein. When Albert was one year '
 'old, the family moved to Munich, where his father and uncle founded '
 'Elektrotechnische Fabrik J. Einstein & Cie, a company that manufactured '
 'electrical equipment.\n'
 '\n'
 'Einstein showed early signs of intellectual curiosity, though contrary to '
 'popular myth, he was not a poor student. He excelled in mathematics and '
 'physics from a young age. When he was five, his father gave him a compass, '
 'and Einstein was fascinated by the invisible forces that moved the needle—an '
 'experience he later described as pivotal in developing his scientific '
 'curiosity.\n'
 '\n'
 "In 1894, Einstein's family moved to Italy for business reasons, but Albert "
 'remained in Munich to

### Faithfullness Evaluator

Measures for the **hallucination** if the response from a query engine matches any source nodes. LlamaIndex's `FaithfulnessEvaluator` employs a sophisticated approach by breaking down the generated response into individual statements and systematically verifying whether each statement can be directly supported by the retrieved context nodes. 

In [93]:
faithfulness_evaluator = FaithfulnessEvaluator(llm=gpt4_llm)

In [94]:
eval_response_vector = query_eval_engine.query(eval_query)

HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


This evaluation process *uses the LLM itself to perform fact-checking, comparing each claim in the response against the source material to detect hallucinations, contradictions, or fabricated information*.

In [95]:
eval_response_result = faithfulness_evaluator.evaluate_response(
    response=eval_response_vector
)

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


The evaluator returns a binary pass/fail result along with detailed feedback, making it an essential tool for ensuring RAG systems maintain factual accuracy and don't generate misleading information that goes beyond what's actually contained in the knowledge base.

If not hallucinated, `passing` should be `True` as the following:

In [96]:
eval_response_result.passing

True

In [None]:
pprint.pprint(eval_response_result.contexts)

['In 1915, he completed his General Theory of Relativity, which described '
 'gravity not as a force, but as a curvature of spacetime caused by mass and '
 'energy. This theory made several predictions that were later confirmed '
 'experimentally, including the bending of light around massive objects.\n'
 '\n'
 '## International Fame (1915-1933)\n'
 '\n'
 "Einstein's general relativity gained worldwide attention when Sir Arthur "
 "Eddington's 1919 solar eclipse expedition confirmed the theory's prediction "
 'that light would bend around the sun. Overnight, Einstein became an '
 'international celebrity and the face of modern science.\n'
 '\n'
 'During this period, Einstein made significant contributions to quantum '
 'mechanics, statistical mechanics, and cosmology, though he remained '
 "skeptical of quantum mechanics' probabilistic interpretation, famously "
 'stating "God does not play dice with the universe."\n'
 '\n'
 '## Personal Life\n'
 '\n'
 'Einstein married Mileva Marić in

### Relevency Evaluation

Mesures if the response and source nodes match the query. LlamaIndex's `RelevancyEvaluator` assesses the semantic alignment between user queries and generated responses by analyzing whether the answer directly addresses the question's intent and scope. 

This evaluator goes beyond simple keyword matching to understand contextual relevance, evaluating if the response stays on-topic, addresses all parts of multi-faceted questions, and provides information that genuinely helps answer what was asked. 


Create relevancy evaluator:

In [105]:
relevancy_evaluator = RelevancyEvaluator(llm=gpt4_llm)

The evaluation process considers both the retrieved context relevance and the final response relevance, ensuring that not only are appropriate source materials found, but that the LLM synthesizes them into answers that are meaningfully connected to the user's information need rather than providing tangentially related but ultimately unhelpful responses.

In [106]:
from llama_index.core.indices import query


eval_response_result = relevancy_evaluator.evaluate_response(
    query=eval_query, response=eval_response_vector
)

Retrying request to /chat/completions in 0.477346 seconds
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


In [107]:
eval_response_result.query

"What is Einstein's miracle year?"

In [108]:
eval_response_result.response

"Einstein's miracle year is 1905."

In [109]:
eval_response_result.passing

True

### Retrieval Evaluation

Retrieval evaluation forms the foundation of RAG system assessment, specifically targeting the quality and effectiveness of the retriever module in LlamaIndex. This evaluation phase is crucial because retrieval quality directly impacts downstream response generation - poor retrieval leads to irrelevant or incomplete answers regardless of LLM capabilities.

**Core Retrieval Evaluation Paradigm:**

The retrieval evaluation process in LlamaIndex assesses how effectively the retriever identifies and ranks relevant documents from the knowledge base. RAG retrieval evaluation must consider semantic relevance, contextual appropriateness, and the ability to support accurate answer generation.

**Evaluation Components:**

- **Semantic Accuracy**: Measures whether retrieved nodes contain information semantically relevant to the query, even when exact keywords don't match
- **Ranking Quality**: Assesses how well the retriever orders results by relevance, ensuring the most pertinent information appears in top-k results
- **Coverage Analysis**: Evaluates whether the retriever captures all necessary information to answer complex, multi-faceted queries
- **Noise Reduction**: Determines the retriever's ability to exclude irrelevant or tangentially related content that could mislead the generation process

**LlamaIndex RetrieverEvaluator Framework:**

LlamaIndex's `RetrieverEvaluator` provides comprehensive assessment capabilities using industry-standard metrics like Mean Reciprocal Rank (***MRR***) and ***Hit Rate***. This framework enables:

- **Ground Truth Comparison**: Evaluating retrieval performance against manually curated relevant document sets
- **Multi-metric Analysis**: Simultaneous assessment using complementary metrics to provide holistic performance insights  
- **Batch Evaluation**: Processing entire evaluation datasets to measure consistent retriever performance across diverse queries
- **Comparative Benchmarking**: Testing different retriever configurations to optimize similarity_top_k, embedding models, and indexing strategies

**Practical Implementation Considerations:**

Effective retrieval evaluation requires careful consideration of evaluation dataset quality, ground truth annotation accuracy, and the alignment between evaluation metrics and actual RAG system performance. The evaluation results guide critical decisions about embedding model selection, chunking strategies, and similarity threshold tuning.

Will reuse `query_eval_retriever` to demonstrate these evaluation principles:

In [111]:
retrieval_eval_nodes = query_eval_retriever.retrieve(eval_query)

HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [115]:
for node in retrieval_eval_nodes:
    pprint.pprint(node)

NodeWithScore(node=TextNode(id_='61b7c18b-b156-4e27-9904-c9ff286975af', embedding=None, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='2a37d51d-ac93-4091-aad5-06b963941e44', node_type='4', metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, hash='ebbeab8f1fb4d4

Let's create dataset for quality assurance:

In [None]:
from llama_index.core.evaluation import generate_question_context_pairs

qa_dataset = generate_question_context_pairs(
    retrieval_eval_nodes, llm=gpt4_llm, num_questions_per_chunk=2,show
)

  0%|          | 0/3 [00:00<?, ?it/s]

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


 33%|███▎      | 1/3 [00:01<00:03,  1.98s/it]

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


 67%|██████▋   | 2/3 [00:06<00:03,  3.32s/it]

HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


100%|██████████| 3/3 [00:08<00:00,  2.97s/it]


Check the results fo the QA dataset:

In [119]:
qa_queries = qa_dataset.queries.values()

In [120]:
pprint.pprint(list(qa_queries)[4])

('Describe the initial interaction between Marie Curie and Albert Einstein at '
 'the first Solvay Conference in 1911. What were the key topics they discussed '
 'and how did they express their mutual respect and admiration for each '
 "other's work?")


Create MRR evaluator:

In [121]:
retriever_evaluator = RetrieverEvaluator.from_metric_names(
    ["mrr", "hit_rate"], retriever=query_eval_retriever
)

Test for the sample query

In [122]:
sample_id, sample_query = list(qa_dataset.queries.items())[0]
sample_expected = qa_dataset.relevant_docs[sample_id]

eval_result = retriever_evaluator.evaluate(sample_query, sample_expected)
print(eval_result)

HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
Query: "Discuss the impact of Einstein's General Theory of Relativity on modern physics and provide examples of its practical applications as mentioned in the text."
Metrics: {'mrr': 1.0, 'hit_rate': 1.0}



Evaluate for entire dataset

In [123]:
# try it out on an entire dataset
qa_dataset_eval_results = await retriever_evaluator.aevaluate_dataset(qa_dataset)

Retrying request to /embeddings in 0.488101 seconds
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [128]:
metric_dicts = []
for eval_result in qa_dataset_eval_results:
    metric_dict = eval_result.metric_vals_dict
    metric_dicts.append(metric_dict)

full_df = pd.DataFrame(metric_dicts)

hit_rate = full_df["hit_rate"].mean()
mrr = full_df["mrr"].mean()

metric_df = pd.DataFrame(
    { "Hit Rate": [hit_rate], "MRR": [mrr]}
)

metric_df



Unnamed: 0,Hit Rate,MRR
0,1.0,0.833333


As shown in the above, results are pretty good may be because I am using only three documents and my evaluation question is very straight where inferencing is very easy to demonstrate the concepts in here.

[^1]: [Generative AI for Cloud Solutions](https://learning.oreilly.com/library/view/generative-ai-for/9781835084786/){:target="_blank"}

[^2]: [Building Retrieval Augmented Generation (RAG) Applications with LlamaIndex: From Basic Components to Advanced RAG Systems](https://learning.oreilly.com/course/building-retrieval-augmented/0790145860415/){:target="_blank"}

[^3]: [30 Agents Every AI Engineer Must Build](https://learning.oreilly.com/library/view/30-agents-every/9781806109012/){:target="_blank"}

[^4]: [Building Data-Driven Applications with LlamaIndex](https://learning.oreilly.com/library/view/building-data-driven-applications/9781835089507/)

[^5]: [Effective Conversational AI](https://learning.oreilly.com/library/view/effective-conversational-ai/9781633436404/)

{:gtxt: .message color="green"}

{:rtxt: .message color="red"}