# 1. Setup

## 1.1 Installing Libraries

Reference: [Llama Index Installation and Setup](https://docs.llamaindex.ai/en/stable/getting_started/installation/)

In [None]:
!pip install python-dotenv llama-index chromadb llama-index-vector-stores-chroma llama-index-retrievers-bm25 EbookLib html2text langchain-text-splitters

## 1.2 Importing Libraries

In [1]:
import chromadb

from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, PromptTemplate, get_response_synthesizer
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.llms.openai import OpenAI
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.core.prompts import SelectorPromptTemplate

from ebooklib import epub
import uuid
import os
from pathlib import Path
from dotenv import load_dotenv
import nest_asyncio
from enum import Enum

nest_asyncio.apply()

## 1.3 Importing Environment Variables

In [2]:
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## 1.4 Setting up Embedding Model

In [3]:
embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)

## 1.5 Setting up LLM

In [4]:
llm = OpenAI(api_key=OPENAI_API_KEY, model_name="gpt-4o-mini", temperature=0.1)

# 2. Setting up IndexedVectorStore

In [4]:
# The name "IndexedVectorStore" emphasizes that the class handles both the vector store and the index

class IndexedVectorStore:
    def __init__(self):
        self.db = chromadb.PersistentClient(path="./db")
        self.chroma_collection = self.db.get_or_create_collection("transcription_project")
        self.vector_store = ChromaVectorStore(chroma_collection=self.chroma_collection)
        self.index = VectorStoreIndex.from_vector_store(
            self.vector_store,
            embed_model=embed_model,
        )

    def add_documents(self, documents: list) -> None:
        # Add the documents to the LlamaIndex and persist them
        for document in documents:
            self.index.insert(document)
        self.index.storage_context.persist(persist_dir="./db")

In [6]:
vectorstore = IndexedVectorStore()

# 3. Loading Data from Directory using `SimpleDirectoryReader`

Reference: [Loaders](https://docs.llamaindex.ai/en/stable/understanding/loading/loading/)

Extracting Metadata Reference: [SimpleDirectoryReader](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/)

We can specify a function that will read each file and extract metadata that gets attached to the resulting Document object for each file by passing the function as `file_metadata`

In [7]:
def extract_epub_metadata(book_path: str) -> dict:
    book_path = Path(book_path)
    if not book_path.exists():
        raise FileNotFoundError(f"EPUB file not found at path: {book_path}")
    book = epub.read_epub(str(book_path))

    return {
        "id": f"epub-{uuid.uuid4().hex}",
        "title": book.get_metadata("DC", "title")[0][0].rstrip(".epub") if book.get_metadata("DC", "title") else "N/A",
        "author": book.get_metadata("DC", "creator")[0][0] if book.get_metadata("DC", "creator") else "",
        "language": book.get_metadata("DC", "language")[0][0] if book.get_metadata("DC", "language") else "",
        "description": book.get_metadata("DC", "description")[0][0] if book.get_metadata("DC", "description") else "",
        "type": "epub",
        "embeddings": "openaiembeddings"
    }

In [8]:
documents = SimpleDirectoryReader(input_dir="./data", file_metadata=extract_epub_metadata).load_data()

  for root_file in tree.findall('//xmlns:rootfile[@media-type]', namespaces={'xmlns': NAMESPACES['CONTAINERNS']}):


In [9]:
print(f"Total Documents: {len(documents)}")

Total Documents: 1


In [10]:
print(documents[0].metadata)

{'id': 'epub-24542662e8fa47f2b48246bcd9c6c2e9', 'title': "Theological Instructions (Amuzish-e Aqa'id)", 'author': 'Muhammad Taqi Misbah Yazdi', 'language': 'en', 'description': '', 'type': 'epub', 'embeddings': 'openaiembeddings'}


Loading a new book

In [15]:
new_book = SimpleDirectoryReader(input_files=["./data/give_and_take.epub"], file_metadata=extract_epub_metadata).load_data()
print(f"Metadata of first element: {new_book[0].metadata}")

# This way, we can load a new book and can use the same VectorStore object to add the new book to the index

Metadata of first element: {'id': 'epub-6d7d7179a80e414c9c18e6715cae457f', 'title': 'Give and Tak', 'author': 'Unknown', 'language': 'en', 'description': '', 'type': 'epub'}


# 4. Transforming

After the data is loaded, you then need to process and transform your data before putting it into a storage system. These transformations include chunking, extracting metadata, and embedding each chunk. This is necessary to make sure that the data can be retrieved, and used optimally by the LLM.

An `IngestionPipeline` uses a concept of Transformations that are applied to input data. These Transformations are applied to your input data

Reference: [IngestionPipeline](https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/)

In [11]:
chunk_size = 512
overlap_percentage = 0.25
chunk_overlap = int(chunk_size * overlap_percentage)

print(f"Chunk Size: {chunk_size}, Overlap Percentage: {overlap_percentage}, Chunk Overlap: {chunk_overlap}")

Chunk Size: 512, Overlap Percentage: 0.25, Chunk Overlap: 128


In [12]:
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap, paragraph_separator="\n\n\n"),
        embed_model # OpenAIEmbedding
    ],
    vector_store=vectorstore.vector_store,
)

In [13]:
documents = pipeline.run(documents=documents)

In [14]:
print(f"Total Documents: {len(documents)}")

Total Documents: 516


In [15]:
for i, doc in enumerate(documents[:10]):
    print(f"Document {i + 1} Length: {len(doc.text)}")

Document 1 Length: 1957
Document 2 Length: 2062
Document 3 Length: 2055
Document 4 Length: 2015
Document 5 Length: 2221
Document 6 Length: 2143
Document 7 Length: 1927
Document 8 Length: 1398
Document 9 Length: 1524
Document 10 Length: 1553


Function to load and add a new book to vectorstore

In [18]:
def add_book(book_path: str):
    print(f"Loading book from path: {book_path}")
    new_book = SimpleDirectoryReader(input_files=[book_path], file_metadata=extract_epub_metadata).load_data()
    print(f"Loaded book with metadata: {new_book[0].metadata}")
    new_book = pipeline.run(documents=new_book)
    print("Book added successfully!")

In [19]:
add_book("./data/give_and_take.epub")

Loading book from path: ./data/give_and_take.epub


  for root_file in tree.findall('//xmlns:rootfile[@media-type]', namespaces={'xmlns': NAMESPACES['CONTAINERNS']}):


Loaded book with metadata: {'id': 'epub-b09eb4f758f54495a29b6b23763dffb3', 'title': 'Give and Tak', 'author': 'Unknown', 'language': 'en', 'description': '', 'type': 'epub'}
Book added successfully!


# 5. Performing Vector Search

In [16]:
query_gen_prompt = """You are an AI language model assistant specializing in query expansion. Your task is to generate {num_queries} diverse versions of the given user question. These variations will be used to retrieve relevant documents from a vector database, helping to overcome limitations of distance-based similarity search.

Original question: {query}

Instructions:
1. Create {num_queries} unique variations of the original question.
2. Ensure each variation maintains the core intent of the original question.
3. Use different phrasings, synonyms, or perspectives for each variation.
4. Consider potential context or implications not explicitly stated in the original question.
5. Avoid introducing new topics or drastically changing the meaning of the question.

Please provide your {num_queries} question variations, each on a new line:
"""

In [17]:
top_n = 5
num_queries = 5
question = "Why is there only one God?"

vector_retriever = vectorstore.index.as_retriever(similarity_top_k=top_n)

In [18]:
retriever = QueryFusionRetriever(
    [vector_retriever],
    similarity_top_k=top_n,
    num_queries=num_queries,  # set this to 1 to disable query generation
    mode="reciprocal_rerank",
    use_async=True,
    verbose=True,
    query_gen_prompt=query_gen_prompt,
)

In [19]:
# it will genereate n - 1 queries since the original query is also included
nodes_with_scores = retriever.retrieve(question)

Generated queries:
1. What is the rationale behind the belief in a singular deity?
2. How do monotheistic religions justify the existence of a single God?
3. What are the reasons for the concept of a sole divine being in various faith traditions?
4. Can you explain the philosophical basis for the idea of a solitary God in religious teachings?


In [20]:
# Printing first document
print("------- BOOK INFO -------")
print(f"Book Title: {nodes_with_scores[0].metadata['title']}")
print(f"Book ID   : {nodes_with_scores[0].metadata['id']}")
print(f"Author    : {nodes_with_scores[0].metadata['author']}")

print("\n------- TEXT -------")
print(nodes_with_scores[0].text)

------- BOOK INFO -------
Book Title: Theological Instructions (Amuzish-e Aqa'id)
Book ID   : epub-24542662e8fa47f2b48246bcd9c6c2e9
Author    : Muhammad Taqi Misbah Yazdi

------- TEXT -------
There is no power except by Allah!” (Holy
Qur’an,[18:39](/printepub/book/export/html/123734#quran_ref_189052)).** _

### D. The Two Important Results Achieved

The result of the unity of Divine action is that nothing other than God
deserves worship, because as we have indicated before, a being does not
deserve to be worshipped by just being a creator or a lord. In other words,
Divinity _(uluhiyyah)_ is the necessary condition of lordship and creatorship.

From another angle, the result of monotheism in the latter meaning is that the
entirety of human reliance must be upon God, and in all of works He must be
trusted and solely from Him help must be requested. Man’s fear and hope ought
to be from Him, and when the sources for the completion of needs are out of
reach, one must not despair, because G

# 7. Generate Answer

In [21]:
answer_template = """You are a knowledgeable AI assistant tasked with answering questions based on the provided context. Your goal is to provide a comprehensive, accurate, and well-structured response using Chain-of-Thought reasoning.

Context:
{context_str}

Question: {query_str}

Instructions:
1. Carefully analyze the given context and question.
2. Use Chain-of-Thought reasoning to break down your answer into clear steps:
   a. First, identify the key components of the question, such as sub-problems that need to be explained before an answer can be derived
   b. Then, for each component, explain your thought process as you analyze the relevant information from the context.
   c. Show how you're connecting different pieces of information to form your conclusion.
3. Provide a detailed answer using only the information from the context.
4. If the context doesn't contain enough information to fully answer the question, state this clearly and explain why.
5. Organize your response with appropriate headings and subheadings for clarity.
6. Use bullet points or numbered lists where applicable to improve readability.
7. If relevant, include brief examples or analogies to illustrate key points.
8. After your detailed Chain-of-Thought reasoning, summarize your main points at the end of the response.
9. At the end, list all the contexts used in your reasoning. After your response, add a "References" section where you list the full contexts that you used arrive at your answer. Provide as much detail as available from each context (e.g., book title, author, full text of the relevant contexts. For video sources, include the url to the video.

For the references, use the format:

# References:
(for each context:)
## Context Id: title
Context excerpt (print as it is)

Please format your entire response in markdown for optimal readability.
"""

answer_prompt = PromptTemplate(answer_template)
answer_prompt_sel = SelectorPromptTemplate(answer_prompt)

## Response Modes


In [22]:
class ResponseMode(str, Enum):
    """Response modes of the response builder (and synthesizer)."""

    REFINE = "refine"
    """
    Refine is an iterative way of generating a response.
    We first use the context in the first node, along with the query, to generate an \
    initial answer.
    We then pass this answer, the query, and the context of the second node as input \
    into a “refine prompt” to generate a refined answer. We refine through N-1 nodes, \
    where N is the total number of nodes.
    """

    COMPACT = "compact"
    """
    Compact and refine mode first combine text chunks into larger consolidated chunks \
    that more fully utilize the available context window, then refine answers \
    across them.
    This mode is faster than refine since we make fewer calls to the LLM.
    """

    SIMPLE_SUMMARIZE = "simple_summarize"
    """
    Merge all text chunks into one, and make a LLM call.
    This will fail if the merged text chunk exceeds the context window size.
    """

    TREE_SUMMARIZE = "tree_summarize"
    """
    Build a tree index over the set of candidate nodes, with a summary prompt seeded \
    with the query.
    The tree is built in a bottoms-up fashion, and in the end the root node is \
    returned as the response
    """

    GENERATION = "generation"
    """Ignore context, just use LLM to generate a response."""

    NO_TEXT = "no_text"
    """Return the retrieved context nodes, without synthesizing a final response."""

    CONTEXT_ONLY = "context_only"
    """Returns a concatenated string of all text chunks."""

    ACCUMULATE = "accumulate"
    """Synthesize a response for each text chunk, and then return the concatenation."""

    COMPACT_ACCUMULATE = "compact_accumulate"
    """
    Compact and accumulate mode first combine text chunks into larger consolidated \
    chunks that more fully utilize the available context window, then accumulate \
    answers for each of them and finally return the concatenation.
    This mode is faster than accumulate since we make fewer calls to the LLM.
    """

In [23]:
def generate_answer(question, documents, answer_prompt, response_mode, verbose):
    response_synthesizer = get_response_synthesizer(response_mode=response_mode, llm=llm, text_qa_template=answer_prompt, verbose=verbose)

    response = response_synthesizer.synthesize(
        question, nodes=documents
    )

    return response

`REFINE`:

Refine is an iterative way of generating a response. We first use the context in the first node, along with the query, to generate an initial answer. We then pass this answer, the query, and the context of the second node as input into a “refine prompt” to generate a refined answer. We refine through N-1 nodes, where N is the total number of nodes.

In [24]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.REFINE, verbose=True)
print(answer_md)

> Refine context: id: epub-24542662e8fa47f2b48246bcd9c6c2e9
title...
> Refine context: id: epub-24542662e8fa47f2b48246bcd9c6c2e9
title...
> Refine context: id: epub-24542662e8fa47f2b48246bcd9c6c2e9
title...
> Refine context: id: epub-24542662e8fa47f2b48246bcd9c6c2e9
title...
Seeking intercession from saints contradicts the concept of monotheism. Monotheism emphasizes exclusive reliance on God for help and guidance. Seeking intercession from saints may undermine the belief in the unique authority of God and the tranquility of the soul derived from monotheistic beliefs.


`COMPACT`:

Compact and refine mode first combine text chunks into larger consolidated chunks that more fully utilize the available context window, then refine answers across them. This mode is faster than refine since we make fewer calls to the LLM.

In [25]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.COMPACT, verbose=True)
print(answer_md)

# Answer:

## Analyzing the Question:
The question asks why there is only one God. To answer this question, we need to break it down into several components:
1. Understanding the concept of monotheism and the rejection of polytheism.
2. Exploring the arguments presented in the context that support the idea of one God.
3. Examining the implications of having multiple gods or lords in the context of creation and administration of the universe.

## Chain-of-Thought Reasoning:

### 1. Understanding Monotheism and Rejection of Polytheism:
- Monotheism refers to the belief in the oneness of God and the rejection of multiple gods.
- Polytheism, on the other hand, involves the belief in multiple gods or lords who are considered independent entities.

### 2. Arguments for One God in the Context:
- The context presents arguments against polytheism by demonstrating the impossibility of having multiple gods creating and administering the universe.
- It highlights that the unity of Divine action le

`SIMPLE_SUMMARIZE`:

Merge all text chunks into one, and make a LLM call. This will fail if the merged text chunk exceeds the context window size.

In [26]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.SIMPLE_SUMMARIZE, verbose=True)
print(answer_md)

# Answer:

## Analyzing the Question:
To address the question of why there is only one God, we need to delve into the theological and philosophical arguments presented in the provided context. The key components to consider are:
1. The concept of monotheism and the rejection of polytheism.
2. Arguments for the oneness of God and the refutation of multiple gods.
3. The relationship between creatorship and lordship in monotheistic belief.

## Chain-of-Thought Reasoning:

### 1. Monotheism vs. Polytheism:
- **Monotheism**: 
  - Monotheism entails the belief in the unity of God and the rejection of plurality and multiplicity outside God's essence.
  - It opposes polytheism, which involves the belief in two or more independent gods.
- **Polytheism**:
  - Polytheistic beliefs involve recognizing other gods who administer the world independently, alongside the Creator.

### 2. Arguments for the Oneness of God:
- **Unity of Divine Action**:
  - The unity of Divine action implies that only God 

`TREE_SUMMARIZE`:

Build a tree index over the set of candidate nodes, with a summary prompt seeded with the query. The tree is built in a bottoms-up fashion, and in the end the root node is returned as the response

In [27]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.TREE_SUMMARIZE, verbose=True)
print(answer_md)

1 text chunks after repacking
There is only one God because the unity of Divine action leads to the conclusion that nothing other than God deserves worship. Additionally, monotheism dictates that human reliance must solely be upon God, and all help must be sought from Him. The interconnectedness and order in the universe also indicate that there can only be one ultimate source of creation and lordship, making the concept of multiple gods creating and administering the universe impossible.


`GENERATION`:

Ignore context, just use LLM to generate a response.

In [28]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.GENERATION, verbose=True)
print(answer_md)

There are a variety of beliefs and interpretations regarding the concept of God in different religions and belief systems. In monotheistic religions such as Christianity, Islam, and Judaism, there is the belief in one supreme, all-powerful deity who is the creator and ruler of the universe. This belief is based on the idea that there is only one ultimate source of power, wisdom, and authority.

In these religions, the belief in one God is seen as a fundamental truth that is central to their teachings and practices. It is believed that having multiple gods would lead to confusion, conflict, and a lack of unity in the universe. Additionally, the concept of one God is often associated with the idea of unity, oneness, and wholeness.

Ultimately, the belief in one God is a foundational principle in monotheistic religions, and is seen as a way to understand and connect with the divine in a singular and unified manner.


`NO_TEXT`:

Return the retrieved context nodes, without synthesizing a final response.

In [29]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.NO_TEXT, verbose=True)
print(answer_md)

None


`CONTEXT_ONLY`:

Returns a concatenated string of all text chunks.

In [30]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.CONTEXT_ONLY, verbose=True)
print(answer_md)

id: epub-24542662e8fa47f2b48246bcd9c6c2e9
title: Theological Instructions (Amuzish-e Aqa'id)
author: Muhammad Taqi Misbah Yazdi
language: en
description: 
type: epub
embeddings: openaiembeddings

There is no power except by Allah!” (Holy
Qur’an,[18:39](/printepub/book/export/html/123734#quran_ref_189052)).** _

### D. The Two Important Results Achieved

The result of the unity of Divine action is that nothing other than God
deserves worship, because as we have indicated before, a being does not
deserve to be worshipped by just being a creator or a lord. In other words,
Divinity _(uluhiyyah)_ is the necessary condition of lordship and creatorship.

From another angle, the result of monotheism in the latter meaning is that the
entirety of human reliance must be upon God, and in all of works He must be
trusted and solely from Him help must be requested. Man’s fear and hope ought
to be from Him, and when the sources for the completion of needs are out of
reach, one must not despair, becaus

`ACCUMULATE`:

Synthesize a response for each text chunk, and then return the concatenation.

In [31]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.ACCUMULATE, verbose=True)
print(answer_md)

Response 1: # Answer:

## Key Components of the Question:
1. Why is there only one God?
2. How does seeking help exclusively from God relate to the concept of monotheism and intercession from saints?

## Chain-of-Thought Reasoning:

### Why is there only one God?
1. **Unity of Divine Action**: 
   - The unity of Divine action implies that nothing other than God deserves worship.
   - Divinity is the necessary condition of lordship and creatorship.
   - Monotheism emphasizes that all reliance should be on God, and He should be trusted in all works.

2. **Special Divine Authority and Tranquil Soul**:
   - Living under monotheistic circumstances results in enjoying a special Divine authority and having a tranquil soul.
   - The friends of Allah will have no fear nor grieve, indicating a sense of security and peace.

3. **Recitation in Muslim Prayers**:
   - Muslims recite the verse "You [alone] do we worship, and You [alone] do we turn for help" at least ten times a day.
   - This reinfor

`COMPACT_ACCUMULATE`:

Compact and accumulate mode first combine text chunks into larger consolidated chunks that more fully utilize the available context window, then accumulate answers for each of them and finally return the concatenation. This mode is faster than accumulate since we make fewer calls to the LLM.

In [32]:
answer_md = generate_answer(question=question, documents=nodes_with_scores, answer_prompt=answer_prompt_sel, response_mode=ResponseMode.COMPACT_ACCUMULATE, verbose=True)
print(answer_md)

Response 1: # Answer:

## Analyzing the Question:
The question asks why there is only one God. To answer this, we need to delve into the provided context, which discusses the concept of monotheism, the negation of plurality, and the rejection of polytheistic beliefs. We will break down the answer into key components based on the information provided.

### Key Components:
1. **Monotheism and Oneness of God**
2. **Negation of Plurality**
3. **Polytheistic Beliefs**

## Chain-of-Thought Reasoning:

### 1. Monotheism and Oneness of God:
- The context emphasizes the belief in the unity of God and the rejection of multiple gods.
- Monotheism entails believing in the oneness of God and negating plurality and multiplicity in His essence.
- This belief is in opposition to polytheism, which believes in multiple independent gods.

### 2. Negation of Plurality:
- The concept of monotheism rejects the idea of multiple gods who are independent and separate from each other.
- It asserts that there is