# 🧠 Dynamic Prompt-Aware RAG System using LlamaIndex

This project showcases a Retrieval-Augmented Generation (RAG) pipeline built with [LlamaIndex](https://www.llamaindex.ai/), enhanced with **dynamic prompt routing based on query tone or intent**. It demonstrates how a simple RAG setup can be evolved into an intelligent, user-aware system that responds in varying tones such as comparative, creative, elaborative, etc.

---

## 🔍 What is this project?

A document-based Q&A system that:
- Loads and indexes PDFs using embeddings
- Accepts user queries
- Dynamically detects the *tone* or *intent* of each query
- Routes it to an appropriate **custom prompt template**
- Performs RAG using the selected prompt and LLM
- Returns intelligent, context-aware answers

---

## 📈 Project Evolution

### ✅ Phase 1: Basic RAG with LlamaIndex
- Loaded documents from PDF
- Used `VectorStoreIndex` for retrieval
- Queried with `.query()` without custom prompt control
- Responses were generic and one-tone

### 🔥 Phase 2: Prompt-Augmented RAG
- Introduced `ResponseSynthesizer` to control prompt formatting
- Created custom `PromptTemplate`s for:
  - `compare`
  - `summarize`
  - `elaborate`
  - `creative`
  - `chat` (default fallback)

### 🚀 Phase 3: Dynamic Prompt Routing
- Built a lightweight **tone/intent classifier**
- Used LLM to classify user query type
- Dynamically selected the appropriate prompt template
- Created modular query flow: classify → route → synthesize → answer

---

## 🧠 Features

- 📄 PDF ingestion + embedding (Google GenAI or HuggingFace)
- 🔍 Semantic search using Chroma vector store
- 🧩 Custom response synthesis with prompt control
- 🔄 Tone-aware query classification via LLM
- 🎯 Dynamic RAG with prompt routing
- ✨ Clean, modular architecture for easy extension

---

## 🖼️ Prompt Types Used

| Prompt Type | Example Query |
|-------------|---------------|
| `compare`   | "Compare top-k and top-p sampling" |
| `summarize` | "List the key prompting techniques in the paper" |
| `elaborate` | "Explain chain-of-thought reasoning in detail" |
| `creative`  | "Describe prompt engineering using a cooking recipe analogy" |
| `chat`      | "What is prompt engineering?" |

---

## 📦 Tech Stack

- 🧠 **LlamaIndex** – RAG backbone
- ✍️ **Google GenAI** – LLM & embeddings (Gemini)
- 🧠 **PromptTemplate** + `ResponseSynthesizer` – Prompt control
- 📚 **Chroma** – Vector store backend
- 📄 **PyMuPDF** – PDF text parsing
- 🧪 **Python** – Language of choice

---

#Built with love, curiosity, and a passion for building better AI Apps
Thanks to [LlamaIndex](https://www.llamaindex.ai/) and open-source tools for making this powerful system possible.




In [None]:
!pip install llama-index llama-index-vector-stores-chroma llama-index-llms-huggingface-api llama-index-embeddings-huggingface -U -q

In [None]:
%pip install llama-index-llms-google-genai llama-index

In [None]:
%pip install llama-index-embeddings-google-genai

**Here I used the Google Gemini API for the LLM and the embeddings. You can use any combination of LLM and embedding model from the llamaindex's documentation**

In [2]:
import os
GOOGLE_API_KEY = ""
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

In [3]:
from llama_index.llms.google_genai import GoogleGenAI
llm = GoogleGenAI(
    model="gemini-2.0-flash",
   api_key=""
)

In [4]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.google_genai import GoogleGenAI
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding
from google.genai.types import EmbedContentConfig
# Load document, put the correct directory name here . here I am doing it only for 1 pdf .
reader = SimpleDirectoryReader(input_files=["Prompt_engineering.pdf"])
documents = reader.load_data()
print(f"Loaded {len(documents)} document(s).")

# Split into chunks
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
# you can change the batch size , model_name to different varieties that google embeddings provide ,
# go to https://ai.google.dev/gemini-api/docs/embeddings
embed_model = GoogleGenAIEmbedding(
    model_name="text-embedding-004",
    embed_batch_size=200,
    api_key=""
)
# Set up LLM and embedding model
Settings.llm = llm
Settings.embed_model = embed_model

# Create vector index
vector_index = VectorStoreIndex(nodes)

# Create query engine
query_engine = vector_index.as_query_engine()

Loaded 68 document(s).


In [13]:
vector_store = vector_index.vector_store

embedding_dict = vector_store.data.embedding_dict
node_dict = vector_store.data.text_id_to_ref_doc_id
print(f"Number of embeddings: {len(embedding_dict)}")
print(f"Number of node references: {len(node_dict)}")

Number of embeddings: 68
Number of node references: 68


**Here we test the RAG System with prompts, this is done without prompt augmentation, by default it is handled by Llamaindex, but later we will see how we can add custom prompts. we wont have much control over default version.**

In [15]:
response = query_engine.query("What is Prompt Engineering ? ")
print(response,end=' ')

Prompt engineering is an iterative process of crafting effective prompts that considers word-choice, style and tone, structure, and context. It is an input that the model uses to predict a specific output. Everyone can write a prompt.
 

In [16]:
response2 = query_engine.query("What is difference between top-k and top-p ?")
print(response2,end=' ')

Top-K sampling selects the top K most likely tokens from the model’s predicted distribution, while top-P sampling selects the top tokens whose cumulative probability does not exceed a certain value (P).
 

In [18]:
response3 = query_engine.query("What are the various Prompting techniques in the paper?")
print(response3,end=' ')

The prompting techniques mentioned in the paper are: System, contextual, role, and step-back prompting.
 

In [19]:
response4 = query_engine.query("What is the ReAct mentioned in the paper ? explain it briefly")
print(response4,end=' ')

ReAct is a prompting paradigm that enables LLMs to solve complex tasks by combining natural language reasoning with external tools, allowing the LLM to take actions like interacting with external APIs to retrieve information. It works by combining reasoning and acting into a thought-action loop. The LLM first reasons about the problem and generates a plan of action, then performs the actions in the plan and observes the results. The LLM then uses the observations to update its reasoning and generate a new plan of action, continuing until the LLM reaches a solution to the problem.
 

In [20]:
response5 = query_engine.query("what is json repair ? explain it")
print(response5,end=' ')

JSON repair is a tool, like the json-repair library on PyPI, that fixes incomplete or malformed JSON objects. It can automatically fix JSON outputs that are cut off due to token limits, which is helpful when working with LLM-generated JSON.
 

**The RAG System correctly identifies what the document is about and it will not answer those questions which are not related to it. This what is required and it works as obvious**

In [21]:
response6 = query_engine.query("what is meditation ? how it benefits us ?")
print(response6,end=' ')

This document is about prompt engineering and does not contain information about meditation.
 

In [22]:
response7 = query_engine.query("what is convolutional neural network(CNN) in deep learning ?")
print(response7,end=' ')

I am sorry, but this document does not contain information about convolutional neural networks.
 

In [25]:
response8 = query_engine.query("explain about the temperature parameter")
print(response8,end=' ')

Temperature regulates the amount of randomness employed when choosing tokens. Utilizing lower temperatures is suitable for prompts requiring a more predictable response, whereas higher temperatures can yield more varied or unexpected outcomes. A temperature of 0 corresponds to greedy decoding.
 

**From here onwards we will "AUGMENT" OUR CUSTOM PROMPTS AND HAVE MORE CONTROL OVER THE RAG SYSTEM. THIS FUNCTIONALITY IS PROVIDED BY VARIOUS LIBRARIES OF LLAMAINDEX. SO WE WILL SEE HOW TO DO IT**

In [29]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.core import PromptTemplate
from llama_index.core.response_synthesizers import ResponseMode

# we will define our own custom prompt template here
custom_prompt = PromptTemplate(
    "You are a helpful research assistant. Based on the following context, answer the user's question.\n\n"
    "----------------\n"
    "{context_str}\n"
    "----------------\n"
    "Question: {query_str}\n"
    "Answer:"
)

# Build a response synthesizer using your prompt from the llamaindex docs
# Reference : https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/
synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT,
    text_qa_template=custom_prompt
)

# Get retriever from vector index
retriever = vector_index.as_retriever(similarity_top_k=4)

#   Plug into query engine
custom_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer
)

#   Ask questions with prompt-injected RAG
response = custom_query_engine.query("What is the difference between top-k and top-p sampling?")
print(response)


Top-K sampling selects the top K most likely tokens from the model’s predicted distribution, while top-P sampling selects the top tokens whose cumulative probability does not exceed a certain value (P).



**The above custom_prompt template might have not made much sense to you. so lets take a bullet_prompt template and test out with it . You can see the results. It is as expected in bullets. So you see the injection of any type of prompt works as expected. Now lets play with it to improve our RAG system**

In [32]:
bullet_prompt = PromptTemplate(
    "Based on the context below, answer the question with a concise bullet-point summary.\n\n"
    "{context_str}\n"
    "Question: {query_str}\n\n"
    "Answer in bullets:"
)
various_prompts(bullet_prompt,"What is the difference between top-k and top-p sampling?")

Here's a summary of the differences between Top-K and Top-P sampling:

*   **Top-K:** Selects the K most likely tokens from the model's predicted distribution. Higher K values lead to more creative/varied output, while lower K values result in more restrictive/factual output. A K of 1 is equivalent to greedy decoding.
*   **Top-P:** Selects the top tokens whose cumulative probability does not exceed a certain value (P). P ranges from 0 (greedy decoding) to 1 (all tokens).
*   **Combined Use:** Experimentation is recommended to determine which method (or both together) produces the desired results.



**From here onwards we will start with the DYNAMIC AWARE ROUTING BASED RAG.
First we define various type of prompts, and test for each individually. we can see it is as expected for each type of prompt. If it was naive RAG, then it would not have perfomed well as we saw above. Now after testing each of these we ask the LLM to identify the tone and then the prompt will be choosen and the answer will be given. So this improves its response and a good performance is obtained**

In [33]:
elaborative_prompt = PromptTemplate(
    "You are a patient teacher. Use the following context to explain the user's question in simple and clear language.\n\n"
    "{context_str}\n"
    "Question: {query_str}\n"
    "Answer as if teaching a beginner:"
)

In [34]:
json_prompt = PromptTemplate(
    "Extract information based on the following context and user's query. Respond strictly in the following JSON format:\n"
    "{\n"
    "  \"question\": \"...\",\n"
    "  \"summary\": \"...\",\n"
    "  \"key_points\": [\"...\", \"...\", \"...\"]\n"
    "}\n\n"
    "Context:\n"
    "{context_str}\n"
    "Question: {query_str}"
)

In [35]:
compare_prompt = PromptTemplate(
    "Using the following context, compare the two items mentioned in the question. Highlight differences clearly.\n\n"
    "{context_str}\n"
    "Question: {query_str}\n\n"
    "Answer in comparative form:"
)

In [36]:
creative_prompt = PromptTemplate(
    "Explain the concept in the user's question using a real-world analogy or metaphor. Base your answer on the context below.\n\n"
    "{context_str}\n"
    "Question: {query_str}\n\n"
    "Creative analogy:"
)

In [37]:
chat_prompt = PromptTemplate(
    "You are a helpful assistant. Use the context to respond to the user's query in a friendly, conversational tone.\n\n"
    "{context_str}\n"
    "User: {query_str}\n"
    "Assistant:"
)

In [40]:
def various_prompts(template,user_query):
    synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT,
    text_qa_template=template
    )
    #  Get retriever from vector index
    retriever = vector_index.as_retriever(similarity_top_k=4)
    # Plug into query engine
    custom_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=synthesizer
    )
    # Ask questions with prompt-injected RAG
    response = custom_query_engine.query(user_query)
    print(response)

**Here you can see various examples of prompts in action. Later we dynamically route to the appropriate one because we dont need only one of them, we will need the right one according to the query of user and hence the Dynamic Routing and it leads to amazing results.**

In [41]:
various_prompts(elaborative_prompt,"Explain what temperature parameter means in language models")

Okay, let's break down what this document is saying about how Large Language Models (LLMs) generate text and how you can influence it.

Imagine an LLM is like a super-smart parrot that can predict what words should come next in a sentence. But instead of just knowing one word, it knows how likely *many* different words are to follow. It assigns a probability to each word in its "vocabulary."

**Here's the core idea:**

The document explains that LLMs don't just pick the *single most likely* word every time. Instead, they use a process called "sampling" to choose the next word based on these probabilities. You can control this sampling process using settings like:

*   **Temperature:** This controls how "random" or "creative" the LLM is.

    *   A **low temperature** (close to 0) makes the LLM pick the *most likely* words more often. This leads to more predictable and "safe" outputs. If two tokens have the same highest predicted probability, depending on how tiebreaking is implemented 

In [42]:
various_prompts(json_prompt,"What are the limitations of prompt engineering?")

```json
{
  "question": "What are the limitations of prompt engineering?",
  "summary": "Inadequate prompts can lead to ambiguous, inaccurate responses, and can hinder the model’s ability to provide meaningful output. LLMs aren’t perfect; the clearer your prompt text, the better it is for the LLM to predict the next likely text.",
  "key_points": [
    "Inadequate prompts can lead to ambiguous, inaccurate responses.",
    "Inadequate prompts can hinder the model’s ability to provide meaningful output.",
    "LLMs aren’t perfect; the clearer your prompt text, the better it is for the LLM to predict the next likely text."
  ]
}
```


In [43]:
various_prompts(compare_prompt,"What is the difference between ReAct and CoT ?")

**Tree of Thoughts (ToT) vs. ReAct (Reason & Act)**

Here's a comparison of Tree of Thoughts (ToT) and ReAct, highlighting their key differences based on the provided context:

*   **Reasoning Approach:**

    *   **ToT:** Explores multiple reasoning paths simultaneously by maintaining a tree of thoughts, where each thought is a coherent language sequence. It branches out from different nodes in the tree to explore different reasoning paths. It generalizes the concept of CoT prompting.
    *   **ReAct:** Combines reasoning and acting in a thought-action loop. The LLM reasons about the problem, generates a plan of action, performs the actions, observes the results, and updates its reasoning based on the observations.

*   **Action & External Tools:**

    *   **ToT:** The context doesn't explicitly mention the use of external tools or actions in the same way as ReAct.
    *   **ReAct:** Explicitly designed to use external tools (search, code interpreter, etc.) and perform actions (inter

In [44]:
various_prompts(json_prompt,"What is cost of apple macbook ?")

```json
{
  "question": "What is cost of apple macbook ?",
  "summary": "This document is about prompt engineering techniques and does not contain information about the cost of an Apple Macbook.",
  "key_points": []
}
```


In [46]:
various_prompts(creative_prompt,"Explain zero shot , few shot and one shot prompting ")

Imagine you're teaching a dog a new trick, like "fetch."

*   **Zero-shot prompting** is like telling the dog, "Okay, fetch!" without ever showing it what "fetch" means or giving any prior examples. You're hoping the dog somehow understands what you want based solely on the word itself.

*   **One-shot prompting** is like showing the dog *once* what you want it to do. You throw the ball, the dog brings it back, and you say, "Good, fetch!" Now the dog has one example to learn from.

*   **Few-shot prompting** is like showing the dog the "fetch" action multiple times. You throw the ball, the dog brings it back, you praise it, and repeat this several times. The dog sees the pattern and is more likely to understand and perform the trick correctly.



In [47]:
various_prompts(chat_prompt,"What’s the deal with chain-of-thought prompting?")

Chain of Thought (CoT) prompting is a technique that improves the reasoning capabilities of Large Language Models (LLMs) by having them generate intermediate reasoning steps. This helps the LLM come up with more accurate answers, especially for complex tasks that need some reasoning before responding.

Here's the deal with CoT:

*   **How it works:** Instead of just asking a question, you prompt the LLM to "think step by step" or "let's think step by step" to guide it to break down the problem.
*   **Benefits:**
    *   It's effective and works well with readily available LLMs, so you don't need to fine-tune them.
    *   It allows you to see the LLM's reasoning, which helps you understand how it arrived at the answer and identify any errors in its logic.
    *   It can improve the consistency of your prompts across different LLM versions.
*   **Downsides:** Because the LLM shows its reasoning, the responses can be longer, which means it could cost you more money and time.

There are a

**Here we first detect the type of tone of the user's query and then we proceed to choose the prompt template**

In [66]:
from llama_index.core.llms import ChatMessage, MessageRole
def detect_prompt_type(llm, user_query):
    classification_prompt = PromptTemplate(
        "You are a prompt classifier.\n"
        "Given a user's question, classify the tone or intent. "
        # this line below was added later, you may remove and test it and then later add it
        "If the user uses phrases like “explain like I’m 5”, “explain to a child”, “use a metaphor”, or “using a story” — classify as creative."
        "Choose one of the following:\n"
        "- compare\n"
        "- summarize\n"
        "- elaborate\n"
        "- creative\n"
        "- chat\n\n"
        "User question: {query_str}\n"
        "Classification:"
    )

    prompt = classification_prompt.format(query_str=user_query)
    message = ChatMessage(role=MessageRole.USER, content=prompt)
    response = llm.chat([message])
    return response.message.content.strip().lower()


In [52]:
def get_synthesizer_from_type(prompt_type):
    if prompt_type == "compare":
        template = compare_prompt
    elif prompt_type == "summarize":
        template = bullet_prompt
    elif prompt_type == "elaborate":
        template = elaborative_prompt
    elif prompt_type == "creative":
        template = creative_prompt
    else:
        template = chat_prompt  # default

    return get_response_synthesizer(
        response_mode=ResponseMode.COMPACT,
        text_qa_template=template
    )


In [67]:
def dynamic_rag_query(user_query, vector_index, llm):
    # Step 1: Detect tone/intent
    prompt_type = detect_prompt_type(llm, user_query)
    print(f"⤷ Detected prompt type: {prompt_type}")

    # Step 2: Build synthesizer
    synthesizer = get_synthesizer_from_type(prompt_type)

    # Step 3: Create query engine
    retriever = vector_index.as_retriever(similarity_top_k=4)
    query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=synthesizer
    )

    # Step 4: Query
    return query_engine.query(user_query)


**Here are the results and as you can see, its amazing how the results are. It detects the prompt type and then the appropriate response is generated. This is useful in many applications where we need to give the response for a particualr style of prompt. Including all the prompts in one big definition maybe costly as the context size will increase. So its better to give it like this. We address the limitation of naive RAG and take it to next level. The kind of responses are really great. This is a more intelligent aware RAG System that the user would like to interact with as compared to naive RAG**

In [56]:
response_dynamic = dynamic_rag_query("Can you compare top-k and top-p sampling techniques?", vector_index, llm)
print(response_dynamic)

⤷ Detected prompt type: compare
Top-K and Top-P sampling are both methods used in LLMs to control the randomness and diversity of generated text, but they differ in how they select the next token.

*   **Selection Criteria:** Top-K selects the top K most likely tokens from the model's predicted distribution, whereas Top-P selects the top tokens whose cumulative probability does not exceed a certain value (P).

*   **Parameter Range:** Top-K uses an integer value (K) to define the number of tokens to consider, while Top-P uses a probability value (P) ranging from 0 to 1.

*   **Output Behavior:** A higher Top-K leads to more creative and varied output, while a lower Top-K results in more restrictive and factual output. Top-P values closer to 1 include more tokens, increasing randomness, while values closer to 0 restrict the selection to the most probable tokens.

*   **Greedy Decoding Equivalence:** A Top-K of 1 is equivalent to greedy decoding, while a Top-P of 0 (or a very small value

In [59]:
response_dynamic_2 = dynamic_rag_query("what is prompt engineering in simple words ?", vector_index, llm)
print(response_dynamic_2)

⤷ Detected prompt type: summarize
*   Prompt engineering is crafting effective text inputs (prompts) for large language models (LLMs) to get accurate outputs.
*   It's an iterative process that involves optimizing word choice, style, structure, and context.
*   The goal is to guide the LLM to predict the right sequence of tokens and produce relevant results.
*   Automatic Prompt Engineering (APE) automates the prompt creation process by using a model to generate and evaluate prompts.



**Here we are seeing the detected type as "elaborate", it is because in the elaborate prompt we wrote "Answer as if teaching a beginner" , so maybe it detected it as elaborate. well this response is also good . but later we will see how to make it creative**

In [60]:
response_dynamic_2 = dynamic_rag_query("explain chain of thought prompting as if you are explaining it to a child", vector_index, llm)
print(response_dynamic_2)

⤷ Detected prompt type: elaborate




Okay, I'll explain Chain of Thought (CoT) prompting like I'm talking to a child.

Imagine you're trying to solve a puzzle, like figuring out how many cookies your friend has.

**Without Chain of Thought:** Someone just asks the computer, "How many cookies does my friend have?" The computer might guess and get it wrong!

**With Chain of Thought:**  Instead of just guessing, we teach the computer to think step-by-step, like this:

1.  **First, we tell the computer:** "My friend started with 5 cookies."
2.  **Then, we say:** "Then, they got 3 more cookies."
3.  **Finally, we ask:** "So, how many cookies do they have now?"

Now, the computer can think: "Okay, 5 cookies + 3 cookies = 8 cookies!"  It figures it out step-by-step and gets the right answer!

**So, Chain of Thought is like teaching the computer to think through a problem, one step at a time, instead of just guessing. This helps it get the right answer, especially when the problem is a little tricky!**

The document you provided 

In [64]:
response_dynamic_4 = dynamic_rag_query("Describe prompt engineering using a cooking recipe analogy.", vector_index, llm)
print(response_dynamic_4)

⤷ Detected prompt type: creative
Based on the provided context, prompt engineering is like crafting a detailed recipe for a chef (the language model).

*   **The Prompt is the Recipe:** Just like a recipe provides instructions, ingredients, and cooking methods, a prompt provides the language model with the necessary information to generate a specific output.
*   **Inadequate Prompts are like Vague Recipes:** If a recipe is poorly written or missing key information, the resulting dish might be a disaster. Similarly, inadequate prompts can lead to ambiguous, inaccurate, or unhelpful responses from the language model.
*   **Iterative Process is like Refining a Recipe:** A chef might need to experiment and adjust a recipe multiple times to perfect it. Similarly, prompt engineering is an iterative process where you refine your prompts based on the model's output to achieve the desired results.
*   **Examples in Prompts are like Pictures in a Recipe Book:** Including examples in your prompt 

**Now here we see the type is creative, so it was all in the prompt. So it is never that in the first trial you will get the correct responses. you may have to change the prompt. Give it more explanation. That's where the magic of prompt lies. It is really interesting to see how this happens. So experiment, change the prompt and see yourself, how the results improve and you get better results**

In [69]:
response_dynamic_5 = dynamic_rag_query("explain to a child what is chain of thought prompting ?", vector_index, llm)
print(response_dynamic_5)

⤷ Detected prompt type: creative




Imagine you're trying to build a really cool Lego castle.

**Without Chain of Thought:** You just dump all the Lego bricks on the table and try to guess where each one goes. You might get lucky and stick a few together, but it's mostly random and probably won't look like a castle. This is like asking the computer a complicated question without giving it any steps to follow. It might give you an answer, but it's probably wrong or doesn't make sense.

**With Chain of Thought:** Now, imagine you have a set of instructions. First, you build the base. Then, you add the walls. Next, you put on the towers, and finally, you add the flags. Each step helps you get closer to the finished castle. Chain of Thought is like giving the computer those instructions. You tell it to "think step-by-step" and break down the problem into smaller, easier-to-solve pieces. This way, it's much more likely to build the correct "castle" (give the right answer).



**The thing we changed in the custom prompt template was just this we added the line ```If the user uses phrases like “explain like I’m 5”, “explain to a child”, “use a metaphor”, or “using a story” — classify as creative.```. So this is the magic of prompt. Try changing and see what works best for your use case and build a intelligent system !!!**

```
 classification_prompt = PromptTemplate(
        "You are a prompt classifier.\n"
        "Given a user's question, classify the tone or intent. "
        "If the user uses phrases like “explain like I’m 5”, “explain to a child”, “use a metaphor”, or “using a story” — classify as creative."
        "Choose one of the following:\n"
        "- compare\n"
        "- summarize\n"
        "- elaborate\n"
        "- creative\n"
        "- chat\n\n"
        "User question: {query_str}\n"
        "Classification:"
    )

```



In [70]:
response_dynamic_6 = dynamic_rag_query("what is the difference between Tree of Thoughts and  ReAct(reason and act) ?. give the difference in bullet points", vector_index, llm)
print(response_dynamic_6)

⤷ Detected prompt type: compare
Here's a comparison of Tree of Thoughts (ToT) and ReAct (Reason & Act) based on the provided context, highlighting their key differences:

**Tree of Thoughts (ToT)**

*   **Reasoning Approach:** Explores multiple reasoning paths simultaneously by maintaining a tree of thoughts. Each thought represents a coherent language sequence as an intermediate step.
*   **Generalization of CoT:** Generalizes the concept of Chain of Thought (CoT) prompting.
*   **Exploration Focus:** Well-suited for complex tasks that require exploration of different reasoning paths.
*   **Output:** Generates multiple Chains of Thoughts.
*   **Consistency:** Aims to improve accuracy by considering multiple perspectives and selecting the most consistent answer.

**ReAct (Reason & Act)**

*   **Reasoning Approach:** Combines natural language reasoning with external tools (search, code interpreter, etc.) in a thought-action loop.
*   **Action-Oriented:** Enables LLMs to perform actions,

**See the scores of Top-4 nodes. You can see here which chunk(s) were picked to give the answer**

In [72]:
print("Source nodes:")
print("=" * 70)

for i, node in enumerate(response_dynamic_6.source_nodes):
    print(f"Node {i+1}:")
    print(f"Score: {node.score}")
    print(f"Text: {node.text}")
    print(f"Metadata: {node.metadata}")
    print("-" * 60)

Source nodes:
Node 1:
Score: 0.7206183639768341
Text: Prompt Engineering
February 2025
37
This approach makes ToT particularly well-suited for complex tasks that require exploration. It 
works by maintaining a tree of thoughts, where each thought represents a coherent language 
sequence that serves as an intermediate step toward solving a problem. The model can then 
explore different reasoning paths by branching out from different nodes in the tree. 
There’s a great notebook, which goes into a bit more detail showing The Tree of Thought 
(ToT) which is based on the paper ‘Large Language Model Guided Tree-of-Thought’.9
ReAct (reason & act)
Reason and act (ReAct) [10]13 prompting is a paradigm for enabling LLMs to solve complex 
tasks using natural language reasoning combined with external tools (search, code 
interpreter etc.) allowing the LLM to perform certain actions, such as interacting with external 
APIs to retrieve information which is a first step towards agent modeling.
ReAct 