In [None]:
from IPython.display import Markdown, display
def in_md(md_txt):
    md_formated_txt = f"--- Response -------<br/>{md_txt}<br/>-------------------"
    display(Markdown(md_formated_txt))

## Connecting to Amazon Bedrock

Amazon Bedrock is AWS's fully managed service that provides access to foundation models from leading AI companies through a single API. This notebook demonstrates how to integrate Bedrock's LLM and embedding models with LlamaIndex for RAG applications.

**Key Benefits:**
- **Enterprise-ready**: Built-in security and governance
- **Multiple models**: Access Claude, Llama, Titan, and Cohere models
- **Managed service**: No infrastructure to maintain
- **Pay-per-use**: Cost-effective scaling

This integration allows you to replace OpenAI models with Bedrock equivalents while maintaining the same RAG functionality demonstrated in the main Science Community chat notebook.

### Discovering Available Bedrock Embedding Models

The following code demonstrates how to programmatically discover which embedding models are available through Amazon Bedrock. This is useful for understanding your options before selecting a specific model for your RAG application.

The `BedrockEmbedding.list_supported_models()` method returns a structured list of embedding models organized by provider (Amazon, Cohere, etc.), helping you choose the most appropriate model for your use case.

In [2]:
from llama_index.embeddings.bedrock import BedrockEmbedding
import json

supported_models = BedrockEmbedding.list_supported_models()
print(json.dumps(supported_models, indent=2))

{
  "amazon": [
    "amazon.titan-embed-text-v1",
    "amazon.titan-embed-text-v2:0",
    "amazon.titan-embed-g1-text-02"
  ],
  "cohere": [
    "cohere.embed-english-v3",
    "cohere.embed-multilingual-v3"
  ]
}


In [3]:
from llama_index.core import get_response_synthesizer
from llama_index.llms.bedrock_converse import BedrockConverse
from llama_index.embeddings.bedrock import BedrockEmbedding
from llama_index.core import Settings

llm = BedrockConverse(model="amazon.nova-pro-v1:0",
            temperature=0,
            max_tokens=3000
)
Settings.llm = llm
# Basic embedding model setup
embed_model = BedrockEmbedding(
    model_name="amazon.titan-embed-text-v2:0"
)
Settings.embed_model = embed_model

### Setting Up Bedrock LLM and Embedding Models

This code initializes the core Bedrock components for our RAG system:

- **BedrockConverse LLM**: Uses Amazon's Nova Pro model for text generation with controlled temperature (0) and token limit (3000)
- **BedrockEmbedding**: Uses Amazon's Titan v2 embedding model to convert text into vector representations
- **LlamaIndex Settings**: Configures these models globally so all LlamaIndex components use Bedrock instead of OpenAI

In [4]:
embed_model.get_text_embedding("hello world")

[-0.02060231938958168,
 0.05661262199282646,
 0.007168131414800882,
 -0.009022098034620285,
 0.039273928850889206,
 -0.02533888816833496,
 0.03170996159315109,
 -0.012241295538842678,
 0.01955483667552471,
 0.013856952078640461,
 -0.006783066783100367,
 -0.009889166802167892,
 0.0004500277864281088,
 -0.08411110192537308,
 0.021822139620780945,
 -0.01758911833167076,
 -0.034111905843019485,
 -0.012357083149254322,
 0.014474267140030861,
 -0.02697339467704296,
 -0.006888084579259157,
 0.05007998272776604,
 -0.007631286978721619,
 -0.01914822869002819,
 -0.04326527565717697,
 -0.032690126448869705,
 0.0022619199007749557,
 -0.02874455787241459,
 0.013076050207018852,
 -0.0004469984269235283,
 0.0779891163110733,
 -0.006882698740810156,
 0.0029081825632601976,
 -0.016296593472361565,
 0.011040322482585907,
 0.05673918128013611,
 0.029598835855722427,
 -0.013738470152020454,
 0.030342038720846176,
 -0.02437487803399563,
 0.05102480202913284,
 0.0008549518533982337,
 -0.004173780791461468,


### Testing the Embedding Model

Above code tests our Bedrock embedding model by converting the simple text "hello world" into a vector representation. The output will be a list of numerical values (embeddings) that represent the semantic meaning of the text, confirming that our Bedrock embedding setup is working correctly.

### Loading Documents

This code loads all documents from the `./data` directory using LlamaIndex's `SimpleDirectoryReader`. This will read the science community documents (Einstein, Marie Curie, friends, and meeting scenarios) and prepare them for processing in our RAG pipeline.

In [5]:
from llama_index.core import SimpleDirectoryReader
documents =SimpleDirectoryReader(
    input_dir="./data").load_data()

### Text Chunking and Node Creation

This code breaks down the loaded documents into smaller, manageable chunks using `TokenTextSplitter`. Each chunk is limited to 1024 tokens with a 20-token overlap between chunks to maintain context continuity. The result is a collection of `TextNode` objects that will be used for embedding and retrieval.

In [6]:
from llama_index.core.node_parser import TokenTextSplitter


splittter = TokenTextSplitter(
    chunk_size= 1024,
    chunk_overlap=20
)
nodes = splittter.get_nodes_from_documents(documents)

### Inspecting a Text Node

This code displays the structure and content of the first text node to help us understand how the document chunking worked. You'll see the node's metadata (file path, creation date, etc.), relationships to other nodes, and the actual text content from the Einstein biography.

In [7]:
import pprint

pprint.pprint(nodes[0], indent=2)

TextNode(id_='b2bd275e-19c8-419b-a142-cf5c45bf69ba', embedding=None, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='64439a70-c6f0-4fec-b986-419aa96bb2c3', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/Users/ojitha/GitHub/rag_llamaindex/Science_Community/data/AlbertEinstein.txt', 'file_name': 'AlbertEinstein.txt', 'file_type': 'text/plain', 'file_size': 8235, 'creation_date': '2025-09-12', 'last_modified_date': '2025-09-12'}, hash='ebbeab8f1f

### Creating Vector Index

This code creates a `VectorStoreIndex` from our text nodes. The index will automatically use our Bedrock Titan embedding model (configured earlier) to convert each text chunk into vector embeddings and store them for efficient similarity-based retrieval during queries.

In [8]:
from llama_index.core import VectorStoreIndex


index = VectorStoreIndex(nodes)

In [9]:
q_engine = index.as_query_engine(similarity_top_k = 5) 
response = q_engine.query("who are the scientists in these documents?")
response.response

'The scientists mentioned in the documents include Marie Curie, Albert Einstein, Paul Langevin, Max Planck, Henri Poincaré, Ernest Rutherford, Hendrik Lorentz, Niels Bohr, Walther Nernst, Marcel Brillouin, Ernest Solvay, Emil Warburg, Jean Baptiste Perrin, Wilhelm Wien, Robert Goldschmidt, Heinrich Rubens, Arnold Sommerfeld, Frederick Lindemann, Maurice de Broglie, Martin Knudsen, Friedrich Hasenöhrl, Georges Hostelet, Edouard Herzen, James Hopwood Jeans, Heike Kamerlingh Onnes, and Pierre Curie.'

### Testing the RAG System

This code creates a query engine from our vector index and tests it with the question "who are the scientists in these documents?". The engine will retrieve the top 5 most similar text chunks and use our Bedrock Nova Pro LLM to generate a comprehensive answer based on the retrieved context.

In [10]:
in_md(response.response)

--- Response -------<br/>The scientists mentioned in the documents include Marie Curie, Albert Einstein, Paul Langevin, Max Planck, Henri Poincaré, Ernest Rutherford, Hendrik Lorentz, Niels Bohr, Walther Nernst, Marcel Brillouin, Ernest Solvay, Emil Warburg, Jean Baptiste Perrin, Wilhelm Wien, Robert Goldschmidt, Heinrich Rubens, Arnold Sommerfeld, Frederick Lindemann, Maurice de Broglie, Martin Knudsen, Friedrich Hasenöhrl, Georges Hostelet, Edouard Herzen, James Hopwood Jeans, Heike Kamerlingh Onnes, and Pierre Curie.<br/>-------------------

### Checking Retrieved Source Nodes

This code displays the number of source nodes (text chunks) that were retrieved and used to generate the response. This helps verify that our query engine is working correctly and retrieving the expected number of relevant document chunks (should be 5 as configured).

In [11]:
len(response.source_nodes)

5

### Creating Summary Index

This code creates a `SummaryIndex` as an alternative to the vector-based approach. Unlike vector search which retrieves similar chunks, the summary index processes all document nodes sequentially to generate comprehensive responses. This is useful for queries that require a complete overview of all documents rather than specific similarity matches.

In [12]:
from llama_index.core import SummaryIndex


summary_index = SummaryIndex(nodes)
s_engine = summary_index.as_query_engine()

In [13]:
summary = s_engine.query("Povide the summary")
in_md(summary.response)

--- Response -------<br/>In October 1911, at the first Solvay Conference in Brussels, Albert Einstein and Marie Curie met for the first time. Einstein, impressed by Curie's work on radioactivity, praised her dedication and methodical approach. Curie, in turn, admired Einstein's ability to visualize complex physical phenomena. They bonded over their shared experiences as outsiders in the scientific community—Curie as a woman and Einstein as a Jew—and discussed both physics and the growing tensions in Europe. Their mutual respect and intellectual kinship endured through correspondence and subsequent meetings, forming a lasting connection between two revolutionary minds.<br/>-------------------

## LlamaIndex Chat Engines Summary

This Bedrock integration demonstrates the same powerful chat engine capabilities available in LlamaIndex. While this notebook focuses on basic RAG implementation, LlamaIndex supports multiple sophisticated chat engine modes for enhanced conversational AI applications:

### 🔄 **Condense Question Chat Engine**

**Architecture**: *Stateless conversational RAG pattern*

The Condense Question chat engine transforms multi-turn conversations into standalone queries by condensing the entire chat history and current question into a single, comprehensive query before performing retrieval. 

**Key Features:**
- Analyzes conversation thread to understand contextual references
- Reformulates current question to be self-contained
- Includes all necessary context from previous exchanges
- Ideal for maintaining conversation context without explicit memory management

**Use Case**: Best for applications where conversation history needs to be preserved but explicit memory management is not required.

---

### 💾 **Context Chat Engine**

**Architecture**: *Explicit memory management through ChatMemoryBuffer*

The Context chat engine leverages explicit memory management to maintain conversational state across multiple interactions, combining retrieved documents with chat memory.

**Key Features:**
- Preserves conversation history up to specified token limit (e.g., 3900 tokens)
- Retrieves relevant context based on current query
- References previous exchanges while grounding responses in knowledge base
- Uses system prompts for domain-specific context guidance

**Use Case**: Particularly effective for specialized applications where conversation continuity and expert knowledge synthesis are essential for coherent multi-turn interactions.

---

### 🔀 **Condense Plus Context Chat Engine**

**Architecture**: *Hybrid approach combining best features of both engines*

The Condense Plus Context chat engine represents the most sophisticated approach, implementing a dual-phase process that first condenses conversation history into standalone questions, then retrieves relevant context while maintaining conversation memory.

**Key Features:**
- **Phase 1**: Condenses conversation history (condense question mode)
- **Phase 2**: Retrieves relevant context based on reformulated query
- Maintains conversation memory through ChatMemoryBuffer
- Provides domain-specific guidance through context prompts
- Verbose mode reveals the condensation process

**Use Case**: Ideal for complex conversational RAG applications requiring both high retrieval precision and coherent multi-turn dialogue capabilities.

---

### 🎯 **Implementation with Bedrock**

All these chat engine modes can be implemented using AWS Bedrock models as demonstrated in this notebook:

```python
# Example: Context Chat Engine with Bedrock
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)
chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt="Your domain-specific instructions here"
)
```

### 📊 **Choosing the Right Engine**

- **Condense Question**: Use when you need conversation context without memory overhead
- **Context**: Use when explicit memory management and domain expertise are crucial  
- **Condense Plus Context**: Use for the most sophisticated conversational experiences requiring optimal retrieval and dialogue quality

For detailed implementation examples and performance comparisons of these chat engines, refer to the main Science Community Chat notebook: `2025-09-12-LlamaIndexScienceCommunityChat.ipynb`.