# 2 - Agentic RAG with Bedrock KB and LlamaIndex SubQuestionQueryEngine

### Installation

In [32]:
%pip install llama-index
%pip install llama-index-llms-bedrock
%pip install llama-index-embeddings-bedrock
%pip install llama-index-retrievers-bedrock


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new 

### Setup and imports

In [35]:
from llama_index.core import Settings
from llama_index.core.query_engine import RetrieverQueryEngine

from llama_index.retrievers.bedrock import AmazonKnowledgeBasesRetriever
from llama_index.llms.bedrock import Bedrock
from llama_index.embeddings.bedrock import BedrockEmbedding

In [37]:
llm = Bedrock(model = "anthropic.claude-v2")
embed_model = BedrockEmbedding(model = "amazon.titan-embed-text-v1")

In [39]:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.chunk_size = 512

### Download data

In [40]:
!mkdir -p './data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf' -O './data/10k/lyft_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf' -O './data/10k/uber_2021.pdf'

--2024-04-18 13:29:33--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/lyft_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1440303 (1.4M) [application/octet-stream]
Saving to: ‘./data/10k/lyft_2021.pdf’


2024-04-18 13:29:33 (13.8 MB/s) - ‘./data/10k/lyft_2021.pdf’ saved [1440303/1440303]

--2024-04-18 13:29:33--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10k/uber_2021.pdf
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response...

In [3]:
knowledge_base_id = "PO6XCEKGI1"
top_k = 4
search_mode = "HYBRID"

In [4]:
apple_fpath = "data/apple_2019.pdf"
apple_retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": top_k,
            "overrideSearchType": search_mode,
            "filter": {"equals": {"key": "file_path", "value": apple_fpath}},
        }
    },
)
apple_engine = RetrieverQueryEngine(retriever=apple_retriever)

In [5]:
tesla_fpath = "data/tesla_2019.pdf"
tesla_retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=knowledge_base_id,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": top_k,
            "overrideSearchType": search_mode,
            "filter": {"equals": {"key": "file_path", "value": tesla_fpath}},
        }
    },
)
tesla_engine = RetrieverQueryEngine(retriever=tesla_retriever)

In [6]:
from llama_index.core.tools import QueryEngineTool, ToolMetadata

In [24]:
tools = [
    QueryEngineTool(
        query_engine=apple_engine,
        metadata=ToolMetadata(
            name="apple_2019",
            description="10K filing for Apple 2019",
        ),
    ),
    QueryEngineTool(
        query_engine=tesla_engine,
        metadata=ToolMetadata(
            name="tesla_2019",
            description="10K filing for Tesla 2019",
        ),
    ),
]

In [25]:
from llama_index.core.query_engine import SubQuestionQueryEngine

In [26]:
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=tools,
)

In [27]:
response = await query_engine.aquery('compare risks areas for apple and tesla in 2019')

Generated 2 sub questions.
[1;3;38;2;237;90;200m[apple_2019] Q: What were the key risk areas identified in the 10K filing for Apple in 2019?
[0m[1;3;38;2;90;149;237m[tesla_2019] Q: What were the key risk areas identified in the 10K filing for Tesla in 2019?
[0m[1;3;38;2;237;90;200m[apple_2019] A: Key risk areas identified in the 10K filing for Apple in 2019 included competitive markets, credit risk associated with derivative instruments, potential legal and other claims including intellectual property rights infringement, and exposure to complex and changing laws and regulations worldwide.
[0m[1;3;38;2;90;149;237m[tesla_2019] A: The key risk areas identified in the 10K filing for Tesla in 2019 included automotive revenue recognition controls for sales with resale value guarantees or buyback options, sales return reserves, management's estimation of future market values, historical experience evaluation, economic incentives for customers, warranty reserves for new and used vehicl

In [30]:
from llama_index.core.response.pprint_utils import pprint_response

In [29]:
pprint_response(response, show_source=True)

Final Response: The key risk areas identified in the 10K filing for
Apple in 2019 included competitive markets, credit risk associated
with derivative instruments, potential legal and other claims
including intellectual property rights infringement, and exposure to
complex and changing laws and regulations worldwide. On the other
hand, the key risk areas identified in the 10K filing for Tesla in
2019 included automotive revenue recognition controls for sales with
resale value guarantees or buyback options, sales return reserves,
management's estimation of future market values, historical experience
evaluation, economic incentives for customers, warranty reserves for
new and used vehicles, and interest rate risk related to borrowings
with floating rates.
______________________________________________________________________
Source Node 1/10
Node ID: aef1d97a-4824-415a-81e1-a55f62041c40
Similarity: None
Text: Sub question: What were the key risk areas identified in the 10K
filing for App