# ⚠️ Important Notice

This notebook (and repository) is deprecated.

For the latest python examples, please refer to the `llama-cloud-services` repository examples: 
https://github.com/run-llama/llama_cloud_services/tree/main/examples

---

# LlamaParse + LlamaCloud + AWS Bedrock Cookbook

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/10k_apple_tesla/demo_file_retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we demonstrate demonstrate how to build a RAG application using LlamaParse, LlamaCloud, embedding models and LLMs supported on AWS Bedrock.

Here are the steps involved:

1. Install the packages and setup API keys. 
2. Download Apple-10K 2023 SEC filing.
3. Parse the documents using LlamaParse.
4. Create a pipeline/ Index on LlamaCloud.
5. Upload the document to Index with `amazon.titan-embed-text-v1` embedding.
6. Connect to the Index.
7. Initiate LLM.
8. Create `query_engine`.
9. Query the index using `query_engine`.

[LlamaCloud](https://docs.cloud.llamaindex.ai/), [LlamaParse](https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/)

## Installation and Setup API Keys

### Installation

Here we install following packages:

1. `llama-index`: Core package for OSS orchestration.
2. `llama-index-llms-bedrock-converse`: To utilize Bedrock LLMs.
3. `llama-index-indices-managed-llama-cloud`: For managing indices on LlamaCloud.
4. `llama-parse`: For parsing documents efficiently.

In [None]:
!pip install llama-index
!pip install llama-index-llms-bedrock-converse
!pip install llama-index-indices-managed-llama-cloud
!pip install llama-parse

### Setup API Keys

Here we setup `LLAMA_CLOUD_API_KEY` for managing the index on LlamaCloud.

In [1]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

import os
# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "<LLAMACLOUD API KEY>" # Get your API key from https://cloud.llamaindex.ai/

## Download files

Here we download `Apple-10K` 2023 SEC filings and use it for our demonstration.

In [6]:
# download Apple 
!mkdir -p data
!wget "https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf" -O data/apple_2023.pdf

--2024-11-28 16:05:55--  https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 181.41.142.154
Connecting to s2.q4cdn.com (s2.q4cdn.com)|181.41.142.154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 714094 (697K) [application/pdf]
Saving to: ‘data/apple_2023.pdf’


2024-11-28 16:05:57 (4.37 MB/s) - ‘data/apple_2023.pdf’ saved [714094/714094]



## Parse the document.

Here we use `LlamaParse` to parse the downloaded `Apple` 10K-SEC filings. 

In [7]:
from llama_parse import LlamaParse

parser = LlamaParse(
    result_type="markdown",  # "markdown" and "text" are available
    num_workers=4,  # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",  # Optionally you can define a language, default=en
)

# sync
documents = parser.load_data("data/apple_2023.pdf")

Started parsing the file under job_id d6c339e3-9014-4139-afce-48c1ffbaa098


## Create Pipeline/ Index on LlamaCloud

### LlamaCloud Client

We will connect to `LlamaCloud` client.

In [None]:
from llama_cloud.client import LlamaCloud

client = LlamaCloud(token=os.environ["LLAMA_CLOUD_API_KEY"])

### Create LlamaCloud Pipeline/ Index

We need `embedding_config` and `transform_config` to create a pipeline.

`embedding_config` - Sets up the embedding model details required to configure and create the pipeline.

`transform_config` - Configures the `chunk_size` and `chunk_overlap` parameters required for the RAG application.

We will use the `amazon.titan-embed-text-v1` embedding model available on AWS Bedrock. To access it, you will need the following credentials: `region_name`, `aws_access_key_id`, and `aws_secret_access_key`.

In [8]:
# Transformation auto config
transform_config = {
    'mode': 'auto',
    'config': {
        'chunk_size': 1024, # editable
        'chunk_overlap': 20 # editable
    }
}

embedding_config = {
      'type': 'BEDROCK_EMBEDDING',
      'component': {
          'region_name': '<REGION NAME>',
          'aws_access_key_id': '<AWS ACCESS KEY ID>',
          'aws_secret_access_key': '<AWS SECRET ACCESS KEY>',
          'model': 'amazon.titan-embed-text-v1',
      }
}

In [9]:
pipeline = {
  'name': 'apple_2023', # pipeline/ index name
  'transform_config': transform_config,
  'embedding_config': embedding_config,
  'data_sink_id': None
}

pipeline = client.pipelines.upsert_pipeline(request=pipeline)


## Upload the Documents

Here we use the parsed document and upload it to the LlamaCloud

In [10]:
from llama_cloud.types import CloudDocumentCreate

text = "\n\n".join([doc.text for doc in documents])

documents = [
CloudDocumentCreate(
text=text,
metadata={"filename": "apple_2023.pdf", "file_path": "data/apple_2023.pdf"},
)
]

documents = client.pipelines.create_batch_pipeline_documents(pipeline.id, request=documents)


### Check status if its uploaded

In [13]:
status = client.pipelines.get_pipeline_status(pipeline.id)
print(status.status)

ManagedIngestionStatus.SUCCESS


## Connect to the Index

We will connect to the LlamaCloud pipeline or index that has been created. You can get the `project_name` and `organization_id` from your LlamaCloud index.

In [14]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
  name="apple_2023", 
  project_name="<PROJECT NAME>",
  organization_id="<ORGANIZATION ID>",
)

## Define the LLM

Here, we will initiate the supported LLM on AWS Bedrock LLM. You can refer to the [AWS Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-supported.html) to explore the available LLMs.

To access it, you will need the following credentials: `region_name`, `aws_access_key_id`, `aws_secret_access_key` and `aws_session_token`.

In [19]:
from llama_index.llms.bedrock_converse import BedrockConverse

llm = BedrockConverse(model="<MODEL ID>", 
                      region_name="<REGION NAME>", 
                      aws_access_key_id="<AWS ACCESS KEY ID>", 
                      aws_secret_access_key="<AWS SECRET ACCESS KEY>", 
                      aws_session_token="<AWS SESSION TOKEN>")

## Create QueryEngine

In [26]:
query_engine = index.as_query_engine(llm=llm)

## Querying

We will test out with a query using the created `QueryEngine`

In [27]:
query = "what is the revenue of Apple in 2023?"
response = query_engine.query(query)

print(response)

The revenue of Apple in 2023 was $383.3 billion.
