# Effective AI Applications: Combine Both RAG & Long Context Window LLMs

## RAG Application Using LlamaIndex & Long Context Model Jamba-Instruct

### Install LlamaIndex & AI21

In [6]:
!pip install llama-index
!pip install -U ai21
!pip install llama-index-llms-ai21

Collecting llama-index
  Downloading llama_index-0.11.8-py3-none-any.whl.metadata (11 kB)
Collecting llama-index-agent-openai<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_agent_openai-0.3.1-py3-none-any.whl.metadata (677 bytes)
Collecting llama-index-cli<0.4.0,>=0.3.1 (from llama-index)
  Downloading llama_index_cli-0.3.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.12.0,>=0.11.8 (from llama-index)
  Downloading llama_index_core-0.11.8-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.3.0,>=0.2.4 (from llama-index)
  Downloading llama_index_embeddings_openai-0.2.4-py3-none-any.whl.metadata (635 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.3.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.3.0-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama-index)
  Downloading llama_index_legacy-0.9.48.post3-py3-none-any.whl.metadata (8.5 kB)
Collecting 

### Import the libraries & dependencies 

In [9]:
import os
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.llama_pack import download_llama_pack
from llama_index.core import VectorStoreIndex
from llama_index.core import SimpleDirectoryReader
from llama_index.llms.ai21 import AI21

In [11]:
os.environ['OPENAI_API_KEY'] = 'Add Your OpneAI API Key' # For embeddings
os.environ['AI21_API_KEY'] = 'Add Your AI21 API Key' # For the generation

### Setup jamba instruct as the LLM

In [12]:
llm = AI21(
    model='jamba-instruct',
    temperature=0,
    max_tokens=2000
)

### Get the data - download 10k forms from AMZN from the last five years

In [13]:
os.mkdir("data")
!wget 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf' -O 'data/amazon_2023.pdf'
!wget 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf' -O 'data/amazon_2022.pdf'
!wget 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/f965e5c3-fded-45d3-bbdb-f750f156dcc9.pdf' -O 'data/amazon_2021.pdf'
!wget 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/336d8745-ea82-40a5-9acc-1a89df23d0f3.pdf' -O 'data/amazon_2020.pdf'
!wget 'https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/4d39f579-19d8-4119-b087-ee618abf82d6.pdf' -O 'data/amazon_2019.pdf'

--2024-09-10 16:44:38--  https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/c7c14359-36fa-40c3-b3ca-5bf7f3fa0b96.pdf
Resolving d18rn0p25nwr6d.cloudfront.net (d18rn0p25nwr6d.cloudfront.net)... 99.84.178.109, 99.84.178.77, 99.84.178.193, ...
Connecting to d18rn0p25nwr6d.cloudfront.net (d18rn0p25nwr6d.cloudfront.net)|99.84.178.109|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 800598 (782K) [application/pdf]
Saving to: ‘data/amazon_2023.pdf’


2024-09-10 16:45:42 (39.5 MB/s) - ‘data/amazon_2023.pdf’ saved [800598/800598]

--2024-09-10 16:45:46--  https://d18rn0p25nwr6d.cloudfront.net/CIK-0001018724/d2fde7ee-05f7-419d-9ce8-186de4c96e25.pdf
Resolving d18rn0p25nwr6d.cloudfront.net (d18rn0p25nwr6d.cloudfront.net)... 99.84.178.109, 99.84.178.193, 99.84.178.124, ...
Connecting to d18rn0p25nwr6d.cloudfront.net (d18rn0p25nwr6d.cloudfront.net)|99.84.178.109|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 712683 (696K) [application/pdf]
Saving 

### Setup the index 

In [14]:
file_list = [os.path.join("data", f) for f in os.listdir("data")]

amzn_10k_docs = SimpleDirectoryReader(input_files=file_list).load_data()
index = VectorStoreIndex.from_documents(documents=amzn_10k_docs)

# Build a query engine
default_query_engine = index.as_query_engine(llm)

### Let’s enter a query to make sure our RAG system is working.

In [17]:
answer = default_query_engine.query("What was the company's revenue in 2021?")
print(answer.response)

The company's revenue in 2021 was $469,822 million.


### We can see there’s a problem, as our RAG system isn't answering correctly

In [18]:
answer = default_query_engine.query("What was the company's revenue in 2023?")
print(answer.response)

The company's revenue in 2023 was not explicitly mentioned in the provided context. However, it is mentioned that the company's operating income increased to $36.9 billion in 2023, compared to $12.2 billion in 2022.


That’s because the default amount of retrieved chunks is rather small. This makes the whole system prone to errors and failing to capture information that is indeed located in the documents.

### Let’s build a new query engine on top of our existing index and try the query that failed before.
We can increase the number of retrieved chunks from just a few (default value) to 100 and vastly improve the entire RAG system.

In [19]:
# Large amount of chunks in the retrieval process
extended_query_engine = index.as_query_engine(llm,
                                              similarity_top_k=100)

answer = extended_query_engine.query("What was the company's revenue in 2023?")
print(answer.response)

The company's revenue in 2023 was $574.785 million.


In [21]:
answer = default_query_engine.query("Was there a stock split in the last five years?")
print(answer.response)

No, there was no stock split in the last five years.


In [22]:
answer = extended_query_engine.query("Was there a stock split in the last five years?")
print(answer.response)

Yes, there was a stock split in the last five years. On May 27, 2022, Amazon.com, Inc. effected a 20-for-1 stock split of its common stock.


This way, by combining both RAG and long cntext window LLMs, we can build highly effective AI applications.