#  Making Queries to the RAG Model
In this Python notebook, we will be making use of our RAG model as well as an LLM to ask questions regarding our uploaded documents. If all goes to plan, our RAG model (powered by Atlas Vector Search) should be able to retrieve the portions of the document that's relevant to our query and feed that information to the LLM, thus enabling it to correctly answer our query. 

## Basic Setup
Same as with the earlier Python notebook we used, we'll start with some basic setup steps in the next two code cells. Don't worry if your device does not have a GPU, you will still be able to proceed with the Quest.

In [1]:
# Check if GPU is enabled
import os
import torch

# To disable GPU and experiment, uncomment the following line
# Normally, you would want to use GPU, if one is available
# os.environ["CUDA_VISIBLE_DEVICES"]=""

print ("using CUDA/GPU: ", torch.cuda.is_available())

for i in range(torch.cuda.device_count()):
   print("device ", i , torch.cuda.get_device_properties(i).name)

using CUDA/GPU:  False


In [2]:
# Setup logging. To see more logging, set the level to DEBUG

import sys
import logging

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.basicConfig(stream=sys.stdout, level=logging.WARNING)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Step 1: Load Settings

In [3]:
# Load settings from .env file
from dotenv import find_dotenv, dotenv_values

# Change system path to root direcotry
sys.path.insert(0, '../')

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# For debugging purposes
# print (config)

ATLAS_URI = config.get('ATLAS_URI')

if not ATLAS_URI:
    raise Exception ("'ATLAS_URI' is not set.  Please set it above to continue...")
else:
    print("ATLAS_URI Connection string found:", ATLAS_URI)

## Only uncomment this if you are using OpenAI for embeddings
# OPENAI_API_KEY = config.get("OPENAI_API_KEY")
# if not OPENAI_API_KEY:
#     raise Exception ("'OPENAI_API_KEY' is not set. Please set it above to continue...")
# else:
#     print("ATLAS_URI Connection string found:", ATLAS_URI)

ATLAS_URI Connection string found: mongodb+srv://yongtaufoo:mucjOuDXLysFfEGA@cluster0.ds8hjdi.mongodb.net/?retryWrites=true&w=majority


In [4]:
# Define our variables
DB_NAME = 'rag1'
COLLECTION_NAME = '10k'
INDEX_NAME = 'idx_embedding'

In [5]:
# LlamaIndex will download embeddings models as needed
# Set llamaindex cache dir to ../cache dir here (Default is system tmp)
# This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath('../'), 'cache')

In [6]:
from pymongo import MongoClient

mongodb_client = MongoClient(ATLAS_URI)

print ("Atlas client initialized")

Atlas client initialized


## Step 2: Setup Embedding Model

Now, we'll need to set up an embedding model to help us generate embeddings for the user query. 

Same as in the previous Python notebook in this Quest, we'll have the option to either use OpenAI models or open source HuggingFace models. We'll be going with the second approach here.

### 2.1: Option A: OpenAI Embeddings

This option utilizes an OpenAI embedding model. As such, you will need to have an OpenAI API key (as defined in env variable `OPENAI_API_KEY`).

In [None]:
## Only uncomment this if you are using OpenAI for Embeddings
# from llama_index import  OpenAIEmbedding
# embed_model = OpenAIEmbedding()

### 2.2: Option B: Using Custom Embeddings

This option utilizes a HuggingFace embedding model. Note that this embedding model must be the same as the embedding model you used in the previous Python notebook when you were generating the embeddings for the documents. Unless you changed it, it should be `BAAI/bge-small-en-v1.5` in both Python notebooks.

In [7]:
# from llama_index.embeddings import HuggingFaceEmbedding
# Uncomment the line above and comment away the line below if you face an import error
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

  from .autonotebook import tqdm as notebook_tqdm


## Step 3: Setup LLM
Then, we'll need to setup an LLM to be able to take the results from the Atlas Vector Search and respond to the user query. You'll have two choices here - **Option A** is to use an LLM from OpenAI (this option requires you to have a OpenAI API key with credits) and **Option B** is to use an LLM from Llama via API (this open is free to use). 

For users that have an existing OpenAI API key with credits, you're encouraged to use Option A. For those who do not, please go for Option B.

### 3.1: Option A: OpenAI LLM
This option utilizes an LLM from OpenAI. If you have an existing OpenAI key, feel free to use the code cell below.

In [10]:
## Only uncomment this if you are using OpenAI for your LLM
# import openai
# from llama_index.llms.openai import OpenAI

# openai.api_key = config.get("OPENAI_API_KEY")

# llm = OpenAI(model="gpt-3.5-turbo")

# completion_response = llm.complete("To infinity, and")
# print(completion_response)

 beyond!

The Toy Story franchise has been a beloved part of pop culture for over two decades, and it's not slowing down anytime soon. The latest installment, Toy Story 4, is set to hit theaters this summer, and it's already generating buzz.

The movie follows the adventures of Woody, Buzz, and the gang as they embark on a new adventure with a new toy, Forky. The trailer for the movie has been released, and it's already getting fans excited for the film.

One of the most exciting things about Toy Story 4 is the return of some beloved characters. Bo Peep, who was last seen in Toy Story 2, is back and looking better than ever. She's now a modern, independent woman, and her new look has been getting a lot of attention.

Another exciting addition to the movie is the introduction of new characters, including Forky, who is voiced by Tony Hale. Forky is a spork with a popsicle stick for a handle, and he's not exactly thrilled about being a toy.

The trailer for Toy Story 4 has been viewed ove

### 3.2: Option B: Using Llama LLM API
This option utilizes Llama API for your LLM. This is a free service provided by Llama, hence no payment is needed. To start, **head to [https://www.llama-api.com/](https://www.llama-api.com/)** to create an account and **obtain an API key** (refer to image below).

![image.png](https://github.com/jameslimjy/2023-10-Puppy-Raffle/assets/56946413/f3f9549b-d401-4975-bdff-bdea3b8e2cc4)

Then, copy your API key and use it to **replace the placeholder value** in the code cell below.

In [None]:
# Run this cell to install llama-index-llms-llama-api
!pip install llama-index-llms-llama-api

In [None]:
from llama_index.llms.llama_api import LlamaAPI

# replace placeholder value below, e.g. LL-TPLM7PkKGXnvZPHEofx761PJwFItBp1234567894X2dIhhOP57F4HZwVx
api_key = "REPLACE-WITH-YOUR-LLAMA-API-TOKEN"
llm = LlamaAPI(api_key=api_key)

resp = llm.complete("Paul Graham is ")
print(resp)

Awesome! Now that we've initialized both our embedding model as well as our LLM, let's combine them together into a unified interface `service_context` that we can use later on.

In [11]:
# from llama_index import ServiceContext
# Uncomment the line above and comment away the line below if you face an import error
from llama_index.core import ServiceContext

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)

## Step 4: Connect Llama-Index and MongoDB Atlas

This is where everything comes together, we orchestrate the combination of MongoDB Atlas as our vector storage and the `service_context` we just defined. This system we've just set up will allow us to ask the LLM questions regarding our uploaded documents; Atlas Vector Search will then locate portions of the document that most closely matches our query to supplement the LLM's response, thereby providing us with a more accurate response. 

In [12]:
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch

# from llama_index.storage.storage_context import StorageContext
# Uncomment the line above and comment away the line below if you face an import error
from llama_index.core import StorageContext

# from llama_index.indices.vector_store.base import VectorStoreIndex
# Uncomment the line above and comment away the line below if you face an import error
from llama_index.core import VectorStoreIndex

vector_store = MongoDBAtlasVectorSearch(mongodb_client = mongodb_client,
                                 db_name = DB_NAME, collection_name = COLLECTION_NAME,
                                 index_name  = 'idx_embedding',
                                 ## the following columns are set to default values
                                 # embedding_key = 'embedding', text_key = 'text', metadata_= 'metadata',
                                 )

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

## Step 5: Query Data / Ask Questions

Now, time for the fun part - asking it some questions! Let's start with asking our model 2 questions where the answers can be found in our documents.

In [16]:
from IPython.display import Markdown
from pprint import pprint

response = index.as_query_engine().query("What was Uber's revenue?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

<b>

Uber's revenue for the years ended December 31, 2019, 2020, and 2021, respectively, was $13 billion, $11 billion, and $17.5 billion. This revenue is disaggregated by offering and geographical region, with Mobility revenue being the largest revenue stream, followed by Delivery and Freight revenue. Subscription fees are recognized ratably over the life of the pass, and revenue from New Mobility offerings and products is accounted for as an operating lease as defined under ASC 842.</b>

Response(response='\n'
                  '\n'
                  "Uber's revenue for the years ended December 31, 2019, 2020, "
                  'and 2021, respectively, was $13 billion, $11 billion, and '
                  '$17.5 billion. This revenue is disaggregated by offering '
                  'and geographical region, with Mobility revenue being the '
                  'largest revenue stream, followed by Delivery and Freight '
                  'revenue. Subscription fees are recognized ratably over the '
                  'life of the pass, and revenue from New Mobility offerings '
                  'and products is accounted for as an operating lease as '
                  'defined under ASC 842.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='211fae60-23c4-4fcf-85c0-63223c90ab8a', embedding=None, metadata={'page_label': '54', 'file_name': 'uber_2021.pdf', 'file_path': '../data/10k/uber_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1880483, 'creation_d

In [17]:
response = index.as_query_engine().query("How much money did Lyft make in 2020?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

<b>

Lyft made $2,364,681,000 in revenue in 2020, according to the provided financial statement.</b>

Response(response='\n'
                  '\n'
                  'Lyft made $2,364,681,000 in revenue in 2020, according to '
                  'the provided financial statement.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='b36b2e4e-63f4-4038-bb77-10d9bba9354f', embedding=None, metadata={'page_label': '58', 'file_name': 'lyft_2021.pdf', 'file_path': '../data/10k/lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'creation_date': '2024-01-27', 'last_modified_date': '2024-01-27', 'last_accessed_date': '2024-01-27'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='e94ec178-006c-4804-b6ad-2239d97689fe', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '58', 'file_n

As you can see from the 2 questions we asked above, our model was able to search for portions within the uploaded documents that most closely matched our queries and responded with the answers. Now, let's try asking it a question where the answer can't be found in the uploaded documents.

In [27]:
# The answer to this question doesn't exist in the Lyft_10k filing!
# Let's see what we get back
response = index.as_query_engine().query("How much money Lyft made in 2018?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

<b>

The given context information is from Lyft's 2021 annual report, and it provides financial statements for the years 2020 and 2019. However, it does not provide information for 2018. Therefore, the answer to the query is not available from the given context information.</b>

Response(response='\n'
                  '\n'
                  "The given context information is from Lyft's 2021 annual "
                  'report, and it provides financial statements for the years '
                  '2020 and 2019. However, it does not provide information for '
                  '2018. Therefore, the answer to the query is not available '
                  'from the given context information.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='dcb49394-d650-418b-920d-eb1c7140335c', embedding=None, metadata={'page_label': '79', 'file_name': 'lyft_2021.pdf', 'file_path': '../data/10k/lyft_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1440303, 'creation_date': '2024-01-27', 'last_modified_date': '2024-01-27', 'last_accessed_date': '2024-01-27'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creati

In [29]:
# The answer to this question doesn't exist in the Uber_10k filing either
# Let's see what we get back
response = index.as_query_engine().query("How many employees did Uber have in 2015?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

<b>

The given context information does not provide information about the number of employees Uber had in 2015. The information provided is about Uber's partnership with Arizona State University, their operating and reportable segments, and their financial partnerships offerings.</b>

Response(response='\n'
                  '\n'
                  'The given context information does not provide information '
                  'about the number of employees Uber had in 2015. The '
                  "information provided is about Uber's partnership with "
                  'Arizona State University, their operating and reportable '
                  'segments, and their financial partnerships offerings.',
         source_nodes=[   NodeWithScore(node=TextNode(id_='fc065d3f-3914-451d-be0a-9adf5cda4b2c', embedding=None, metadata={'page_label': '12', 'file_name': 'uber_2021.pdf', 'file_path': '../data/10k/uber_2021.pdf', 'file_type': 'application/pdf', 'file_size': 1880483, 'creation_date': '2024-01-27', 'last_modified_date': '2024-01-27', 'last_accessed_date': '2024-01-27'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', '

Good job following till the end! Please **head back to the Quest page on StackUp now** and refer to the instructions for how you can prepare your deliverable for this Quest.