#  Making Queries to the RAG Model
In this Python notebook, we will be making use of our RAG model as well as an LLM to ask questions regarding our uploaded documents. If all goes to plan, our RAG model (powered by Atlas Vector Search) should be able to retrieve the portions of the document that's relevant to our query and feed that information to the LLM, thus enabling it to correctly answer our query. 

## Basic Setup
Same as with the earlier Python notebook we used, we'll start with some basic setup steps in the next two code cells. Don't worry if your device does not have a GPU, you will still be able to proceed with the Quest.

In [None]:
# Check if GPU is enabled
import os
import torch

# To disable GPU and experiment, uncomment the following line
# Normally, you would want to use GPU, if one is available
# os.environ["CUDA_VISIBLE_DEVICES"]=""

print ("using CUDA/GPU: ", torch.cuda.is_available())

for i in range(torch.cuda.device_count()):
   print("device ", i , torch.cuda.get_device_properties(i).name)

In [None]:
# Setup logging. To see more logging, set the level to DEBUG

import sys
import logging

# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.basicConfig(stream=sys.stdout, level=logging.WARNING)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Step 1: Load Settings

In [None]:
# Load settings from .env file
from dotenv import find_dotenv, dotenv_values

# Change system path to root direcotry
sys.path.insert(0, '../')

# _ = load_dotenv(find_dotenv()) # read local .env file
config = dotenv_values(find_dotenv())

# For debugging purposes
# print (config)

ATLAS_URI = config.get('ATLAS_URI')

if not ATLAS_URI:
    raise Exception ("'ATLAS_URI' is not set.  Please set it above to continue...")
else:
    print("ATLAS_URI Connection string found:", ATLAS_URI)

OPENAI_API_KEY = config.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    raise Exception ("'OPENAI_API_KEY' is not set. Please set it above to continue...")
else:
    print("OPENAI_API_KEY Connection string found:", OPENAI_API_KEY)

In [None]:
# Define our variables
DB_NAME = 'rag1'
COLLECTION_NAME = '10k'
INDEX_NAME = 'idx_embedding'

In [None]:
# LlamaIndex will download embeddings models as needed
# Set llamaindex cache dir to ../cache dir here (Default is system tmp)
# This way, we can easily see downloaded artifacts
os.environ['LLAMA_INDEX_CACHE_DIR'] = os.path.join(os.path.abspath('../'), 'cache')

In [None]:
from pymongo import MongoClient

mongodb_client = MongoClient(ATLAS_URI)

print ("Atlas client initialized")

## Step 2: Setup Embedding Model

Now, we'll need to set up an embedding model to help us generate embeddings for the user query. 

Same as in the previous Python notebook in this Quest, we'll have the option to either use OpenAI models or open source HuggingFace models. We'll be going with the second approach here.

### 2.1: Option A: OpenAI Embeddings

This option utilizes an OpenAI embedding model. As such, you will need to have an OpenAI API key (as defined in env variable `OPENAI_API_KEY`).

In [None]:
## Only uncomment this if you are using OpenAI for Embeddings
# from llama_index import  OpenAIEmbedding
# embed_model = OpenAIEmbedding()

### 2.2: Option B: Using Custom Embeddings

This option utilizes a HuggingFace embedding model. Note that this embedding model must be the same as the embedding model you used in the previous Python notebook when you were generating the embeddings for the documents. Unless you changed it, it should be `BAAI/bge-small-en-v1.5` in both Python notebooks.

In [None]:
from llama_index.embeddings import HuggingFaceEmbedding
# Uncomment the line below and comment away the line above if you face an import error
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

## Step 3: Setup LLM
Then, we'll need to setup an LLM to be able to take the results from the Atlas Vector Search and respond to the user query. We'll be using OpenAI for this purpose, so make sure that your OpenAI key in your .env file is populated.

After doing that, run the code cell. We'll use a simple completion task to see if our LLM has successfully loaded before continuing.

In [None]:
import openai
from llama_index.llms.openai import OpenAI

openai.api_key = config.get("OPENAI_API_KEY")

llm = OpenAI(model="gpt-3.5-turbo")

completion_response = llm.complete("To infinity, and")
print(completion_response)

Awesome! Now that we've initialized both our embedding model as well as our LLM, let's combine them together into a unified interface `service_context` that we can use later on.

In [None]:
from llama_index import ServiceContext
# Uncomment the line below and comment away the line above if you face an import error
# from llama_index.core import ServiceContext

service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)

## Step 4: Connect Llama-Index and MongoDB Atlas

This is where everything comes together, we orchestrate the combination of MongoDB Atlas as our vector storage and the `service_context` we just defined. This system we've just set up will allow us to ask the LLM questions regarding our uploaded documents; Atlas Vector Search will then locate portions of the document that most closely matches our query to supplement the LLM's response, thereby providing us with a more accurate response. 

In [None]:
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch

from llama_index.storage.storage_context import StorageContext
# Uncomment the line below and comment away the line above if you face an import error
# from llama_index.core import StorageContext

from llama_index.indices.vector_store.base import VectorStoreIndex
# Uncomment the line below and comment away the line above if you face an import error
# from llama_index.core import VectorStoreIndex

vector_store = MongoDBAtlasVectorSearch(mongodb_client = mongodb_client,
                                 db_name = DB_NAME, collection_name = COLLECTION_NAME,
                                 index_name  = 'idx_embedding',
                                 ## the following columns are set to default values
                                 # embedding_key = 'embedding', text_key = 'text', metadata_= 'metadata',
                                 )

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

## Step 5: Query Data / Ask Questions

Now, time for the fun part - asking it some questions! Let's start with asking our model 2 questions where the answers can be found in our documents.

In [None]:
from IPython.display import Markdown
from pprint import pprint

response = index.as_query_engine().query("What was Uber's revenue?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

In [None]:
response = index.as_query_engine().query("How much money did Lyft make in 2020?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

As you can see from the 2 questions we asked above, our model was able to search for portions within the uploaded documents that most closely matched our queries and responded with the answers. Now, let's try asking it a question where the answer can't be found in the uploaded documents.

In [None]:
# The answer to this question doesn't exist in the Lyft_10k filing!
# Let's see what we get back
response = index.as_query_engine().query("How much money Lyft made in 2018?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

In [None]:
# The answer to this question doesn't exist in the Uber_10k filing either
# Let's see what we get back
response = index.as_query_engine().query("How many employees did Uber have in 2015?")
display(Markdown(f"<b>{response}</b>"))
pprint(response, indent=4)

Good job following till the end! Please **head back to the Quest page on StackUp now** and refer to the instructions for how you can prepare your deliverable for this Quest.