## Use a `QueryEngine` for retrieval augmented generation

### Set up the environemnt first

In [1]:
!pip install -q -r requirements.txt

In [2]:
from my_config import MyConfig
my_config = MyConfig()

### Get the Vector Store and Create Index to Query

In [3]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore

db = chromadb.PersistentClient(path="./alfred_chroma_db")
chroma_collection = db.get_or_create_collection(name="alfred")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)



In [4]:
from my_utils import RunPodEmbedding
from llama_index.core import VectorStoreIndex

# Instantiate custom class with your RunPod URL
runpod_url = f"https://{my_config.VLLM_EMBEDDING_MODEL_INFERENCE_NODE_IP}-8000.proxy.runpod.net/v1/embeddings"
embedding_model = RunPodEmbedding(endpoint_url=runpod_url)
# embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embedding_model
)

We don't need to worry about persisting the index to disk, as it is automatically saved within the `ChromaVectorStore` object and the passed directory path.

### Querying the index
Now that we have our index, we can use it to query the documents. Let's create a `QueryEngine` from the index and use it to query the documents using a specific response mode.

In [5]:
from my_utils import RunPodQwenLLM
model_id="Qwen/Qwen2.5-Coder-7B-Instruct"
vllm_api_base = f"https://{my_config.VLLM_LLM_INFERENCE_NODE_IP}-8000.proxy.runpod.net/v1/chat/completions"
llm = RunPodQwenLLM(api_url=vllm_api_base, overriden_model_name=model_id)

In [6]:
query_engine = index.as_query_engine(
    llm=llm,
    response_mode="tree_summarize",
)

In [None]:
response = await query_engine.aquery(
    "Respond using a persona that describes author and travel experiences?"
)

Response(response="Persona 1:\nName: Emma\nOccupation: History and Culture Writer\nLocation: Global Explorer\nEmma is an avid traveler and writer passionate about sharing historical and cultural insights with a general audience. She frequently visits ancient sites, markets, and local communities to capture their essence and share it through her blog posts. With a keen eye for detail and a knack for storytelling, Emma aims to make complex historical events accessible and engaging for readers.\n\nPersona 2:\nName: Alex\nOccupation: Travel Blogger\nSpecialization: Cultural Exploration and Language\nCurrent Location: Eastern Europe\nAlex is a travel blogger with a particular focus on Eastern European culture and history. He has a strong background in linguistics and uses his skills to immerse himself in the local languages and customs of the regions he visits. Alex's goal is to bridge the gap between Western and Eastern cultures through his blog, making it easier for readers to understand 

In [14]:
with open('output/output.out', 'w') as f:
    f.write(response.response)