# 2 - Leverage LLamaIndex with VertexAI Vector Search to perform question answering RAG

## Overview

This notebook will go over how to create a RAG framework using LlamaIndex and VertexAI Vector Search effectively.

LlamaIndex is used to parse, chunk, and embed the input data using Gemini Text Embedding models. We use store the parsed data in a VertexAI Vector Search index that will searched against during inference to retrieve context to augment prompts for question answering task.

### Objectives
This notebook provides a guide to building a questions answering system using retrieval augmented generation (RAG) framework that leverages LLamaIndex for data ingestion and Vector Store creation.

You will complete the following tasks:

1. Set up Google Cloud resources required: GCS Bucket and Vertex AI Vector Search index and deployed endpoint
2. Ingest, parse, chunk, and embed data using LlamaIndex with Gemini Text Embedding models.
3. Search the vector store with an incoming text queries to find similar text data that can be used as context in the prompt
4. Generate answer to the user query using Gemini Pro Model

### Imports

Install any dependencies that are needed.

In [None]:
!pip install llama-index \
  llama-index-embeddings-vertex \
  llama-index-llms-vertex \
  llama-index-vector_stores-vertexaivectorsearch \
  langchain-community \
  termcolor \
  llama-index-llms-langchain \
  llama-index-llms-fireworks \
  langchainhub -q

In [49]:
#Imports
import os
from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    StorageContext,
    Settings,
    PromptTemplate
)
from llama_index.embeddings.vertex import VertexTextEmbedding
from llama_index.llms.vertex import Vertex
from llama_index.core.prompts import LangchainPromptTemplate
from langchain import hub
from llama_index.vector_stores.vertexaivectorsearch import VertexAIVectorStore
from termcolor import colored


In [10]:
PROJECT_ID = "" #TODO - add your project-id here from the console
REGION = ""  #TODO - add your region here from the console
GCS_BUCKET = "llamaindex_gcs_bucket"  # @param {type:"string"}
VS_INDEX_NAME = "llamaindex_doc_index"  # @param {type:"string"}
VS_INDEX_ENDPOINT_NAME = "llamaindex_doc_endpoint"  # @param {type:"string"}
DOC_FOLDER = "./data"  # @param {type:"string"}

### Investigate sample data

Refer to document 04a02.pdf

This document describes the importance of stable power grids in Japan, highlighting the recent failure of a generator step-up transformer at the Nakoso Power Station and the rapid restoration response undertaken to maintain power supply stability.

We will use this pdf moving forward.

## Ingest data using Llama-Index into VertexAI Vector Search

The following section leverages LlamaIndex to ingest, chunk, and embed the PDF data to be connected to the VertexAI Vector Store.

At the end of this section you will be ready to query against the Vector Store to find relevant context.

In [38]:
def initialize_llm_and_storage(vs_index, vs_endpoint):
    """
    Initializes VertexAI Vector Store given a VertexAI Search index and deployed endpoint.
    Configures embedding and LLMs models to be gecko and Gemini.
    """
    # setup storage
    vector_store = VertexAIVectorStore(
        project_id=PROJECT_ID,
        region=REGION,
        index_id=vs_index,
        endpoint_id=vs_endpoint,
        gcs_bucket_name=GCS_BUCKET,
    )

    # set storage context
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    gemini_embedding_model = VertexTextEmbedding("text-embedding-004")
    llm = Vertex("gemini-pro")

    Settings.embed_model = gemini_embedding_model
    Settings.llm=llm

    return storage_context

def ingest_document():
    '''
    Using SimpleDirectoryReader, which creates documents out of every file in a given directory. It is built in to LlamaIndex and can read a variety of formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video.
    '''
    documents = SimpleDirectoryReader(input_files=[DOC_FOLDER+"/04a02.pdf"]).load_data()
    return documents

Initialize a LlamaIndex retriever on VertexAI using the resources created in the gcp.setup.ipynb notebook.

Copy and paste the values from the previous notebook  in the cell below.

In [40]:
vs_index_id = "" #TODO - add your vertexai search id from setup here
vs_endpoint_id = ""  #TODO - add your vertexai search deployed endpoint id from setup here

In [41]:
storage_context = initialize_llm_and_storage(vs_index_id, vs_endpoint_id)
docs = ingest_document()

In [None]:
# Here we can see the documents that were parsed using LlamaIndex
docs

## Perform Q/A RAG

This section peforms the RAG prompt and returns an answer to the user query. To explore RAG frameworks we will look at 3 different options:

1. Using the built-in RAG prompt provided by LlamaIndex
2. Connecting a LangChain RAG template
3. Creating a custom few-shot example RAG template.

For each option, you will see the current text prompt structure and the generated output.

In [43]:
# Setting up helper functions

def display_prompt_dict(prompts_dict):
    """
    Used to display the underlying text prompt used for RAG.
    """
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        print(text_md)
        print(p.get_template())
        print("\n\n")

def display_and_run_prompt(query_engine, query_str, show_prompt = True):
    """
    Displays the current RAG prompt used and runs the query against the RAG workflow.
    """

    if show_prompt:
        print("----Displaying current prompt dictionary----\n")
        prompts_dict = query_engine.get_prompts()
        display_prompt_dict(prompts_dict)

    response = query_engine.query(query_str)
    print(f"Response:")
    print("-" * 80)
    print(response.response)
    print("-" * 80)
    print(f"Source Documents:")
    print("-" * 80)
    for source in response.source_nodes:
        print(f"Sample Text: {source.text[:200]}")
        print(f"Relevance score: {source.get_score():.3f}")
        print(f"File Name: {source.metadata.get('file_name')}")
        print(f"Page #: {source.metadata.get('page_label')}")
        print(f"File Path: {source.metadata.get('file_path')}")
        print("-" * 80)

### LlamaIndex Built-in RAG

In [44]:
def llama_built_in_prompt(query_engine, query_str, show_prompt = True):
    display_and_run_prompt(query_engine, query_str, show_prompt)

### Templeted RAG through LangChain

In [45]:
def langchain_rag_prompt(query_engine, query_str, show_prompt = True):
    langchain_prompt = hub.pull("rlm/rag-prompt")

    langchain_prompt_template = LangchainPromptTemplate(
        template=langchain_prompt,
        template_var_mappings={"query_str": "question", "context_str": "context"},
    )

    query_engine.update_prompts(
        {"response_synthesizer:text_qa_template": langchain_prompt_template}
    )

    display_and_run_prompt(query_engine, query_str, show_prompt)

### Custom RAG Implementation

This custom RAG prompt highlights two important prompt engineering techniques:

__Few-shot examples:__ Providing the model with a few examples of the desired input-output behavior helps guide the model's response, effectively demonstrating the task and the expected format. This is particularly useful when the task is complex or requires a specific style of output.

__Grounding the output:__ Instructing the model to base its answer on the retrieved documents and to provide justification for the answer ensures that the response is factually grounded and relevant to the context. This is crucial for maintaining accuracy and preventing the model from generating responses that are either irrelevant or factually incorrect.


In [46]:
def custom_few_shot_prompt(query_engine, query_str, show_prompt = True):
    """
    Generating custom few shot prompt to show the desired output format and prevent hallucination by including reasoning in the response.
    """

    qa_prompt_custom_string= """\
    Context information is below.
    ---------------------
    {context_str}
    ---------------------
    Given the context information and not prior knowledge, answer the query asking about citations over different topics.

    Please output your answer in the following JSON format:
    JSON Output: [
    "answer": 'This is the answer to the question',
    "justification": 'This is the reasoning or evidence supporting the answer given the provided context'
    ]

    Example query and JSON output:
    Query: Who are the authors of the paper?
    JSON Output: [
    "answer": "The authors are Hikaru Fujita, Masaru Kashiwakura, Akihiro Kawagoe, Hisaki Hamamoto, Tetsuo Niitsuma, and Yuzuru Mitani."
    "justification": "The authors are listed on the first and last page in order."
    ]

    Query: When was there a failure at the Nakoso Power Station?
    JSON Output: [
    "answer": "September  16,  2021",
    "justification": "In the context provided it states: It was in this context
    that  on  September  16,  2021,  a  failure  due  to  aging  forced  the  emergency  stop  of  the
    unit No. 8 generator step-up transformer (built in 1981) at the Nakoso Power Station of
    Jōban Joint Power Co., Ltd. "
    ]

    Only answer the query provided. Return only one JSON answer/justification pair.
    Query: {query_str}
    Answer:
    """

    custom_RAG_template = PromptTemplate(
        template=qa_prompt_custom_string
    )

    query_engine.update_prompts(
        {"response_synthesizer:text_qa_template": custom_RAG_template}
    )

    display_and_run_prompt(query_engine, query_str, show_prompt)

In [50]:
def index_and_query_documents(documents, storage_context, query):
    """
    Sets up vector store index to query against for a RAG pattern.
    """
    #Using gemini embedding models
    vector_index = VectorStoreIndex.from_documents(
        documents, storage_context=storage_context
    )

    #Set up a query engine
    query_engine = vector_index.as_query_engine()

    print(colored("*******Option 1: LlamaIndex Built-In Prompt*******", "red"))
    llama_built_in_prompt(query_engine, query, show_prompt = False)
    print(colored("*******Option 2: LangChain Template RAG Prompt*******", "blue"))
    langchain_rag_prompt(query_engine, query, show_prompt = False)
    print(colored("*******Option 3: Custom Few-Shot Prompt*******", "green"))
    custom_few_shot_prompt(query_engine,query, show_prompt=False)


In [None]:
# Run the RAG workflow for the llama-index built in prompt, templated LangChain prompt, and custom few-shot prompt

query = "what is minimum reserve rate of power?"

vector_idx = index_and_query_documents(docs, storage_context, query)

## Conclusions

Congratulations! You've implemented LlamaIndex on VertexAI for RAG applications with various types of prompting.

Feel free to play around with the different input queries, prompt types, prompt structures and see how that impacts the output.

Happy coding.