# Vertex AI Search - Technical Deep Dive - Lab Exercise

The purpose of this lab is to explore the use of the client libraries and APIs in Vertex AI Search and the Langchain LLM integrations and retrievers for Enterprise Search and Vertex AI.

You'll use these tools to build a question and answer service that takes a user query, retrieves relevant documents from a search data store in Gen App Builder, then returns an LLM-generated answer to the original query along with source documents that were used to generate the answer.

Helpful resources for the lab coding exercise:

- [Vertex AI Search Code Samples (Documentation)](https://cloud.google.com/generative-ai-app-builder/docs/samples)
- [Question Answering Over Documents (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb)
- [Grounding Generative AI using Vertex AI Search Results (Colab)](https://colab.research.google.com/drive/174YYPNNy1rWdIFvV-_LWZ-cueRB7Q6EC?resourcekey=0-9bYTUjXMbEkHIuduaNjNJw&usp=sharing)
- [Vertex AI Search - Search Web App (GitHub)](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/gen-app-builder/search-web-app)

# Coding exercise (Technical asset)

## Step 1

As a first step, you must create an unstructured data search app that uses PDFs data and get de data_store_id that will be used later.


**DONE?** :)

## Step 2

Install the Vertex AI and Langchain 0.0.236 (newer versions are broken as of 2023-08-10) client libraries for Python:

In [1]:
# Install packages
# Note: You might need to restart the runtime after installing these packages
!pip install google-cloud-discoveryengine google-cloud-aiplatform langchain==0.0.236 "shapely<2.0.0" -q

In [2]:
!pip install gradio==3.48.0 -q

Next code will restart your Runtime



In [3]:
import os
import sys

if "google.colab" in sys.modules:
    from google.colab import auth as google_auth

    google_auth.authenticate_user()

In [4]:
# Authenticate with Google account
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [5]:
from os.path import basename
from typing import Dict, List, Optional, Tuple, Any

## Step 3

Use the [Enterprise Search document retriever in LangChain](https://python.langchain.com/docs/integrations/retrievers/google_cloud_enterprise_search) to retrieve documents from your data store based on a query.

Sample query: “What are Alphabet's social and environmental impact?”

In [17]:
# @title Configuration Variables
PROJECT_ID = "qwiklabs-gcp-01-1ca9ffcb2245" # @param {type:"string"}
LOCATION = "us-central1" # @param {type:"string"}
DATA_STORE_ID = "nh-nofinanciera_1700442192371" # @param {type:"string"}

In [37]:
from google.cloud import discoveryengine_v1beta as discoveryengine
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever as EnterpriseSearchRetriever

QUERY = "Quien es el CEO de NH?"

# Code your solution here

# Initialise an Enterprise Search Retriever
retriever = EnterpriseSearchRetriever(
    project_id=PROJECT_ID,
    search_engine_id=DATA_STORE_ID,)
    #max_documents=3, #opt
    #max_extractive_answer_count=3, #opt
    #get_extractive_answers=True,) #opt

# Get relevant documents
result = retriever.get_relevant_documents(QUERY)
for doc in result:
    print(doc)


page_content='ÍNDICE\n\nNUESTRA PRESENCIA EN EL MUNDO 2022\n\n99\n\nSOBRE EL ESTADO DE INFORMACIÓN NO FINANCIERA CONSOLIDADO\n\n100\n\nTAXONOMÍA DE ACTIVIDADES SOSTENIBLES DE LA UNIÓN EUROPEA\n\n102\n\nMENSAJE DEL PRESIDENTE Y DEL CEO\n\n108\n\n• Mensaje del presidente\n\n108\n\n• Mensaje del CEO\n\n110\n\nHITOS 2022\n\n113\n\nNUESTRA VISIÓN Y CULTURA\n\n114\n\nMODELO DE NEGOCIO DE NH HOTEL GROUP\n\n116\n\nESTRATEGIA DE NH HOTEL GROUP\n\n121\n\nGOBIERNO CORPORATIVO\n\n123\n\n• Consejo de Administración de NH Hotel Group\n\n123\n\n• Comisiones del Consejo\n\n124\n\n• Comité de Dirección\n\n126\n\n• Remuneración del Consejo y de la Alta Dirección\n\n127\n\n• Estructura Accionarial\n\n128\n\nCOMPROMISO ÉTICO Y SISTEMA DE CUMPLIMIENTO\n\n130\n\nTRANSPARENCIA FISCAL: BENEFICIOS E IMPUESTOS\n\n136\n\nRELACIÓN CON LOS GOBIERNOS E INFLUENCIA POLÍTICA\n\n140\n\nSEGURIDAD DE LA INFORMACIÓN\n\n141\n\nPROTECCIÓN DE LOS DERECHOS HUMANOS\n\n144\n\n• Debida diligencia de los Derechos Humanos\n\n146\n

## Step 4

Given a search query, use [Langchain's LLM integration with Vertex AI](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm) to send a search query and return an answer with source documents

Hint: Use [RetrievalQAWithSourcesChain](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb) and refer to the “Helpful resources” at the top of this notebook!

Sample query: “Who is the CEO of DeepMind?”

In [38]:
import vertexai
from langchain.llms import VertexAI
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever

# Code your solution here
from langchain.chains import LLMChain
from langchain import PromptTemplate
import vertexai
vertexai.init(project=PROJECT_ID, location=LOCATION)

Here we define the parameters for LLM model calls

In [40]:
#Initialise LLM
LLM_MODEL = "text-bison@001" #@param {type: "string"}
MAX_OUTPUT_TOKENS = 1024 #@param {type: "integer"}
TEMPERATURE = 0.2 #@param {type: "number"}
TOP_P = 0.8 #@param {type: "number"}
TOP_K = 40 #@param {type: "number"}
VERBOSE = True #@param {type: "boolean"}
llm_params = dict(
    model_name=LLM_MODEL,
    max_output_tokens=MAX_OUTPUT_TOKENS,
    temperature=TEMPERATURE,
    top_p=TOP_P,
    top_k=TOP_K,
    verbose=VERBOSE,
)

llm = VertexAI(**llm_params)

Sample question for testing

In [45]:
QUERY = "Quien es el CEO de NH?" #@param {type: "string"}

Here we test the results before productivicing


In [41]:
# Combine the LLM with a prompt to make a simple chain
PROMPT_STRING = "Please parse these search results and summarize them to the answer the following question. Results:{results}. Question:{query}. Answer:"
prompt = PromptTemplate(input_variables=['results', 'query'],
                        template=PROMPT_STRING)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)

# Get relevant documents
documents = retriever.get_relevant_documents(QUERY)
content = [d.page_content for d in documents]

# Use the LLM-prompt chain to answer the question based on the results
result = chain.run({'results': content, 'query': QUERY})

result.split('\n')



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mPlease parse these search results and summarize them to the answer the following question. Results:['ÍNDICE\n\nNUESTRA PRESENCIA EN EL MUNDO 2022\n\n99\n\nSOBRE EL ESTADO DE INFORMACIÓN NO FINANCIERA CONSOLIDADO\n\n100\n\nTAXONOMÍA DE ACTIVIDADES SOSTENIBLES DE LA UNIÓN EUROPEA\n\n102\n\nMENSAJE DEL PRESIDENTE Y DEL CEO\n\n108\n\n• Mensaje del presidente\n\n108\n\n• Mensaje del CEO\n\n110\n\nHITOS 2022\n\n113\n\nNUESTRA VISIÓN Y CULTURA\n\n114\n\nMODELO DE NEGOCIO DE NH HOTEL GROUP\n\n116\n\nESTRATEGIA DE NH HOTEL GROUP\n\n121\n\nGOBIERNO CORPORATIVO\n\n123\n\n• Consejo de Administración de NH Hotel Group\n\n123\n\n• Comisiones del Consejo\n\n124\n\n• Comité de Dirección\n\n126\n\n• Remuneración del Consejo y de la Alta Dirección\n\n127\n\n• Estructura Accionarial\n\n128\n\nCOMPROMISO ÉTICO Y SISTEMA DE CUMPLIMIENTO\n\n130\n\nTRANSPARENCIA FISCAL: BENEFICIOS E IMPUESTOS\n\n136\n\nRELACIÓN CON LOS GOBIERNOS

['The CEO of NH Hotel Group is Ramón Aragonés.']

Here we create a function that will be called from Gradio

In [46]:
def data_store_qna(QUERY):

  PROMPT_STRING = "Please parse these search results and summarize them to the answer the following question. Results:{results}. Question:{query}. Answer:"
  prompt = PromptTemplate(input_variables=['results', 'query'],
                        template=PROMPT_STRING)
  chain = LLMChain(llm=llm, prompt=prompt, verbose=True)

# Get relevant documents
  documents = retriever.get_relevant_documents(QUERY)
  content = [d.page_content for d in documents]

# Use the LLM-prompt chain to answer the question based on the results
  result = chain.run({'results': content, 'query': QUERY})

  result.split('\n')
  return result

Testing the function

In [48]:
output=data_store_qna(QUERY)
print(output)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mPlease parse these search results and summarize them to the answer the following question. Results:['ÍNDICE\n\nNUESTRA PRESENCIA EN EL MUNDO 2022\n\n99\n\nSOBRE EL ESTADO DE INFORMACIÓN NO FINANCIERA CONSOLIDADO\n\n100\n\nTAXONOMÍA DE ACTIVIDADES SOSTENIBLES DE LA UNIÓN EUROPEA\n\n102\n\nMENSAJE DEL PRESIDENTE Y DEL CEO\n\n108\n\n• Mensaje del presidente\n\n108\n\n• Mensaje del CEO\n\n110\n\nHITOS 2022\n\n113\n\nNUESTRA VISIÓN Y CULTURA\n\n114\n\nMODELO DE NEGOCIO DE NH HOTEL GROUP\n\n116\n\nESTRATEGIA DE NH HOTEL GROUP\n\n121\n\nGOBIERNO CORPORATIVO\n\n123\n\n• Consejo de Administración de NH Hotel Group\n\n123\n\n• Comisiones del Consejo\n\n124\n\n• Comité de Dirección\n\n126\n\n• Remuneración del Consejo y de la Alta Dirección\n\n127\n\n• Estructura Accionarial\n\n128\n\nCOMPROMISO ÉTICO Y SISTEMA DE CUMPLIMIENTO\n\n130\n\nTRANSPARENCIA FISCAL: BENEFICIOS E IMPUESTOS\n\n136\n\nRELACIÓN CON LOS GOBIERNOS

We import Gradio and create a simple website. It will provide us with an URL that we can use to test it.

In [49]:
import gradio as gr

with gr.Blocks() as demo:
    gr.Markdown(
    """
    ## Ask Vertex AI Search

    This app uses Vertex AI Search Results to answer questions.

    ## How to use

    1. Enter a question
    2. Click on "Ask the Question"
    3. The answer will be displayed

    """)
    with gr.Row():
      with gr.Column():
        input_text = gr.Textbox(label="Question", placeholder="Quién es el CEO de NH?")
    with gr.Row():
      generate = gr.Button("Ask the Question")
    with gr.Row():
      label = gr.Textbox(label="Output")
    generate.click(data_store_qna,input_text, [label])
demo.launch(share=True, debug=False)



Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://d90a11d0d846255f46.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


