### Reference
- [Build your own Grounded RAG application using Vertex AI APIs for RAG and Langchain](https://github.com/GoogleCloudPlatform/applied-ai-engineering-samples/blob/main/genai-on-vertex-ai/retrieval_augmented_generation/diy_rag_with_vertexai_apis/build_grounded_rag_app_with_vertex.ipynb)

In [2]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_google_community import VertexAIRank, VertexAISearchRetriever, VertexAICheckGroundingWrapper
from dotenv import load_dotenv
import os
from typing import List
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain.docstore.document import Document
from langchain_core.runnables import chain
from langchain_google_vertexai import VertexAI
from langchain.prompts import PromptTemplate
from util.helper import get_sxs_comparison, display_grounded_generation
from rich import print

In [3]:
load_dotenv()

True

In [4]:
PROJECT_ID=os.environ.get("PROJECT_ID")
LOCATION=os.environ.get("LOCATION")
COLLECTION=os.environ.get("COLLECTION")
DATA_STORE_ID=os.environ.get("DATA_STORE_ID")
APP_ENGINE_DISPLAY_NAME=os.environ.get("APP_ENGINE_DISPLAY_NAME")
APP_ENGINE_ID=os.environ.get("APP_ENGINE_ID")
FIELD="file_name"
RERANK_TOP_K=5
query="How does Google Cloud leverage AI to enhance its products and services, and what is its commitment to privacy and security in this context?"

In [5]:
retriever = VertexAISearchRetriever(
    project_id=PROJECT_ID,
    data_store_id=DATA_STORE_ID,
    location_id=LOCATION,
    engine_data_type=0,
    max_documents=10,
    max_extractive_segment_count=5
)

In [6]:
# Instantiate the VertexAIReranker with the SDK manager
reranker = VertexAIRank(
    project_id=PROJECT_ID,
    location_id=LOCATION,
    ranking_config="default_ranking_config",
    title_field=FIELD,  # metadata field to preserve with reranked results
    top_n=RERANK_TOP_K,
)


# Create the ContextualCompressionRetriever with the VertexAIRanker as a Reranker
retriever_with_reranker = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=retriever
)

In [9]:
reranked_results = get_sxs_comparison(
    simple_retriever=retriever,
    reranking_api_retriever=retriever_with_reranker,
    query="What were alphabet revenues in 2021?",
    search_kwargs={"k": RERANK_TOP_K},
)

Retriever Results,Reranked Results
"Other information: As Al is critical to delivering our mission of bringing our breakthrough innovations into the real world, beginning in January 2023, we will update our segment reporting relating to certain of Alphabet’s Al activities. DeepMind, previously reported within Other Bets, will be reported as part of Alphabet’s corporate costs, reflecting its increasing collaboration with Google Services, Google Cloud, and Other Bets. Prior periods will be recast to conform to the revised presentation. See Note 15 of the Notes to Consolidated Financial Statements included in Item 8 of this Annual Report on Form 10-K for information relating to our segments. 31 # Table of Contents # Financial Results # Revenues The following table presents revenues by type (in millions): # Google Services Google advertising revenues # Alphabet Inc. STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Google Search & other | $ 148,951 | $ 162,450 | | YouTube ads | 28,845 | 29,243 | | Google Network | 31,701 | 32,780 | | Google advertising | 209,497 | 224,473 | | Google other | 28,032 | 29,055 | | Google Services total | 237,529 | 253,528 | | Google Cloud | 19,206 | 26,280 | | Other Bets | 753 | 1,068 | | Hedging gains (losses) | 149 | 1,960 | | Total revenues | $ 257,637 | $ 282,836 | ENDOFTABLE Google Search & other Google Search & other revenues increased $13.5 billion from 2021 to 2022. The growth was driven by interrelated factors including increases in search queries resulting from growth in user adoption and usage, primarily on mobile devices; growth in advertiser spending; and improvements we have made in ad formats and delivery. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. # YouTube ads YouTube ads revenues increased $398 million from 2021 to 2022. The growth was driven by our brand advertising products followed by direct response products, both of which benefited from increased spending by our advertisers as well as improvements to ad formats and delivery. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. # Google Network Google Network revenues increased $1.1 billion from 2021 to 2022. The growth was primarily driven by strength in AdSense and AdMob. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. ## Monetization Metrics 2022:  Source: 20230203-alphabet-10K.pdf","Other information: As Al is critical to delivering our mission of bringing our breakthrough innovations into the real world, beginning in January 2023, we will update our segment reporting relating to certain of Alphabet’s Al activities. DeepMind, previously reported within Other Bets, will be reported as part of Alphabet’s corporate costs, reflecting its increasing collaboration with Google Services, Google Cloud, and Other Bets. Prior periods will be recast to conform to the revised presentation. See Note 15 of the Notes to Consolidated Financial Statements included in Item 8 of this Annual Report on Form 10-K for information relating to our segments. 31 # Table of Contents # Financial Results # Revenues The following table presents revenues by type (in millions): # Google Services Google advertising revenues # Alphabet Inc. STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Google Search & other | $ 148,951 | $ 162,450 | | YouTube ads | 28,845 | 29,243 | | Google Network | 31,701 | 32,780 | | Google advertising | 209,497 | 224,473 | | Google other | 28,032 | 29,055 | | Google Services total | 237,529 | 253,528 | | Google Cloud | 19,206 | 26,280 | | Other Bets | 753 | 1,068 | | Hedging gains (losses) | 149 | 1,960 | | Total revenues | $ 257,637 | $ 282,836 | ENDOFTABLE Google Search & other Google Search & other revenues increased $13.5 billion from 2021 to 2022. The growth was driven by interrelated factors including increases in search queries resulting from growth in user adoption and usage, primarily on mobile devices; growth in advertiser spending; and improvements we have made in ad formats and delivery. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. # YouTube ads YouTube ads revenues increased $398 million from 2021 to 2022. The growth was driven by our brand advertising products followed by direct response products, both of which benefited from increased spending by our advertisers as well as improvements to ad formats and delivery. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. # Google Network Google Network revenues increased $1.1 billion from 2021 to 2022. The growth was primarily driven by strength in AdSense and AdMob. Growth was adversely affected by the unfavorable effect of foreign currency exchange rates. ## Monetization Metrics 2022:  Source: 20230203-alphabet-10K.pdf"
"Paid clicks and cost-per-click The following table presents changes in paid clicks and cost-per-click (expressed as a percentage) from 2021 to STARTOFTABLE TABLEINMARKDOWN: |-|-| | Paid clicks change | 10% | | Cost-per-click change | (1)% | ENDOFTABLE Paid clicks increased from 2021 to 2022 driven by a number of interrelated factors, including an increase in search queries resulting from growth in user adoption and usage, primarily on mobile devices; growth in advertiser spending; and improvements we have made in ad formats and delivery. Cost-per-click decreased from 2021 to 2022 driven by a number of interrelated factors including changes in device mix, geographic mix, advertiser spending, ongoing product changes, and property mix, as well as the unfavorable effect of foreign currency exchange rates. 32 # Table of Contents # Impressions and cost-per-impression # Alphabet Inc. The following table presents changes in impressions and cost-per-impression (expressed as a percentage) from 2021 to 2022: STARTOFTABLE TABLEINMARKDOWN: |-|-| | Impressions change | 3% | | Cost-per-impression change | 1% | ENDOFTABLE Impressions increased from 2021 to 2022 primarily driven by Google Ad Manager and AdMob. The increase in cost-per-impression from 2021 to 2022 was driven by a number of interrelated factors including ongoing product and policy changes, improvements we have made in ad formats and delivery, changes in device mix, geographic mix, product mix, and property mix, partially offset by the unfavorable effect of foreign currency exchange rates. # Google other revenues Google other revenues increased $1.0 billion from 2021 to 2022 primarily driven by growth in YouTube non- advertising and hardware revenues, partially offset by a decrease in Google Play revenues. The growth in YouTube non-advertising was largely due to an increase in paid subscribers. The growth in hardware was primarily driven by increased sales of Pixel devices. The decrease in Google Play revenues was primarily driven by the fee structure changes we announced in 2021 as well as a decrease in buyer spending. Additionally, the overall increase in Google other revenues was adversely affected by the unfavorable effect of foreign currency exchange rates. # Google Cloud Google Cloud revenues increased $7.1 billion from 2021 to 2022. The growth was primarily driven by Google Cloud Platform followed by Google Workspace offerings. Google Cloud’s infrastructure and platform services were the largest drivers of growth in Google Cloud Platform. # Revenues by Geography The following table presents revenues by geography as a percentage of revenues, determined based on the addresses of our customers: STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | Year Ended December 31, | | | | 2022 2021 | | | United States | 46% | 48% | | EMEA | 31% | 29% | | APAC | 18% | 16% | | Other Americas | 5% | 6% | | Hedging gains (losses) | 0% | 1% | ENDOFTABLE For further details on revenues by geography, see Note 2 of the Notes to Consolidated Financial Statements included in Item 8 of this Annual Report on Form 10-K.  Source: 20230203-alphabet-10K.pdf","Cost of Revenues The following table presents cost of revenues, including TAC (in millions, except percentages): STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | TAC | 45,566 $ | $ 48,955 | | Other cost of revenues | 65,373 | 77,248 | | Total cost of revenues | 110,939 $ | $ 126,203 | | Total cost of revenues as a percentage of revenues | 43% | 45% | ENDOFTABLE Cost of revenues increased $15.3 billion from 2021 to 2022. The increase was due to an increase in other cost of revenues and TAC of $11.9 billion and $3.4 billion, respectively. The increase in TAC from 2021 to 2022 was due to an increase in TAC paid to distribution partners and to Google Network partners, primarily driven by growth in revenues subject to TAC. The TAC rate was 22% in both 2021 and 2022. The TAC rate on Google Search & other revenues and the TAC rate on Google Network revenues were both substantially consistent from 2021 to 2022. The increase in other cost of revenues from 2021 to 2022 was primarily due to increases in data center costs and other operations costs as well as hardware costs. 34 # Table of Contents # Research and Development The following table presents R&D expenses (in millions, except percentages): # Alphabet Inc. STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Research and development expenses | $ 31,562 | $ 39,500 | | Research and development expenses as a percentage of revenues | 12% | 14% | ENDOFTABLE R&D expenses increased $7.9 billion from 2021 to 2022 primarily driven by an increase in compensation expenses of $5.4 billion, largely resulting from a 21% increase in average headcount, and an increase in third-party service fees of $704 million. # Sales and Marketing The following table presents sales and marketing expenses (in millions, except percentages): STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Sales and marketing expenses | $ 22,912 | $ 26,567 | | Sales and marketing expenses as a percentage of revenues | 9% | 9% | ENDOFTABLE Sales and marketing expenses increased $3.7 billion from 2021 to 2022, primarily driven by an increase in compensation expenses of $1.8 billion, largely resulting from a 19% increase in average headcount, and an increase in advertising and promotional activities of $1.3 billion.  Source: 20230203-alphabet-10K.pdf"
"Cost of Revenues The following table presents cost of revenues, including TAC (in millions, except percentages): STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | TAC | 45,566 $ | $ 48,955 | | Other cost of revenues | 65,373 | 77,248 | | Total cost of revenues | 110,939 $ | $ 126,203 | | Total cost of revenues as a percentage of revenues | 43% | 45% | ENDOFTABLE Cost of revenues increased $15.3 billion from 2021 to 2022. The increase was due to an increase in other cost of revenues and TAC of $11.9 billion and $3.4 billion, respectively. The increase in TAC from 2021 to 2022 was due to an increase in TAC paid to distribution partners and to Google Network partners, primarily driven by growth in revenues subject to TAC. The TAC rate was 22% in both 2021 and 2022. The TAC rate on Google Search & other revenues and the TAC rate on Google Network revenues were both substantially consistent from 2021 to 2022. The increase in other cost of revenues from 2021 to 2022 was primarily due to increases in data center costs and other operations costs as well as hardware costs. 34 # Table of Contents # Research and Development The following table presents R&D expenses (in millions, except percentages): # Alphabet Inc. STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Research and development expenses | $ 31,562 | $ 39,500 | | Research and development expenses as a percentage of revenues | 12% | 14% | ENDOFTABLE R&D expenses increased $7.9 billion from 2021 to 2022 primarily driven by an increase in compensation expenses of $5.4 billion, largely resulting from a 21% increase in average headcount, and an increase in third-party service fees of $704 million. # Sales and Marketing The following table presents sales and marketing expenses (in millions, except percentages): STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | | Year Ended December 31, | | | 2021 | 2022 | | Sales and marketing expenses | $ 22,912 | $ 26,567 | | Sales and marketing expenses as a percentage of revenues | 9% | 9% | ENDOFTABLE Sales and marketing expenses increased $3.7 billion from 2021 to 2022, primarily driven by an increase in compensation expenses of $1.8 billion, largely resulting from a 19% increase in average headcount, and an increase in advertising and promotional activities of $1.3 billion.  Source: 20230203-alphabet-10K.pdf","Paid clicks and cost-per-click The following table presents changes in paid clicks and cost-per-click (expressed as a percentage) from 2021 to STARTOFTABLE TABLEINMARKDOWN: |-|-| | Paid clicks change | 10% | | Cost-per-click change | (1)% | ENDOFTABLE Paid clicks increased from 2021 to 2022 driven by a number of interrelated factors, including an increase in search queries resulting from growth in user adoption and usage, primarily on mobile devices; growth in advertiser spending; and improvements we have made in ad formats and delivery. Cost-per-click decreased from 2021 to 2022 driven by a number of interrelated factors including changes in device mix, geographic mix, advertiser spending, ongoing product changes, and property mix, as well as the unfavorable effect of foreign currency exchange rates. 32 # Table of Contents # Impressions and cost-per-impression # Alphabet Inc. The following table presents changes in impressions and cost-per-impression (expressed as a percentage) from 2021 to 2022: STARTOFTABLE TABLEINMARKDOWN: |-|-| | Impressions change | 3% | | Cost-per-impression change | 1% | ENDOFTABLE Impressions increased from 2021 to 2022 primarily driven by Google Ad Manager and AdMob. The increase in cost-per-impression from 2021 to 2022 was driven by a number of interrelated factors including ongoing product and policy changes, improvements we have made in ad formats and delivery, changes in device mix, geographic mix, product mix, and property mix, partially offset by the unfavorable effect of foreign currency exchange rates. # Google other revenues Google other revenues increased $1.0 billion from 2021 to 2022 primarily driven by growth in YouTube non- advertising and hardware revenues, partially offset by a decrease in Google Play revenues. The growth in YouTube non-advertising was largely due to an increase in paid subscribers. The growth in hardware was primarily driven by increased sales of Pixel devices. The decrease in Google Play revenues was primarily driven by the fee structure changes we announced in 2021 as well as a decrease in buyer spending. Additionally, the overall increase in Google other revenues was adversely affected by the unfavorable effect of foreign currency exchange rates. # Google Cloud Google Cloud revenues increased $7.1 billion from 2021 to 2022. The growth was primarily driven by Google Cloud Platform followed by Google Workspace offerings. Google Cloud’s infrastructure and platform services were the largest drivers of growth in Google Cloud Platform. # Revenues by Geography The following table presents revenues by geography as a percentage of revenues, determined based on the addresses of our customers: STARTOFTABLE TABLEINMARKDOWN: |-|-|-| | | Year Ended December 31, | | | | 2022 2021 | | | United States | 46% | 48% | | EMEA | 31% | 29% | | APAC | 18% | 16% | | Other Americas | 5% | 6% | | Hedging gains (losses) | 0% | 1% | ENDOFTABLE For further details on revenues by geography, see Note 2 of the Notes to Consolidated Financial Statements included in Item 8 of this Annual Report on Form 10-K.  Source: 20230203-alphabet-10K.pdf"


In [10]:
llm = VertexAI(model_name="gemini-1.5-pro-001", max_output_tokens=1024)
template = """
Answer the question based only on the following context:
{context}

Question:
{query}
"""
prompt = PromptTemplate.from_template(template)

create_answer = prompt | llm

In [11]:
output_parser = VertexAICheckGroundingWrapper(
    project_id=PROJECT_ID,
    location_id=LOCATION,
    grounding_config="default_grounding_config",
    top_n=3,
)

In [12]:
@chain
def check_grounding_output_parser(answer_candidate: str, documents: List[Document]):
    return output_parser.with_config(configurable={"documents": documents}).invoke(
        answer_candidate
    )


setup_and_retrieval = RunnableParallel(
    {"context": retriever, "query": RunnablePassthrough()}
)


@chain
def qa_with_check_grounding(query):
    docs = setup_and_retrieval.invoke(query)
    # The GroundingFact attribute dictionary in the grounding API only accepts values that are strings, and it limits the number of attributes to six.
    for doc in docs["context"]:
        doc.metadata = {k: v for k, v in doc.metadata.items() if isinstance(v, str) and k not in ["file_name_2", "created_unix_time"] }
    answer_candidate = create_answer.invoke(input={"query": query, "context": docs["context"]})
    check_grounding_output = check_grounding_output_parser.invoke(
        answer_candidate, documents=docs["context"]
    )
    return check_grounding_output

In [13]:
result = qa_with_check_grounding.invoke(query)
print(result)

In [14]:
display_grounded_generation(result)

In [15]:
result = qa_with_check_grounding.invoke(
    query
)
display_grounded_generation(result)