# RAG with Vertex AI Demo

### Process of RAG application
<img src="image.png" alt="RAG" style="height: 700px;"/>
<p> Image Source: <a href=https://python.langchain.com/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png> Langchain</a> </p>

## Single sentence as data source

 Install Dependencies

In [1]:
%pip install --upgrade --quiet  langchain langchain-openai faiss-cpu langchain-google-vertexai==0.0.5 bs4 huggingface_hub

Note: you may need to restart the kernel to use updated packages.


Import necessary modules

In [2]:
import vertexai
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_vertexai import ChatVertexAI,VertexAIEmbeddings

2024-02-16 10:17:09.719872: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 10:17:09.777329: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-16 10:17:09.777380: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-16 10:17:09.779338: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-16 10:17:09.789358: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-16 10:17:09.790758: I tensorflow/core/platform/cpu_feature_guard.cc:1

Setup Google Cloud Credentials and initialize VertexAI Python Package

In [3]:
import os

# Initialize Vertex
vertexai.init(project='pic-gen-ai-project', location="asia-southeast1")

Create embedding and LLM objects

In [4]:
# Using self-hosted embeddings
from langchain_community.embeddings import HuggingFaceHubEmbeddings
embeddings = HuggingFaceHubEmbeddings(model="http://embeddings.genai-pic.com:8080")

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
from langchain_google_vertexai import VertexAI
llm = VertexAI(model_name="gemini-pro", max_output_tokens=2000)

Embed the sentence

In [6]:
vectorstore = FAISS.from_texts(
    ["John worked at PIC"], embedding=embeddings
)
retriever = vectorstore.as_retriever()

Prepare the prompt for RAG

In [7]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

Create chain from the components

In [8]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

Run the chain

In [9]:
from langchain.globals import set_debug
set_debug(False)
chain.invoke("Who works at PIC?")

'John'

## Creating vectorstore from a website

### Data Source Preparation 
<img src="image-1.png" alt="RAG" style="height: 700px;"/>
<p> Image Source: <a href=https://python.langchain.com/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png> Langchain</a> </p>

In [10]:
from langchain_community.document_loaders import WebBaseLoader

# Load fitbit review blog from Google
loader = WebBaseLoader("https://blog.google/products/fitbit/fitbit-charge-6-overview/")
documents = loader.load()

#Preview the contents of the webpage
page_content = documents[0].page_content[12107:19000]
print(page_content[1000:5000])


Fitbit’s Charge 6 boasts plenty of new and improved capabilities, including our most accurate heart rate on a fitness tracker yet. With my New Years’ resolutions fresh on my mind, I was given the chance to try one out for myself. Over the course of a week, I wore my Charge 6 everywhere from the gym to my favorite outdoor running routes. Just as important, I also kept it strapped to my wrist at the office and even at night to help track the quality of my sleep. Here are a few of my favorite ways Fitbit’s most advanced tracker yet helped me take my health and fitness to the next level, while also giving me the ability to tell when it might be a better idea to take it easy.1. The heart rate tracking gave me new insights into my overall health.Whether I was furiously pedaling on my gym’s exercise bike or taking a break between sets on the weight bench, Charge 6 did an impressive job of keeping track of my heart rate. Part of the credit goes to its more accurate heart tracking thanks to an

Do some manual cleaning up for the page content

In [11]:
documents[0].page_content= documents[0].page_content[12707:-1500]

Chunking the website document into smaller pieces

In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 300, chunk_overlap = 10)
all_splits = text_splitter.split_documents(documents)

Preview chunks

In [13]:
for split in all_splits:
    print(split)
print(f"Total chunks: {len(all_splits)}")

page_content='read\n\n\n\n\n\n\nShare\n\n\n\n\n\n\nTwitter\n\n\n\n\n\nFacebook\n\n\n\n\n\nLinkedIn\n\n\n\n\n\nMail\n\n\n\n\n\n\nCopy link\n\n\n\n\n\n\n\n\n\n\n          Fitbit’s latest wearable helped give me plenty of new insights to keep my health and workout goals on track.\n        \n\n\n\n\n\n\n\n\n\n\n\n\n\nMike Darling\n\n      Contributor, The Keyword' metadata={'source': 'https://blog.google/products/fitbit/fitbit-charge-6-overview/', 'title': 'Fitbit Charge 6: What I learned after a week of workouts', 'description': 'Over the course of a week, I wore my Fitbit Charge 6 everywhere from the gym to my favorite outdoor running routes. Here are a few of my favorite ways Fitbit’s most advanced tracker yet helped me take my health and fitness to the next level.', 'language': 'en-us'}
page_content='Share\n\n\n\n\n\n\nTwitter\n\n\n\n\n\nFacebook\n\n\n\n\n\nLinkedIn\n\n\n\n\n\nMail\n\n\n\n\n\n\nCopy link' metadata={'source': 'https://blog.google/products/fitbit/fitbit-charge-6-overview

Create Vector Store from the chunked documents

In [14]:
web_vectorstore = FAISS.from_documents(
    all_splits, embedding=embeddings
)

Create Prompt template

In [15]:
prompt = ChatPromptTemplate.from_messages([
    ("human", "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\
                Question: {question} \
                Context: {context} \
                Answer:")
])

Create RAG chain

In [16]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=web_vectorstore.as_retriever(search_kwargs={"k": 15}),
    chain_type_kwargs={"prompt": prompt}
)

Set Langchain Debug settings to `True` to see the context

In [17]:
set_debug(False)

In [18]:
question = "How's the battery life?"
result = qa_chain({"query": question})
result["result"]

  warn_deprecated(


'The battery life of the Fitbit Charge 6 is up to a week on a full charge and can be fully charged in under 2 hours.'

In [19]:
question = "exercise modes?"
result = qa_chain({"query": question})
result["result"]

'I apologize, but the provided context does not mention anything about exercise modes. Therefore, I cannot answer this question from the provided context.'

In [20]:
question = "Does it have oxygen saturation (SpO2) monitor ?"
result = qa_chain({"query": question})
result["result"]

'Yes, it has an oxygen saturation (SpO2) monitor. It is one of the features that can help you keep an eye out for other irregularities.'

## Using LLM models for Named-entity recognition
Create a prompt for the LLM to detect the entities inside a input sentence. Mentioned in the pormpt for the output to be in JSON format. Then this output can easily integrate with other components of the application.

In [21]:
ner_prompt = ChatPromptTemplate.from_template(
"""You are a powerful language model trained to understand the world. Today, you want to become an expert in identifying named entities in sentences. 
Tell me what kind of things are considered named entities (e.g., people, places, organizations, etc.). Tell me which words are the named entities and what type of entity they are. 
Your ouput needs to be a valid JSON format.
Input: {query}
Ouput:""")

Create LLM chain with Langchain

In [22]:
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()
set_debug(False)

ner_chain = ner_prompt | llm | output_parser

Run the chain

In [23]:
ner_chain.invoke({"query": "Tomorrow, I'm flying to Paris with my friend Sarah to visit the Eiffel Tower."})

'```json\n{\n  "named_entities": [\n    {\n      "text": "Sarah",\n      "type": "PERSON"\n    },\n    {\n      "text": "Paris",\n      "type": "GPE"\n    },\n    {\n      "text": "Eiffel Tower",\n      "type": "FAC"\n    }\n  ]\n}\n```'

In [24]:
ner_chain.invoke({"query":"明天我和朋友莎拉一起去巴黎看埃菲尔铁塔，我们会搭飞机过去"})

'```JSON\n{\n  "entities": [\n    {\n      "text": "莎拉",\n      "type": "PERSON"\n    },\n    {\n      "text": "巴黎",\n      "type": "LOCATION"\n    },\n    {\n      "text": "埃菲尔铁塔",\n      "type": "LOCATION"\n    }\n  ]\n}\n```'