# Week 2: RAG

Two methods to equip a model with new knowledge:
1. **RAG (Retrieval-Augmented Generation)**
2. Fine-tuning

---

In this tutorial, we will:  
1. Demonstrate the limitations of LLMs with examples.
2. Build a RAG using LangChain and LangGraph.
3. Enable the LLM's web search functionality.  

## RAG workflow

<img src="https://miro.medium.com/v2/resize:fit:720/format:webp/1*8_4t5Rno_9lQaMpmR33g7g.jpeg">




## Table of Contents

**It is recommended to use the TOC in the sidebar in Colab.**

- [Install dependencies](#install-dependencies)
- [Setup Google Gemini API Key](#setup-google-gemini-api-key)
  - [Register](#register)
  - [Enter your API Key for this Colab](#enter-your-api-key-for-this-colab)
  - [The LLM Model we're going to use](#the-llm-model-were-going-to-use)
- [Limitations of Language Models](#limitations-of-language-models)
  - [Hallucinations](#hallucinations)
    - [Let's try in legal fields](#lets-try-in-legal-fields)
  - [Knowledge cut-off](#knowledge-cut-off)
    - [LLMs won't know about things that happened recently](#llms-wont-know-about-things-that-happened-recently)
- [Let's build RAG to solve it!](#lets-build-rag-to-solve-it)
  - [The Knowledge we want LLM to know](#the-knowledge-we-want-llm-to-know)
  - [Web Scraping these Knowledge](#web-scraping-these-knowledge)
  - [Splitting and Chunking Data](#splitting-and-chunking-data)
  - [Setup Embedding Model](#setup-embedding-model)
  - [Vector Database](#vector-database)
  - [Save embedding into Vector DB](#save-embedding-into-vector-db)
  - [Define Prompt](#define-prompt)
  - [Building RAG Workflow](#building-rag-workflow)
  - [Done! Let's try it!](#done-lets-try-it)
    - [Some question you can try](#some-question-you-can-try)
    - [Trying harder question](#trying-harder-question)
- [RAG Use Case Overview](#rag-use-case-overview)
- [Enable Web Search Functionalities](#enable-web-search-functionalities)
  - [Gemini Now Performs Web Searches Before Answering](#gemini-now-performs-web-searches-before-answering)

# 1.Install dependencies

In [1]:
%%capture

!pip install langchain langchain-google-genai
!pip install langchain-text-splitters langchain-community langgraph

# 2.Setup Google Gemini API Key

## 2.1 Register

1. Visit the website: [https://aistudio.google.com/](https://aistudio.google.com/) and log in with your Google account.

2. Click on "Get API key."

<img src="https://lh3.google.com/u/0/d/16x6gM2WAvmbOkayzKGtfnavNPFp3Pgz4">

3. Agree to the terms of use by selecting only the first checkbox.

<img src="https://lh3.google.com/u/0/d/1bN1iR64XS-ibE-L47Dy_nQUu-idh24eS">

4. Generate API Keys  

  Click the "Create API key" button. You will see two options:  
- The first option, **"Create API key in new project"**, will create a new GCP (Google Cloud Platform) project and generate a new key.  
- The second option allows you to select an existing GCP project where the key will be created, if you've used GCP before.  

If this is your first time using GCP, select the first option.

<img src="https://lh3.google.com/u/0/d/1yhNB5BT6Wtobxjhlb9cAJ4FGUv75CG3p">

5. Copy the Google Gemini API key.

<img src="https://lh3.google.com/u/0/d/1J9sO5UMCz_ylNO27ku8vJr9_rKp63k5T">

## 2.2 Enter your API Key for this Colab

In [2]:
import os
import getpass

try:
    from google.colab import userdata
    api_key = userdata.get('GOOGLE_API_KEY')
except Exception as e:
    api_key = None

if not api_key:
    api_key = getpass.getpass("Enter your Google AI API key: ")

os.environ["GOOGLE_API_KEY"] = api_key

## 2.3 The LLM Model we're going to use

In [3]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=1.0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

# 3.Limitations of Language Models

## 3.1 Hallucinations

Language models sometimes generate responses that appear plausible but are factually incorrect or entirely fabricated. This phenomenon, known as "hallucination," can mislead users, especially in contexts requiring high accuracy, such as medical, legal, or technical fields.

### 3.1.1 Let's try in legel fields

Truth is, **there is no Article 1226 in Civil Code**, we can verify it.

[Civil Code Search(Chinese ver)](https://law.moj.gov.tw/LawClass/LawSearchCNKey.aspx?BTNType=NO&pcode=B0000001)

[Civil Code Search(English ver)](https://law.moj.gov.tw/ENG/LawClass/LawSearchCNKey.aspx?BTNType=NO&pcode=B0000001)

In [4]:
messages = [
    ("human", "What is the specific content of Article 1226 in the Civil Code(民法) of Taiwan? Reply in Chinese."),
]
ai_msg = llm.invoke(messages)

print(ai_msg.content)

台灣民法第1226條的內容如下：

**第1226條**

特留分受侵害時，繼承人得行使扣減權。

前項扣減權之行使，以扣減至特留分應得之額為限。

被繼承人以遺囑處分其財產，如侵害繼承人之特留分，繼承人得向受遺贈人請求返還，及向因遺贈取得權利之人請求塗銷其權利之登記。

**簡而言之， Article 1226 規定了以下重點：**

*   **特留分受侵害，繼承人有扣減權：** 當遺囑或贈與使繼承人應得的特留分受到損害時，該繼承人可以行使扣減權。
*   **扣減權的行使範圍：** 扣減權的行使以扣減到繼承人應得的特留分金額為限。 換句話說， 繼承人不能透過扣減權取得超過其特留分的金額。
*   **侵害特留分的處理方式：** 如果被繼承人透過遺囑處分財產侵害了繼承人的特留分，繼承人可以：
    *   向受遺贈人請求返還財產。
    *   向因遺贈取得權利的人請求塗銷其權利登記。

**解釋：**

*   **特留分 (tèliúfèn):**  法律保障給特定繼承人的最低繼承比例，即使遺囑內容不同，這些繼承人仍然有權繼承這部分財產。  通常是直系血親卑親屬（例如子女）、父母和配偶。
*   **扣減權 (kòujiǎn quán):** 繼承人為保護其特留分所擁有的權利，可以要求超過特留分部分的遺贈或贈與進行扣減。
*   **遺贈 (yízhèng):** 遺囑中指定將特定財產贈與給某人的行為。

因此， Article 1226 主要目的是保障繼承人的特留分，並賦予他們扣減權以對抗侵害特留分的遺囑或贈與。


## 3.2 Knowledge cut-off

The knowledge of a language model is limited to the data it was trained on, up to a specific cut-off date. As a result, it cannot provide information about events, discoveries, or updates that occurred after that point, making it less reliable for addressing recent developments.

### 3.2.1 LLMs won't know about things that happened recently

The 47th United States presidential election took place on **November 5, 2024**.

The model we're using, `gemini-2.0-flash`, knowledge cutoff at **June 2024** ([see more about model](https://deepmind.google/technologies/gemini/flash/))

In [5]:
messages = [
    ("human", "Who won the 47th US President election?"),
]
ai_msg = llm.invoke(messages)

print(ai_msg.content)

There was no 47th US Presidential election. Joe Biden is the 46th and current US President.


# 4.Let's build RAG to slove it!

## 4.1 The Knowledge we want LLM to know

These are news reports about events that occurred between late 2024 and early 2025, which the LLM is not yet aware have taken place.

1. [Trump seeks to force TSMC negotiations, experts say](https://www.taipeitimes.com/News/biz/archives/2025/02/10/2003831601) - 專家：川普試圖迫使台積電進行談判
2. [Instagram ‘Teen Accounts’ go live in Taiwan today](https://www.taipeitimes.com/News/taiwan/archives/2025/02/11/2003831708) - Instagram「青少年帳號」今日在台灣上線
3. [Donald Trump wins US presidency](https://www.taipeitimes.com/News/front/archives/2024/11/07/2003826511) - 川普當選美國總統
4. [Team Taiwan claim U-12 Asian baseball title](https://www.taipeitimes.com/News/front/archives/2024/11/30/2003827724) - 台灣隊奪得U-12亞洲棒球錦標賽冠軍
5. [What is DeepSeek and why is it disrupting the AI sector?](https://www.taipeitimes.com/News/lang/archives/2025/02/11/2003831649) - 什麼是DeepSeek？為何它正在顛覆AI產業？

## 4.2 Web Scraping these Knowledge

Use web scraping on the five news articles mentioned above to extract the textual content of the reports.

In [6]:
import bs4
os.environ['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
from langchain_community.document_loaders import WebBaseLoader


# Load and chunk contents
loader = WebBaseLoader(
    web_paths=(
        "https://www.taipeitimes.com/News/biz/archives/2025/02/10/2003831601",
        "https://www.taipeitimes.com/News/taiwan/archives/2025/02/11/2003831708",
        "https://www.taipeitimes.com/News/front/archives/2024/11/07/2003826511",
        "https://www.taipeitimes.com/News/front/archives/2024/11/30/2003827724",
        "https://www.taipeitimes.com/News/lang/archives/2025/02/11/2003831649"
        ),
    )

docs = loader.load()

assert len(docs) == 5

print("Finished crawing news form urls.")

for i, doc in enumerate(docs):
    print(f"News {i + 1}:")
    print(f"  Source URL: {doc.metadata['source']}")
    print(f"  Total Characters: {len(doc.page_content.strip())}")
    print("-" * 30)

Finished crawing news form urls.
News 1:
  Source URL: https://www.taipeitimes.com/News/biz/archives/2025/02/10/2003831601
  Total Characters: 8429
------------------------------
News 2:
  Source URL: https://www.taipeitimes.com/News/taiwan/archives/2025/02/11/2003831708
  Total Characters: 6647
------------------------------
News 3:
  Source URL: https://www.taipeitimes.com/News/front/archives/2024/11/07/2003826511
  Total Characters: 8275
------------------------------
News 4:
  Source URL: https://www.taipeitimes.com/News/front/archives/2024/11/30/2003827724
  Total Characters: 6967
------------------------------
News 5:
  Source URL: https://www.taipeitimes.com/News/lang/archives/2025/02/11/2003831649
  Total Characters: 10854
------------------------------


## 4.3 Splitting and Chunking Data

There is too much characters here, and providing all of it to the LLM at once is not a good choice. Therefore, we need to break those into smaller chunks while preserving the semantics.

- `chunk_size`: The maximum size of each chunk.
- `chunk_overlap`: The number of overlapping characters between consecutive chunks. This ensures semantic continuity across chunks and prevents loss of context.

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # chunk size (characters)
    chunk_overlap=200,  # chunk overlap (characters)
    add_start_index=True,  # track index in original document
)
all_splits = text_splitter.split_documents(docs)

print(f"Split all news into {len(all_splits)} sub-documents.")

Split all news into 58 sub-documents.


## 4.4 Setup Embedding Model

The Embedding Model can convert the semantics of a sentence into a high-dimensional vector.

In [8]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

print(embeddings.embed_query("What's our Q1 revenue?"))

[0.040674030780792236, 0.006255019456148148, -0.013568978756666183, -0.0003686861018650234, 0.04303165152668953, 0.04935013875365257, -0.013514830730855465, -0.027903610840439796, -0.03995805233716965, -0.006844368763267994, 0.0013024156214669347, -0.009539234451949596, 0.0705987736582756, -0.009862210601568222, 0.03167127072811127, -0.02663198858499527, -0.018167555332183838, -0.005245935637503862, -0.14866198599338531, -0.01596848852932453, 0.02811194583773613, -0.0018506837077438831, -0.025303209200501442, -0.01434125192463398, -0.03104301728308201, -0.07088255137205124, 0.011673162691295147, 0.008746510371565819, 0.003015926806256175, -0.010475549846887589, -6.184780795592815e-05, -0.0014338439796119928, -0.03641575202345848, -0.0519932359457016, -0.02123081497848034, 0.03613690286874771, -0.03694721683859825, 0.06530386954545975, 0.031148776412010193, -0.05865824222564697, -0.033094197511672974, -0.002400598954409361, -0.039360735565423965, 0.001522441511042416, 0.0348703004419803

## 4.5 Vector Database

In this tutorial, for the sake of convenience and speed, we will directly use RAM as the storage location for vector data.

In [9]:
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

## 4.6 Save embedding into Vector DB

In [10]:
# Index chunks
_ = vector_store.add_documents(documents=all_splits)

## 4.7 Define Prompt

This prompt is used in the generation stage of RAG, providing instructions to the LLM to generate a coherent and fluent response.

In [11]:
# Define prompt for question-answering
from langchain import hub

prompt = hub.pull("rlm/rag-prompt")

example_messages = prompt.invoke(
    {"context": "(context goes here)", "question": "(question goes here)"}
).to_messages()

assert len(example_messages) == 1
print(example_messages[0].content)

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: (question goes here) 
Context: (context goes here) 
Answer:




## 4.8 Building RAG Workflow

The RAG we are building will primarily consist of two steps: Retrieval and Generation.

 - **Retrieval:** The question is used to perform a vector search, retrieving the most semantically relevant sub-documents from the previously created sub-documents.

 - **Generation:** The content of the retrieved sub-documents is provided to the LLM, which generates a complete answer based on the question.

In [12]:
from langchain_core.documents import Document
from typing_extensions import List, TypedDict

from langgraph.graph import START, StateGraph

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

## 4.9 Done! Let's try it!

### 4.9.1 Some question you can try

```plaintext
Who was elected the 47th president of the United States?
```

```plaintext
What is the purpose of Trump's proposed tariffs on Taiwanese semiconductors?
```

```plaintext
What is DeepSeek, and why is it significant in the AI sector?
```

```plaintext
What impact has DeepSeek’s success had on big tech companies like Nvidia?
```

In [13]:
response = graph.invoke({"question": "Who was elected the 47th president of the United States?"})
print(response["answer"])

Donald Trump was elected the 47th president of the United States. He secured 277 electoral votes, surpassing the 270 needed to win the presidency, while Kamala Harris had 224. Trump received 71,289,216 votes nationwide, representing 51 percent of the total.


### 4.9.2 Trying harder question

This RAG we built is quite smart; you can try more questions in the `news-QA-dataset.json` file.

#### About 2024 Asian Baseball Championship Final

```plaintext
What was the score of Taiwan’s loss to South Korea in division B?
```

```plaintext
What action by South Korea’s pitcher allowed Su Yu-hsiang to steal second base in the final?
```

In [14]:
response = graph.invoke({"question": "What was the score of Taiwan’s loss to South Korea in division B?"})
print(response["answer"])

Taiwan lost to South Korea 0-1 in division B of the U-12 Asian Baseball Championship. However, Taiwan later defeated South Korea 5-1 in the final to win the championship. It was Taiwan's eighth championship in total.


# 5.RAG Use Case Overview

At this stage, we have successfully enabled the LLM to access the database we created and respond to queries based on the retrieved data.

However, for **public data** such as news searches, implementing RAG is not always necessary. Existing LLM products, such as ChatGPT, already integrate web search functionalities, allowing them to directly retrieve real-time public information.

### Key Insight: When to Use RAG
The **primary use case for RAG** lies in handling **enterprise data** or **offline data**. These types of **non-public data** are more suitable for RAG as a supplementary source for LLMs.

# 6.Enable Web Search Functionalities

Google's LLM Also Comes with Built-in Web Search Functionality

You can directly try this option in [Studio](https://aistudio.google.com/). However, here we will try it out using code instead.

In [15]:
#@markdown # 6.1 Building Library
#@markdown Enable the GoogleSearchRetrieval tool feature using the Google Gen AI SDK.

%%capture
!pip install google-genai

from google import genai
from google.genai import types

def query_google_genai(question):
    """
    Queries Google GenAI with a given question and returns the result.
    This function uses the GoogleSearchRetrieval tool to perform a web search.

    Parameters:
        question (str): The question to query.

    Returns:
        str: The text content of the query result.
    """
    # Initialize the GenAI client
    client = genai.Client()

    # Send the query request
    response = client.models.generate_content(
        model='gemini-2.0-flash',
        contents=question,
        config=types.GenerateContentConfig(
            tools=[types.Tool(
                google_search=types.GoogleSearchRetrieval
            )]
        )
    )

    # Process the response and return the result
    if response.candidates:
        first_candidate = response.candidates[0]
        if first_candidate.content and first_candidate.content.parts:
            return first_candidate.content.parts[0].text

    # Return a default message if no valid response is received
    return "No valid response received."

## 6.2 Gemini Now Performs Web Searches Before Answering

Now it will say "I don't know" instead of fabricating a non-existent law.  

Therefore, the primary benefit of RAG is to **significantly reduce hallucinations**.

In [16]:
question = "What is the specific content of Article 1226 in the Civil Code(民法) of Taiwan? Reply in Chinese."

result = query_google_genai(question)
print(result)


很抱歉，我找不到台灣民法第1226條的具體內容。我找到了一些關於其他國家民法典第1226條的資訊，以及一些提到台灣民法的網站，但沒有一個提供你所要求的特定條文。
