## **Self RAG**

Self-RAGは、言語モデル（LM）によって生成されたテキストの精度と品質を向上させる手法です。関連情報を検索によって見つけ、モデルがその出力について振り返ることを可能にすることで、これを実現します。

モデルは検索された文章の助けを借りてテキストを生成し、その後、振り返りトークンを作成することで自身の回答をチェックします。これらのトークンは、モデルに対して、より多くの情報が必要かどうか、または回答が完全で検索されたデータによって裏付けられているかどうかを伝えます。

Self-RAGは、３つの採点者(Grader)を使います。
- document grader: 検索してきた文書が証拠として使えるか使えないか
- halucination grader: 回答に証拠があるかないか
- answer grader: 回答に妥当性があるかないか

Research Paper: [Self RAG](https://arxiv.org/pdf/2310.11511)

In [1]:
# !pip install --upgrade pip


In [55]:
import os
from dotenv import load_dotenv
load_dotenv()

# # for web search　今回は使わない
# os.environ['TAVILY_API_KEY'] = os.getenv('TAVILY_API_KEY')

# VERTEXAI用の設定
import vertexai
import google.generativeai as genai

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.expanduser("~/.config/gcloud/application_default_credentials.json")
vertexai.init(project=os.getenv("gcp_project_id"), location="us-central1")

# load llm
from langchain_google_vertexai import ChatVertexAI
llm = ChatVertexAI(
    model_name="gemini-2.0-flash-exp",
    project=os.getenv("gcp_project_id"),
    location="us-central1",
    temperature=0
)

# # load data
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("../data/pdf/57_public_スタートアップ育成に向けた政府の取組_file_name=kaisetsushiryou_2024.pdf")
documents = loader.load()

# split documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
documents = text_splitter.split_documents(documents)

# huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using tokenizers before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
os.environ["TOKENIZERS_PARALLELISM"] = "true" # 警告対策　tokenizersライブラリの並列処理を明示的にON 

# load embedding model
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
    model_name="intfloat/multilingual-e5-base",
    encode_kwargs={"normalize_embeddings": True}
)

from langchain.vectorstores import Chroma ## 何故か今日は以下のエラーが出てつかえなかった、機能は使えたのに。謎。仕方がないのでFAISSに変更した。
# RuntimeError: Chroma is running in http-only client mode, and can only be run with 'chromadb.api.fastapi.FastAPI' or 'chromadb.api.async_fastapi.AsyncFastAPI' as the chroma_api_impl.             see https://docs.trychroma.com/guides#using-the-python-http-only-client for more information.
# Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

from langchain.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents, embeddings)
# vectorstore = Chroma.from_documents(documents, embeddings, persist_directory="../data/chroma_db_57")
# vectorstore = Chroma(persist_directory="../data/chroma_db_57", embedding_function=embeddings, client_settings=settings)
# vectorstore = Chroma(persist_directory="../data/chroma_db_57", embedding_function=embeddings)

# create retirever
retriever = vectorstore.as_retriever()

## **Document Grader**
ドキュメント評価者は、ドキュメントが与えられたクエリに関連しているかどうかを評価します。

In [56]:
# create grader for doc retriever
from langchain_core.prompts import ChatPromptTemplate
# from langchain_core.pydantic_v1 import BaseModel, Field
from pydantic import BaseModel, Field

from langchain_openai import ChatOpenAI

# define a data class
class GradeDocuments(BaseModel): 
    """Schema for grading retrieved documents for relevance.
    The field 'binary_score' is expected to be either "yes" or "no" indicating
    whether the document is relevant to the user's question.
    """
    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )

# LLM with function call
structured_llm_grader = llm.with_structured_output(GradeDocuments)

# Prompt for the grader
system = """You are a grader assessing relevance of a retrieved document to a user question. \n
    If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

retrieval_grader = grade_prompt | structured_llm_grader

In [57]:
# testing grader
question = "沖縄科学技術大学院大学(OIST)のスタートアップ支援に関する予算額は？"
docs = retriever.invoke(question)
print(retrieval_grader.invoke({"question": question, "document": docs}))

binary_score='yes'


# RAG Chain

In [58]:
# create document chain
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate

template = """"
You are a helpful assistant that answers questions based on the following context
Context: {context}
Question: {question}
Answer:

"""

prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = prompt | llm | StrOutputParser()

In [7]:
# response
generation = rag_chain.invoke({"context": docs, "question": question})
generation

'沖縄科学技術大学院大学(OIST)のスタートアップ支援に関する予算額は、R4補正で23億円の内数、R5補正で26億円の内数、R6当初で196億円の内数(内閣府)です。\n'

# Hallucination Grader
その答えが与えられた事実に基づいていたり、裏付けられているかどうかを確認する。

(先程のDocument Graderは retrieve したドキュメントが与えられたクエリに関連しているかどうかを評価していた。)

In [59]:
# create grader for hallucination
# define a data class
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in generation answer."""

    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )


# LLM with function call
structured_llm_grader = llm.with_structured_output(GradeHallucinations)

# prompt for the grader
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n
     Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
    ]
)

hallucination_grader = hallucination_prompt | structured_llm_grader
hallucination_grader.invoke({"documents": docs, "generation": generation})

GradeHallucinations(binary_score='no')

## **Answer Grader**
回答が与えられた質問に効果的に対処しているかどうかを評価します。

In [60]:
# create grader for answer
# define a data class
class GradeAnswer(BaseModel):
    """Binary score to assess answer addresses question."""

    binary_score: str = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )


# LLM with function call
structured_llm_grader = llm.with_structured_output(GradeAnswer)

# prompt for the grader
system = """You are a grader assessing whether an answer addresses / resolves a question \n
     Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
answer_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
    ]
)

answer_grader = answer_prompt | structured_llm_grader
answer_grader.invoke({"question": question, "generation": generation})

GradeAnswer(binary_score='no')

# Create Graph

## define Graph State

In [61]:
# define a data class for state
from typing import List
from typing_extensions import TypedDict

class GraphState(TypedDict):
    question: str
    generation: str
    documents: List[str]

In [62]:
# # define graph steps
# from langchain.schema import Document
from pprint import pprint

# all nodes
def retrieve(state):

    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}


def generate(state):

    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}


def grade_documents(state):

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            continue
    return {"documents": filtered_docs, "question": question}


# edges
def decide_to_generate(state):

    print("---ASSESS GRADED DOCUMENTS---")
    state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # All documents have been filtered check_relevance
        print(
            "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION---"
        )
        return "no_relevant_documents"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"


def grade_generation_v_documents_and_question(state):

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

## Build Graph

In [63]:
# Build graph
from langgraph.graph import END, StateGraph, START

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("retrieve", retrieve)
workflow.add_node("grade_documents", grade_documents)
workflow.add_node("generate", generate)

# Build graph
workflow.add_edge(START, "retrieve")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "generate": "generate",
        "no_relevant_documents": END,
    },
)

workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "not useful": "generate",
        "useful": END,
    },
) #  "not supported"、"not useful" どちらを返しても、次回の生成が行われるようになります。

# Compile
app = workflow.compile()

In [None]:
# Final generation example 1 (relevant documents)
from pprint import pprint

inputs = {"question": "ディープテック・スタートアップの起業・経営人材確保等支援事業の予算額は？"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Node '{key}':")
    pprint("\n---\n")

if "generation" in value:
    pprint(value["generation"])
else:
    pprint("No relevant documents found or no generation produced.")

---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..


---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..


---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 8.0 seconds as it raised ResourceExhausted: 429 Quota excee

---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..


---CHECK HALLUCINATIONS---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..


---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION DOES NOT ADDRESS QUESTION---
"Node 'generate':"
'\n---\n'
---GENERATE---


Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: gemini-experimental. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..


---CHECK HALLUCINATIONS---


In [None]:
# example 2 (no relevant documents)
from pprint import pprint

inputs = {"question": "図書館間貸借制度はどのように機能していますか？"}

for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Node '{key}':")
    pprint("\n---\n")

if "generation" in value:
    pprint(value["generation"])
else:
    pprint("No relevant documents found or no generation produced.")

## **Preparing Data for Evaluation**

In [None]:
# Create a dataframe to store the question, context, and response
import time

inputs = [
{"question": "グローバル・スタートアップ・アクセラレーションプログラムの予算額は？"},
{"question": "ディープテック・スタートアップの起業・経営人材確保等支援事業の予算額は？"},
{"question": "図書館間貸借制度はどのように機能していますか？"},
]

outputs = []

for inp in inputs:
    try:
        for output in app.stream(inp):
            evaluation_results = {
                "document_relevance": [],
                "hallucination_check": False,
                "answers_question": False
            }
            
            for key, value in output.items():
                if key == "generate":
                    question = value["question"]
                    documents = value["documents"]
                    generation = value["generation"]
                    
                    # コンテキストを文字列のリストとして保持
                    contexts = [doc.page_content for doc in documents]

                    check_relevance = output.get("document_relevance", [])
                    # 評価結果を記録

                    # ドキュメントの関連性（RELEVANT/NOT RELEVANTのリスト）
                    evaluation_results["document_relevance"] = [ doc for doc, relevance in zip(contexts, check_relevance) if "---GRADE: DOCUMENT RELEVANT---" in relevance ]
                
                    
                    # 幻覚チェック結果
                    evaluation_results["hallucination_check"] = "---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---" in str(output)
                    
                    # 質問への適合性
                    evaluation_results["answers_question"] = "---DECISION: GENERATION DOES NOT ADDRESS QUESTION---" not in str(output)
                    
                    # Append the result with ragas format and evaluation results
                    outputs.append({
                        "question": question,
                        "contexts": contexts,
                        "answer": generation,
                        "evaluation": evaluation_results
                    })
    except Exception as e:
        if "Quota exceeded" in str(e):
            print("生成要求のクォータ上限に達しました。後ほど再実行するか、Vertex AIのクォータ増加を申請してください。")
        else:
            print(f"エラーが発生しました: {e}")
        # クォータ超過の場合は、少し待ってから次の質問へ進むか、もしくは処理をスキップ
        time.sleep(10)

In [None]:
# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(outputs)
df

In [None]:
df['evaluation'][1]['hallucination_check']

In [None]:
# ragasのフォーマットに変換
from typing import Dict
import pandas as pd

def create_ragas_dataset(df: pd.DataFrame) -> List[Dict]:
    ragas_dataset = []
    for _, row in df.iterrows():
        ragas_dataset.append({
            "question": row['question'],
            "answer": row['answer'],
            "contexts": row['contexts']  # これは既にリスト形式であることを想定
        })
    return ragas_dataset

# 使用例
eval_dataset = create_ragas_dataset(df)

## **Evaluation in RAGAS**

In [None]:
# 評価の実行
metrics = [
    AnswerRelevancy(),
    # ContextRelevancy(),
    # Faithfulness(),
    # Conciseness()
]

results = evaluate(
    eval_dataset,
    metrics=metrics
)

# 結果の表示
print(results.to_pandas())