<a href="https://colab.research.google.com/github/hyrule-coder/langchain-book-learning/blob/main/chapter7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 7. LangSmithを使ったRAGアプリケーションの評価

In [None]:
## 7.4 Ragasによる合成テストデータの生成

### パッケージのインストール

In [1]:
!pip install langchain-core==0.2.30 langchain-openai==0.1.21 \
     langchain-community==0.2.12 GitPython==3.1.43 \
     langchain-chroma==0.1.2 chromadb==0.5.3 \
     ragas==0.1.14 nest-asyncio==1.6.0



In [2]:
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = userdata.get("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "agent-book"

### 検索対象のドキュメントのロード

In [3]:
from langchain_community.document_loaders import GitLoader

def file_filter(file_path: str) -> bool:
  return file_path.endswith(".mdx")

loader = GitLoader(
    clone_url="https://github.com/langchain-ai/langchain",
    repo_path="langchain",
    branch="master",
    file_filter=file_filter,
)

documents = loader.load()
print(len(documents))

390


### Ragasによる合成テストデータの生成の実装

gpt-4oで下記のコードを実行するとembeddingの段階でRimitErrorがでる。
品質が悪くても、完了できるgpt-4o-miniを選んだ。

In [4]:
for document in documents:
  document.metadata["filename"] = document.metadata["source"]

In [6]:
import nest_asyncio
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

nest_asyncio.apply()

generator = TestsetGenerator.from_langchain(
    generator_llm=ChatOpenAI(model="gpt-4o-mini"),
    critic_llm=ChatOpenAI(model="gpt-4o-mini"),
    embeddings=OpenAIEmbeddings(),
)

testset = generator.generate_with_langchain_docs(
    documents,
    test_size=4,
    distributions={simple: 0.5, reasoning:0.25, multi_context: 0.25 }
)

embedding nodes:   0%|          | 0/1224 [00:00<?, ?it/s]

Generating:   0%|          | 0/4 [00:00<?, ?it/s]

In [8]:
testset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What are the steps involved in the installatio...,[# Runhouse\n\nThis page covers how to use the...,The steps involved in the installation and set...,simple,[{'source': 'docs/docs/integrations/providers/...,True
1,What is required to initialize the loader for ...,[# Reddit\n\n>[Reddit](https://www.reddit.com)...,"To initialize the loader for Reddit, you need ...",simple,[{'source': 'docs/docs/integrations/providers/...,True
2,What env var is needed for VoyageAI?,[# VoyageAI\n\nAll functionality related to Vo...,The environment variable needed for VoyageAI i...,reasoning,[{'source': 'docs/docs/integrations/providers/...,True
3,How does reranking improve retrieval for embed...,[# VoyageAI\n\nAll functionality related to Vo...,The answer to given question is not present in...,multi_context,[{'source': 'docs/docs/integrations/providers/...,True




### LangSmithのDatasetの作成

In [27]:
from langsmith import Client

dataset_name="agent-book"

client=Client()

if client.has_dataset(dataset_name=dataset_name):
  client.delete_dataset(dataset_name=dataset_name)

dataset = client.create_dataset(dataset_name=dataset_name)

### 合成テストデータの保存

In [28]:
inputs = []
outputs = []
metadatas = []

for testset_record in testset.test_data:
  inputs.append(
      {
          "question": testset_record.question,
      }
  )

  outputs.append(
      {
          "contexts": testset_record.contexts,
          "ground_truth": testset_record.ground_truth,
      }
  )

  metadatas.append(
      {
          "source": testset_record.metadata[0]["source"],
          "evolution_type": testset_record.evolution_type,
      }
  )


In [29]:
client.create_examples(
    inputs=inputs,
    outputs=outputs,
    metadatas=metadatas,
    dataset_id=dataset.id,
)

## 7.5 LangSmithとRagasを使ったオフライン評価の実装

In [13]:
from typing import Any

from langchain_core.embeddings import Embeddings
from langchain_core.language_models import BaseLanguageModel
from langsmith.schemas import Example, Run
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.llms import LangchainLLMWrapper
from ragas.metrics.base import Metric, MetricWithEmbeddings, MetricWithLLM

class RagasMetricEvaluator:
  def __init__(
      self,
      metric: Metric,
      llm: BaseLanguageModel,
      embeddings: Embeddings,
  ):
    self.metric = metric

    if isinstance(self.metric, MetricWithLLM):
      self.metric.llm = LangchainLLMWrapper(llm)
    if isinstance(self.metric, MetricWithEmbeddings):
      self.metric.embeddings = LangchainEmbeddingsWrapper(embeddings)
  def evaluate(self, run:Run, example:Example) -> dict[str, Any]:
    context_strs = [doc.page_content for doc in run.outputs["contexts"]]

    # Ragasの評価メトリクスのscoreメソッドでスコアを算出
    score = self.metric.score(
        {
          "question":example.inputs["question"], #質問
          "answer":run.outputs["answer"], #実際の回答
          "contexts":context_strs, #実際の検索結果
          "ground_truth":example.outputs["ground_truth"], #期待する回答
        }
    )
    return {"key": self.metric.name, "score": score}

In [15]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.metrics import answer_relevancy, context_precision

metrics = [context_precision, answer_relevancy]

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

evaluators=[
    RagasMetricEvaluator(metric, llm, embeddings).evaluate
    for metric in metrics
]


### 推論の関数の実装

In [16]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
db = Chroma.from_documents(documents, embeddings)


In [17]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_template('''\
以下の文脈だけを踏まえて質問に回答してください。

文脈: """
{context}
"""

質問: {question}
''')

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retriever = db.as_retriever()

chain = RunnableParallel(
    {
        "question": RunnablePassthrough(),
        "context": retriever,
    }
).assign(answer=prompt | model | StrOutputParser())


In [18]:
def predict(inputs: dict[str, Any]) -> dict[str, Any]:
  question = inputs["question"]
  output = chain.invoke(question)
  return {
      "contexts": output["context"],
      "answer": output["answer"],
  }

### オフライン評価の実装・実行

In [30]:
from langsmith.evaluation import evaluate

evaluate(
  predict,
  data="agent-book",
  evaluators=evaluators,
)


View the evaluation results for experiment: 'notable-potato-79' at:
https://smith.langchain.com/o/e8857daa-a6d4-4374-9d4f-071e0dabcf2e/datasets/3082d07e-0445-43ea-8536-6fdea5ceba7a/compare?selectedSessions=0e15ceed-b9ab-43ed-9603-c58cdc4a59f8




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.contexts,outputs.answer,error,reference.contexts,reference.ground_truth,feedback.context_precision,feedback.answer_relevancy,execution_time,example_id,id
0,What is required to initialize the loader for ...,[page_content='# Reddit\n\n>[Reddit](https://w...,"To initialize the loader for Reddit, you need ...",,[# Reddit\n\n>[Reddit](https://www.reddit.com)...,"To initialize the loader for Reddit, you need ...",1.0,0.855926,1.852851,3ce16d08-9349-4806-8a71-b1ec9cfaa71d,b37f655a-84c3-439a-895f-ce896a86653e
1,How does reranking improve retrieval for embed...,[page_content='# VoyageAI\n\nAll functionality...,Reranking improves retrieval for embedding mod...,,[# VoyageAI\n\nAll functionality related to Vo...,The answer to given question is not present in...,0.0,0.999701,4.716979,a9cf0e92-2963-4b87-ac18-84ed0e1eedbc,60e40ef0-9fac-47b0-b134-cefb37df0d30
2,What are the steps involved in the installatio...,[page_content='# Runhouse\n\nThis page covers ...,The steps involved in the installation and set...,,[# Runhouse\n\nThis page covers how to use the...,The steps involved in the installation and set...,1.0,1.0,5.460566,f65ee58a-e598-434e-99fa-74ec868d6190,529f444c-205e-48cb-833e-38e05c83100a
3,What env var is needed for VoyageAI?,[page_content='# VoyageAI\n\nAll functionality...,"For VoyageAI, you need to set the environment ...",,[# VoyageAI\n\nAll functionality related to Vo...,The environment variable needed for VoyageAI i...,1.0,0.855135,1.913356,6a21bd6c-3245-450e-994e-323c839ec877,e808a060-b9ce-40fd-8ab4-8bae54a03f24


## 7.6LangSmithを使ったフィードバックの収集

### 　フィードバックボタンを表示する関数の実装

In [34]:
from uuid import UUID

import ipywidgets as widgets
from IPython.display import display
from langsmith import Client

def display_feedback_buttons(run_id: UUID) -> None:
  # GoodボタンとBadボタンを準備
  good_button = widgets.Button(
    description="Good",
    button_style="success",
    icon="thumbs-up",
  )
  bad_button = widgets.Button(
    description="Bad",
    button_style="danger",
    icon="thumbs-down",
  )

  # クリックされた際に実行される関数を定義
  def on_button_clicked(button: widgets.Button) -> None:
    if button == good_button:
      score = 1
    elif button == bad_button:
      score = 0
    else:
      raise ValueError(f"Unknown button: {button}")

    client = Client()
    client.create_feedback(run_id=run_id, key="thumbs", score=score)
    print("フィードバックを送信しました")


### フィードバックボタンを表示

In [35]:
from langchain_core.tracers.context import collect_runs

# LangSmithのトレースのID(Run ID)を取得するため、collect_runs関数を使用
with collect_runs() as runs_cb:
  output = chain.invoke("LangChainの概要を教えて")
  print(output["answer"])
  run_id = runs_cb.traced_runs[0].id

display_feedback_buttons(run_id)

LangChainは、大規模言語モデル（LLM）を活用したアプリケーションを開発するためのフレームワークです。このフレームワークは、LLMアプリケーションのライフサイクルの各段階を簡素化します。具体的には、以下のような機能を提供しています。

1. **開発**: LangChainのオープンソースコンポーネントやサードパーティ統合を使用してアプリケーションを構築できます。LangGraphを利用することで、状態を持つエージェントを構築し、ストリーミングや人間の介入をサポートします。

2. **生産化**: LangSmithを使用してアプリケーションを検査、監視、評価し、継続的に最適化して自信を持ってデプロイできます。

3. **デプロイ**: LangGraphアプリケーションを生産準備が整ったAPIやアシスタントに変換できます。

LangChainは、LLMや関連技術（埋め込みモデルやベクターストア）に対する標準インターフェースを実装しており、数百のプロバイダーと統合されています。また、複数のオープンソースライブラリで構成されており、ユーザーは特定のニーズに応じてコンポーネントを選択して使用できます。


↑ボタンは表示されなかった