# Day 3-1
https://colab.research.google.com/drive/1Nvq7ogHRZNglsdjoNsS9xc3QQoUQoV2T#scrollTo=a9c910c1-738c-4bf7-bf9e-801862b227eb

https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb

검색엔진 Tavily

In [1]:
# 필요한 라이브러리 설치
# !pip install langchain-nomic
# !pip install -U langchain-nomic langchain_community tiktoken langchainhub chromadb langchain langgraph tavily-python langchain-text-splitters 'nomic[local]'

In [2]:
from tavily import TavilyClient
import getpass
tavily = TavilyClient(api_key=getpass.getpass())

In [3]:
response = tavily.search(query="Where does Messi play right now?", max_results=3)

In [4]:
response

{'query': 'Where does Messi play right now?',
 'follow_up_questions': None,
 'answer': None,
 'images': None,
 'results': [{'title': 'Is Lionel Messi playing today? Status for next Inter Miami game in 2024 ...',
   'url': 'https://www.sportingnews.com/us/soccer/news/lionel-messi-playing-today-inter-miami-game-2024/129c2c378fee4d1f0102aa9d',
   'content': '* Lionel Messi did not participate. Inter Miami schedule for Leagues Cup. The 2024 Leagues Cup is scheduled to begin on July 26, running for a month while the MLS season pauses play.. The final ...',
   'score': 0.98773,
   'raw_content': None},
  {'title': 'Where will Lionel Messi play in 2024? Cities, stadiums Inter Miami ...',
   'url': 'https://www.sportingnews.com/us/soccer/news/where-lionel-messi-play-2024-inter-miami-cities-road-schedule/23334c5768cebee9021e71d0',
   'content': "Here is how Inter Miami's road schedule will look for the coming regular season:\nInter Miami home stadium for 2024 MLS season\nFor their home matches 

In [5]:
# response['results'] 에서 url과 content를 가져와서 context 리스트에 담아주기
context = [{"url": obj["url"], "content": obj["content"]} for obj in response['results']]
context

[{'url': 'https://www.sportingnews.com/us/soccer/news/lionel-messi-playing-today-inter-miami-game-2024/129c2c378fee4d1f0102aa9d',
  'content': '* Lionel Messi did not participate. Inter Miami schedule for Leagues Cup. The 2024 Leagues Cup is scheduled to begin on July 26, running for a month while the MLS season pauses play.. The final ...'},
 {'url': 'https://www.sportingnews.com/us/soccer/news/where-lionel-messi-play-2024-inter-miami-cities-road-schedule/23334c5768cebee9021e71d0',
  'content': "Here is how Inter Miami's road schedule will look for the coming regular season:\nInter Miami home stadium for 2024 MLS season\nFor their home matches through the 2024 campaign, Inter Miami will once again play at\xa0DRV PNK Stadium in Fort Lauderdale, Florida.\n Cities, stadiums Inter Miami visit on road MLS schedule for new season\nWith Lionel Messi set to embark on his first full season with Inter Miami, fans across the United States will be clamoring to see when the Argentine superstar wil

In [6]:
# You can easily get search result context based on any max tokens straight into your RAG.
# The response is a string of the context within the max_token limit.

response_context = tavily.get_search_context(query="Where does Messi play right now?", search_depth="advanced", max_tokens=500)
response_context

'"[\\"{\\\\\\"url\\\\\\": \\\\\\"https://www.sportingnews.com/us/soccer/news/where-lionel-messi-play-2024-inter-miami-cities-road-schedule/23334c5768cebee9021e71d0\\\\\\", \\\\\\"content\\\\\\": \\\\\\"Here is how Inter Miami\'s road schedule will look for the coming regular season:\\\\\\\\nInter Miami home stadium for 2024 MLS season\\\\\\\\nFor their home matches through the 2024 campaign, Inter Miami will once again play at\\\\\\\\u00a0DRV PNK Stadium in Fort Lauderdale, Florida.\\\\\\\\n Cities, stadiums Inter Miami visit on road MLS schedule for new season\\\\\\\\nWith Lionel Messi set to embark on his first full season with Inter Miami, fans across the United States will be clamoring to see when the Argentine superstar will visit their city in 2024.\\\\\\\\n MLS Season Pass is separate from Apple TV+, meaning those with Apple TV+ would still need an MLS Season Pass subscription to access the complete slate of games, while those without Apple TV+ can still sign up for MLS Season P

In [7]:
# You can also get a simple answer to a question including relevant sources all with a simple function call:
# You can use it for baseline
response_qna = tavily.qna_search(query="Where does Messi play right now?")
response_qna

'Lionel Messi currently plays for Inter Miami in the MLS. He is expected to make his full MLS debut against Charlotte in Fort Lauderdale.'

In [8]:
import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass()

In [10]:
# will use llama3 model with Ollama
local_llm = 'llama3'

In [11]:
### Index
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.embeddings import GPT4AllEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorDB
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma-2",
    embedding=GPT4AllEmbeddings(model_name="all-MiniLM-L6-v2.gguf2.f16.gguf"),
)
retriever = vectorstore.as_retriever()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [12]:
### Router
# 어떤 질문이 벡터 스토어를 사용하여 검색해야 하는지 또는 웹 검색을 사용해야 하는지를 결정하는 라우팅 모델을 생성하고 사용함
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

# LLM
llm = ChatOllama(model=local_llm, format="json", temperature=0)

# 사용자의 질문을 vectorstore or web search 로 라우팅 하는 전문가
# vectorstore 사용: LLM agents, prompt engineering, adversarial attacks 질문. 키워드를 엄격하게 제한할 필요는 없음
# web_search 사용: 그 외
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an expert at routing a
    user question to a vectorstore or web search. Use the vectorstore for questions on LLM  agents,
    prompt engineering, and adversarial attacks. You do not need to be stringent with the keywords
    in the question related to these topics. Otherwise, use web-search. Give a binary choice 'web_search'
    or 'vectorstore' based on the question. Return the a JSON with a single key 'datasource' and
    no premable or explanation. Question to route: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question"],
)

question_router = prompt | llm | JsonOutputParser()
question = "llm agent memory"
docs = retriever.get_relevant_documents(question)
doc_txt = docs[1].page_content
print(question_router.invoke({"question": question}))

  warn_deprecated(


{'datasource': 'vectorstore'}


In [13]:
### Retrieval Grader
# 어떤 문서가 특정 질문과 관련이 있는지를 평가하는 모델을 생성하고 사용
# 여기서 "관련성"은 문서가 질문의 키워드를 포함하고 있는지 여부에 따라 평가됨
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

# Model Initialization
# LLM
llm = ChatOllama(model=local_llm, format="json", temperature=0)

# PromptTemplate을 사용하면 특정 양식을 갖춘 자유로운 텍스트 입력을 처리할 수 있음
# 이 템플릿에는 두 가지 입력 변수인 'question'과 'document'가 있음
# 이들 변수는 invoke 메소드를 통해 제공됨
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing relevance 
    of a retrieved document to a user question. If the document contains keywords related to the user question, 
    grade it as relevant. It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question. \n
    Provide the binary score as a JSON with a single key 'score' and no premable or explanation.
     <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here is the retrieved document: \n\n {document} \n\n
    Here is the user question: {question} \n <|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """,
    input_variables=["question", "document"],
)

# Retrieval Grader Pipeline Creation
# 모델에 터미널 파이프라인을 생성함
# 첫 번째 단계는 템플릿을 사용하여 질문과 문서를 포멧팅
# 두 번째 단계는 이를 모델에 통과시킴
# 마지막 단계는 결과를 파싱하여 사용가능한 형식으로 만듦
retrieval_grader = prompt | llm | JsonOutputParser()

question = "agent memory"
docs = retriever.invoke(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

{'score': 'yes'}


In [14]:
### Generate
# 사용자의 질문에 대한 답변을 생성
# 답변은 검색도니 문서 내용(context)에 기반하여 생성되며,
# 만일 답변을 모를 경우 모델은 모른다고 응답하고, 답변은 최대 3문장으로 요약되어야 함
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate

# Prompt
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. 
    Use three sentences maximum and keep the answer concise <|eot_id|><|start_header_id|>user<|end_header_id|>
    Question: {question} 
    Context: {context} 
    Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "document"],
)

llm = ChatOllama(model=local_llm, temperature=0)

# Post-processing
# 가져온 문서의 내용을 문자열로 조인하여 반환
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Chain
rag_chain = prompt | llm | StrOutputParser()

# Run
# agent memory 라는 질문에 대한 답변을 생성하고 출력함
question = "agent memory"
docs = retriever.invoke(question)
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

The context mentions that the agent's memory is a long-term memory module (external database) that records a comprehensive list of agents' experience in natural language. It also discusses the reflection mechanism, which synthesizes memories into higher-level inferences over time and guides the agent's future behavior.


In [15]:
### Hallucination Grader
# 생성된 답변이 주어진 사실 집합에 기반하고 있는지를 평가
# 사실에 기반한 생성을 평가하는 데에 중요
# 결과를 통해 AI의 출력이 얼마나 현실적이고 사실적인지를 판단

# LLM
llm = ChatOllama(model=local_llm, format="json", temperature=0)

# Prompt
prompt = PromptTemplate(
    template=""" <|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether 
    an answer is grounded in / supported by a set of facts. Give a binary 'yes' or 'no' score to indicate 
    whether the answer is grounded in / supported by a set of facts. Provide the binary score as a JSON with a 
    single key 'score' and no preamble or explanation. <|eot_id|><|start_header_id|>user<|end_header_id|>
    Here are the facts:
    \n ------- \n
    {documents} 
    \n ------- \n
    Here is the answer: {generation}  <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "documents"],
)

# prompt 를 llm모델이 처리하고 JsonOutputParser를 이용하여 결과를 파싱하고 출력받음
hallucination_grader = prompt | llm | JsonOutputParser()

# 생성된 평가자에게 'generation'(답변)과 'documents'(사실 집합)를 전달하고, 이를 바탕으로 평가를 수행함
hallucination_grader.invoke({"documents": docs, "generation": generation})

{'score': 'yes'}

In [16]:
### Answer Grader
# 생성된 답변이 주어진 질문을 해결하는 데 유용한지를 평가하는 모델을 생성하고 사용함
# AI의 출력이 질문에 대한 유용한 답변을 제공하고 있는지를 판단하는 데 중요함

# LLM
llm = ChatOllama(model=local_llm, format="json", temperature=0)

# Prompt
prompt = PromptTemplate(
    template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a grader assessing whether an 
    answer is useful to resolve a question. Give a binary score 'yes' or 'no' to indicate whether the answer is 
    useful to resolve a question. Provide the binary score as a JSON with a single key 'score' and no preamble or explanation.
     <|eot_id|><|start_header_id|>user<|end_header_id|> Here is the answer:
    \n ------- \n
    {generation} 
    \n ------- \n
    Here is the question: {question} <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["generation", "question"],
)

answer_grader = prompt | llm | JsonOutputParser()

# 최종적으로 코드의 출력은 "score"라는 키를 가지는 JSON 객체가 됨
answer_grader.invoke({"question": question, "generation": generation})

{'score': 'yes'}

In [24]:
from pprint import pprint
from typing import List

from langchain_core.documents import Document
from typing_extensions import TypedDict

from langgraph.graph import END, StateGraph

### State
class GraphState(TypedDict):
    """
    Represents the state of our graph.
    상태 그래프의 상태를 정의하는 클래스

    Attributes:
        question: question
        generation: LLM generation
        web_search: whether to add search
        documents: list of documents
        urls: list of url
    """

    question: str
    generation: str
    web_search: str
    documents: List[str]
    urls: List[str]


### Nodes
# 노드 함수 정의
# 상태 그래프의 여러 노드를 정의하는 함수들
def retrieve(state):
    """
    Retrieve documents from vectorstore
    주어진 질문에 대한 문서 얻기
    얻은 문서는 상태 그래프의 documents에 추가됨

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.invoke(question)
    print(documents)
    return {
        "documents": documents,
        "question": question,
    }


def generate(state):
    """
    Generate answer using RAG on retrieved documents
    어어진 문서를 바탕으로 답변을 생성

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = rag_chain.invoke({"context": documents, "question": question})
    return {
        "documents": documents,
        "question": question,
        "generation": generation,
    }


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question
    If any document is not relevant, we will set a flag to run web search
    검색된 문서들이 질문과 관련성이 있는지 판단

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Filtered out irrelevant documents and updated web_search state
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    web_search = "No"
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score["score"]
        # Document relevant
        if grade.lower() == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        # Document not relevant
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            # We do not include the document in filtered_docs
            # We set a flag to indicate that we want to run web search
            web_search = "Yes"
            continue
    return {"documents": filtered_docs, "question": question, "web_search": web_search}


def web_search(state):
    """
    Web search based based on the question
    웹에서 질문과 관련된 문서 얻기

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """

    print("---WEB SEARCH---")
    question = state["question"]
    documents = state["documents"]

    # Web search
    # docs = web_search_tool.invoke({"query": question})
    docs = tavily.search(query=question)['results']
    urls = [d["url"] for d in docs]
    
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    if documents is not None:
        documents.append(web_results)
    else:
        documents = [web_results]
    return {
        "documents": documents,
        "question": question,
        "urls": urls,
    }


### Conditional edge
def route_question(state):
    """
    Route question to web search or RAG.
    질문의 출처가 웹 검색 결과인지, 문서 검색 결과인지 결정 "websearch" or "vectorstore"

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    question = state["question"]
    print(question)
    source = question_router.invoke({"question": question})
    print(source)
    print(source["datasource"])
    if source["datasource"] == "web_search":
        print("---ROUTE QUESTION TO WEB SEARCH---")
        return "websearch"
    elif source["datasource"] == "vectorstore":
        print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"


def decide_to_generate(state):
    """
    Determines whether to generate an answer, or add web search
    생성할 것인지, 또는 웹 검색을 포함할 것인지 결정

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    state["question"]
    web_search = state["web_search"]
    state["documents"]

    if web_search == "Yes":
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print(
            "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, INCLUDE WEB SEARCH---"
        )
        return "websearch"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"


### Conditional edge
def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question.
    생성물이 문서에 기반하는지, 그리고 질문을 어떻게 처리하는지 평가

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score["score"]

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score["score"]
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

# 상태 그래프 생성
workflow = StateGraph(GraphState)

# Define the nodes
# 앞서 정의된 노드 함수들을 그래프에 추가
workflow.add_node("websearch", web_search)  # web search
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generate

print(workflow.nodes)

{'websearch': websearch(recurse=True), 'retrieve': retrieve(recurse=True), 'grade_documents': grade_documents(recurse=True), 'generate': generate(recurse=True)}


In [25]:
# Build graph

# 앞서 정의한 workflow 상태 그래프의 흐름을 정의
# 하나의 노드에서 다른 노드로의 연결을 설정
# 그래프의 초기 진입점 및 각 노드에서의 분기를 결정함

# Docs Retrieval 로 시작
workflow.set_entry_point("retrieve")

# Edge 추가
# add_edge 함수를 통해 두 노드 사이의 직접적인 연결을 추가함
# retrieve 노드에서 grade_documents 노드로 연결
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "websearch": "websearch",
        "generate": "generate",
    },
)

# websearch 노드에서 generate 노드로 연결
workflow.add_edge("websearch", "generate")
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "websearch",
    },
)

In [20]:
# Compile - 상태 그래프(workflow)를 compile 메서드를 사용하여 실행가능한 애플리케이션(app)으로 변환
app = workflow.compile()

# Test
inputs = {"question": "What are the types of agent memory?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["generation"])

---RETRIEVE---
[Document(page_content='They also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to obtain a synthesis solution and the agent attempted to consult documentation to execute the procedure. 7 out of 11 were rejected and among these 7 rejected cases, 5 happened after a Web search while 2 were rejected based on prompt only.\nGenerative Agents Simulation#\nGenerative Agents (Park, et al. 2023) is super fun experiment where 25 virtual characters, each controlled by a LLM-powered agent, are living and interacting in a sandbox environment, inspired by The Sims. Generative agents create believable simulacra of human behavior for interactive applications.\nThe design of generative agents combines LLM with memory, planning and reflection mechanisms to enable agents to behave conditioned on past experience, as

KeyboardInterrupt: 

In [26]:
from pprint import pprint

# Compile
app = workflow.compile()
inputs = {"question": "Who are the Bears expected to draft first in the NFL draft?"}
for output in app.stream(inputs):
    for key, value in output.items():
        pprint(f"Finished running: {key}:")
pprint(value["generation"])

---RETRIEVE---
[Document(page_content='Zero-shot generation: This is to find a number of prompts that can trigger harmful output conditioned on a preset prompt.\nStochastic few-shot generation: The red team prompts found from the above step are then used as few-shot examples to generate more similar cases. Each zero-shot test case might be selected in few-shot examples with a probability $\\propto \\exp(r(\\mathbf{x}, \\mathbf{y}) / \\tau)$\nSupervised learning: The red team model can be fine-tuned on failing, zero-shot test cases. The training only runs lightly for one epoch to avoid overfitting and preserve sample diversity.', metadata={'description': 'The use of large language models in the real world has strongly accelerated by the launch of ChatGPT. We (including my team at OpenAI, shoutout to them) have invested a lot of effort to build default safe behavior into the model during the alignment process (e.g. via RLHF). However, adversarial attacks or jailbreak prompts could potent