# Assignment5 (Agent)

챌린지:
(EN)

1. In a new Jupyter notebook create a research AI agent and give it custom tools.
2. The agent should be able to do the following tasks:

- Search in Wikipedia
- Search in DuckDuckGo
- Scrape and extract the text of any website.
- Save the research to a .txt file

3. Run the agent with this query:
   "Research about the XZ backdoor", the agent should try to search in Wikipedia or DuckDuckGo, if it finds a website in DuckDuckGo it should enter the website and extract it's content, then it should finish by saving the research to a .txt file.

(KR)

1. 새로운 Jupyter notebook에서 리서치 AI 에이전트를 만들고 커스텀 도구를 부여합니다.
2. 에이전트는 다음 작업을 수행할 수 있어야 합니다:

- Wikipedia에서 검색
- DuckDuckGo에서 검색
- 웹사이트의 텍스트를 스크랩하고 추출합니다.
- 리서치 결과를 .txt 파일에 저장하기

3. 다음 쿼리로 에이전트를 실행합니다:
   "Research about the XZ backdoor" 라는 쿼리로 에이전트를 실행하면, 에이전트는 Wikipedia 또는 DuckDuckGo에서 검색을 시도하고, DuckDuckGo에서 웹사이트를 찾으면 해당 웹사이트에 들어가서 콘텐츠를 추출한 다음 .txt 파일에 조사 내용을 저장하는 것으로 완료해야 합니다.


In [10]:
from typing import Type
from pydantic import BaseModel
from pydantic import Field
from langchain.chat_models import ChatOpenAI
from langchain.tools import BaseTool
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
from langchain.utilities.wikipedia import WikipediaAPIWrapper
from langchain.prompts import PromptTemplate
from langchain.document_loaders.web_base import WebBaseLoader
from langchain.schema.runnable import RunnablePassthrough
from datetime import datetime

llm = ChatOpenAI(
    temperature=0.1,
    model="gpt-4o-mini",
)


class WikipediaSearchTool(BaseTool):
    name = "WikipediaSearchTool"
    description = """
    Use this tool to find the website for the given query.
    """

    class WikipediaSearchToolArgsSchema(BaseModel):
        query: str = Field(
            description="The query you will search for. Example query: Research about the   ",
        )

    args_schema: Type[WikipediaSearchToolArgsSchema] = WikipediaSearchToolArgsSchema

    def _run(self, query):
        w = WikipediaAPIWrapper()
        return w.run(query)


class DuckDuckGoSearchTool(BaseTool):
    name = "DuckDuckGoTool"
    description = """
    Use this tool to find the website for the given query.
    """

    class DuckDuckGoSearchToolArgsSchema(BaseModel):
        query: str = Field(
            description="The query you will search for. Example query: Research about the XZ backdoor",
        )

    args_schema: Type[DuckDuckGoSearchToolArgsSchema] = DuckDuckGoSearchToolArgsSchema

    def _run(self, query):
        try:
            ddg = DuckDuckGoSearchAPIWrapper()
            return ddg.run(query)
        except Exception as e:
            return f"Error occurred while searching DuckDuckGo: {str(e)}"


class LoadWebsiteTool(BaseTool):
    name = "LoadWebsiteTool"
    description = """
    Use this tool to load the website for the given url.
    """

    class LoadWebsiteToolArgsSchema(BaseModel):
        url: str = Field(
            description="The url you will load. Example url: https://en.wikipedia.org/wiki/Backdoor_(computing)",
        )

    args_schema: Type[LoadWebsiteToolArgsSchema] = LoadWebsiteToolArgsSchema

    def _run(self, url):
        loader = WebBaseLoader([url])
        docs = loader.load()
        return docs


class SaveToFileTool(BaseTool):
    name = "SaveToFileTool"
    description = """
    Use this tool to save the text to a file.
    """

    class SaveToFileToolArgsSchema(BaseModel):
        text: str = Field(
            description="The text you will save to a file.",
        )
        file_path: str = Field(
            description="Path of the file to save the text to.",
        )

    args_schema: Type[SaveToFileToolArgsSchema] = SaveToFileToolArgsSchema

    def _run(self, text, file_path):
        rearch_dt = datetime.now().strftime("%Y%m%d_%H%M%S")
        with open(f"{rearch_dt}_{file_path}", "w", encoding="utf-8") as f:
            f.write(text)
        return f"Text saved to {rearch_dt}_{file_path}"


def agent_invoke(input):
    agent = initialize_agent(
        llm=llm,
        verbose=True,
        agent=AgentType.OPENAI_MULTI_FUNCTIONS,
        handle_parsing_errors=True,
        tools=[
            WikipediaSearchTool(),
            DuckDuckGoSearchTool(),
            LoadWebsiteTool(),
            SaveToFileTool(),
        ],
    )

    prompt = PromptTemplate.from_template(
        """    
        - Search query in Wikipedia, DuckDuckGo
        - Scrape and extract the text of any website.
        - Save the research to a .txt file
        
        query: {query}    
        """,
    )

    chain = {"query": RunnablePassthrough()} | prompt | agent
    chain.invoke(input)


query = "Research about the XZ backdoor"

agent_invoke(query)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `WikipediaSearchTool` with `{'query': 'Research about the XZ backdoor'}`


[0m[36;1m[1;3mPage: XZ Utils backdoor
Summary: In February 2024, a malicious backdoor was introduced to the Linux utility xz within the liblzma library in versions 5.6.0 and 5.6.1 by an account using the name "Jia Tan". The backdoor gives an attacker who possesses a specific Ed448 private key remote code execution capabilities on the affected Linux system. The issue has been given the Common Vulnerabilities and Exposures number CVE-2024-3094 and has been assigned a CVSS score of 10.0, the highest possible score.
While xz is commonly present in most Linux distributions, at the time of discovery the backdoored version had not yet been widely deployed to production systems, but was present in development versions of major distributions. The backdoor was discovered by the software developer Andres Freund, who announced his findings on 29 Marc