## AutoGPT

* Implementation of https://github.com/Significant-Gravitas/Auto-GPT 
* With LangChain primitives (LLMs, PromptTemplates, VectorStores, Embeddings, Tools)

### Workflow

[Temporary] Ensure you have correct Langchain version locally
  
*  `export SERPAPI_API_KEY`

In [None]:
# !pip install playwright
# !playwright install  

In [None]:
# ! playwright install  

In [None]:
# General 
from datetime import datetime
import os
import pandas as pd
from typing import Optional
import matplotlib.pyplot as plt
from langchain.experimental.autonomous_agents.autogpt.agent import AutoGPT
from langchain.chat_models import ChatOpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.tools.human.tool import HumanInputRun
from langchain.tools.file_management.read import ReadFileTool
from langchain.tools.file_management.write import WriteFileTool
from langchain.tools.wikipedia.tool import WikipediaQueryRun
from langchain.utilities import WikipediaAPIWrapper

from langchain.document_loaders.url_selenium import SeleniumURLLoader
from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent

In [None]:
from langchain.document_loaders.url_playwright import PlaywrightURLLoader, Document

### Set up tools

* We'll set up an AutoGPT with a `search` tool, and `write-file` tool, and a `read-file` tool

Define any `tools` you want to use here

In [None]:
# Tools
from typing import Optional
from langchain.agents import Tool, tool
from langchain.utilities import SerpAPIWrapper
from langchain.chains.llm_requests import LLMRequestsChain
from langchain.tools.file_management.read import ReadFileTool
from langchain.tools.file_management.write import WriteFileTool
from langchain.tools.requests.tool import RequestsGetTool, TextRequestsWrapper

@tool
def process_csv(csv_file_path: str, instructions: str, output_path: Optional[str] = None) -> str:
    """Process a CSV by with pandas in a limited REPL. Only use this after writing data to disk as a csv file. Any figures must be saved to disk to be viewed by the human. Instructions should be written in natural language, not code. Assume the dataframe is already loaded."""
    try:
        df = pd.read_csv(csv_file_path)
    except Exception as e:
        return f"Error: {e}"
    agent = create_pandas_dataframe_agent(llm, df, max_iterations=30, verbose=True)
    if output_path is not None:
        instructions += f" Save output to disk at {output_path}"
    try:
        return agent.run(instructions)
    except Exception as e:
        return f"Error: {e}"
    
@tool
def show_image(image_path: str) -> str:
    """Show an image from disk"""
    try:
        img = plt.imread(image_path)
    except Exception as e:
        return f"Error: {e}"
    plt.imshow(img)
    return f"Showed image at {image_path}"

@tool
def current_time() -> str:
    """Show the current time"""
    return f"Current time: {datetime.datetime.now()}" 

@tool
def scrape_links(url: str) -> str:
    """Scrape links from a webpage."""
    response, error_message = get_response(url)
    if error_message:
        return error_message
    if not response:
        return "Error: Could not get response"
    soup = BeautifulSoup(response.text, "html.parser")

    for script in soup(["script", "style"]):
        script.extract()

    hyperlinks = extract_hyperlinks(soup, url)

    return format_hyperlinks(hyperlinks)


# Free DDG Search

In [None]:
# !pip install duckduckgo_search

In [None]:
import json
from duckduckgo_search import ddg

In [None]:
@tool
def search(query: str, num_results: int = 8) -> str:
    """Useful for general internet search queries."""
    search_results = []
    if not query:
        return json.dumps(search_results)

    results = ddg(query, max_results=num_results)
    if not results:
        return json.dumps(search_results)

    for j in results:
        search_results.append(j)

    return json.dumps(search_results, ensure_ascii=False, indent=4)

### Define Web Browser Tool

This is optional but can provide more information to the model relative to the other tools

In [None]:
# !pip install playwright
# !playwright install  >/dev/null
# !pip install bs4
# !pip install nest_asyncio

In [None]:
from langchain.docstore.document import Document
from langchain.chains.summarize import load_summarize_chain
from langchain.chains import RetrievalQA

# chain = load_summarize_chain(llm)

# @tool
# def summarize_text(text: str) -> str:
#     """Summarize the provided text. You have a limited memory, so summarization is useful."""
#     docs = [Document(page_content=text)]
#     return chain.run(docs).strip()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import WebBaseLoader

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 500,
    chunk_overlap  = 20,
    length_function = len,
)

loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")
docs = loader.load()
ruff_texts = text_splitter.split_documents(docs)

In [None]:
async def async_load_playwright(url: str) -> str:
    """Load the specified URLs using Playwright and parse using BeautifulSoup."""
    from bs4 import BeautifulSoup
    from playwright.async_api import async_playwright

    results = ""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        try:
            page = await browser.new_page()
            await page.goto(url)

            page_source = await page.content()
            soup = BeautifulSoup(page_source, "html.parser")

            for script in soup(["script", "style"]):
                script.extract()

            text = soup.get_text()
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            results = "\n".join(chunk for chunk in chunks if chunk)
        except Exception as e:
            results = f"Error: {e}"
        await browser.close()
    return results

def run_async(coro):
    import asyncio
    import nest_asyncio
    nest_asyncio.apply()
    event_loop = asyncio.get_event_loop()
    return event_loop.run_until_complete(coro)

@tool
def browse_web_page(url: str) -> str:
    """Useful for browsing websites and scraping the text information."""
    return run_async(async_load_playwright(url))



In [None]:
# !pip install wikipedia >/dev/null

In [None]:
wikipedia_tool = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

### Set up memory

* The memory here is used for the agents intermediate steps

In [None]:
# Memory
import faiss
from langchain.vectorstores import FAISS
from langchain.docstore import InMemoryDocstore
from langchain.embeddings import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})

### Setup model and AutoGPT

`Model set-up`

In [None]:
from typing import Any
from langchain.schema import BaseRetriever, BaseLanguageModel
from functools import partial
from langchain.tools import BaseTool
from langchain.chains.retrieval_qa.base import BaseRetrievalQA
from langchain.chains.conversational_retrieval.base import ConversationalRetrievalChain

# def query_memory(query: str, retriever: Any, llm: BaseLanguageModel) -> str:
#     """Query your memories."""
#     # qa_chain = 
#     return qa_chain.run(
#         query
#     )


In [None]:
retrieval_qa_chain = ConversationalRetrievalChain.from_llm(llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 1}))

In [None]:
class QueryMemoryTool(BaseTool):
    """Query your memories."""
    name = "query memory"
    description = "Query your memories of past actions and thoughts."
    retrieval_qa_chain: ConversationalRetrievalChain
    
    def _run(self, query: str) -> str:
        return self.retrieval_qa_chain({"question": query, "chat_history": []})
    
    async def _arun(self, query: str) -> str:
        raise NotImplementedError


query_memory_tool = QueryMemoryTool(
    name = "query memory",
    description="Query your memories of past actions and thoughts.",
    retrieval_qa_chain=retrieval_qa_chain,
)

In [None]:
query_memory_tool("What do you remember?")

In [None]:
tools = [
    # Tool(
    #     name = "google search",
    #     func=search.run,
    #     description="Simple search method useful for answering simple questions about current events. You should ask targeted questions"
    # ),
    search,
    WriteFileTool(),
    ReadFileTool(),
    process_csv,
    show_image,
    RequestsGetTool(requests_wrapper=TextRequestsWrapper()),
    current_time,
    browse_web_page,
    wikipedia_tool,
    query_memory_tool
    # huggingface_image_generation,
]

In [None]:
llm = ChatOpenAI(model_name="gpt-4", temperature=1.0)
agent = AutoGPT.from_llm_and_tools(
    ai_name="Tom",
    ai_role="Assistant",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever(search_kwargs={"k": 8}),
    # human_in_the_loop=True, # Set to True if you want to add feedback at each step.
)
# agent.chain.verbose = True

### AutoGPT as a research / data munger 

#### `inflation` and `college tuition`
 
Let's use AutoGPT as researcher and data munger / cleaner.
  
I spent a lot of time over the years crawling data sources and cleaning data. 

Let's see if AutoGPT can do all of this for us!

Here is the prompt comparing `inflation` and `college tuition`.

In [None]:
agent.run(["Using the 'data/' folder as scratch, find a marathon in Northern California in July come up with a training program for me as a table."])

In [None]:
# agent.run(["Using the 'data/' folder as scratch, get me the yearly % change in US college tuition and the yearly % change in US inflation (US CPI) every year since 1980."])

The command runs and writes output to `data`.
   
`cleaned_college_tuition_inflation_percent_change.csv` is written.

We write some simple code to plot this.

In [None]:
# Read
d = pd.read_csv("data/cleaned_college_tuition_inflation_percent_change.csv")
d.set_index("Year",inplace=True)
# Compute cumulative percent change
d['College Tuition % Change Cumulative'] = (1 + d['College Tuition % Change']).cumprod() * 100
d['Inflation % Change Cumulative'] = (1 + d['Inflation % Change']).cumprod() * 100
# Plot
d[['College Tuition % Change Cumulative','Inflation % Change Cumulative']].plot(color=['blue','green'])
plt.ylabel("Cumulative Percent Change")

Of course, we would want to inspect and verify the results.