## AutoGPT

* Implementation of https://github.com/Significant-Gravitas/Auto-GPT 
* With LangChain primitives (LLMs, PromptTemplates, VectorStores, Embeddings, Tools)

### Workflow

[Temporary] Ensure you have correct Langchain version locally
  
*  `export SERPAPI_API_KEY`

In [None]:
# !pip install playwright
# !playwright install  

In [None]:
# ! playwright install  

In [None]:
# General 
import os
import pandas as pd
from typing import Optional
import matplotlib.pyplot as plt
from langchain.experimental.autonomous_agents.autogpt.agent import AutoGPT
from langchain.chat_models import ChatOpenAI
from langchain.utilities import SerpAPIWrapper
from langchain.tools.human.tool import HumanInputRun
from langchain.tools.file_management.read import ReadFileTool
from langchain.tools.file_management.write import WriteFileTool
from langchain.document_loaders.url_selenium import SeleniumURLLoader
from langchain.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent

In [None]:
from langchain.document_loaders.url_playwright import PlaywrightURLLoader, Document

### Set up tools

* We'll set up an AutoGPT with a `search` tool, and `write-file` tool, and a `read-file` tool

Define any `tools` you want to use here

In [None]:
# Tools
from typing import Optional
from langchain.agents import Tool, tool
from langchain.utilities import SerpAPIWrapper
from langchain.chains.llm_requests import LLMRequestsChain
from langchain.tools.file_management.read import ReadFileTool
from langchain.tools.file_management.write import WriteFileTool
from langchain.tools.requests.tool import RequestsGetTool, TextRequestsWrapper

@tool
def process_csv(csv_file_path: str, instructions: str, output_path: Optional[str] = None) -> str:
    """Process a CSV by with pandas in a limited REPL. Only use this after writing data to disk as a csv file. Any figures must be saved to disk to be viewed by the human. Instructions should be written in natural language, not code. Assume the dataframe is already loaded."""
    try:
        df = pd.read_csv(csv_file_path)
    except Exception as e:
        return f"Error: {e}"
    agent = create_pandas_dataframe_agent(llm, df, max_iterations=30, verbose=True)
    if output_path is not None:
        instructions += f" Save output to disk at {output_path}"
    try:
        return agent.run(instructions)
    except Exception as e:
        return f"Error: {e}"
    
@tool
def show_image(image_path: str) -> str:
    """Show an image from disk"""
    try:
        img = plt.imread(image_path)
    except Exception as e:
        return f"Error: {e}"
    plt.imshow(img)
    return f"Showed image at {image_path}"

@tool
def current_time() -> str:
    """Show the current time"""
    return f"Current time: {datetime.datetime.now()}" 

@tool
def scrape_links(url: str) -> str:
    """Scrape links from a webpage."""
    response, error_message = get_response(url)
    if error_message:
        return error_message
    if not response:
        return "Error: Could not get response"
    soup = BeautifulSoup(response.text, "html.parser")

    for script in soup(["script", "style"]):
        script.extract()

    hyperlinks = extract_hyperlinks(soup, url)

    return format_hyperlinks(hyperlinks)



### Define Web Browser Tool

This is optional but can provide more information to the model relative to the other tools

In [None]:
# !pip install playwright
# !playwright install  >/dev/null
# !pip install bs4
# !pip install nest_asyncio

In [None]:
async def async_load_playwright(url: str) -> str:
    """Load the specified URLs using Playwright and parse using BeautifulSoup."""
    from bs4 import BeautifulSoup
    from playwright.async_api import async_playwright

    results = ""
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        try:
            page = await browser.new_page()
            await page.goto(url)

            page_source = await page.content()
            soup = BeautifulSoup(page_source, "html.parser")

            for script in soup(["script", "style"]):
                script.extract()

            text = soup.get_text()
            lines = (line.strip() for line in text.splitlines())
            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
            results = "\n".join(chunk for chunk in chunks if chunk)
        except Exception as e:
            results = f"Error: {e}"
        await browser.close()
    return results

def run_async(coro):
    import asyncio
    import nest_asyncio
    nest_asyncio.apply()
    event_loop = asyncio.get_event_loop()
    return event_loop.run_until_complete(coro)

@tool
def browse_web_page(url: str) -> str:
    """Useful for browsing websites and scraping the text information."""
    return run_async(async_load_playwright(url))

### Set up memory

* The memory here is used for the agents intermediate steps

In [None]:
# Memory
import faiss
from langchain.vectorstores import FAISS
from langchain.docstore import InMemoryDocstore
from langchain.embeddings import OpenAIEmbeddings

embeddings_model = OpenAIEmbeddings()
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})

### Setup model and AutoGPT

`Model set-up`

In [None]:
search = SerpAPIWrapper()
tools = [
    Tool(
        name = "search",
        func=search.run,
        description="Useful for when you need to answer questions about current events. You should ask targeted questions"
    ),
    WriteFileTool(),
    ReadFileTool(),
    process_csv,
    show_image,
    RequestsGetTool(requests_wrapper=TextRequestsWrapper()),
    current_time,
    browse_web_page,
    # huggingface_image_generation,
]

In [None]:
llm = ChatOpenAI(model_name="gpt-4", temperature=0.3)
agent = AutoGPT.from_llm_and_tools(
    ai_name="Tom",
    ai_role="Assistant",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever(),
    # human_in_the_loop=True, # Set to True if you want to add feedback at each step.
)
agent.chain.verbose = True

### AutoGPT as a research / data munger 

#### `inflation` and `college tuition`
 
Let's use AutoGPT as researcher and data munger / cleaner.
  
I spent a lot of time over the years crawling data sources and cleaning data. 

Let's see if AutoGPT can do all of this for us!

Here is the prompt comparing `inflation` and `college tuition`.

In [None]:
agent.run(["Using the 'data/' folder as scratch, get me the yearly % change in US college tuition and the yearly % change in US inflation (US CPI) every year since 1980."])

The command runs and writes output to `data`.
   
`cleaned_college_tuition_inflation_percent_change.csv` is written.

We write some simple code to plot this.

In [None]:
# Read
d = pd.read_csv("data/cleaned_college_tuition_inflation_percent_change.csv")
d.set_index("Year",inplace=True)
# Compute cumulative percent change
d['College Tuition % Change Cumulative'] = (1 + d['College Tuition % Change']).cumprod() * 100
d['Inflation % Change Cumulative'] = (1 + d['Inflation % Change']).cumprod() * 100
# Plot
d[['College Tuition % Change Cumulative','Inflation % Change Cumulative']].plot(color=['blue','green'])
plt.ylabel("Cumulative Percent Change")

Of course, we would want to inspect and verify the results.

### AutoGPT as a data scietntist (research, cleaning, visualization)

#### `hey jude` 

Let's try end-to-end data retrival and plotting.

* See my full notes here: https://rlancemartin.notion.site/Auto-GPT-Notes-9481cbd0cb364580bf97d5f41144e5cf
* Files to write 
* However, reading seems to not work
* Even if I simply ask it to read from the saved file, it gets stuck and appears to loop

`agent.run(["Read the file hey_jude_lyrics.txt and plot the frequency of the top 5 words."])`

In [None]:
llm = ChatOpenAI(model_name="gpt-4", temperature=0.3)
agent = AutoGPT.from_llm_and_tools(
    ai_name="Tom",
    ai_role="Look up song lyrics",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever()
)
agent.chain.verbose = True

agent.run(["Get the lyrics to the Beatles song 'Hey Jude' and plot the frequency of the top 5 words."])

In [None]:
llm = ChatOpenAI(model_name="gpt-4", temperature=0.3)
agent = AutoGPT.from_llm_and_tools(
    ai_name="Tom",
    ai_role="Look up song lyrics",
    tools=tools,
    llm=llm,
    memory=vectorstore.as_retriever()
)
agent.chain.verbose = True

agent.run(["Read the file hey_jude_lyrics.txt and plot the frequency of the top 5 words."])