# WebSurfer Agent with Headless GUI-based Browsing

This notebook is derived from the standard [WebSurferAgent Notebook](https://github.com/microsoft/autogen/blob/main/notebook/agentchat_surfer.ipynb) for the purposes of demonstrating coverage.

# Notebook Setup

## Requirements

AutoGen requires `Python>=3.8`. To run this notebook example, please install:
```bash
pip install pyautogen selenium markdownify pillow pdfminer.six beautifulsoup4 arxiv
```
or
```bash
pip install "pyautogen[websurfer]"
```

## Ensure that we have the WebDrivers present for Selenium

*EDIT*:
[Selenium Manager](https://www.selenium.dev/documentation/selenium_manager/) states:
"Selenium Manager is a command-line tool implemented in Rust that provides automated driver and browser management for Selenium. Selenium bindings use this tool by default, so you do not need to download it or add anything to your code or do anything else to use it."

Therefore the folling instructions should not be needed:
Following the instructions in [Selenium Documentation](https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location/#download-the-driver), 
we first download the web driver for our browser of choice, or all 3: [Edge](https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/?form=MA13LH#downloads), [Firefox](https://github.com/mozilla/geckodriver/releases), [Chrome](https://chromedriver.chromium.org/downloads).~~

In [3]:
# %%capture --no-stderr
import os
import logging
import autogen
from time import sleep

from autogen.agentchat.contrib.web_surfer import WebSurferAgent
from autogen.agentchat.conversable_agent import ConversableAgent
from autogen.agentchat.user_proxy_agent import UserProxyAgent
from autogen.oai import config_list_from_json
from autogen.browser_utils import display_binary_image

# Get the logger instance for the current module (__name__).
logger = logging.getLogger(__name__)

Neither powershell nor pwsh is installed.


## Set your API Endpoint

The [`config_list_from_json`](https://microsoft.github.io/autogen/docs/reference/oai/openai_utils#config_list_from_json) function loads a list of configurations from an environment variable or a json file.

It first looks for environment variable "OAI_CONFIG_LIST" which needs to be a valid json string. If that variable is not found, it then looks for a json file named "OAI_CONFIG_LIST". It filters the configs by models (you can filter by other keys as well).

The WebSurferAgent uses a combination of models. GPT-4 and GPT-3.5-turbo-16 are recommended.

Your json config should look something like the following:
```json
[
    {
        "model": "gpt-4",
        "api_key": "<your OpenAI API key here>"
    },
    {
        "model": "gpt-3.5-turbo-16k",
        "api_key": "<your OpenAI API key here>"
    }
]
```

If you open this notebook in colab, you can upload your files by clicking the file icon on the left panel and then choose "upload file" icon.


In [5]:
llm_config = {
    "timeout": 600,
    "cache_seed": 44,  # change the seed for different trials
    "config_list": config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={"model": ["gpt-3.5-turbo"]},
    ),
    "temperature": 0,
}

summarizer_llm_config = {
    "timeout": 600,
    "cache_seed": 44,  # change the seed for different trials
    "config_list": config_list_from_json(
        "OAI_CONFIG_LIST",
        filter_dict={"model": ["gpt-3.5-turbo"]},
    ),
    "temperature": 0,
}

## Configure Bing

For WebSurferAgent to be reasonably useful, it needs to be able to search the web -- and that means it needs a Bing API key. 
You can read more about how to get an API on the [Bing Web Search API](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) page.

Once you have your key, either set it as the `BING_API_KEY` system environment variable, or simply input your key below.

In [6]:
bing_api_key = os.environ["BING_API_KEY"] if "BING_API_KEY" in os.environ else ""

### Construct Agents

We now create out WebSurferAgent, and a UserProxyAgent to surf the web, but using a graphical based browser required for many use-cases. 

In [7]:
web_surfer = WebSurferAgent(
    "web_surfer",
    llm_config=llm_config,
    summarizer_llm_config=summarizer_llm_config,
    browser_config={
        "type": "selenium",  # *NEW* Here we specify that we intend to use our headless GUI browser. The default setting is "text".
        "browser": "edge",  # *NEW* We'll use the edge browser for these tests.  Choices include 'edge', 'firefox', and 'chrome'
        "resolution": (1400, 900),  # *NEW* we specify the browser window size.  The default is (1920,5200)
        "render_text": False,  # *NEW* We still have the option to convert the output to text and render it in the browser
        "bing_api_key": bing_api_key,
    },
)

user_proxy = UserProxyAgent(
    "user_proxy",
    human_input_mode="NEVER",
    code_execution_config=False,
    default_auto_reply="",
    is_termination_msg=lambda x: True,
)

# Notebook Content

### Part 1: Search, summarize
- Search for information aobut Microsoft AutoGen
- Summarize the results
- Visit the Getting Started Docs page

In [9]:
# Bing search is a special case and we return the text in the same way as the SimpleTextBrowser

task1 = """
Search the web for information about Microsoft AutoGen
"""

user_proxy.initiate_chat(web_surfer, message=task1)

# Note that these results are also accessable in JSON format with `web_surfer.browser.bing_results`

[33muser_proxy[0m (to web_surfer):


Search the web for information about Microsoft AutoGen


--------------------------------------------------------------------------------
[31m
>>>>>>>> USING AUTO REPLY...[0m


[35m
>>>>>>>> EXECUTING FUNCTION informational_web_search...[0m
[33mweb_surfer[0m (to user_proxy):

Address: bing: Microsoft AutoGen
Title: Microsoft AutoGen - Search
Viewport position: Showing page 1 of 1.
A Bing search for 'Microsoft AutoGen' found 8 results:

## Web Results
1. [AutoGen: Enabling next-generation large language model applications](https://www.microsoft.com/en-us/research/blog/autogen-enabling-next-generation-large-language-model-applications/)
AutoGen is a Python package that simplifies the orchestration, optimization, and automation of large language model applications. It enables customizable and conversable agents that integrate with humans, tools, and other agents to solve tasks using GPT-4 and other advanced LLMs. Learn how to use AutoGen for code-based question answering, supply-chain optimization, conversational chess, and more.

2. [GitHub - microsoft/autogen: Enable Next-Gen Large Language Model ...](https://github.com/microsoft/autogen)
AutoGen is a fram



In [10]:
task2 = "Summarize these results"
user_proxy.initiate_chat(web_surfer, message=task2, clear_history=False)

[33muser_proxy[0m (to web_surfer):

Summarize these results

--------------------------------------------------------------------------------
[31m
>>>>>>>> USING AUTO REPLY...[0m


[35m
>>>>>>>> EXECUTING FUNCTION summarize_page...[0m


The token limit (4096) of the WebSurferAgent.summarizer_llm_config, is below the recommended 16k.


[33mweb_surfer[0m (to user_proxy):

AutoGen is a framework developed by Microsoft Research to simplify the orchestration, optimization, and automation of large language model (LLM) workflows. The framework offers customizable and conversable agents that utilize advanced LLM capabilities, such as GPT-4, while also integrating with humans and tools to address limitations and enhance performance. As developers create more complex LLM-based applications, the workflows become intricate, requiring significant effort and expertise to design and implement. Automating these workflows using AutoGen can streamline the process and improve efficiency, enabling the creation of next-generation applications that leverage the full potential of LLMs. The framework supports conversations between multiple agents through automated chat, providing a solution to the challenge of orchestrating optimal workflows in a vast and complex design space.

------------------------------------------------------------



In [11]:
task3 = "Click the 'Getting Started' result"
user_proxy.initiate_chat(web_surfer, message=task3, clear_history=False)

[33muser_proxy[0m (to web_surfer):

Click the 'Getting Started' result

--------------------------------------------------------------------------------
[31m
>>>>>>>> USING AUTO REPLY...[0m


[35m
>>>>>>>> EXECUTING FUNCTION visit_page...[0m
[33mweb_surfer[0m (to user_proxy):

Address: https://microsoft.github.io/autogen/docs/Getting-Started/
Title: Getting Started | AutoGen
Viewport position: Showing page 1 of 1.



Getting Started | AutoGen







[Skip to main content](#__docusaurus_skipToContent_fallback)[![AutoGen](/autogen/img/ag.svg)**AutoGen**](/autogen/)[Docs](/autogen/docs/Getting-Started)[SDK](/autogen/docs/reference/agentchat/conversable_agent)[Blog](/autogen/blog)[FAQ](/autogen/docs/FAQ)[Examples](/autogen/docs/Examples)[Resources](#)* [Ecosystem](/autogen/docs/Ecosystem)
* [Gallery](/autogen/docs/Gallery)
[Other Languages](#)* [Dotnet](https://microsoft.github.io/autogen-for-net/)
[GitHub](https://github.com/microsoft/autogen)`⌘``K`* [Getting Started](/autogen/docs/Getting-Started)
* [Installation](/autogen/docs/installation/)
* [LLM Configuration](/autogen/docs/llm_configuration)
* [Use Cases](#)
* [Contributing](/autogen/docs/Contribute)
* [Research](/au



### Part 2: Let's look at the actual page rendered

In [None]:
display_binary_image(web_surfer.browser.driver.get_screenshot_as_png())

### Let's scroll down and look again

In [None]:
task5 = """Scroll down."""
user_proxy.initiate_chat(web_surfer, message=task5, clear_history=False)

# We give it few seconds before viewing the browser
sleep(3)
display_binary_image(web_surfer.browser.driver.get_screenshot_as_png())

### Let's test our navigation using the rendered page
Note: this does require vision capabilities

In [15]:
task6 = "Click the 'research studies' link"
user_proxy.initiate_chat(web_surfer, message=task6, clear_history=False)

[33muser_proxy[0m (to web_surfer):

Click the 'research studies' link

--------------------------------------------------------------------------------
[31m
>>>>>>>> USING AUTO REPLY...[0m
[35m
>>>>>>>> EXECUTING FUNCTION visit_page...[0m
[33mweb_surfer[0m (to user_proxy):

Address: https://microsoft.github.io/autogen/docs/Research
Title: Research | AutoGen
Viewport position: Showing page 1 of 1.



Research | AutoGen







[Skip to main content](#__docusaurus_skipToContent_fallback)[![AutoGen](/autogen/img/ag.svg)**AutoGen**](/autogen/)[Docs](/autogen/docs/Getting-Started)[SDK](/autogen/docs/reference/agentchat/conversable_agent)[Blog](/autogen/blog)[FAQ](/autogen/docs/FAQ)[Examples](/autogen/docs/Examples)[Resources](#)* [Ecosystem](/autogen/docs/Ecosystem)
* [Gallery](/autogen/docs/Gallery)
[Other Languages](#)* [Dotnet](https://microsoft.github.io/autogen-for-net/)
[GitHub](https://github.com/microsoft/autogen)`⌘``K`* [Getting Started](/autogen/docs/Getting-Started)
* [Ins



### Show us the results of that action

In [None]:
display_binary_image(web_surfer.browser.driver.get_screenshot_as_png())

### Amazing!  Agent navigation on the web still works with the full desktop browser which is great news!
### And we can always still display the text on screen if our use-case benefited from that

In [None]:
display_binary_image(web_surfer.browser.driver.get_screenshot_as_png())

## Cleanup process
To ensure that we have no lingering processes in the background, we can shutdown the browser

In [19]:
# Gracefully shut down our headless desktop browser
web_surfer.close_the_browser()