## Interface Agents

Interface agents address tasks by interacting with a user interface. They can be used to build systems that address tasks that are not easily accessible through an API.  


This notebook demonstrates how the `InterfaceAgent` package to address entire tasks by interacting with a user interface. The package is built on top of the `Playwright` library, which provides a high-level API to interact with web pages.





In [1]:
from interfaceagent import WebBrowser, Planner, OpenAIPlannerModel 

In [2]:
browser = WebBrowser(start_url="http://google.com/",headless=False)
model = OpenAIPlannerModel(model="gpt-4o-mini-2024-07-18")
task = "What is the website for the Manning Book - Multi-Agent Systems with AutoGen. Navigate to the book website and find the author of the book." 
planner = Planner(model=model, web_browser=browser, task=task)
result = await planner.run(task=task)

[32m2024-09-05 21:34:27.376[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mrun[0m:[36m254[0m - [1mWebBrowser not initialized. Initializing now.[0m
[32m2024-09-05 21:34:29.950[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36minitialize[0m:[36m39[0m - [1mWebBrowser successfully initialized.[0m
[32m2024-09-05 21:34:30.781[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mgenerate_plan[0m:[36m58[0m - [1mHigh-level plan: ["Search for 'Multi-Agent Systems with AutoGen Manning Book' on Google", 'Look for the official Manning Publications website link in the search results', "Navigate to the Manning book page for 'Multi-Agent Systems with AutoGen'", "Locate the author's name on the book page"][0m
[32m2024-09-05 21:34:32.042[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36mget_interactive_elements[0m:[36m167[0m - [1mTotal interactive elements found: 20[0m
[32m2024-09-05 21

In [12]:
print(result)

{'task': 'What is the website for the Manning Book - Multi-Agent Systems with AutoGen. Navigate to the book website and find the author of the book.', 'page_content': {'content': 'Accessibility Links\nSkip to main content\nAccessibility help\nAccessibility feedback\nSign in\nFilters and Topics\nAll\nShopping\nImages\nVideos\nForums\nBooks\nWeb\nMore\nTools\nSearch Results\n\nMulti-Agent Systems with AutoGen\nManning\nhttps://www.manning.com › books › multi-agent-syste...\nMulti-Agent Systems with AutoGen teaches you how to build collaborative teams of AI agents that can tackle tasks far beyond the capabilities of the standard\xa0...\n$23.99 · \u200e30-day returns\n\nAnnouncing A New Manning Book — Multi-Agent Systems ...\nMedium\xa0·\xa0Victor Dibia\n70+ likes · 1 month ago\nIn this book, you\'ll learn about: Core components of multi-agent systems and their implementation using tools like AutoGen and AutoGen Studio\xa0...\nVideos\n4:01\nAutomate What Once Was Impossible to Automate\nYo

In [3]:
import base64
from IPython.display import HTML

def display_image(file_path):
    # Read the image file
    with open(file_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode()
    
    # Create the HTML to display the image
    html = f'<img src="data:image/png;base64,{encoded_string}" />'
    
    # Display the HTML
    return HTML(html)

# Usage
display_image('screenshot.png')

## Manually Stepping Through the Task

We accomplish this through the following: 

- Initialize a browser object 
- Use the planner to plan the next steps to take
- Manually execute each step and view responses 

In [9]:
browser = WebBrowser(start_url="http://bing.com/",headless=False)
model = OpenAIPlannerModel(model="gpt-4o-mini-2024-07-18")
task = "What is the website for the Manning Book - Multi-Agent Systems with AutoGen" 

planner = Planner(model=model, web_browser=browser, task=task)

next_actions = await planner.next_actions() 
print(next_actions) 

[32m2024-09-05 20:02:14.994[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mnext_actions[0m:[36m70[0m - [1mWebBrowser not initialized. Initializing now.[0m
[32m2024-09-05 20:02:16.373[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36minitialize[0m:[36m39[0m - [1mWebBrowser successfully initialized.[0m
[32m2024-09-05 20:02:17.197[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36mget_interactive_elements[0m:[36m167[0m - [1mTotal interactive elements found: 10[0m
[32m2024-09-05 20:02:18.438[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mnext_actions[0m:[36m106[0m - [1mNext actions: [{'action': 'type', 'selector': '#sb_form_q', 'selector_type': 'css_selector', 'value': 'Multi-Agent Systems with AutoGen', 'url': ''}, {'action': 'press', 'selector': '#sb_form_q', 'selector_type': 'css_selector', 'value': 'Enter', 'url': ''}][0m


[{'action': 'type', 'selector': '#sb_form_q', 'selector_type': 'css_selector', 'value': 'Multi-Agent Systems with AutoGen', 'url': ''}, {'action': 'press', 'selector': '#sb_form_q', 'selector_type': 'css_selector', 'value': 'Enter', 'url': ''}]


In [10]:
await planner.execute_action(next_actions[0]) 
await browser.screenshot("screenshot.png")
display_image('screenshot.png')


[32m2024-09-05 20:02:28.435[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='type' selector='#sb_form_q' value='Multi-Agent Systems with AutoGen'[0m


In [11]:
await planner.execute_action(next_actions[0]) 
await browser.screenshot("screenshot.png")
await planner.execute_action(next_actions[1])
display_image('screenshot.png')



[32m2024-09-05 20:02:35.214[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='type' selector='#sb_form_q' value='Multi-Agent Systems with AutoGen'[0m
[32m2024-09-05 20:02:35.390[0m | [1mINFO    [0m | [36minterfaceagent.interface.planner[0m:[36mexecute_action[0m:[36m152[0m - [1mExecuting: action='press' selector='#sb_form_q' value='Enter'[0m


In [8]:
await browser.close()

[32m2024-09-05 20:02:11.624[0m | [1mINFO    [0m | [36minterfaceagent.interface.webbrowser[0m:[36mclose[0m:[36m317[0m - [1mWebBrowser successfully closed and resources cleaned up.[0m
