<a href="https://colab.research.google.com/github/singhsrj/Google-Collab-Notebooks/blob/main/docs/docs/integrations/tools/playwright.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PlayWright Browser Toolkit

>[Playwright](https://github.com/microsoft/playwright) is an open-source automation tool developed by `Microsoft` that allows you to programmatically control and automate web browsers. It is designed for end-to-end testing, scraping, and automating tasks across various web browsers such as `Chromium`, `Firefox`, and `WebKit`.

This toolkit is used to interact with the browser. While other tools (like the `Requests` tools) are fine for static sites, `PlayWright Browser` toolkits let your agent navigate the web and interact with dynamically rendered sites.

Some tools bundled within the `PlayWright Browser` toolkit include:

- `NavigateTool` (navigate_browser) - navigate to a URL
- `NavigateBackTool` (previous_page) - wait for an element to appear
- `ClickTool` (click_element) - click on an element (specified by selector)
- `ExtractTextTool` (extract_text) - use beautiful soup to extract text from the current web page
- `ExtractHyperlinksTool` (extract_hyperlinks) - use beautiful soup to extract hyperlinks from the current web page
- `GetElementsTool` (get_elements) - select elements by CSS selector
- `CurrentPageTool` (current_page) - get the current page URL


In [1]:
%pip install --upgrade --quiet  playwright > /dev/null
%pip install --upgrade --quiet  lxml

# If this is your first time using playwright, you'll have to install a browser executable.
# Running `playwright install` by default installs a chromium browser executable.
# playwright install

In [3]:
!pip install langchain_community
from langchain_community.agent_toolkits import PlayWrightBrowserToolkit

Collecting langchain_community
  Downloading langchain_community-0.3.20-py3-none-any.whl.metadata (2.4 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB

Async function to create context and launch browser:

In [4]:
from langchain_community.tools.playwright.utils import (
    create_async_playwright_browser,  # A synchronous browser is available, though it isn't compatible with jupyter.\n",	  },
)

In [5]:
# This import is required only for jupyter notebooks, since they have their own eventloop
import nest_asyncio

nest_asyncio.apply()

In [8]:
!playwright install

Downloading Chromium 134.0.6998.35 (playwright build v1161)[2m from https://cdn.playwright.dev/dbazure/download/playwright/builds/chromium/1161/chromium-linux.zip[22m
[1G164.9 MiB [] 0% 0.0s[0K[1G164.9 MiB [] 0% 96.5s[0K[1G164.9 MiB [] 0% 98.7s[0K[1G164.9 MiB [] 0% 69.4s[0K[1G164.9 MiB [] 0% 45.0s[0K[1G164.9 MiB [] 0% 33.3s[0K[1G164.9 MiB [] 0% 24.6s[0K[1G164.9 MiB [] 0% 18.0s[0K[1G164.9 MiB [] 1% 12.7s[0K[1G164.9 MiB [] 1% 8.7s[0K[1G164.9 MiB [] 2% 6.8s[0K[1G164.9 MiB [] 3% 5.5s[0K[1G164.9 MiB [] 4% 4.4s[0K[1G164.9 MiB [] 5% 3.9s[0K[1G164.9 MiB [] 6% 3.6s[0K[1G164.9 MiB [] 6% 3.8s[0K[1G164.9 MiB [] 6% 3.7s[0K[1G164.9 MiB [] 7% 3.4s[0K[1G164.9 MiB [] 8% 3.2s[0K[1G164.9 MiB [] 9% 3.1s[0K[1G164.9 MiB [] 9% 3.0s[0K[1G164.9 MiB [] 10% 2.9s[0K[1G164.9 MiB [] 11% 2.9s[0K[1G164.9 MiB [] 12% 2.7s[0K[1G164.9 MiB [] 13% 2.5s[0K[1G164.9 MiB [] 15% 2.3s[0K[1G164.9 MiB [] 16% 2.2s[0K[1G164.9 MiB [] 17% 2.2s[0K[1G164.9 MiB [] 18% 2.1s[0K[1

In [10]:
!libwoff2dec.so.1.0.2 install
!libgstgl-1.0.so.0 install
!libgstcodecparsers-1.0.so.0 install
!libavif.so.13 install
!libharfbuzz-icu.so.0 install
!libenchant-2.so.2 install
!libsecret-1.so.0 install
!libhyphen.so.0 install
!libmanette-0.2.so.0 install

/bin/bash: line 1: libwoff2dec.so.1.0.2: command not found
/bin/bash: line 1: libgstgl-1.0.so.0: command not found
/bin/bash: line 1: libgstcodecparsers-1.0.so.0: command not found
/bin/bash: line 1: libavif.so.13: command not found
/bin/bash: line 1: libharfbuzz-icu.so.0: command not found
/bin/bash: line 1: libenchant-2.so.2: command not found
/bin/bash: line 1: libsecret-1.so.0: command not found
/bin/bash: line 1: libhyphen.so.0: command not found
/bin/bash: line 1: libmanette-0.2.so.0: command not found


## Instantiating a Browser Toolkit

It's always recommended to instantiate using the from_browser method so that the browser context is properly initialized and managed, ensuring seamless interaction and resource optimization.

In [11]:
async_browser = create_async_playwright_browser()
toolkit = PlayWrightBrowserToolkit.from_browser(async_browser=async_browser)
tools = toolkit.get_tools()
tools

[ClickTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-1161/chrome-linux/chrome> version=134.0.6998.35>),
 NavigateTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-1161/chrome-linux/chrome> version=134.0.6998.35>),
 NavigateBackTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-1161/chrome-linux/chrome> version=134.0.6998.35>),
 ExtractTextTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-1161/chrome-linux/chrome> version=134.0.6998.35>),
 ExtractHyperlinksTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-1161/chrome-linux/chrome> version=134.0.6998.35>),
 GetElementsTool(async_browser=<Browser type=<BrowserType name=chromium executable_path=/root/.cache/ms-playwright/chromium-116

In [12]:
tools_by_name = {tool.name: tool for tool in tools}
navigate_tool = tools_by_name["navigate_browser"]
get_elements_tool = tools_by_name["get_elements"]

In [15]:
await navigate_tool.arun(
    {
        "url": "https://web.archive.org/web/20230428133211/https://cnn.com/world",
        "timeout": 60000  # Increased timeout to 60 seconds
    }
)
# You may need to pass `timeout` as an additional parameter to `arun` that will be subsequently passed to `page.goto()`
await navigate_tool.arun(
    {
        "url": "https://web.archive.org/web/20230428133211/https://cnn.com/world"
    },
    timeout=60000
)

'Navigating to https://web.archive.org/web/20230428133211/https://cnn.com/world returned status code 200'

In [16]:
# The browser is shared across tools, so the agent can interact in a stateful manner
await get_elements_tool.arun(
    {"selector": ".container__headline", "attributes": ["innerText"]}
)



In [17]:
# If the agent wants to remember the current webpage, it can use the `current_webpage` tool
await tools_by_name["current_webpage"].arun({})

'https://web.archive.org/web/20230428133211/https://cnn.com/world'

## Use within an Agent

Several of the browser tools are `StructuredTool`'s, meaning they expect multiple arguments. These aren't compatible (out of the box) with agents older than the `STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION`

In [18]:
!pip install langchain_groq

Collecting langchain_groq
  Downloading langchain_groq-0.3.1-py3-none-any.whl.metadata (2.6 kB)
Collecting groq<1,>=0.4.1 (from langchain_groq)
  Downloading groq-0.20.0-py3-none-any.whl.metadata (15 kB)
Downloading langchain_groq-0.3.1-py3-none-any.whl (15 kB)
Downloading groq-0.20.0-py3-none-any.whl (124 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.9/124.9 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq, langchain_groq
Successfully installed groq-0.20.0 langchain_groq-0.3.1


In [19]:
from langchain.agents import AgentType, initialize_agent
from langchain_groq import ChatGroq

groq_api_key="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"

llm = ChatGroq(model ="Gemma2-9b-It",groq_api_key=groq_api_key)

agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

  agent_chain = initialize_agent(


In [21]:
result = await agent_chain.arun("What are the headers on langchain.com?")
print(result)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction:
```json
{
  "action": "extract_text",
  "action_input": ""
}
```[0m
Observation: [31;1m[1;3mWorld news - breaking news, video, headlines and opinion | CNN The Wayback Machine - https://web.archive.org/web/20230428133211/https://www.cnn.com/world CNN values your feedback 1. How relevant is this ad to you? 2. Did you encounter any technical issues? No Video player was slow to load content Video content never loaded Ad froze or did not finish loading Video content did not start after ad Audio on ad was too loud Other issues Ad never loaded Ad prevented/slowed the page from loading Content moved around while ad loaded Ad was repetitive to ads I've seen previously Other issues Cancel Submit Thank You! Your effort and contribution in providing this feedback is much
Thought:[32;1m[1;3mAction:
```json
{
  "action": "extract_text",
  "action_input": ""
}
```
[0m
Observation: [31;1m[1;3mWorld news - breaking news, video