# Playwright Agentic Tutorial

This notebook demonstrates how to create an AI agent that can autonomously search Google and summarize web content using Microsoft Semantic Kernel and Playwright.

## What You'll Learn
- Create browser automation functions as Semantic Kernel plugins
- Build an AI agent that can control a web browser
- Implement automated Google search and content summarization
- Handle real-world web automation scenarios

## 1. Package Installation

First, we'll install the required packages for Semantic Kernel, Playwright, and dependency injection.

The following packages are required for this tutorial:

- Microsoft.SemanticKernel: Provides the core functionality for building AI agents.
- Microsoft.SemanticKernel.Agents.Core: Contains essential components for agent-based systems.
- Microsoft.Playwright: Enables browser automation for web interactions.
- Microsoft.Extensions.DependencyInjection: Facilitates dependency injection for better modularity.
- Microsoft.Extensions.Hosting: Supports hosting and application lifetime management.
- Microsoft.Extensions.Logging: Offers logging capabilities to track and debug application behavior.


In [1]:
#r "nuget: Microsoft.SemanticKernel"
#r "nuget: Microsoft.SemanticKernel.Agents.Core, *-*"
#r "nuget: Microsoft.Playwright"
#r "nuget: Microsoft.Extensions.DependencyInjection"
#r "nuget: Microsoft.Extensions.Hosting"
#r "nuget: Microsoft.Extensions.Logging"

## 2. Imports and Configuration

Import necessary namespaces and configure experimental features.

In [12]:
#pragma warning disable SKEXP0110 // Agents are experimental

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Agents;
using Microsoft.Playwright;
using System;
using System.ComponentModel;
using System.Text.Json;

## 3. Playwright Setup

Initialize Playwright and install browser binaries if needed.

In [3]:
// Initialize Playwright (this will download browser binaries if not present)
Microsoft.Playwright.Program.Main(new[] { "install", "chromium" });

Console.WriteLine("✅ Playwright initialized successfully");

✅ Playwright initialized successfully


## 4. Browser Plugin Definition

Create a plugin that provides browser automation capabilities to our AI agent.

In [4]:
public class BrowserPlugin : IDisposable
{
    private IPlaywright? _playwright;
    private IBrowser? _browser;
    private IPage? _page;

    [KernelFunction(nameof(LaunchBrowser))]
    [Description("Launch a web browser for automation")]
    public async Task<string> LaunchBrowser()
    {
        try
        {
            _playwright = await Playwright.CreateAsync();
            _browser = await _playwright.Chromium.LaunchAsync(new BrowserTypeLaunchOptions
            {
                Headless = false // Set to true for headless mode
            });
            _page = await _browser.NewPageAsync();
            
            return "Browser launched successfully";
        }
        catch (Exception ex)
        {
            return $"Failed to launch browser: {ex.Message}";
        }
    }

    [KernelFunction(nameof(NavigateToGoogle))]
    [Description("Navigate to Google search page")]
    public async Task<string> NavigateToGoogle()
    {
        try
        {
            if (_page == null) return "Browser not initialized. Call LaunchBrowser first.";
            
            await _page.GotoAsync("https://www.google.com");
            await _page.WaitForLoadStateAsync(LoadState.NetworkIdle);
            
            return "Successfully navigated to Google";
        }
        catch (Exception ex)
        {
            return $"Failed to navigate to Google: {ex.Message}";
        }
    }

    [KernelFunction(nameof(SearchFor))]
    [Description("Perform a search on Google")]
    public async Task<string> SearchFor(string query)
    {
        try
        {
            if (_page == null) return "Browser not initialized. Call LaunchBrowser first.";
            
            // Find search box and enter query
            await _page.FillAsync("textarea[name='q']", query);
            await _page.PressAsync("textarea[name='q']", "Enter");
            
            // Wait for search results
            await _page.WaitForSelectorAsync("#search");
            
            return $"Search completed for: {query}";
        }
        catch (Exception ex)
        {
            return $"Failed to search: {ex.Message}";
        }
    }

    [KernelFunction(nameof(GetSearchResults))]
    [Description("Extract search result titles and URLs from Google search results")]
    public async Task<string> GetSearchResults()
    {
        try
        {
            if (_page == null) return "Browser not initialized. Call LaunchBrowser first.";
            
            // Extract search results
            var results = await _page.EvaluateAsync<object[]>(@"
                () => {
                    const results = [];
                    const searchResults = document.querySelectorAll('div[data-ved] h3');
                    
                    for (let i = 0; i < Math.min(5, searchResults.length); i++) {
                        const element = searchResults[i];
                        const link = element.closest('a');
                        if (link) {
                            results.push({
                                title: element.textContent,
                                url: link.href
                            });
                        }
                    }
                    
                    return results;
                }
            ");
            
            return JsonSerializer.Serialize(results, new JsonSerializerOptions { WriteIndented = true });
        }
        catch (Exception ex)
        {
            return $"Failed to extract search results: {ex.Message}";
        }
    }

    [KernelFunction(nameof(GetPageContent))]
    [Description("Navigate to a URL and extract the main content of the page")]
    public async Task<string> GetPageContent(string url)
    {
        try
        {
            if (_page == null) return "Browser not initialized. Call LaunchBrowser first.";
            
            await _page.GotoAsync(url);
            await _page.WaitForLoadStateAsync(LoadState.NetworkIdle);
            
            // Extract main content (trying common content selectors)
            var content = await _page.EvaluateAsync<string>(@"
                () => {
                    // Try various content selectors
                    const selectors = ['article', 'main', '.content', '.post-content', '.entry-content', 'body'];
                    
                    for (const selector of selectors) {
                        const element = document.querySelector(selector);
                        if (element && element.textContent.trim().length > 100) {
                            return element.textContent.trim().substring(0, 2000); // Limit to 2000 chars
                        }
                    }
                    
                    return document.body.textContent.trim().substring(0, 2000);
                }
            ");
            
            return content ?? "No content found";
        }
        catch (Exception ex)
        {
            return $"Failed to get page content: {ex.Message}";
        }
    }

    public void Dispose()
    {
        _page?.CloseAsync().Wait();
        _browser?.CloseAsync().Wait();
        _playwright?.Dispose();
    }
}

Console.WriteLine("✅ BrowserPlugin defined successfully");

✅ BrowserPlugin defined successfully


## 5. Kernel and Agent Setup

Configure the Semantic Kernel with OpenAI and register our browser plugin.

In [5]:
// Create kernel with OpenAI
var builder = Kernel.CreateBuilder()
    .AddOpenAIChatCompletion("gpt-4o-mini", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));

var kernel = builder.Build();

// Register browser plugin
var browserPlugin = new BrowserPlugin();
kernel.Plugins.AddFromObject(browserPlugin, "Browser");

Console.WriteLine("✅ Kernel configured with OpenAI and BrowserPlugin");

✅ Kernel configured with OpenAI and BrowserPlugin


## 6. Function Calling Configuration

Configure execution settings to enable automatic function calling.

In [13]:
// Configure function calling behavior
var executionSettings = new OpenAIPromptExecutionSettings()
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto(),
    MaxTokens = 2000,
    Temperature = 0.1
};

Console.WriteLine("✅ Function calling configuration set");

✅ Function calling configuration set


## 7. Research Agent Creation

Create an AI agent that can autonomously use the browser functions to research topics.

In [15]:
// Create the research agent with proper function calling configuration
var researchAgent = new ChatCompletionAgent()
{
    Name = "ResearchAgent",
    Instructions = """
        You are a research assistant that can search Google and analyze web content.
        
        When asked to research a topic:
        1. Launch the browser using LaunchBrowser
        2. Navigate to Google using NavigateToGoogle
        3. Search for the requested topic using SearchFor
        4. Get the search results using GetSearchResults
        5. Visit the most relevant pages using GetPageContent
        6. Provide a comprehensive summary of your findings
        
        Always use the browser functions to gather real information. Be thorough and provide useful insights.
        """,
    Kernel = kernel,
    Arguments = new KernelArguments(executionSettings)
};

Console.WriteLine("✅ ResearchAgent created successfully with function calling enabled");

✅ ResearchAgent created successfully with function calling enabled


## 8. Output Data Structure

Define a simple structure for organizing research results.

In [8]:
public record SearchSummary(string Query, string[] ResultTitles, string Summary);

Console.WriteLine("✅ Output structure defined");

✅ Output structure defined


## 9. Example: Simple Search and Summarization

Let's ask our agent to research a technical topic.

In [17]:
// Create conversation thread
var thread = new ChatHistoryAgentThread();

// Research query
string query = "What is C# 12 and what are its new features?";

Console.WriteLine($"🔍 Researching: {query}\n");

// Let the agent research the topic
await foreach (var message in researchAgent.InvokeAsync(query, thread))
{
    Console.WriteLine($"Agent: {message.Message.Content}");
}

Console.WriteLine("\n✅ Research completed");

🔍 Researching: What is C# 12 and what are its new features?

Agent: C# 12, released in 2024, introduces several new features aimed at enhancing the language's capabilities and improving developer productivity. Here’s a summary of the key features:

1. **Primary Constructors**: This feature allows developers to define constructors directly in the class declaration, simplifying the syntax and reducing boilerplate code. It enables the initialization of properties directly from the constructor parameters.

2. **Collection Expressions**: C# 12 introduces a new syntax for creating collections, making it easier to initialize collections in a more concise manner. This feature enhances readability and reduces the amount of code needed to create and populate collections.

3. **Ref Readonly Parameters**: This feature allows parameters to be passed by reference while ensuring they cannot be modified within the method. This is particularly useful for performance optimization when dealing with large

## 12. Cleanup

Properly dispose of browser resources when finished.

In [None]:
// Clean up browser resources
browserPlugin.Dispose();

Console.WriteLine("✅ Browser resources cleaned up");

## Summary

This tutorial demonstrated how to create an AI agent that can:

1. **Control a web browser** using Playwright automation
2. **Search Google autonomously** based on user queries
3. **Extract and analyze web content** from search results
4. **Provide intelligent summaries** of research findings
5. **Handle follow-up questions** within the same context

### Key Components:
- **BrowserPlugin**: Provides web automation capabilities as Kernel functions
- **ResearchAgent**: AI agent that orchestrates browser operations
- **Function Calling**: Automatic selection and execution of browser functions
- **Execution Settings**: Proper configuration for function calling behavior

### Technical Highlights:
- **Automatic Function Selection**: Agent chooses which browser functions to call
- **Error Handling**: Graceful failure management in browser operations
- **Resource Management**: Proper disposal of browser resources
- **Conversation Threading**: Maintains context across multiple interactions

### Practical Applications:
- Automated research and fact-checking
- Real-time information gathering
- Competitive analysis
- Content discovery and summarization

### Next Steps:
- Add more sophisticated content extraction
- Implement result filtering and ranking
- Add support for different search engines
- Create specialized research workflows for different domains