# 🚀 Semantic Kernel + Playwright NLP Web Automation

Welcome to this interactive Jupyter notebook! 👋 Here, we'll explore how to combine the power of Microsoft's Semantic Kernel (SK) with Playwright to create a Natural Language Processing (NLP) based web automation tool.

## 🎯 What You'll Learn

In this notebook, we'll cover:

1. 📦 Setting up the necessary packages and dependencies
2. 🔧 Creating a custom Playwright plugin for Semantic Kernel
3. 🌐 Initializing Playwright and launching a browser
4. 🧠 Configuring Semantic Kernel with OpenAI
5. 🖥️ Performing web automation tasks using natural language commands
6. 📸 Capturing screenshots of web pages

## 🛠️ How It Works

We'll use Semantic Kernel to interpret natural language commands and translate them into Playwright actions. This allows us to control web browsers using simple English phrases, making web automation more intuitive and accessible.

## 🏗️ Structure of the Notebook

1. Package installation and imports
2. Definition of the PlaywrightPlugin class
3. Browser initialization
4. Semantic Kernel setup
5. Execution of web automation tasks
6. Demonstration of results

Let's dive in and start automating the web with the power of NLP! 
🌊🏄‍♂️

- - -

# 📦 Package Setup and Imports

This cell sets up our development environment by installing necessary NuGet packages and importing required namespaces.

## 🛠️ NuGet Packages

We're using the `#r` directive to reference and install two crucial NuGet packages:

1. `Microsoft.SemanticKernel`: This package provides the Semantic Kernel framework, which we'll use for natural language processing and AI orchestration.
2. `Microsoft.Playwright`: This package gives us the ability to automate web browsers.

## 📚 Namespace Imports

We're importing several namespaces to access the functionality we need:

- `System`, `System.Diagnostics`, `System.IO`: Basic .NET functionality
- `Microsoft.AspNetCore.Html`: HTML helpers (though not used in this example)
- `System.ComponentModel`, `System.Threading.Tasks`: For asynchronous programming
- `Microsoft.Playwright`: For web automation
- `Microsoft.SemanticKernel`: For AI orchestration
- `Microsoft.SemanticKernel.Connectors.OpenAI`: For connecting to OpenAI's language models

## 🔧 Configuration

The last line sets a boolean flag `useAzureOpenAI` to `false`. This suggests that we'll be using the standard OpenAI service rather than Azure's version.

## 🚀 Next Steps

With these packages and imports in place, we're ready to start building our NLP-powered web automation tool. In the next cells, we'll define our custom Playwright plugin and set up our Semantic Kernel instance.

In [2]:
#r "nuget:Microsoft.SemanticKernel"
#r "nuget:Microsoft.Playwright"

using System;
using System.Diagnostics;
using System.IO;
using Microsoft.AspNetCore.Html;

using System.ComponentModel;
using System.Threading.Tasks;

using Microsoft.Playwright;
using Microsoft.SemanticKernel;
// using Kernel = Microsoft.SemanticKernel.Kernel;
using Microsoft.SemanticKernel.Connectors.OpenAI;

bool useAzureOpenAI = false;

# 🎭 PlaywrightPlugin Class

This code cell defines the `PlaywrightPlugin` class, which is the heart of our NLP-powered web automation tool. It encapsulates Playwright's web automation capabilities and exposes them as functions that Semantic Kernel can understand and execute.

## 🏗️ Class Structure

- The class is initialized with an `IPage` object, representing a browser page.
- It provides a static `CreateInstance` method for easy instantiation.

## 🛠️ Key Methods

The class defines numerous methods, each corresponding to a specific web automation action. Here are some highlights:

1. 🌐 **Navigation**: `GoToAsync` for navigating to URLs
2. 📝 **Form Interaction**: 
   - `FillAsync` for filling form fields
   - `ClearAsync` for clearing form fields
   - `CheckAsync` and `UncheckAsync` for checkboxes
3. 🖱️ **Mouse Actions**: 
   - `ClickAsync` for clicking
   - `DblClickAsync` for double-clicking
   - `HoverAsync` for hovering
4. ⌨️ **Keyboard Actions**: `PressAsync` for pressing keys
5. 📱 **Touch Actions**: `TapAsync` for tapping (on touch devices)
6. 🕰️ **Waiting**: 
   - `WaitForAsync` for waiting for elements
   - `WaitForLoadStateAsync` for waiting for page load states
7. 📸 **Screenshot**: `ScreenshotAsync` for taking screenshots

## 🧠 Integration with Semantic Kernel

Each method is decorated with two important attributes:

1. `[Description]`: Provides a human-readable description of what the method does.
2. `[KernelFunction]`: Marks the method as a function that Semantic Kernel can call.

These attributes allow Semantic Kernel to understand and invoke these methods based on natural language prompts.

## 🚀 Next Steps

With this `PlaywrightPlugin` class defined, we've created a powerful bridge between natural language commands and web automation actions. In the upcoming cells, we'll see how to initialize this plugin and use it with Semantic Kernel to perform web automation tasks using simple English prompts.

In [3]:
#!import SemanticKernel-Playwright-NLPTests/PlaywrightPlugin.cs

# 🌐 Browser Setup for Playwright

These two code blocks work together to set up a clean browser environment for our web automation tasks. Let's break down what each block does:

## 🧹 Cleaning Up Existing Browser Processes

```csharp
var processes = Process.GetProcessesByName("chromium");

foreach (var process in processes)
{
    process.Kill();
}
```

This block ensures a clean slate by terminating any existing Chromium processes:

1. 🔍 It searches for all running processes named "chromium".
2. 🗑️ It then iterates through these processes and forcefully terminates each one.

👉 **Why do this?** This step helps prevent conflicts with existing browser instances and ensures we're starting with a fresh browser environment.

## 🚀 Launching a New Browser Instance

```csharp
var playwright = await Playwright.CreateAsync();
var browser = await playwright.Chromium.LaunchAsync(new (){
    Headless = false
});
var context = await browser.NewContextAsync();
var page = await context.NewPageAsync();
```

This block initializes Playwright and launches a new browser instance:

1. 🎭 `Playwright.CreateAsync()`: Initializes the Playwright automation framework.
2. 🌐 `playwright.Chromium.LaunchAsync()`: Launches a new Chromium browser instance.
   - `Headless = false`: This option makes the browser visible (not headless), useful for debugging.
3. 🖼️ `browser.NewContextAsync()`: Creates a new browser context, similar to an incognito window.
4. 📄 `context.NewPageAsync()`: Opens a new page within the browser context.

👉 **Key Points:**
- We're using Chromium, but Playwright also supports Firefox and WebKit.
- The browser is launched in non-headless mode, allowing you to see the automation in action.
- Each step (playwright, browser, context, page) represents a level in Playwright's hierarchy.

## 🎯 Next Steps

With these steps completed, we now have:
1. A clean slate with no conflicting Chromium processes.
2. A new, visible Chromium browser instance ready for automation.
3. A fresh page where we can start our web automation tasks.

In the upcoming cells, we'll see how to use this `page` object with our `PlaywrightPlugin` and Semantic Kernel to perform NLP-driven web automation.

In [4]:
var processes = Process.GetProcessesByName("chromium");

foreach (var process in processes)
{
    process.Kill();
}

In [5]:
var playwright = await Playwright.CreateAsync();
var browser = await playwright.Chromium.LaunchAsync(new (){
    Headless = false
});
var context = await browser.NewContextAsync(new BrowserNewContextOptions
{
    RecordVideoDir = "videos/",
    RecordVideoSize = new RecordVideoSize { Width = 1280, Height = 720 }
});
var page = await context.NewPageAsync();

# 🧠 Semantic Kernel Setup

This code block configures Semantic Kernel with OpenAI's language models and our custom PlaywrightPlugin. Let's break it down step by step:

## 1. 🛠️ Execution Settings

```csharp
var executionSettings = new OpenAIPromptExecutionSettings(){
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};
```

This creates settings for OpenAI prompt execution:
- `AutoInvokeKernelFunctions`: This setting allows Semantic Kernel to automatically invoke functions based on the AI's understanding of the prompt.

## 2. 🏗️ Kernel Builder

```csharp
var builder = Kernel.CreateBuilder();
```

This initializes a new Kernel builder, which we'll use to configure our Semantic Kernel instance.

## 3. 🔌 Adding OpenAI Services

```csharp
builder
    .AddOpenAIChatCompletion(
        "gpt-4",
        "sk-proj-"
    )
    .AddOpenAITextGeneration(
        "gpt-4",
        "sk-proj-"
    );
```

This adds two OpenAI services to our Kernel:
1. Chat Completion: For generating conversational responses.
2. Text Generation: For generating non-conversational text.

Both are configured to use the GPT-4 model.

⚠️ **Security Note**: The API key is exposed in this code. In a production environment, always use secure methods to store and retrieve API keys, such as environment variables or a secure key vault.

## 4. 🔗 Adding PlaywrightPlugin

```csharp
builder.Plugins.AddFromObject(PlaywrightPlugin.CreateInstance(page));
```

This line adds our custom PlaywrightPlugin to the Kernel. It creates a new instance of the plugin using the `page` object we set up earlier.

## 5. 🚀 Building the Kernel

```csharp
Kernel kernel = builder.Build();
```

Finally, this line builds the Kernel with all our configurations, creating a ready-to-use Semantic Kernel instance.

## 🎯 What's Next?

With this setup complete, we now have a powerful Semantic Kernel instance that can:
1. Understand and generate natural language using GPT-4.
2. Automatically invoke our PlaywrightPlugin functions based on natural language prompts.

In the upcoming cells, we'll use this `kernel` to perform web automation tasks using simple English commands!

In [6]:
var executionSettings = new OpenAIPromptExecutionSettings(){
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};

var builder = Kernel.CreateBuilder();

builder
    .AddOpenAIChatCompletion(
        "gpt-4",
        "sk-proj-"
    )
    .AddOpenAITextGeneration(
        "gpt-4",
        "sk-proj-"
    );

builder.Plugins.AddFromObject(PlaywrightPlugin.CreateInstance(page));


Kernel kernel = builder.Build();

# 🚀 NLP-Powered Web Automation in Action

This final code block demonstrates the power of our Semantic Kernel and PlaywrightPlugin integration. We'll perform a series of web actions using natural language commands and then capture the results.

## 🔍 Step-by-Step Breakdown

1. **Navigate to Google**
   ```csharp
   await kernel.InvokePromptAsync("Navigate to 'https://google.com/'", new (executionSettings));
   ```
   This command tells the browser to go to Google's homepage.

2. **Perform a Search**
   ```csharp
   await kernel.InvokePromptAsync("Fill '[aria-label=\"Search\"]' with 'Semantic Kernel'.", new (executionSettings));
   await kernel.InvokePromptAsync("Press '[aria-label=\"Search\"]' with \"Enter\".", new (executionSettings));
   ```
   These two commands simulate typing "Semantic Kernel" into the search box and pressing Enter.

3. **Click a Search Result**
   ```csharp
   await kernel.InvokePromptAsync("Click link \"text=Introduction to Semantic Kernel\"", new (executionSettings));
   ```
   This command finds and clicks on a link containing the text "Introduction to Semantic Kernel".

4. **Get Page Information**
   ```csharp
   var page = await kernel.InvokePromptAsync("What is the current page title and url?", new (executionSettings));
   ```
   This command retrieves the title and URL of the current page.

5. **Capture and Display Results**
   ```csharp
   Console.WriteLine($"Image of the page has been saved to {page}\n\n{Convert.ToBase64String(File.ReadAllBytes("./ss.png"))}");
   ```
   This line prints the path where the screenshot is saved and displays the screenshot as a base64-encoded string.

6. **Close the Browser**
   ```csharp
   await browser.CloseAsync();
   ```
   Finally, we close the browser to clean up resources.

## 🎯 Key Takeaways

- We're using natural language commands to control the web browser.
- The Semantic Kernel interprets these commands and invokes the appropriate PlaywrightPlugin functions.
- We can perform complex web actions (navigation, form filling, clicking) with simple English phrases.
- We can also retrieve information about the current page using natural language queries.

## 🚀 Next Steps

Now that you've seen the basics, try experimenting with different commands and websites. You can:
- Automate more complex web interactions
- Extract data from web pages
- Perform multi-step processes across different websites

The possibilities are endless with NLP-powered web automation!

In [6]:
var navigateCommand = await kernel.InvokePromptAsync("Navigate to 'https://google.com/'", new (executionSettings));
var fillCommand = await kernel.InvokePromptAsync("Fill '[aria-label=\"Search\"]' with 'Semantic Kernel'.", new (executionSettings));
var pressCommand = await kernel.InvokePromptAsync("Press '[aria-label=\"Search\"]' with \"Enter\".", new (executionSettings));
var clickLinkCommand = await kernel.InvokePromptAsync("Click link \"text=Introduction to Semantic Kernel\"", new (executionSettings));
var pageTitleAndUrlCommands = await kernel.InvokePromptAsync("What is the current page title and url?", new (executionSettings));

Console.WriteLine($"Image of the page has been saved to './ss.png'.\n\n{Convert.ToBase64String(File.ReadAllBytes("./ss.png"))}");

await browser.CloseAsync();

new {
    navigateCommand,
    fillCommand,
    pressCommand,
    clickLinkCommand,
    pageTitleAndUrlCommands
}

Image of the page has been saved to './ss.png'.

iVBORw0KGgoAAAANSUhEUgAABQAAAALQCAIAAABAH0oBAAAAAXNSR0IArs4c6QAAEKVJREFUeJzt18ENwCAQwLDS/Xc+tgCJ2BPkmzUzHwAAALzuvx0AAAAAJxhgAAAAEgwwAAAACQYYAACABAMMAABAggEGAAAgwQADAACQYIABAABIMMAAAAAkGGAAAAASDDAAAAAJBhgAAIAEAwwAAECCAQYAACDBAAMAAJBggAEAAEgwwAAAACQYYAAAABIMMAAAAAkGGAAAgAQDDAAAQIIBBgAAIMEAAwAAkGCAAQAASDDAAAAAJBhgAAAAEgwwAAAACQYYAACABAMMAABAggEGAAAgwQADAACQYIABAABIMMAAAAAkGGAAAAASDDAAAAAJBhgAAIAEAwwAAECCAQYAACDBAAMAAJBggAEAAEgwwAAAACQYYAAAABIMMAAAAAkGGAAAgAQDDAAAQIIBBgAAIMEAAwAAkGCAAQAASDDAAAAAJBhgAAAAEgwwAAAACQYYAACABAMMAABAggEGAAAgwQADAACQYIABAABIMMAAAAAkGGAAAAASDDAAAAAJBhgAAIAEAwwAAECCAQYAACDBAAMAAJBggAEAAEgwwAAAACQYYAAAABIMMAAAAAkGGAAAgAQDDAAAQIIBBgAAIMEAAwAAkGCAAQAASDDAAAAAJBhgAAAAEgwwAAAACQYYAACABAMMAABAggEGAAAgwQADAACQYIABAABIMMAAAAAkGGAAAAASDDAAAAAJBhgAAIAEAwwAAECCAQYAACDBAAMAAJBggAEAAEgwwAAAACQYYAAAABIMMAAAAAkGGAAAgAQDDAAAQIIBBgAAIMEAAwAAkGCAAQAASDDAAAAAJBhgAAAAEgwwAAAACQYYAACABAMMAABAggEGAAAgwQADAACQYIABAABIMMAAAAAkGGAAAAASDDAAAAA

# 🎉 Congratulations! You've Mastered NLP-Powered Web Automation!

You've just witnessed the power of combining Semantic Kernel with Playwright to create an intuitive, natural language-driven web automation tool. But the journey doesn't end here! 🚀

## 🌐 Take Your Skills to the Next Level: Semantic Kernel and HTTP Endpoints

If you're excited about what you've learned and want to explore more applications of Semantic Kernel, we've got something special for you!

### 🔗 Check out: Semantic Kernel API Execution

We invite you to dive into another fascinating aspect of Semantic Kernel: executing HTTP endpoints. This opens up a whole new world of possibilities for AI-driven automation and integration.

👉 [Explore the Semantic Kernel API Execution Demo](https://github.com/montraydavis/MDLabs-LLM-Demos/blob/main/dotnet/src/SemanticKernel-Api-Execution/SemanticKernel-Api-Execution.ipynb)

In this demo, you'll learn how to:
- 🔧 Set up Semantic Kernel for API execution
- 🧠 Use natural language to interact with RESTful APIs
- 🔄 Process API responses using AI
- 🛠️ Integrate external services into your AI workflows

## 🚀 Why Explore This?

1. **Expand Your Toolkit**: Combine web automation with API interactions for more powerful applications.
2. **Real-World Applications**: Learn how to integrate AI with existing web services and APIs.
3. **Enhanced Automation**: Take your automation skills beyond web browsing to interact with a wide range of services.

## 🎓 Keep Learning, Keep Growing!

The world of AI and automation is vast and exciting. By combining different tools and techniques, you're setting yourself up to create truly innovative solutions.

Happy coding, and may your AI adventures be ever fruitful! 🌟💻🤖