<a href="https://colab.research.google.com/github/krMaynard/genAI/blob/main/Chrome_Extension_On_Device_AI_Q%26A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup

Please ensure you have imported a Gemini API key from AI Studio.
You can do this directly in the Secrets tab on the left.

After doing so, please run the setup cell below.

In [1]:
!pip install -U -q "google"
!pip install -U -q "google.genai"

import os
from google.colab import userdata
from google.colab import drive
os.environ["GEMINI_API_KEY"] = userdata.get("GOOGLE_API_KEY")

drive.mount("/content/drive")
# Please ensure that uploaded files are available in the AI Studio folder or change the working folder.
os.chdir("/content/drive/MyDrive/Google AI Studio")

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.6/41.6 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.8/226.8 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hMounted at /content/drive


# Code Generation

In [2]:
# To run this code you need to install the following dependencies:
# pip install google-genai

import base64
import os
from google import genai
from google.genai import types


def generate():
    client = genai.Client(
        api_key=os.environ.get("GEMINI_API_KEY"),
    )

    model = "gemini-2.5-pro"
    contents = [
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text="""create a chrome extension that uses the built in Chrome LLM API to answer questions about the page"""),
            ],
        ),
        types.Content(
            role="model",
            parts=[
                types.Part.from_text(text="""**Beginning Extension Research**

I've started digging into the Chrome extension documentation and tutorials. My focus is on understanding the LLM API's capabilities and access methods. Next, I'll explore extraction techniques.


**Reviewing API Access**

I'm now focusing on how the Chrome LLM API works, especially how it's accessed within an extension. My primary concern is the most effective approach for text extraction from the active webpage, followed by incorporating the user's question. This leads to the architecture of the extension, which I am starting to formalize in my mind.


**Initiating Extension Development**

I've made headway! I'm now exploring the Chrome extension architecture and Gemini Nano API. Initial research has been completed and the basic extension structure is now taking shape. I'm focusing on integrating the LLM for initial testing.


**Investigating Content Extraction**

I'm now investigating efficient methods to extract the core textual content from web pages, excluding unwanted elements. Current focus is on robust DOM traversal and potential libraries to improve content filtering beyond `document.body.innerText`. I am also researching how to best structure the interaction with the user, and how to handle pages with a large quantity of text.


**Clarifying Extension Scope**

I'm now zeroing in on the finer points. I've successfully navigated resources on extension structure, including the manifest, background scripts, and content scripts, and I have a firm understanding of the `chrome.ai` API. My next steps are now focusing on addressing specific design considerations. I am focusing on real-time content extraction, user interaction, handling extensive text, and error handling. I plan to refine the scope of these thoughts and develop new search queries to address these problems, which I will test once they are designed.


**Exploring Extension Architecture**

I've made significant progress, gathering a solid foundation for building a Chrome extension. I'm now clear on the core components: manifest, background scripts, and content scripts. The Chrome LLM API is also starting to become clearer, which is encouraging.


**Investigating Content Extraction & UI**

I'm expanding my exploration of key areas. I've uncovered several JavaScript libraries for content extraction, like `readability.js`, as a major upgrade for content extraction. Also, I've got a grasp of general UI best practices. I'm actively researching strategies for managing large content chunks within the API's constraints. The `simple-chromium-ai` wrapper and other materials have clarified the need for API availability checks and error management. I'm currently figuring out how to handle extremely large documents.


**Clarifying Implementation Details**

I'm now diving into the implementation phase. I have a firm grasp of the extension's core components and am confident in how to use them. The JavaScript libraries for content extraction are also clearer. My next step involves solidifying the UI design and defining a basic user interaction flow. Specifically, I'll structure a straightforward popup, as well as a more sophisticated system to summarize in chunks. Then, I will address prompt engineering and error handling.


**Analyzing Content Extraction Options**

I've made headway! I have a concrete plan and a wealth of info, and I'm homing in on Mozilla's Readability.js as a top pick for content extraction. Furthermore, I've secured resources on incorporating it into a Chrome extension. I'm feeling confident about this direction.


**Defining the Chrome Extension's Structure**

I've made great progress! I've solidified my initial plan and have ample information at my fingertips. Readability.js is still my leading choice for content extraction. I now have a clear understanding of the UI, including a simple popup with a question input and an answer display. Furthermore, I've incorporated robust strategies for handling API unavailability, ensuring the extension remains functional even if the LLM API experiences issues. I'm feeling confident about generating the code and a detailed explanation, as I have sufficient information.


"""),
                types.Part.from_text(text="""## Harnessing On-Device AI: A Chrome Extension to Answer Your Web Page Questions

**In an era of information overload, a new Chrome extension leverages the power of Google's built-in Large Language Model (LLM) to provide concise answers to your questions about the content of any webpage. This tool, built with standard web technologies, offers a glimpse into the future of AI-powered browsing, where understanding complex information is just a query away.**

This new extension adds a simple interface to your Chrome browser. With the click of a button, you can pose a question about the current page, and the on-device AI will analyze the text and provide a relevant answer. This functionality is made possible by the recently introduced `chrome.ai` API, which grants developers access to Google's Gemini Nano model directly within the browser.

### How it Works: A Look Under the Hood

The extension is comprised of a few key components that work in tandem to deliver its question-answering capabilities:

*   **Manifest File (`manifest.json`):** This file serves as the blueprint for the extension, defining its name, version, permissions, and the various scripts it utilizes. Crucially, it requests the `activeTab` permission to access the content of the current page and the `scripting` permission to inject code.

*   **Popup Interface (`popup.html` and `popup.js`):** When you click on the extension's icon, a small popup window appears. This is the primary user interface, containing a text area for your question and a button to submit it. The accompanying JavaScript file (`popup.js`) handles the user interaction, capturing the question and initiating the process of retrieving the page content and querying the LLM.

*   **Content Script (`content.js`):** This script is the bridge between the extension and the webpage. It is programmatically injected into the active tab when the user asks a question. Its primary role is to extract the main readable content of the page, stripping away unnecessary elements like navigation bars, advertisements, and footers. To achieve this, the extension utilizes Mozilla's Readability.js library, a robust open-source tool designed for this very purpose.

*   **Background Script (`background.js`):** The background script acts as the central coordinator. It listens for messages from the popup, containing the user's question and the extracted page content. Upon receiving this information, it checks for the availability of the `chrome.ai` API. If the API is accessible, it constructs a carefully worded prompt, combining the page's text with the user's query, and sends it to the built-in LLM for processing.

### The Power of On-Device AI

A key advantage of this extension is its use of an on-device LLM. This means that all the processing happens locally on your machine, ensuring privacy and a snappy user experience without the need to send data to a server. The `chrome.ai.createGenericSession()` method is used to initiate a session with the LLM, and the `prompt()` method sends the combined text and question for analysis.

The extension also incorporates error handling to gracefully manage situations where the LLM might not be available. This could be due to the user's Chrome version not supporting the API or the necessary experimental flags not being enabled. In such cases, the user is notified that the feature is unavailable.

### Prompt Engineering for Better Answers

The quality of the answers provided by the LLM heavily depends on the quality of the prompt. The extension employs basic prompt engineering techniques to guide the model towards providing the most relevant and accurate response. The prompt is structured to clearly delineate the context (the webpage content) and the specific question being asked.

### Building Your Own Page-Aware Assistant

This Chrome extension serves as a practical example of how developers can start building with the built-in AI capabilities of modern browsers. By combining content extraction libraries like Readability.js with the power of on-device LLMs, the possibilities for creating intelligent and privacy-preserving browser enhancements are vast.

For those interested in building a similar extension, the core steps involve:

1.  **Setting up the basic Chrome extension structure:** This includes creating the `manifest.json`, `popup.html`, and the necessary JavaScript files.
2.  **Integrating a content extraction library:** Mozilla's Readability.js is a strong choice and can be included as a bundled script.
3.  **Utilizing the `chrome.ai` API:** Check for its availability and use it to process the extracted content and user queries.
4.  **Designing a user-friendly interface:** A simple and intuitive popup is often the most effective approach for this type of interaction.
5.  **Implementing robust error handling:** Ensure the extension provides clear feedback to the user if the AI functionality is not available.

As on-device AI becomes more prevalent in browsers, we can expect to see a new wave of intelligent extensions that can understand and interact with web content in more sophisticated ways, ultimately making our online experiences more efficient and insightful."""),
            ],
        ),
        types.Content(
            role="user",
            parts=[
                types.Part.from_text(text="""INSERT_INPUT_HERE"""),
            ],
        ),
    ]
    tools = [
        types.Tool(url_context=types.UrlContext()),
        types.Tool(googleSearch=types.GoogleSearch(
        )),
    ]
    generate_content_config = types.GenerateContentConfig(
        temperature=0,
        thinking_config = types.ThinkingConfig(
            thinking_budget=-1,
        ),
        tools=tools,
        response_mime_type="text/plain",
    )

    for chunk in client.models.generate_content_stream(
        model=model,
        contents=contents,
        config=generate_content_config,
    ):
        print(chunk.text, end="")

if __name__ == "__main__":
    generate()


Of course. Here are the contents for a Chrome extension that uses the built-in Chrome LLM API to answer questions about the current page.

### **How it Works**

This extension operates in a simple, powerful loop:
1.  You click the extension icon and ask a question in the popup.
2.  A content script is injected into the page to extract the main, readable text, using Mozilla's `Readability.js` library to strip out ads, navigation, and other clutter.
3.  The cleaned text and your question are sent to the background script.
4.  The background script forwards this information to the built-in `chrome.ai` LLM.
5.  The LLM's response is streamed back to the popup, displaying the answer in real-time.

This entire process happens locally on your device, ensuring your data remains private.

---

### **1. `manifest.json`**

This file defines the extension's permissions and structure. We need access to the AI, the ability to run scripts on the active tab, and to store the `Readability.js` library.
