<a href="https://colab.research.google.com/github/rmit-ir/Tutotrial-Practical-LLMs/blob/main/LLM_Tutorial_Challenge2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing Required Python Packages

This notebook demonstrates how to use the OpenRouter and SerpApi APIs to perform web search and content analysis. Before we begin, we need to install several essential Python packages:

1. **serpapi** - A client library for SerpAPI service to access Google search results programmatically
2. **selenium** - For web browser automation and content scraping
3. **webdriver-manager** - To help manage browser drivers for Selenium

These packages are necessary for:
- Retrieving search results from Google
- Scraping web content
- Processing and analyzing the retrieved data

In [1]:
# Install the SerpApi library, used to scrape search engine results pages (SERPs)
%pip install serpapi

# Install the Selenium library, used for web browser automation and the WebDriver Manager, which helps manage browser drivers
%pip install selenium webdriver-manager

Collecting serpapi
  Downloading serpapi-0.1.5-py2.py3-none-any.whl.metadata (10 kB)
Downloading serpapi-0.1.5-py2.py3-none-any.whl (10 kB)
Installing collected packages: serpapi
Successfully installed serpapi-0.1.5
Collecting selenium
  Downloading selenium-4.30.0-py3-none-any.whl.metadata (7.5 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.29.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting python-dotenv (from webdriver-manager)
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.9->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading se

# Importing Essential Libraries for API & Data Handling

This script imports essential libraries for handling JSON data, making API requests, and working with Google Colab.

## Libraries Used:
- **json**: Handles JSON data, commonly used in APIs.
- **textwrap**: Formats and wraps text for better readability.
- **pandas**: Facilitates data manipulation and analysis.
- **requests**: Sends HTTP requests to fetch data from APIs.
- **serpapi**: Retrieves Google search results via SerpAPI.
- **google.colab.userdata**: Accesses user-specific data in Google Colab (e.g., secret API keys).


In [3]:
# Import the JSON module for handling JSON data
import json
import textwrap  # Used for formatting and wrapping text, useful for displaying text in a readable way

# Import pandas for data manipulation and analysis
import pandas as pd
# Import the requests library for making HTTP requests to APIs
import requests
# Import the SerpAPI client for Google search results
import serpapi
# Provides access to user-specific information in Google Colab, used to access the user's secret API key
from google.colab import (
    userdata,
)

# Text Wrapping  

This code sets a verbosity level for controlling output display and defines a text-wrapping function for better readability. The `printw` lambda function ensures printed text does not exceed the specified line width.  


In [4]:
VERBOSE = 0  # 'VERBOSE' controls the level of logging or output that is displayed (0: no output, 1: some output, 2: all output)

# set line wrap for print, lower for smaller screens
WRAP = 100  # Defines the maximum line width for wrapping text
printw = lambda x: print(
    textwrap.fill(x, WRAP)
)  # Create a lambda function that wraps text to fit within the specified width (WRAP)

# Test that the API keys are set and accessible

In [7]:
# Test that the API keys are set and accessible
assert (
        userdata.get("OPENROUTER_API_KEY") is not None
), "Please set your OPENROUTER_API_KEY key in user secrets and allow access to it."

assert userdata.get("SERP_API_KEY"), "Please set your SERP_API_KEY in user secrets."

OPENROUTER_API_KEY = userdata.get("OPENROUTER_API_KEY")
SERP_API_KEY = userdata.get("SERP_API_KEY")

# Fetching Search Results with SerpApi  

This function queries **SerpApi's Google Light API** to fetch search results and returns them as a **pandas DataFrame**. It allows specifying the maximum number of results and provides logging options for debugging.  


In [8]:
def fetch_documents_with_serpapi(query, max_results=5, verbose=VERBOSE):
    """
    Fetch documents from SerpApi using the Google Light API.
    Args:
        query (str): The search query.
        max_results (int): The number of results to retrieve.
    Returns:
        pd.DataFrame: A DataFrame containing the search results.
    """

    # Define the search parameters, more info at https://serpapi.com/google-light-api#api-parameters
    params = {
        "engine": "google_light",
        "q": query,  # Search query
        "num": max_results,  # Max number of results to retrieve
        "google_domain": "google.com",  # Google domain to use for the search
        "hl": "en",  # Language code
        "gl": "us",  # Country code
        "api_key": SERP_API_KEY  # Your SerpApi API key
    }

    serp = serpapi.search(params)  # Perform the search using the SerpApi client
    organic_results = serp.get("organic_results",
                               {'position': None})  # Extract the organic results from the search response

    if verbose > 0:
        printw(f"SerpApi returned {len(organic_results)} results for query: {query}")
    if verbose > 1:
        print(f"SerpApi results: {json.dumps(organic_results, indent=2)}")

    return pd.DataFrame(organic_results).set_index('position')[['title', 'link', 'snippet']]

## Code Explanation

1. **Define Search Query:**  
   `query` is set to "Do antioxidants help female subfertility?", which will be searched.

2. **Fetch Documents:**  
   `fetch_documents_with_serpapi(query, max_results=7)` sends the query to SerpAPI, fetching the top 7 results. The results are stored in `documents_df`.

3. **Display Results:**  
   `documents_df` displays the search results in a DataFrame format.

In short, the code searches for the query, retrieves the top 7 results, and shows them in a DataFrame.


In [9]:
# Example usage: Search for the query:
query = "Do antioxidants help female subfetility?"
documents_df = fetch_documents_with_serpapi(query, max_results=7)

# Display the results
documents_df

Unnamed: 0_level_0,title,link,snippet
position,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,The Importance of Natural Antioxidants in Fema...,https://pmc.ncbi.nlm.nih.gov/articles/PMC10135...,Compensation for low levels of antioxidants th...
2,Impact of dietary antioxidants on female infer...,https://www.nature.com/articles/s41598-024-724...,The results highlight the role of increased di...
3,Antioxidants in fertility: impact on male and ...,https://www.sciencedirect.com/science/article/...,Antioxidants could be an inexpensive treatment...
4,Towards Personalized Antioxidant Use in Female...,https://pmc.ncbi.nlm.nih.gov/articles/PMC8698668/,Many studies showed an improvement of fertilit...
5,6 Antioxidants to Know: Your Guide to Boosting...,https://fullwellfertility.com/blogs/knowledgew...,Antioxidants have been shown to improve and pr...
6,Female infertility and dietary antioxidant ind...,https://bmcwomenshealth.biomedcentral.com/arti...,Adequate intake of natural antioxidants may im...
7,Antioxidants and Fertility in Women with Ovari...,https://www.sciencedirect.com/science/article/...,A Cochrane review conducted in the subfertilit...


# Webpage Fetching and Parsing with Selenium

This code defines a function `fetch_and_parse_webpage` that automates fetching and parsing webpage content using Selenium, with retries and timeout handling. It removes non-content elements like headers and footers, then extracts the main content using predefined CSS selectors.

If no main content is found, it retrieves all text from the page's body. The function is applied to each URL in the `documents_df['link']` column, storing the parsed content in a new column `documents_df['content']`.

In [10]:
# Core Selenium package for browser automation
from selenium import webdriver
# Chrome-specific options for configuring the browser session
from selenium.webdriver.chrome.options import Options
# Locator strategies for finding elements on the page
from selenium.webdriver.common.by import By
# Import WebDriverWait and expected conditions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Import time module for sleep functionality
import time


def fetch_and_parse_webpage(url, max_retries=2, timeout=30):
    """
    Fetch and parse webpage with improved timeout handling and retries.

    Args:
        url (str): The webpage URL to fetch
        max_retries (int): Number of retry attempts
        timeout (int): Page load timeout in seconds
    """
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    # Add performance options
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-extensions")
    chrome_options.page_load_strategy = 'eager'  # Wait for the page to load completely

    for attempt in range(max_retries):
        try:
            driver = webdriver.Chrome(options=chrome_options)
            driver.set_page_load_timeout(timeout)
            driver.set_script_timeout(timeout)

            # Load page with explicit wait
            driver.get(url)
            WebDriverWait(driver, timeout).until(
                EC.presence_of_element_located((By.TAG_NAME, "body"))
            )
            # List of CSS selectors for elements to remove (navigation, menus, etc.)
            ignore_elements = [
                "nav",  # Navigation bars
                "header",  # Site headers
                "footer",  # Site footers
                "menu",  # Menu elements
                '[role="navigation"]',  # ARIA navigation roles
                '[role="banner"]',  # ARIA banner roles (headers)
                '[role="complementary"]',  # ARIA sidebars/complementary content
                ".sidebar",  # Sidebar classes
                "#navigation",  # Navigation IDs
                ".menu",  # Menu classes
                ".nav",  # Nav classes
            ]

            # Remove all non-content elements from the page
            for selector in ignore_elements:
                elements = driver.find_elements(By.CSS_SELECTOR, selector)
                for element in elements:
                    try:
                        driver.execute_script("arguments[0].remove()", element)
                    except:
                        continue

            # Prioritized list of selectors for main content areas
            content_selectors = [
                "article",  # Standard article tag
                '[role="main"]',  # ARIA main content role
                ".post-content",  # Common content class
                ".article-content",  # Common article class
                "main",  # HTML5 main tag
                "#content",  # Common content ID
            ]

            # Try each content selector in order until we find content
            content = None
            for selector in content_selectors:
                elements = driver.find_elements(By.CSS_SELECTOR, selector)
                if elements:
                    content = elements[0].text
                    break

            # If no content found with specific selectors, get all body text
            if not content:
                content = driver.find_element(By.TAG_NAME, "body").text

            driver.quit()

            # Clean and format the extracted text
            lines = [line.strip() for line in content.split("\n")]
            # Remove short lines (likely UI elements) and empty lines
            lines = [
                line for line in lines if line and len(line) > 20
            ]
            return "\n".join(lines)

        except requests.exceptions.Timeout:
            print(f"Timeout occurred for the URL: {url}")
            driver.quit()
            if attempt == max_retries - 1:
                return "Failed to fetch the webpage due to timeout."
            chrome_options.page_load_strategy = 'eager'  # Reset to eager strategy
            time.sleep(3)  # Wait before retrying
            print(f"Retrying... ({attempt + 1}/{max_retries})")
            continue
        except Exception as e:
            print(f"Attempt {attempt + 1}/{max_retries} failed for the URL: {url}")
            print("Error:", str(e))
            driver.quit()
            if attempt == max_retries - 1:
                return "Failed to fetch the webpage."
            time.sleep(3)  # Wait before retrying
            print(f"Retrying... ({attempt + 1}/{max_retries})")
            continue

    return "Failed to fetch the webpage."


documents_df['content'] = documents_df['link'].apply(fetch_and_parse_webpage)
documents_df

Unnamed: 0_level_0,title,link,snippet,content
position,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,The Importance of Natural Antioxidants in Fema...,https://pmc.ncbi.nlm.nih.gov/articles/PMC10135...,Compensation for low levels of antioxidants th...,Antioxidants (Basel). 2023 Apr 11;12(4):907. d...
2,Impact of dietary antioxidants on female infer...,https://www.nature.com/articles/s41598-024-724...,The results highlight the role of increased di...,The composite dietary antioxidant index (CDAI)...
3,Antioxidants in fertility: impact on male and ...,https://www.sciencedirect.com/science/article/...,Antioxidants could be an inexpensive treatment...,There was a problem providing the content you ...
4,Towards Personalized Antioxidant Use in Female...,https://pmc.ncbi.nlm.nih.gov/articles/PMC8698668/,Many studies showed an improvement of fertilit...,Biomedicines. 2021 Dec 17;9(12):1933. doi: 10....
5,6 Antioxidants to Know: Your Guide to Boosting...,https://fullwellfertility.com/blogs/knowledgew...,Antioxidants have been shown to improve and pr...,6 Antioxidants to Know: Your Guide to Boosting...
6,Female infertility and dietary antioxidant ind...,https://bmcwomenshealth.biomedcentral.com/arti...,Adequate intake of natural antioxidants may im...,Published: 16 November 2023\nFemale infertilit...
7,Antioxidants and Fertility in Women with Ovari...,https://www.sciencedirect.com/science/article/...,A Cochrane review conducted in the subfertilit...,There was a problem providing the content you ...


# Display the first 1000 characters of the content for first 5 documents


In [11]:
for text in documents_df["content"].head():
    printw(text[:1000])  # Show first 1000 characters
    print("-" * 80)

Antioxidants (Basel). 2023 Apr 11;12(4):907. doi: 10.3390/antiox12040907 The Importance of Natural
Antioxidants in Female Reproduction 2, Miroslava Rabajdová Editor: Stanley Omaye Copyright and
License information PMCID: PMC10135990  PMID: 37107282 Oxidative stress (OS) has an important role
in female reproduction, whether it is ovulation, endometrium decidualization, menstruation, oocyte
fertilization, or development andimplantation of an embryo in the uterus. The menstrual cycle is
regulated by the physiological concentration of reactive forms of oxygen and nitrogen as redox
signal molecules, which trigger and regulate the length of individual phases of the menstrual cycle.
It has been suggested that the decline in female fertility is modulated by pathological OS. The
pathological excess of OS compared to antioxidants triggers many disorders of female reproduction
which could lead to gynecological diseases and to infertility. Therefore, antioxidants are crucial
for proper female repr

# `get_response` function overview

The `get_response` function sends a prompt to the OpenRouter API to get a model's response. It accepts a prompt, model, and optional parameters (e.g., `top_p`, `temperature`) for model configuration.

The function constructs and sends a `POST` request to the API with the given parameters, then returns the response in JSON format. If `verbose` is enabled, it prints status and usage details for debugging.

In [None]:
def get_response(
        prompt: str, model: str, verbose: int = VERBOSE, **model_kwargs
) -> dict:
    """
    Get a response from the OpenRouter API using the given prompt and model.
    Make sure to set your OpenRouter API key in the environment variable
    OPENROUTER_API_KEY. OpenRouter normalizes requests and responses across
    providers. That is, you can use the same code to call different models from
    different providers.
    Args:
        prompt (str): The prompt to send to the model.
        model (str): The model to use.
        verbose (int): Verbosity level for debugging.
        **model_kwargs: Additional keyword arguments for the model.
            - top_p: Top-p sampling parameter.
            - temperature: Temperature parameter for sampling.
            - frequency_penalty: Frequency penalty parameter.
            - presence_penalty: Presence penalty parameter.
            - repetition_penalty: Repetition penalty parameter.
            - top_k: Top-k sampling parameter.
            - max_tokens: Maximum number of tokens to generate.
    Note: The model_kwargs parameters are optional and will be set to default values if not provided.
    Returns:
        dict: The response from the model.
    """
    # Check if model parameter is provided, if not, set a default value.
    # More information about the parameters can be found in the OpenRouter API documentation.
    # https://openrouter.ai/docs/api-reference/parameters
    top_p = model_kwargs.get("top_p", 1)
    temperature = model_kwargs.get("temperature", 0.9)
    frequency_penalty = model_kwargs.get("frequency_penalty", 0)
    presence_penalty = model_kwargs.get("presence_penalty", 0)
    repetition_penalty = model_kwargs.get("repetition_penalty", 1)
    top_k = model_kwargs.get("top_k", 0)
    max_tokens = model_kwargs.get("max_tokens", 50000)

    messages = [{"role": "user", "content": prompt}]

    response = requests.post(
        url="https://openrouter.ai/api/v1/chat/completions",
        headers={"Authorization": f"Bearer {userdata.get('OPENROUTER_API_KEY')}"},
        data=json.dumps(
            {
                "model": model,
                "messages": messages,
                "top_p": top_p,
                "temperature": temperature,
                "frequency_penalty": frequency_penalty,
                "presence_penalty": presence_penalty,
                "repetition_penalty": repetition_penalty,
                "top_k": top_k,
                "max_tokens": max_tokens,
            }
        ),
    )
    if verbose > 0:
        print(f"Response status code: {response.status_code}")
    response_json = response.json()
    # let's print how many tokens we used, it can be useful for cost estimation
    if verbose > 0:
        print(f"Response usage: {response_json.get('usage')}")
    return response_json

## Available models in OpenRouter
- Models are categorized as free (marked with ":free") or paid with varying pricing
- Models may change over time - check current availability and pricing at: ttps://openrouter.ai/models

Note: Different models have varying token limits and optimal hyperparameter settings

In [None]:
MODEL = {
    "llama-free": "meta-llama/llama-3.3-70b-instruct:free",
    "deepseek-r1-free": "deepseek/deepseek-r1-distill-llama-70b:free",
    "deepseek-r1-qwen": "deepseek/deepseek-r1-distill-qwen-1.5b",
    "gemini-flash-2": "google/gemini-2.0-flash-001",
    "gemini-pro-2": "google/gemini-2.0-pro-exp-02-05:free",
    "gemini-flash-2free": "google/gemini-2.0-flash-exp:free",
    "gemma-3-4b": "google/gemma-3-4b-it:free",
    "llama-3.2-1b": "meta-llama/llama-3.2-1b-instruct",
    "gpt-4o-mini": "openai/gpt-4o-mini",
}

In [None]:
def generate_summary(
        text: str,
        prompt: str,
        model: str,
        verbose: int = VERBOSE,
        **model_kwargs
) -> tuple:
    """
    Generate a summary for the given text using the specified model and prompt.
    Args:
        text (str): The text to summarize.
        prompt (str): The prompt template for the model.
        model (str): The model to use for summarization.
        verbose (int): Verbosity level for debugging.
    Returns:
        tuple: A tuple containing the summary, reasoning, and response JSON.
    Note: The prompt should be formatted with the text to summarize.
    """

    prompt_text = prompt.format(text=text)
    if verbose > 0:
        print(f"Running for {text[:50]}... with prompt:")
        if verbose > 1:
            print(f"Prompt:")
            printw(prompt_text)
    response_json = get_response(prompt=prompt_text, model=model, verbose=verbose, **model_kwargs)
    response_message = response_json.get("choices", {0: {"message": None}})[0]["message"]
    if response_message is None:
        print(f"No response message for {text[:50]}...")
        print(response_json.get("error"))
    if verbose > 1:
        print(f"Response for {text[:50]}...:")
        printw(f"Response message: {response_message}")
        print("\n", "-=" * 5, " End of response ", "=-" * 5, "\n")
    response_content = response_message.get("content", None)
    reasoning_result = response_message.get("reasoning", None)
    return response_content, reasoning_result, response_json

# Example usage
prompt = '''
You are an expert in the field of IIR. You have been asked to summarize the content of this document.
Your task is to provide a concise summary of the main points and key takeaways from the document.
Document:
{text}
Summary:
'''
model = MODEL["gpt-4o-mini"]  # Select the model to use for generating summaries
# Generate summary for the first document
text = documents_df["content"].iloc[0]
summary, reasoning, response_json = generate_summary(
    text=text,
    prompt=prompt,
    model=model,
    verbose=VERBOSE,
)
# Display the summary
printw(summary)
# Display the reasoning
if reasoning is not None:
    printw(reasoning)
# Display the response JSON for debugging
print(json.dumps(response_json, indent=2))

The document titled "The Importance of Natural Antioxidants in Female Reproduction" highlights the
significant role of antioxidants in female reproductive health, emphasizing their protective effects
against oxidative stress (OS).   Key points include:  1. **Oxidative Stress and Reproductive
Health**: OS plays a critical role in various reproductive processes including ovulation,
fertilization, and embryo implantation. A pathological excess of OS can lead to disorders such as
infertility and gynecological diseases.  2. **Role of Antioxidants**: Natural antioxidants,
including vitamins (A, C, E, and B9), melatonin, L-carnitine, flavonoids like quercetin and
resveratrol, and trace elements such as zinc and selenium, help counteract OS, regulate hormonal
functions, and promote reproductive health.  3. **Mechanisms of Action**: The review discusses the
biochemical pathways through which antioxidants operate, primarily through modulation of signaling
pathways such as Nrf2 (which promotes an

In [None]:
summary_prompt = '''
You are an expert in the field of IIR. You have been asked to summarize the content of this document.
Your task is to provide a concise summary of the main points and key takeaways from the document.
Document:
{text}
Summary:
'''
# Generate a summary for each documents, the function will return a tuple of (summary, reasoning, response_json)
documents_df["summary"] = documents_df["content"].apply(
    lambda x: generate_summary(
        text=x,
        prompt=summary_prompt,
        model=model,
        verbose=VERBOSE,
        max_tokens=5000, # Decrease the max tokens to avoid exceeding the limit
    )[0]
)
# Display the summaries
documents_df[["title", "link", "content", "summary"]].head()

Unnamed: 0_level_0,title,link,content,summary
position,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,The Importance of Natural Antioxidants in Fema...,https://pmc.ncbi.nlm.nih.gov/articles/PMC10135...,Antioxidants (Basel). 2023 Apr 11;12(4):907. d...,"The document published in ""Antioxidants"" discu..."
2,Impact of dietary antioxidants on female infer...,https://www.nature.com/articles/s41598-024-724...,The composite dietary antioxidant index (CDAI)...,The document presents a study that investigate...
3,Antioxidants in fertility: impact on male and ...,https://www.sciencedirect.com/science/article/...,There was a problem providing the content you ...,The document indicates that there was an issue...
4,Towards Personalized Antioxidant Use in Female...,https://pmc.ncbi.nlm.nih.gov/articles/PMC8698668/,Biomedicines. 2021 Dec 17;9(12):1933. doi: 10....,The document discusses the increasing use of a...
5,6 Antioxidants to Know: Your Guide to Boosting...,https://fullwellfertility.com/blogs/knowledgew...,6 Antioxidants to Know: Your Guide to Boosting...,The document discusses the role of antioxidant...


In [None]:
# Concatenate all document titles and contents into a single string, and summarize it
all_documents = 'Document title: ' + documents_df['title'] + '\n' + 'Document Content: ' + documents_df[
    'content'] + '\n\n'
response_content, reasoning_result, response_json = generate_summary(
    text=all_documents,
    prompt=summary_prompt,
    model=model,
    verbose=VERBOSE,
    temperature=0.5 # Adjust the temperature for more deterministic output
)
# Display the summary of all documents
for row in response_content.split('\n'):
    printw(row)

The document appears to be a collection of titles related to the role of antioxidants in health,
particularly focusing on fertility and dietary impacts. Here are the main points and key takeaways:

1. **Natural Antioxidants**: The importance of natural antioxidants in promoting overall health and
their potential role in preventing various diseases.

2. **Dietary Antioxidants**: An exploration of how dietary antioxidants can influence health
outcomes, particularly in relation to fertility.

3. **Fertility and Antioxidants**: A discussion on the impact of antioxidants on fertility,
emphasizing their significance in reproductive health.

4. **Personalized Antioxidant Approaches**: The potential for personalized nutrition strategies that
incorporate antioxidants to optimize health and fertility.

5. **Key Antioxidants**: An overview of specific antioxidants that are beneficial for health, with a
focus on their relevance to fertility.

6. **Female Infertility**: An examination of the relati

# Display results in Google-like way

In [None]:
from IPython.display import HTML

def create_serp_page(documents_df):
    """
    Display a Google-like search results page from the fetched documents.

    Args:
        documents_df (pd.DataFrame): DataFrame containing the document titles, links, and snippets.
    """

    # Create an HTML structure for displaying the search results in a Google-like layout
    html_content = """
    <html>
    <head>
        <title>Search Results</title>
        <style>
            body {
                font-family: Arial, sans-serif;
                margin: 20px;
                background-color: #f9f9f9;
            }
            .search-results {
                max-width: 800px;
                margin: auto;
                background-color: white;
                padding: 20px;
                box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1);
                border-radius: 8px;
            }
            .result-item {
                margin-bottom: 20px;
            }
            .result-title {
                font-size: 20px;
                color: #1a0dab;
                text-decoration: none;
            }
            .result-title:hover {
                text-decoration: underline;
            }
            .result-snippet {
                color: #4d5156;
                font-size: 14px;
                margin-top: 5px;
            }
            .result-summary {
                color: #4d5156;
                font-size: 14px;
                margin-top: 5px;
            }
            .result-link {
                color: #006621;
                font-size: 14px;
            }
            .result-link:hover {
                text-decoration: underline;
            }
            .search-bar {
                background-color: #f8f9fa;
                padding: 10px;
                margin-bottom: 20px;
                border-radius: 8px;
                box-shadow: 0px 2px 4px rgba(0, 0, 0, 0.1);
            }
            .search-bar input {
                width: 100%;
                padding: 10px;
                font-size: 16px;
                border-radius: 4px;
                border: 1px solid #ddd;
            }
        </style>
    </head>
    <body>
        <div class="search-results">

    """

    # Loop through each document and create a search result item
    for index, row in documents_df.iterrows():
        title = row["title"]
        link = row["link"]
        snippet = row["snippet"]
        summary = row["summary"]

        html_content += f"""
            <div class="result-item">
                <a class="result-title" href="{link}" target="_blank">{title}</a>
                <div class="result-snippet">{snippet}</div>
                <a class="result-link" href="{link}" target="_blank">{link}</a>
                <div class="result-summary">{summary}</div>
            </div>
        """

    # Close the HTML tags
    html_content += """
        </div>
    </body>
    </html>
    """

    # Display the HTML content in the notebook
    display(HTML(html_content))


# Example usage: Display the results from the fetched documents
create_serp_page(documents_df)