# Building a News Analysis Agent with Hyperbrowser and GPT-4o

In this cookbook, we'll build an intelligent News Analysis Agent that can summarize news from any topic by automatically searching news aggregators, analyzing multiple sources, and generating comprehensive summaries with source-specific insights.

Our agent will:

1. Access news aggregator websites like Google News
2. Extract articles from multiple news sources on a specific topic
3. Analyze content across different publications
4. Generate a comprehensive summary with source-specific details
5. Answer user questions based on the aggregated news content

We'll use these tools to build our agent:

- **[Hyperbrowser](https://hyperbrowser.ai)** for web extraction and accessing news sources
- **OpenAI's GPT-4o-mini** for intelligent analysis and report generation

By the end of this cookbook, you'll have a versatile news analysis tool that can keep you informed on any topic with balanced, multi-source perspectives!


## Prerequisites

To follow along you'll need the following:

1. A Hyperbrowser API key (sign up at [hyperbrowser.ai](https://hyperbrowser.ai) if you don't have one, it's free)
2. An OpenAI API key (sign up at [openai.com](https://openai.com) if you don't have one, it's free)

Both API keys should be stored in a `.env` file in the same directory as this notebook with the following format:

```
HYPERBROWSER_API_KEY=your_hyperbrowser_key_here
OPENAI_API_KEY=your_openai_key_here
```


## Step 1: Set up imports and load environment variables


In [None]:
import asyncio
import json
import os

from dotenv import load_dotenv
from hyperbrowser import AsyncHyperbrowser
from hyperbrowser.tools import WebsiteExtractTool
from openai import AsyncOpenAI
from openai.types.chat import (
    ChatCompletionMessageParam,
    ChatCompletionMessageToolCall,
    ChatCompletionToolMessageParam,
)
from typing import List
from IPython.display import Markdown, display
import urllib.parse

load_dotenv()

## Step 2: Initialize API clients

Next, we'll initialize our API clients for Hyperbrowser and OpenAI using the environment variables we loaded. These clients will handle web scraping and AI-powered analysis respectively.


In [2]:
hb = AsyncHyperbrowser(api_key=os.getenv("HYPERBROWSER_API_KEY"))
llm = AsyncOpenAI()

## Step 3: Create a tool handler function

The `handle_tool_call` function processes requests from the LLM to interact with external tools. In this case, it handles the WebsiteExtractTool from Hyperbrowser, which allows our agent to extract structured data from news websites.

This function:

1. Takes a tool call request from the LLM
2. Checks if it's for a supported tool (in this case, the `extract_data` tool)
3. Parses the parameters and executes the tool
4. Returns the results back to the LLM
5. Provides any error handling for issues so that the LLM can fix it with different args for the tool use.

This is essential for enabling our agent to dynamically access and extract news content.


In [3]:
async def handle_tool_call(
    tc: ChatCompletionMessageToolCall,
) -> ChatCompletionToolMessageParam:
    print(f"Handling tool call: {tc.function.name}")

    try:
        if (
            tc.function.name
            == WebsiteExtractTool.openai_tool_definition["function"]["name"]
        ):
            args = json.loads(tc.function.arguments)
            content = await WebsiteExtractTool.async_runnable(hb=hb, params=args)

            return {"role": "tool", "tool_call_id": tc.id, "content": content}
        else:
            raise ValueError(f"Tool not found: {tc.function.name}")

    except Exception as e:
        err_msg = f"Error handling tool call: {e}"
        print(err_msg)
        return {
            "role": "tool",
            "tool_call_id": tc.id,
            "content": err_msg,
            "is_error": True,  # type: ignore
        }

## Step 4: Implement the agent loop

The agent loop is the heart of our news analysis system. It implements a recursive conversation pattern where:

1. The current state of the conversation is sent to the LLM (GPT-4o-mini in this case)
2. The LLM either provides a final answer or requests more information via tool calls
3. If tool calls are made, they're processed and their results are added to the conversation
4. This process repeats until the LLM provides a final analysis

This architecture allows the agent to gather information from multiple news sources iteratively before generating a comprehensive summary. We're using GPT-4o-mini here for efficient processing while maintaining high-quality analysis.


In [4]:
async def agent_loop(messages: list[ChatCompletionMessageParam]) -> str:
    while True:
        response = await llm.chat.completions.create(
            messages=messages,
            model="gpt-4o-mini",
            tools=[
                WebsiteExtractTool.openai_tool_definition,
            ],
            max_completion_tokens=8000,
        )

        choice = response.choices[0]

        # Append response to messages
        messages.append(choice.message)  # type: ignore

        # Handle tool calls
        if (
            choice.finish_reason == "tool_calls"
            and choice.message.tool_calls is not None
        ):
            tool_result_messages = await asyncio.gather(
                *[handle_tool_call(tc) for tc in choice.message.tool_calls]
            )
            messages.extend(tool_result_messages)

        elif choice.finish_reason == "stop" and choice.message.content is not None:
            return choice.message.content

        else:
            print(choice)
            raise ValueError(f"Unhandled finish reason: {choice.finish_reason}")

## Step 5: Design the system prompt

The system prompt is crucial for guiding the LLM's behavior. Our prompt establishes the agent as an expert news analyst and provides detailed instructions on how to:

1. Access and extract information from news sources
2. Generate an overall summary based on multiple sources
3. Provide source-specific analysis with attribution
4. Answer user questions based on the aggregated news content

This structured approach ensures that our news summaries are comprehensive, balanced, and properly sourced.


In [5]:
SYSTEM_PROMPT = """
You are an expert news analyst. You have access to a 'extract_data' tool which can be used to get structured data from a webpage. 

This is the link to a news aggregator {link}. You are required to summarize the overall news in a concise manner. In addition, you are required to provide the list of what the analysis is from the various news sources in detail. Your overall analysis should be based on the summary from the various individual news sources. In addition, if the user asks a question, you are required to provide the answer based on your overall summary.

In summary, you are required to provide the following:
1. Overall summary of the news, based on the individual news sources
2. List of what the analysis is from the various news sources in detail. This analysis should be based on the particular news source itself.
3. If the user asks any questions, you are required to provide the answer based on your overall summary.
""".strip()

## Step 6: Create a factory function for generating news analysis agents

Now we'll create a factory function that generates specialized news analysis agents for specific topics or sources. This approach provides several benefits:

1. **Reusability**: We can create multiple agents for different news topics
2. **Configurability**: Each agent can be configured with a specific news source URL
3. **Flexibility**: The function handles optional user questions for interactive analysis

The factory returns an async function that can be called to generate news summaries or answer topic-specific questions.


In [6]:
from typing import Coroutine, Any, Callable, Optional


def make_news_analyst(
    link_to_aggregator: str,
) -> Callable[..., Coroutine[Any, Any, str]]:
    # Popular documentation providers like Gitbook, Mintlify etc automatically generate a llms.txt file
    # for documentation sites hosted on their platforms.
    if not (
        link_to_aggregator.startswith("http://")
        or link_to_aggregator.startswith("https://")
    ):
        link_to_aggregator = f"https://{link_to_aggregator}"

    sysprompt = SYSTEM_PROMPT.format(
        link=link_to_aggregator,
    )

    async def news_agent(question: Optional[str] = None) -> str:
        messages: List[ChatCompletionMessageParam] = [
            {"role": "system", "content": sysprompt},
        ]
        if question is not None:
            messages.append(
                {
                    "role": "user",
                    "content": f"The user asked the following question as a : {question}",
                }
            )
        return await agent_loop(messages)

    return news_agent

## Step 7: Get the news topic from the users questions

So while we have the ability to summarize the news, and answer the users questions, we can't yet derive what the user wants to search about. So, we'll run a minimal query that gets the actual news query from the users question. Then, we can form the link to the news aggregator using the query.

Note that the query gets encoded using the `quote_plus` function so that the url generated conforms to the expected urls.


In [19]:
async def convert_to_news_query(question: str) -> Optional[str]:
    messages: List[ChatCompletionMessageParam] = [
        {
            "role": "system",
            "content": """Convert the user's question into a concise news search query. Focus on key terms and remove unnecessary words. The query should be suitable for searching news articles.
            For example:
            "What are the latest developments in artificial intelligence regulation?" -> AI regulation news
            "How is the current state of the NBA?" -> NBA news""",
        },
        {
            "role": "user",
            "content": f"Convert this question to a news search query: {question}",
        },
    ]

    response = await llm.chat.completions.create(
        model="gpt-4o-mini", messages=messages, temperature=0.3, max_tokens=50
    )

    return response.choices[0].message.content


async def generate_news_url(question: str) -> Optional[str]:
    query = await convert_to_news_query(question)
    print(query)
    if query is not None:
        return f"https://news.google.com/search?q={urllib.parse.quote_plus(query)}"
    return None


async def get_news_summary(query: str) -> Optional[str]:
    news_url = await generate_news_url(query)
    if news_url is not None:
        news_agent = make_news_analyst(news_url)
        response = await news_agent(query)
        if response:
            return response
        else:
            return "**No response from the agent when summarizing news**"
    else:
        return "**Could not generate news url**"

## Step 8: Set up the news topic and user question

Now we'll define the specific news topic we want to analyze and an optional user question for the agent to answer. In this example, we're:

1. Setting up a Google News search for "bird flu" articles
2. Asking the specific question "Is bird flu a serious concern for humans?"

This demonstrates how the agent can not only summarize news but also answer specific questions about the topic.


In [16]:
query = "Is bird flu a serious concern for humans?"

## Step 9: Run the news analyst agent

Finally, we'll create an instance of our news analysis agent for the specified topic and run it with our question. This demonstrates the full workflow:

1. The agent accesses Google News for "bird flu" articles
2. It extracts content from multiple news sources on this topic
3. It analyzes and summarizes the content from different publications
4. It generates a comprehensive summary with source-specific details
5. It answers our specific question about human health concerns

The formatted output will include an overall summary, source-by-source analysis, and an answer to our question.


In [18]:
news_summary = await get_news_summary(query)
display(Markdown(news_summary))

bird flu human concern news
Handling tool call: extract_data


### Overall Summary of the News
Recent news highlights growing concerns about bird flu (H5N1) and its potential threat to human health. Experts warn that the risk of transmission to humans is increasing, especially with recent cases reported in the U.S. and the detection of the virus in new animal hosts, such as pigs. The World Health Organization (WHO) has expressed "enormous concern" over the situation, emphasizing the need for vigilance and preparedness. Mutations in the virus have raised alarms about the potential for a pandemic, prompting calls for enhanced public health measures and vaccine development.

### Detailed Analysis from Various News Sources
1. **How worried should you be about bird flu?**  
   - **Source:** [Link](https://news.google.com/read/CBMimAFBVV95cUxPakMwOW1ia0RSaEJQMjNWNFliNlliaEs3a19PZEo4RkZfV0x1bm00NnB3MXZ4R1BBWC1BOVdYN1JQc1UyVHlUeEJvTkM3THFURFFseFVnNnpLRzRWWnlqOFpoc29JOGlxSFpUdVNsU24tbzRVQzkxaS1oaE8tUDB2Z2xRYTMxdkUxb1hteE5DMDJQbzFKbERqbg?hl=en-US&gl=US&ceid=US%3Aen)
   - **Analysis:** Experts express significant concern over the rising risk of bird flu (H5N1) spreading to humans, especially with recent cases reported in the U.S. and other countries. The potential for mutations that could facilitate human-to-human transmission is alarming.

2. **Will bird flu spark a human pandemic? Scientists say the risk is rising**  
   - **Source:** [Link](https://news.google.com/read/CBMiX0FVX3lxTE9sSElUMG1MZFNfUElUUlJNX2RDRVdIUUZxSk92OXRMdVJlTDlSRE9tOWxxZEZVUTViZUY4V3h1MW5kOUx1UWJLaUs1aVFMc2FaNC02bERGdUFzNHp4b0J3?hl=en-US&gl=US&ceid=US%3Aen)
   - **Analysis:** Recent studies indicate that the risk of a human pandemic due to bird flu is increasing, particularly as the virus has been detected in new animal hosts, including pigs. Experts warn that the virus's ability to mutate poses a significant threat to public health.

3. **Will Bird Flu Lead To A Pandemic? First U.S. Patient Dies From H5N1**  
   - **Source:** [Link](https://news.google.com/read/CBMic0FVX3lxTE8tLVN6aHFKYUJpQXpTcXgxbUVhVTgtblg3eUVEcnJBRVc2SkVVVFlqcGhoZGFscThRdzdheElSajNVcDhrbGJkMk1tTHNHWXlXRk14ck04NUhhckE2QmxGNHVsSTkxOFByN1h5NzU0bkdtX0nSAVRBVV95cUxPN3lHLTNpYmtpUjdiV3g2Q3FKcXFZT1M0VlhGNmcxMWRxUzhhNjNxZFY1c0owN0JuWFY1RXBld3RJaVJXSG9rdzNOWVZFS0R4b29HZEg?hl=en-US&gl=US&ceid=US%3Aen)
   - **Analysis:** The first reported death of a U.S. patient from H5N1 bird flu has heightened concerns about the virus's potential to cause a pandemic. Health officials emphasize the need for preparedness and monitoring of the virus's spread.

4. **Expert concerned U.S. not prepared for bird flu spread in humans**  
   - **Source:** [Link](https://news.google.com/read/CBMic0FVX3lxTE8tLVN6aHFKYUJpQXpTcXgxbUVhVTgtblg3eUVEcnJBRVc2SkVVVFlqcGhoZGFscThRdzdheElSajNVcDhrbGJkMk1tTHNHWXlXRk14ck04NUhhckE2QmxGNHVsSTkxOFByN1h5NzU0bkdtX0nSAVRBVV95cUxPN3lHLTNpYmtpUjdiV3g2Q3FKcXFZT1M0VlhGNmcxMWRxUzhhNjNxZFY1c0owN0JuWFY1RXBld3RJaVJXSG9rdzNOWVZFS0R4b29HZEg?hl=en-US&gl=US&ceid=US%3Aen)
   - **Analysis:** Experts are worried that the U.S. is not adequately prepared for a potential outbreak of bird flu among humans. They stress the importance of developing vaccines and public health strategies to mitigate the risks.

### Answer to the User's Question
Yes, bird flu is considered a serious concern for humans. There has been an increase in reported cases, including the first death of a U.S. patient, which has raised alarms about the virus's potential to mutate and lead to a pandemic. Experts emphasize the need for preparedness, monitoring, and public health strategies to address this growing threat.

## Step 8: Try it with your own questions

Now that we've seen the agent in action, you can have it analyze your own queries.

```python
# Example: Get the news summary for another topic
# news_topic = "How's the crypto market doing ?"
# news_summary = await get_news_summary(news_topic)
# print(news_summary)
```

Feel free to experiment with different questions and news aggregation sites!


## Conclusion

In this cookbook, we built a powerful news analysis agent using Hyperbrowser and OpenAI's GPT-4o-mini. Our agent can:

1. Access and extract information from news aggregators like Google News
2. Process content from multiple news sources on any topic
3. Generate comprehensive, balanced summaries with source attribution
4. Answer specific questions based on the aggregated news content
5. Present information in a well-structured, easy-to-understand format

This tool can be invaluable for:

- Staying informed on complex news topics without bias
- Quickly understanding the consensus and divergent views across publications
- Getting concise answers to specific questions about current events
- Saving time by consolidating information from multiple sources

### Next Steps

To enhance this news analysis system further, you could:

- Add sentiment analysis to detect media bias
- Implement tracking of topics over time to identify trend changes
- Create a web interface for easier access to news summaries
- Add support for specialized news sources in specific domains (finance, technology, etc.)
- Implement translation for international news coverage

Happy news analyzing!


## Relevant Links

- [Hyperbrowser Documentation](https://docs.hyperbrowser.ai)
- [OpenAI API Documentation](https://platform.openai.com/docs/introduction)
- [Google News](https://news.google.com)
