# AI weekly news reporter

The artificial intelligence landscape evolves at fast speed, with new breakthroughs, research findings, and industry developments emerging daily. For professionals, researchers, and enthusiasts trying to stay informed, this creates a significant challenge: how to efficiently filter through countless news sources, understand complex technical content, and synthesize information into actionable insights?

This notebook presents a solution through an automated news aggregation and summarization system built on a multi-agent architecture. Rather than manually scouring multiple news sources and struggling with technical jargon, our system orchestrates three specialized AI agents that work together to collect, process, and present AI/ML news in an accessible format.

The system demonstrates key concepts in modern AI development: multi-agent coordination, state management, and workflow orchestration.

In [1]:
import os
from typing import Dict, List, Any, TypedDict, Optional
from datetime import datetime
from pydantic import BaseModel
from tavily import TavilyClient
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

### LLM & web search service initialization
We initialize the external services that power our multi-agent system. This includes setting up our web search capabilities and language model interface.

In [2]:
# Initialize Tavily client for web search functionality
tavily = TavilyClient(api_key=os.getenv("TAVILY_API_KEY"))

# Configure language model with optimized parameters
llm = ChatOpenAI(
    model="gpt-4o-mini-2024-07-18",
    temperature=0.1,  # Low temperature for consistent, focused outputs
    max_tokens=600  # Reasonable limit for summary generation
)

Here we are creating our service clients that will be used throughout the workflow.
- The Tavily client provides web search capabilities specifically designed for AI applications.
- The ChatOpenAI client gives us access to GPT models. The configuration parameters are chosen to balance performance, cost, and output quality - low temperature ensures consistent summarization while the token limit prevents overly verbose responses.

### Data models and state management
Effective multi-agent systems require well-defined data structures to ensure type safety and clear communication between components. We will define our data models using Pydantic for validation and TypedDict for state management.

In [3]:
class Article(BaseModel):
    """
    Represents a single news article

    Attributes:
        title (str): Article headline
        url (str): Source URL for reference
        content (str): Full article content for summarization
    """
    title: str
    url: str
    content: str

class Summary(TypedDict):
    """
    Represents a processed article summary

    Attributes:
        title (str): Original article title
        summary (str): AI-generated summary
        url (str): Source URL for reference
    """
    title: str
    summary: str
    url: str

# This defines what information we can store and pass between nodes later
class GraphState(TypedDict):
    """
    Maintains workflow state between agents

    Attributes:
        articles (Optional[List[Article]]): Found articles from search phase
        summaries (Optional[List[Summary]]): Generated summaries from processing phase
        report (Optional[str]): Final compiled report
    """
    articles: Optional[List[Article]]
    summaries: Optional[List[Summary]]
    report: Optional[str]

These data models serve as the backbone of our system's communication. The `Article` class uses Pydantic's validation to ensure data integrity, while the `Summary` and `GraphState` classes use TypedDict for lightweight type checking. The `GraphState` class is particularly important as it defines the "memory" of our workflow - each agent can access what previous agents have accomplished and add their own contributions.

### Agent implementation
Now we will implement our three specialized agents. Each agent encapsulates specific functionality and can be tested and maintained independently.

#### 1. News searcher agent
The NewsSearcher agent is responsible for discovering relevant AI/ML content from web sources. It interfaces with the Tavily API to perform intelligent search queries and retrieve high-quality articles.

In [4]:
class NewsSearcher:
    """
    Agent responsible for finding relevant AI/ML news articles using the Tavily search API
    """

    def search(self) -> List[Article]:
        """
        Performs news search with configured parameters

        Returns:
            List[Article]: Collection of found articles
        """
        # Perform advanced search with specific parameters
        response = tavily.search(
            query="artificial intelligence and machine learning news",  # Targeted query
            topic="news",  # Focus on news content specifically
            time_period="1w",  # Recent articles only (last week)
            search_depth="advanced",  # Comprehensive search algorithm
            max_results=5  # Manageable number for processing
        )

        # Transform search results into validated Article objects
        articles = []
        for result in response['results']:
            articles.append(Article(
                title=result['title'],
                url=result['url'],
                content=result['content']
            ))

        return articles

The `NewsSearcher` agent encapsulates all the complexity of web search behind a simple interface. It uses Tavily's advanced search capabilities to find recent, relevant AI/ML news. The search parameters are carefully chosen: we focus on news content from the past week using advanced search depth for comprehensive results. The agent transforms raw search results into validated `Article` objects, ensuring data consistency for downstream processing.

#### 2. Summarizer agent
The Summarizer agent transforms complex technical content into accessible summaries. It leverages language models to simplify jargon and present key insights in a format suitable for general audiences.

In [5]:
class Summarizer:
    """
    Agent that processes articles and specializes in content simplification and summarization using gpt-4
    """

    def __init__(self):
        # Define the agent's core instruction for consistent behavior
        self.system_prompt = """
        You are an AI expert who makes complex topics accessible
        to general audiences. Summarize this article in 2-3 sentences, focusing on the key points
        and explaining any technical terms simply.
        """

    def summarize(self, article: Article) -> str:
        """
        Generates an accessible summary of a single article

        Args:
            article (Article): Article to summarize

        Returns:
            str: Generated summary
        """
        # Construct conversation with system prompt and article content
        response = llm.invoke([
            SystemMessage(content=self.system_prompt),
            HumanMessage(content=f"Title: {article.title}\n\nContent: {article.content}")
        ])
        return response.content

The `Summarizer` agent addresses an aspect of technical content curation: making complex information accessible. The system prompt is crafted to ensure consistent behavior across all articles, emphasizing clarity and accessibility. The agent processes each article individually, allowing for focused attention on the specific content and technical concepts present in each piece.

#### 3. Publisher agent
The Publisher agent serves as the final step in our workflow, compiling individual summaries into a cohesive, professional report. It handles formatting, organization, and persistence of the final output.


In [6]:
class Publisher:
    """
    Agent that compiles summaries into a formatted report and saves it to disk
    """

    def create_report(self, summaries: List[Dict]) -> str:
        """
        Creates and saves a formatted markdown report

        Args:
            summaries (List[Dict]): Collection of article summaries

        Returns:
            str: Generated report content
        """
        # Define report structure and formatting requirements
        prompt = """
        Create a weekly AI/ML news report for the general public.
        Format it with:
        1. A brief introduction
        2. The main news items with their summaries
        3. Links for further reading

        Make it engaging and accessible to non-technical readers.
        """

        # Prepare summary data for the language model
        summaries_text = "\n\n".join([
            f"Title: {item['title']}\nSummary: {item['summary']}\nSource: {item['url']}"
            for item in summaries
        ])

        # Generate the complete report using the language model
        response = llm.invoke([
            SystemMessage(content=prompt),
            HumanMessage(content=summaries_text)
        ])

        # Add metadata and persist the report
        current_date = datetime.now().strftime("%Y-%m-%d")
        markdown_content = f"""
        Generated on: {current_date}

        {response.content}
        """

        # Save report to disk for future reference
        filename = f"ai_news_report_{current_date}.md"
        with open(filename, 'w') as f:
            f.write(markdown_content)

        return response.content

The `Publisher` agent transforms individual summaries into a cohesive narrative. It uses the language model not just for summarization, but for editorial decision-making about structure, flow, and presentation. The agent handles both the creative aspects (report formatting and narrative flow) and the practical aspects (file persistence and metadata management). This demonstrates how agents can serve multiple roles within a single specialized function.

### Workflow implementation
With our agents implemented, we need to orchestrate their interaction through a workflow system. LangGraph provides the infrastructure for managing state flow and agent coordination.

#### State management nodes
In LangGraph, nodes represent individual processing steps that transform the workflow state and delegate their tasks to dedicated agents. Each node takes the current state, performs its specialized function, and returns an updated state. The `NewsSearcher`, `Summarizer`, and `Publisher` agents encapsulate the business logic, while nodes act as connectors between these agents and the LangGraph system.

In [7]:
# Node for searching news articles using the Tavily API
def search_node(state: Dict[str, Any]) -> Dict[str, Any]:
    """
    Node for article search. This node initializes the workflow by finding relevant articles and updating the state with search results

    Args:
        state (Dict[str, Any]): Current workflow state

    Returns:
        Dict[str, Any]: Updated state with found articles
    """
    searcher = NewsSearcher()  # Instantiate the news search agent
    # Use the agent to fetch relevant AI/ML news articles from Tavily
    state['articles'] = searcher.search()  # Add articles to workflow state
    return state

# Node for summarizing each retrieved article using the Summarizer agent
def summarize_node(state: Dict[str, Any]) -> Dict[str, Any]:
    """
    Node for article summarization. This node processes articles from the search phase and generates summaries for each piece of content

    Args:
        state (Dict[str, Any]): Current workflow state

    Returns:
        Dict[str, Any]: Updated state with summaries
    """
    summarizer = Summarizer() # Instantiate the summarizer agent
    state['summaries'] = []  # Initialize summaries list in the state

    # Process each article individually to generate summaries
    for article in state['articles']: # Uses articles from previous node
        summary = summarizer.summarize(article)

        # Add the structured summary information back to the state
        state['summaries'].append({
            'title': article.title,
            'summary': summary,
            'url': article.url
        })
    return state

# Node for compiling all summaries into a final markdown report
def publish_node(state: Dict[str, Any]) -> Dict[str, Any]:
    """
    Node for report generation. This node compiles summaries into a final report and handles persistence and formatting

    Args:
        state (Dict[str, Any]): Current workflow state

    Returns:
        Dict[str, Any]: Updated state with final report
    """
    publisher = Publisher()  # Instantiate the publisher agent

    # Generate a full markdown report using the summaries
    report_content = publisher.create_report(state['summaries'])

    # Save the report to state for inspection or downstream use
    state['report'] = report_content
    return state

These node functions serve as adapters between our agent classes and the LangGraph workflow system. Each node follows the same pattern: instantiate the appropriate agent, execute its primary function, update the state with results, and return the modified state. This approach maintains clear separation between business logic (in the agents) and workflow coordination (in the nodes).
* The `search_node` is responsible for initiating the entire workflow. It uses the `NewsSearcher` agent to query the Tavily API for the latest AI/ML news, and stores the resulting articles in the state under the key `"articles"`.
* The `summarize_node` consumes the articles from the previous node, then loops through them one-by-one, passing each to the `Summarizer` agent. Each result is appended as a new summary in a structured format (`title`, `summary`, `url`) under the `"summaries"` key.
* Finally, the `publish_node` aggregates all the generated summaries into a well-formatted markdown report using the `Publisher` agent. It stores the result in the state under `"report"`, and also handles saving the report to disk.

Each node adheres to the same contract:
1. Accept the current state (`Dict[str, Any]`),
2. Operate on relevant parts of it,
3. Update the state with new information,
4. Return the updated state to be passed to the next node.

This consistent structure makes it easy to reason about and maintain the system, and also ensures it is extensible — adding more nodes (e.g., for fact-checking, trend analysis, or translation) would follow the same pattern.

#### Workflow graph creation
The final step is constructing the workflow graph that defines how our nodes interact and in what sequence they execute.

In [8]:
def create_workflow() -> StateGraph:
    """
    Constructs and configures the workflow graph
    search -> summarize -> publish

    Returns:
        StateGraph: Compiled workflow ready for execution
    """

    # Create a workflow (graph) initialized with our state schema
    workflow = StateGraph(state_schema=GraphState)

    # Add processing nodes that we will flow between
    workflow.add_node("search", search_node)
    workflow.add_node("summarize", summarize_node)
    workflow.add_node("publish", publish_node)

    # Define the flow with edges
    workflow.add_edge("search", "summarize") # search results flow to summarizer
    workflow.add_edge("summarize", "publish") # summaries flow to publisher

    # Set where to start
    workflow.set_entry_point("search")

    # Compile the workflow for execution
    return workflow.compile()

The workflow graph construction creates a DAG that defines our processing pipeline. The linear flow (search → summarize → publish) reflects the dependencies between our processing steps. Each step builds upon the previous one, creating a clear, predictable execution path. The compiled workflow can be executed multiple times with different inputs, making it reusable and testable.

### Usage example
With all components implemented, we can now execute our complete news processing system. Here, we demonstrate how to run the workflow and access the results.

In [9]:
if __name__ == "__main__":
    # Initialize the workflow system
    workflow = create_workflow()

    # Execute the workflow with initial empty state
    final_state = workflow.invoke({
        "articles": None,  # Will be populated by search node
        "summaries": None,  # Will be populated by summarize node
        "report": None  # Will be populated by publish node
    })

    # Display the final generated report
    print("\n=== AI/ML Weekly News Report ===\n")
    print(final_state['report'])


=== AI/ML Weekly News Report ===

# Weekly AI/ML News Report

**Introduction:**
Welcome to this week's AI and Machine Learning news report! As technology continues to evolve, artificial intelligence (AI) and machine learning (ML) are making waves across various industries, from healthcare to cybersecurity. In this report, we’ll explore some of the latest developments that are shaping our world, making it more efficient, sustainable, and secure. Let’s dive in!

---

### 1. Merck's Tailored Approach to AI in Drug Development
**Summary:**  
Merck is taking a unique approach to artificial intelligence and machine learning by moving away from a one-size-fits-all strategy in drug development. By leveraging AI, they have enhanced the safety and effectiveness of their medications, notably with the drug MK-1084, which was optimized using both historical and new data. This tailored approach allows Merck to create solutions that meet their specific needs and those of their customers, paving the 

Here, we demonstrate the simplicity of running our complex multi-agent system. We start with an empty state and let each agent contribute its specialized functionality. The final state contains the complete processing history: the original articles, their summaries, and the final report. This provides full transparency into the system's operation and allows for debugging and quality assessment at each stage.
