# 1 Project Overview & Architecture

## NewsFindr: Agentic GenAI for Personalized News Discovery

NewsFindr is an agentic AI system that delivers real-time, trustworthy,
and personalized news using a structured multi-agent architecture.

The system uses:

- A SQL Agent to securely verify users and retrieve interests
- A Search Agent to fetch real-time news
- A Reasoning LLM with structured outputs to filter, rank, and summarize news

Structured outputs ensure reliability, prevent hallucination,
and make agent decisions explainable and auditable.


# 2 Setting Up the LLM

## 2.1 Install the required libraries


In [1]:
!pip install -q langchain langchain-community langchain-groq duckduckgo-search pydantic duckduckgo-search ddgs

In [2]:
import langchain

print(langchain.__version__)

1.2.0


## 2.2 Load an LLM using Groq


In [3]:
from dotenv import load_dotenv
import os

load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

In [4]:
import os
from langchain_groq import ChatGroq

llm = ChatGroq(model="openai/gpt-oss-120b", temperature=0)

## 2.3 Check the LLM response on a simple query


In [5]:
llm.invoke("Explain Agentic AI in one sentence.")

AIMessage(content='Agentic AI refers to artificial intelligence systems designed to autonomously set, pursue, and adapt goals or actions—behaving as self‑directed agents rather than merely executing pre‑programmed instructions.', additional_kwargs={'reasoning_content': 'The user asks: "Explain Agentic AI in one sentence." Provide a concise definition. Should be one sentence. Provide clear explanation.'}, response_metadata={'token_usage': {'completion_tokens': 77, 'prompt_tokens': 79, 'total_tokens': 156, 'completion_time': 0.159590666, 'completion_tokens_details': {'reasoning_tokens': 28}, 'prompt_time': 0.003026, 'prompt_tokens_details': None, 'queue_time': 0.054469619, 'total_time': 0.162616666}, 'model_name': 'openai/gpt-oss-120b', 'system_fingerprint': 'fp_e10890e4b9', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b5fa2-df15-7603-9b6b-d21ba6ba6b7a-0', usage_metadata={'input_tokens': 79, 'output_tokens': 77, 'total_

# 3 SQL Agent for Data Retrieval

## 3.1 Initialize the SQL agent to fetch the data from the database


In [6]:
from langchain_community.utilities import SQLDatabase


db_path = "./data/customer.db"
db = SQLDatabase.from_uri(f"sqlite:///{db_path}")

print(f"Dialect: {db.dialect}")
print(f"Available tables: {db.get_usable_table_names()}")
print(f'Sample output: {db.run("SELECT * FROM customers LIMIT 5;")}')

Dialect: sqlite
Available tables: ['customers']
Sample output: [(1, 'F8641860-7', 'Kevin', 'kevin.f8641860-7@gmail.com', '["Politics", "Startups", "Travel"]', '2025-03-26 06:46:15'), (2, '203631A0-B', 'Ian', 'ian.203631a0-b@gmail.com', '["Startups", "Travel"]', '2025-03-26 06:46:15'), (3, 'D77D96F3-3', 'Julia', 'julia.d77d96f3-3@gmail.com', '["India", "Automobile", "Business"]', '2025-03-26 06:46:15'), (4, '6EB33C45-5', 'Alice', 'alice.6eb33c45-5@gmail.com', '["Politics", "Technology", "Business"]', '2025-03-26 06:46:15'), (5, 'EDD38E10-6', 'Oscar', 'oscar.edd38e10-6@gmail.com', '["Automobile", "India", "Sports"]', '2025-03-26 06:46:15')]


In [7]:
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain.agents import create_agent

toolkit = SQLDatabaseToolkit(db=db, llm=llm, top_k=3)
sql_agent = create_agent(
    llm,
    tools=toolkit.get_tools(),
)

response = sql_agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "find details of kevin",
            },
        ]
    }
)
response['messages'][-1].pretty_print()


Here are the details for **Kevin**:

| id | customer_id | name  | email                         | interests                                 | last_updated          |
|----|-------------|-------|-------------------------------|-------------------------------------------|-----------------------|
| 1  | F8641860-7  | Kevin | kevin.f8641860-7@gmail.com    | ["Politics", "Startups", "Travel"]        | 2025‑03‑26 06:46:15   |

Kevin’s interests are Politics, Startups, and Travel.


## 3.2 Initialize the system message


In [8]:
system_message = """
You are a news article finder based on customer interests.
"""

## 3.3 Verify the customer's email and retrieve their details


In [9]:
def find_customer_interests(user_query: str) -> str:
    """Find customer interests from the database based on user query."""

    sql_agent_prompt = """
    You are a secure SQL agent that interacts with a customer database.

    Your task is to:
    - Find the customer interests based on the user query.
    - Only return the interests as a comma separated list.
    """

    sql_agent = create_agent(
        llm,
        tools=toolkit.get_tools(),
        system_prompt=sql_agent_prompt,
    )

    results = sql_agent.invoke(
        {
            "messages": [
                {
                    "role": "user",
                    "content": f"Find the interests of the customer related to: {user_query}",
                },
            ]
        }
    )

    customer_interests = results['messages'][-1].content
    return customer_interests

customer_interests = find_customer_interests("latest news for kevin")
customer_interests

'Politics, Startups, Travel'

# 4 Interface between SQL and Search Agents

## 4.1 Create an expanded search query to search for precise news and the latest information


In [10]:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field

def generate_search_query(customer_interests: str) -> dict:
    class SearchQuery(BaseModel):
        query: str = Field(
            description="A precise single topic search query for duckduckgo search"
        )

    class Queries(BaseModel):
        queries: list[SearchQuery] = Field(
            description="A list of search queries"
        )

    parser = PydanticOutputParser(pydantic_object=Queries)

    prompt = PromptTemplate(
        template="""
        You are a search query generator based on customer interests.
        Your task is to:
        - Expand the customer's interests into a precise search query.
        - Ensure the query captures the essence of the interests for effective news retrieval.

        Customer Interests: {customer_interests}

        {format_instructions}
        """,
        input_variables=["customer_interests"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )

    chain = prompt | llm | parser

    search_query = chain.invoke(
        {
            "customer_interests": customer_interests,
        }
    )

    return search_query

search_query = generate_search_query(customer_interests)
search_query.queries

[SearchQuery(query='latest political news and analysis'),
 SearchQuery(query='startup ecosystem trends and funding news'),
 SearchQuery(query='travel news updates including restrictions and destination guides')]

## 4.2 Fetch the news results using DuckDuckGo


In [11]:
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langchain_community.tools import DuckDuckGoSearchResults


def perform_news_search(queries: list[str]) -> list:
    search = DuckDuckGoSearchResults(
        api_wrapper=DuckDuckGoSearchAPIWrapper(time="y", max_results=4),
        source="news",
        output_format="list",
    )

    searched_articles = []
    for query in queries:
        searched_articles.extend(search.invoke(query))
    return searched_articles

searched_articles = perform_news_search([q.query for q in search_query.queries])
searched_articles

[{'snippet': 'Home World- news Latest Political News Today: Key Updates and Analysis from Around the World ... and analysis on the latest political developments, ...',
  'title': 'Latest Political News Today: Key Updates and Analysis from',
  'link': 'https://lafoxmedia.com/latest-political-news-today-key-updates-and-analysis-from-around-the-world/'},
 {'snippet': '... The Latest Political News and Analysis ... Posted in Service Tagged parliamentary elections Stay Updated: The Latest Political News and Analysis',
  'title': 'Stay Updated: The Latest Political News and Analysis –',
  'link': 'https://www.satanic-kindred.org/stay-updated-the-latest-political-news-and-analysis/'},
 {'snippet': 'Stick to TAG24 NEWS for all the most up-to-date reports on politics , including detailed analysis on US foreign and domestic policy, social movements ...',
  'title': 'Politics news: All news on politics and politicians in the US',
  'link': 'https://www.tag24.com/politics'},
 {'snippet': 'Vadra ca

## 4.3 Filter the relevant and trustworthy news URLs based on the users' interests


In [12]:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field

def filter_relevant_articles(customer_interests: str, searched_articles: list, top: int=3) -> dict:
    class NewsArticle(BaseModel):
        title: str = Field(description="The title of the news article")
        link: str = Field(description="The URL link to the news article")
        snippet: str = Field(description="A snippet of the news article")


    class FilteredNewsArticles(BaseModel):
        articles: list[NewsArticle] = Field(
            description="A list of news articles that are relevant to the user's interests"
        )


    parser = PydanticOutputParser(pydantic_object=FilteredNewsArticles)

    prompt = PromptTemplate(
        template="""
        You are filtering news articles based on relevance to the customer interests.
        - Only keep articles that are highly relevant to the customer interests.
        - Discard articles that are off-topic or irrelevant.

        - Only keep the top {top} most relevant articles.

        News Articles:
        {articles}

        User Query:
        {customer_interests}


        {format_instructions}
        """,
        input_variables=["articles", "customer_interests", "top"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )

    chain = prompt | llm | parser

    results = chain.invoke(
        {
            "articles": searched_articles,
            "customer_interests": customer_interests,
            "top": top,
        }
    )

    return results

filtered_articles = filter_relevant_articles(customer_interests, searched_articles, top=3)
filtered_articles

FilteredNewsArticles(articles=[NewsArticle(title='Politics news: All news on politics and politicians in the US', link='https://www.tag24.com/politics', snippet='Stick to TAG24 NEWS for all the most up-to-date reports on politics , including detailed analysis on US foreign and domestic policy, social movements ...'), NewsArticle(title='U.S. Startup Ecosystem 2025 – Funding & Innovation Insights', link='https://beststartup.us/u-s-startup-ecosystem-2025-funding-insights/', snippet='4 days ago · The U.S. startup ecosystem 2025 saw transformative growth across AI, fintech, quantum, crypto, chip breakthroughs, and Space Cloud innovations. Startups in Silicon Valley, Austin, New York, Boston, and Seattle captured investor attention through billion-dollar funding rounds and strategic partnerships with Amazon, Microsoft, Google, NVIDIA, Starlink, and Space Cloud. This month-wise recap ...'), NewsArticle(title='Travel News - International Travel Info & World Events', link='https://www.traveland

# 5. Output from LLMs

- Use the LLM to
  - Retrieve the final URL(s) for the latest news based on the customer’s interest
  - Create a summary of the retrieved links


In [13]:
from langchain_core.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from pydantic import BaseModel, Field


def final_news_summary(customer_interests: str, filtered_articles: list) -> dict:
    class FinalNewsSummary(BaseModel):
        title: str = Field(description="The title based on customer interests and relevant news articles")
        summary: str = Field(description="A brief summary of selected news articles relevant to customer interests")
        links: list[str] = Field(description="A list of URL links to the relevant news articles")

    parser = PydanticOutputParser(pydantic_object=FinalNewsSummary)

    prompt = PromptTemplate(
        template="""
        You are generating a final news summary based on customer interests and relevant news articles.
        - Create a concise and informative title that reflects the customer's interests and the content of the relevant articles.
        - Summarize the key points from the selected news articles that are most relevant to the customer's interests.
        - Provide a list of URL links to the relevant news articles.

        News Articles:
        {articles}

        Customer Interests:
        {customer_interests}


        {format_instructions}
        """,
        input_variables=["articles", "customer_interests"],
        partial_variables={"format_instructions": parser.get_format_instructions()},
    )

    chain = prompt | llm | parser

    results = chain.invoke(
        {
            "articles": filtered_articles,
            "customer_interests": customer_interests,
        }
    )

    return results

final_summary = final_news_summary(customer_interests, filtered_articles)
final_summary

FinalNewsSummary(title='Latest Highlights in US Politics, Startup Funding, and Travel Advisories', summary='Stay informed with the newest developments across your interests: a comprehensive roundup of US political news covering foreign and domestic policy and social movements; an in‑depth look at the 2025 US startup ecosystem, showcasing rapid growth in AI, fintech, quantum, crypto, chip and space‑cloud sectors, major funding rounds, and partnerships with tech giants; and essential travel updates, including current advisories affecting travelers from the US, Canada, Australia, and China to a popular destination.', links=['https://www.tag24.com/politics', 'https://beststartup.us/u-s-startup-ecosystem-2025-funding-insights/', 'https://www.travelandleisure.com/travel-news'])

# 6. Querying with the Agent


## 6.1 Create a LangGraph for the Final Agent Using a Sequential Execution Plan

In [20]:
from typing import TypedDict, List, Optional

class NewsState(TypedDict):
    query: str
    top: int

    customer_interests: Optional[dict]
    search_queries: Optional[list]
    searched_articles: Optional[list]
    filtered_articles: Optional[list]
    final_summary: Optional[str]

def get_customer_interests(state: NewsState):
    interests = find_customer_interests(state["query"])
    return {"customer_interests": interests}


def build_search_queries(state: NewsState):
    search_query = generate_search_query(state["customer_interests"])
    queries = [q.query for q in search_query.queries]
    return {"search_queries": queries}


def search_news(state: NewsState):
    articles = perform_news_search(state["search_queries"])
    return {"searched_articles": articles}


def filter_articles(state: NewsState):
    filtered = filter_relevant_articles(
        state["customer_interests"],
        state["searched_articles"],
        top=state["top"]
    )
    return {"filtered_articles": filtered}


def summarize_news(state: NewsState):
    summary = final_news_summary(
        state["customer_interests"],
        state["filtered_articles"]
    )
    return {"final_summary": summary}

from langgraph.graph import StateGraph, END

graph = StateGraph(NewsState)

graph.add_node("get_customer_interests", get_customer_interests)
graph.add_node("build_search_queries", build_search_queries)
graph.add_node("search_news", search_news)
graph.add_node("filter_articles", filter_articles)
graph.add_node("summarize_news", summarize_news)

graph.set_entry_point("get_customer_interests")

graph.add_edge("get_customer_interests", "build_search_queries")
graph.add_edge("build_search_queries", "search_news")
graph.add_edge("search_news", "filter_articles")
graph.add_edge("filter_articles", "summarize_news")
graph.add_edge("summarize_news", END)

news_graph = graph.compile()

## 6.2 Provide any 3 queries to the agentic AI system

- Retrieve the top 3 latest news based on the customer’s interest
- Generate a summary of each result

In [34]:
def print_result(result):
    print("Title:", result["final_summary"].title + "\n")
    print("Summary:", result["final_summary"].summary + "\n")
    print("Links:")
    for link in result["final_summary"].links:
        print(link)

In [35]:
result = news_graph.invoke({
    "query": "find news for kevin",
    "top": 3
})

print_result(result)

Title: Latest Politics Headlines, Startup Funding Roundup, and Travel Tech Trends

Summary: Today's briefing covers three areas of interest: (1) CBS Politics delivers the newest political headlines, election updates, and policy news. (2) TechStartups reports a major funding surge, highlighting that startups secured a combined $405 million in capital during the week ending August 22, 2025. (3) NewsonFloor highlights travel and tourism innovations, noting the rise of AI-driven tools that personalize destination suggestions, itineraries, and travel optimization.

Links:
https://www.cbsnews.com/politics/
https://techstartups.com/2025/08/22/top-startup-and-tech-funding-news-roundup-week-ending-august-22-2025/
https://newsonfloor.com/travel-tourism


In [36]:
result = news_graph.invoke({
    "query": "find news for Ian",
    "top": 3
})

print_result(result)

Title: 2025 Startup Funding Surge, Travel Tech Innovations, and Autonomous Vehicle Advances

Summary: Global startup funding hit $91 billion in Q2 2025, up 11% year‑over‑year but down 20% from the previous quarter, highlighting a robust yet volatile investment climate. A wave of travel‑tech startups is reshaping how travelers book and experience trips, driving sustainability and personalization in the industry. Meanwhile, autonomous‑vehicle leaders such as Waymo are expanding operations, with Waymo set to launch a fully autonomous ride‑hailing service in Denver, underscoring rapid progress in self‑driving technology.

Links:
https://news.crunchbase.com/venture/state-of-startups-q2-h1-2025-ai-ma-charts-data/
https://www.forbes.com/sites/dariashunina/2025/01/31/how-tech-startups-are-changing-travel-for-good/
https://www.autoconnectedcar.com/2025/09/autonomous-self-driving-vehicle-news-qualcomm-bmw-waymo-carteav-aeye-flasheye-pony-ai-mowasalat-valeo-momenta-nhtsa/


In [37]:
result = news_graph.invoke({
    "query": "find news for Oscar",
    "top": 3
})

print_result(result)

Title: India Motorsports and Automobile News Highlights

Summary: Recent coverage in India features the latest motorsport action—including Formula 1, MotoGP and other racing results—alongside broader automobile updates such as new car launches, EV developments, and sports car news.

Links:
https://www.autocarindia.com/motor-sports-news
https://timesofindia.indiatimes.com/auto/motorsports
https://auto.timesofindia.com/news
