<center><p float="center">
  <img src="https://upload.wikimedia.org/wikipedia/commons/e/e9/4_RGB_McCombs_School_Brand_Branded.png" width="300" height="100"/>
  <img src="https://mma.prnewswire.com/media/1458111/Great_Learning_Logo.jpg?p=facebook" width="200" height="100"/>
</p></center>

<center><font size=10>Generative AI for Business Applications</center></font>
<center><font size=6>Agentic AI Workflows - Week 2</center></font>

<center><p float="center">
  <img src="https://images.pexels.com/photos/17706646/pexels-photo-17706646.jpeg" width=720></a>
<center><font size=6>NewsFindr: AI-Powered Personalized News Discovery Agent</center></font>

#**Problem Statement**

## Business Context

In today’s fast-paced digital environment, users are inundated with endless streams of news from traditional feeds and search engines. This often results in information overload, difficulty in identifying trustworthy sources, and inefficiencies in discovering relevant updates. To address this challenge, NewsFindr is building an AI-powered news retrieval agent that leverages Agentic AI to deliver personalized, credible, and real-time news recommendations. By applying a structured multi-step approach, the system ensures secure, fair, and explainable recommendations that improve user experience, engagement, and trust in news discovery.

## Objective

The goal is to develop a prototype that demonstrates how Agentic AI can transform the way users access timely news while ensuring credibility and reducing misinformation.

Specifically, the system aims to:

* Provide real-time, personalized news retrieval tailored to user interests.
* Ensure accuracy and credibility by sourcing from trusted platforms and minimizing misinformation.
* Improve user engagement through seamless content discovery and reduced information overload.
* Streamline news consumption by eliminating outdated and irrelevant content, offering a refined reading experience.

This case study focuses on building a small part of that broader solution, a system that expands user interests into search queries, retrieves credible news links, and generates concise, trustworthy summaries. By successfully implementing this approach, NewsFindr seeks to redefine news discovery, enhance user satisfaction, and improve access to reliable information in an era of rapid content generation.


## Data Description


The database consists of the following columns:

* **Customer ID** - A unique identifier assigned to each customer.
* **Name** - The customer’s full name.
* **Email** - The customer’s registered email address.
* **Interests** - Key topics or domains the customer is interested in (e.g., startups, travel).
* **Last Updated** - Timestamp indicating the most recent update of the customer’s information.


#**Installing and Importing Necessary Libraries**

In [None]:
!pip install -q openai==1.93.0 \
             langchain==0.3.26 \
             langchain-openai==0.3.27 \
             langchainhub==0.1.21 \
             langchain-experimental==0.3.4 \
             pandas==2.2.2 \
             numpy==2.0.2 \
             ddgs==9.5.5


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/755.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m755.0/755.0 kB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.0 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m58.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.4/70.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m88.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.3/5.3 MB[0m [31m87.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3

**Note:**

- After running the above cell, kindly restart the runtime (for Google Colab) or notebook kernel (for Jupyter Notebook), and run all cells sequentially from the next cell.
- On executing the above line of code, you might see a warning regarding package dependencies. This error message can be ignored as the above code ensures that all necessary libraries and their dependencies are maintained to successfully execute the code in this notebook.

In [1]:
# Import standard libraries for JSON handling, OS operations, and data manipulation
import json
import os
import pandas as pd

# Import LangChain tools for creating agents, initializing tools, and defining agent types
from langchain.agents import create_sql_agent, initialize_agent, Tool, AgentType
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage

# Import SQLite support for database operations
import sqlite3
from langchain.sql_database import SQLDatabase

# Import DDGS for DuckDuckGo search queries
from ddgs import DDGS


# **Loading and Setting Up the LLM**

In [2]:
# Load the JSON file and extract values
file_name = 'config.json'
with open(file_name, 'r') as file:
    config = json.load(file)
    OPENAI_API_KEY = config.get("OPENAI_API_KEY") # Loading the API Key
    OPENAI_API_BASE = config.get("OPENAI_API_BASE") # Loading the API Base Url


# Storing API credentials in environment variables
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY
os.environ["OPENAI_BASE_URL"] = OPENAI_API_BASE

In [3]:
# Initialise the LLM
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

  llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)


#**SQL Agent for Data Retrieval**

In [4]:
# Connect to the customer SQLite database using LangChain's SQLDatabase wrapper
customer_db = SQLDatabase.from_uri("sqlite:////content/customer.db")

In [5]:
# Initialize a SQL agent to interact with the customer database using the LLM
sqlite_agent = create_sql_agent(
    llm,
    db=customer_db,
    agent_type="openai-tools",
    verbose=False
)


## Testing SQL Agent on some prompts

In [6]:
# Define the user's email and create a query to fetch all records for that email
email = "ian.203631a0-b@gmail.com"
query = f"Fetch all records with email_id {email}"

# Execute the query using the SQL agent and store the output
output = sqlite_agent.invoke(query)

# Display the retrieved records
output


{'input': 'Fetch all records with email_id ian.203631a0-b@gmail.com',
 'output': 'Here is the record for the email id `ian.203631a0-b@gmail.com`:\n\n- **Customer ID:** 203631A0-B\n- **Name:** Ian\n- **Email:** ian.203631a0-b@gmail.com\n- **Interests:** Startups, Travel\n- **Last Updated:** 2025-03-26 06:46:15'}

In [7]:
query= f"Fetch all the unique email_id"

output=sqlite_agent.invoke(query)

output

{'input': 'Fetch all the unique email_id',
 'output': 'Here are some unique email IDs from the database:\n\n1. alice.6eb33c45-5@gmail.com\n2. alice.d56e53b6-1@gmail.com\n3. emma.a88fec03-c@gmail.com\n4. george.5483cb53-8@gmail.com\n5. hannah.4770b814-2@gmail.com\n6. ian.203631a0-b@gmail.com\n7. ian.aee37571-7@gmail.com\n8. julia.b5a14512-c@gmail.com\n9. julia.cf4d1edb-9@gmail.com\n10. julia.d77d96f3-3@gmail.com\n\nIf you need more information or additional examples, let me know!'}

In [8]:
email="ian.203631a0-b@gmail.com"

query= f"Fetch all the Interests with email_id {email} in a list"

output=sqlite_agent.invoke(query)

output

{'input': 'Fetch all the Interests with email_id ian.203631a0-b@gmail.com in a list',
 'output': 'The interests for the email id ian.203631a0-b@gmail.com are: Startups, Travel.'}

#**Tool: Expanded Search Queries**

The `expand_search_queries` function creates up-to-date queries from user interests (`state.interests`). Using an LLM with a system prompt focused on current events, it generates precise queries for breaking news and trending topics, storing them in `expanded_search` for later article retrieval.


In [9]:
# Function to expand user interests into current, time-sensitive news search queries
def expand_search_queries(interests: str) -> str:

    # Generate relevant and up-to-date search queries for each interest.
    system_prompt = """You are an expert news analyst specializing in crafting precise search queries
    to retrieve the most recent, up-to-date information. Focus only on breaking news, trending topics,
    or recent developments without specifying any year."""

    expanded_queries = []
    for interest in interests.split(", "):
        prompt = f"Generate one search query related to: '{interest}'"
        response = llm.predict_messages(
            [
                SystemMessage(content=system_prompt),
                HumanMessage(content=prompt)
            ]
        )
        query = response.content.strip()
        if query:
            expanded_queries.append(query)

    return "\n".join(expanded_queries)

# Wrap the function as a LangChain tool
expand_tool = Tool(
    name="ExpandSearchQueries",
    func=expand_search_queries,
    description="Generates precise, time-sensitive news search queries from user interests."
)


#**Tool: Fetch News Results Using DuckDuckGo**



This function retrieves news results from DuckDuckGo based on the user's expanded search queries.

In [10]:
# Function to search DuckDuckGo for recent news based on a query
def ddg_search(query: str) -> str:
    """
    Perform a DuckDuckGo search and return top 5 results with title, URL, and snippet.
    """
    results = []
    with DDGS() as ddgs:
        for r in ddgs.text(query, max_results=5):
            title = r.get("title", "")
            url = r.get("href", "")
            body = r.get("body", "")
            res = {"title": title, "url": url, "body": body}
            results.append(res)
    return results

# Wrap the search function as a LangChain tool
ddg_search_tool = Tool(
    name="DuckDuckGoSearch",
    func=ddg_search,
    description="Searches DuckDuckGo for recent news and returns top 5 results with URLs."
)


#**Tool: Filter URLs**


This function filters search result URLs to keep only those relevant to the user's interests.

Extracts URLs and their body content from the search results.

Updates the state with the refined list of relevant URLs and their descriptions.

In [11]:
# Function to filter news results by credibility using an LLM
def filter_with_llm(search_results: list) -> list:
    """
    Analyze a list of news results and return only credible sources.
    Ensures diversity by including at least two links per topic (if available).
    Each returned item includes the URL and its snippet/body.
    """
    # Convert the list of results to JSON for LLM input
    search_results = json.dumps(search_results, indent=2)

    # System prompt instructing the LLM on how to evaluate credibility
    system_prompt = """You are a news credibility analyst. Review the provided news URLs and evaluate
    their trustworthiness. Select and return only credible sources, ensuring diversity
    by including at least two different links for each topic (if available).
    For each credible link, provide the URL along with its body."""

    # Combine prompt and data for LLM
    prompt = "Evaluate the following URLs for credibility:\n" + "\n".join(search_results)

    # Get LLM response
    response = llm.predict_messages(
        [
            SystemMessage(content=system_prompt),
            HumanMessage(content=prompt)
        ]
    )

    return response.content

# Wrap the credibility function as a LangChain tool
credibility_tool = Tool(
    name="CredibilityFilter",
    func=filter_with_llm,
    description="Filters search results by evaluating URL credibility using LLM."
)

#**Tool: Generate Summary for the URLs**

This function generates summaries for the credible URLs

Returns the URLs and the Summary based on the users interest

In [12]:
# Function to summarize news articles from a list of URLs using an LLM
def summarize_news(url_list) -> str:
    """
    Generate high-level summaries of news sources given their URLs.
    Uses the LLM to infer main topics and general coverage for each link.
    """

    system_prompt = """You are a professional news summarizer.
    Summarize the following news sources, focusing on the general news coverage
    and main topics likely reported by each source with their links."""

    prompt = "Summarize the following news \n\n" + "\n\n".join(url_list)

    response = llm.predict_messages(
        [
            SystemMessage(content=system_prompt),
            HumanMessage(content=prompt)
        ]
    )

    return "Final Answer: " + response.content

# Wrap summarization function as a LangChain tool
summarize_tool = Tool(
    name="SummarizeNews",
    func=summarize_news,
    description="Generates summaries from news URLs with links of the news"
)


#**Building the Agent Using Tools**

In [13]:
def agent_call(tools_list, user_email, verbose=True):
    # Combine all custom tools and initialize a zero-shot agent with LLM
    tools = [expand_tool, ddg_search_tool, credibility_tool, summarize_tool]

    agent = initialize_agent(
        tools=tools_list,  # Tools the agent can use
        llm=llm,      # Language model for reasoning and tool usage
        agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,  # Uses reasoning + tool descriptions
        verbose=verbose
    )

    interest_result = sqlite_agent.run(f"Fetch all the Interests with email_id {user_email} in a list")

    agent_prompt = f"""
    The user's interest is: {interest_result}
    1. Expand the user's interest into news search queries using 'ExpandSearchQueries'.
    2. Search for recent news on all queries using 'DuckDuckGoSearch'.
    3. Combine results from all queries and filter them with 'CredibilityFilter'.
    4. Summarize all credible results using 'SummarizeNews'.
    5. Provide the final summary to the user.
    """

    # Execute the agent workflow and print the final summarized news
    response = agent.run(agent_prompt)

    return response

In [14]:
tools_list = [expand_tool, ddg_search_tool, credibility_tool, summarize_tool]
user_email = "ian.203631a0-b@gmail.com"

response = agent_call(tools_list, user_email)
print("\n======= FINAL RESPONSE =======")
print(response)

  agent = initialize_agent(
  interest_result = sqlite_agent.run(f"Fetch all the Interests with email_id {user_email} in a list")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to expand the user's interests into specific news search queries related to startups and travel. After that, I will search for recent news articles based on those queries, filter the results for credibility, and summarize the credible articles.

Action: ExpandSearchQueries  
Action Input: "Startups, Travel"  [0m

  response = llm.predict_messages(



Observation: [36;1m[1;3m"latest startup funding news October 2023"
"latest travel restrictions updates October 2023"[0m
Thought:[32;1m[1;3mI have expanded the user's interests into specific news search queries related to startups and travel. The queries generated are focused on the latest startup funding news and travel restrictions updates for October 2023.

Action: DuckDuckGoSearch  
Action Input: "latest startup funding news October 2023"  [0m
Observation: [33;1m[1;3m[{'title': 'Breaking News, Latest News and Videos | CNN', 'url': 'https://www.cnn.com/', 'body': 'View the latest news and breaking news today for U.S., world, weather, entertainment, politics and health at CNN.com.'}, {'title': 'Fox News - Breaking News Updates | Latest News Headlines ...', 'url': 'https://www.foxnews.com/?msockid=138bf22ec101636e37eee44cc08e628d', 'body': 'Latest Current News: U.S., World, Entertainment, Health, Business, Technology, Politics, Sports.'}, {'title': 'NBC News - Breaking Headlin

In [15]:
tools_list = [expand_tool, ddg_search_tool, credibility_tool, summarize_tool]

user_email = "alice.6eb33c45-5@gmail.com"

response = agent_call(tools_list, user_email,verbose=False)
print("\n======= FINAL RESPONSE =======")
print(response)


Here’s a summary of the general news coverage and main topics likely reported by each of the provided sources:

1. **AP News (Politics)**: 
   - Focuses on national and international political news, including elections, legislative developments, and significant political events. Coverage often includes analysis of political trends and implications for governance.
   - Link: [AP News Politics](https://apnews.com/politics)

2. **CNN**: 
   - Offers a wide range of news, including breaking news, politics, health, and technology. CNN is known for its in-depth reporting and analysis of current events, often featuring interviews with key figures and experts.
   - Link: [CNN](https://www.cnn.com)

3. **Washington Post**: 
   - Provides comprehensive coverage of U.S. and global news, with a strong emphasis on political reporting, investigative journalism, and opinion pieces. The Washington Post often delves into the implications of political decisions and societal issues.
   - Link: [Washingt

In [16]:
tools_list = [expand_tool, ddg_search_tool, credibility_tool, summarize_tool]

user_email = "nora.c63fa237-e@gmail.com"

response = agent_call(tools_list, user_email,verbose=False)
print("\n======= FINAL RESPONSE =======")
print(response)


Here’s a summary of the general news coverage and main topics likely reported by each of the specified sources:

1. **CNN** (https://www.cnn.com/)
   - **General Coverage**: CNN provides comprehensive news coverage, including breaking news, politics, world events, and health.
   - **Main Topics**: Current political developments, international relations, social issues, and major events in the U.S. and globally. They often feature in-depth analysis and opinion pieces.

2. **NBC News** (https://www.nbcnews.com/)
   - **General Coverage**: NBC News covers a wide range of topics including national and international news, politics, and human interest stories.
   - **Main Topics**: Focus on major political events, social justice issues, health updates, and significant cultural stories. They also provide investigative journalism and feature segments.

3. **Associated Press (AP) News** (https://apnews.com/)
   - **General Coverage**: AP News is known for its straightforward reporting and is a 

# **Conclusion**

- NewsFindr leverages Agentic AI to deliver personalized, real-time news updates tailored to individual user interests.
- The system ensures accuracy and credibility by filtering content from trusted sources, reducing misinformation.
- Automated query expansion, search, and summarization help users access relevant information efficiently, minimizing information overload.
- Users benefit from a refined news consumption experience, focusing only on timely and meaningful developments.
- Overall, NewsFindr demonstrates how AI can enhance engagement, improve productivity, and streamline the discovery of important news.
