# **NewsFinder: Advanced GenAI**

**Problem Statement**

NewsFindr is redefining news discovery by delivering real-time news updates tailored to user interests. Traditional search methods and generic news feeds often lead to information overload and inefficiencies, making it challenging for users to access relevant and trustworthy content efficiently.

To address this, NewsFindr wants to leverage Agentic AI to build an AI-powered news retrieval agent that ensures accuracy and credibility. By utilizing a structured, multi-step approach, the system will provide secure, fair, and explainable recommendations - enhancing user engagement, optimizing content discovery, and improving access to timely and relevant news.



**Objective**

Provide real-time, personalized news retrieval to help users discover relevant content effortlessly.
Ensure accuracy and credibility by sourcing news from trusted platforms and minimizing misinformation.
Improve user engagement through seamless content discovery, reducing information overload.
Streamline the news consumption process by eliminating outdated and irrelevant content, providing a refined reading experience.

# Mounting Google drive

In [21]:
from google.colab import drive
drive.mount('/content/drive/')

%cd "/content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main/"

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).
/content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main


In [22]:
!ls

Database  main.py  NewsFinder.py  NewsFinder_version_1_0.ipynb	__pycache__


# PACKAGE INSTALLATION

In [5]:
!pip install groq
!pip install python-dotenv
!pip install langchain
!pip install langchain-groq
!pip install sqlalchemy
!pip install sqlite3
!pip install langchain_community
!pip install ddgs

[31mERROR: Could not find a version that satisfies the requirement sqlite3 (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for sqlite3[0m[31m


# Solution

        --> class NewsFinder.py
            --> def __init__(self,base_path,groq_api)
                * Assigning the GroqAPI key, Model name
                * Load the DB from the base location

            --> def Load_LLM(self,userQuery):
                * Loading the LLM llama-3.3-70b-versatile and testing the LLm with some test query
                
            --> def SQLAgents(self):
                * Initalize the SQL Agents

            --> def Verify_Customer_Email(self,emailID):
                * If the input user's mail id is present then it will display the interests of the customer in the db

            --> def Search_News(self,interests,no_of_urls):
                * if the mail id of the customer is passed it will get the URLs for the customer's interests

            --> def Fetch_News_URLs(self, emailIDs,no_of_urls):
                * Invoking the function Verify_Customer_Email and Fetch_News_URLs function

            --> def Get_News_Summary(self, urls, interest):
                * this function will summarise the the news from the Customers Interest URLs

            --> def Generate_News_Summaries(self, news_urls):
                * Invoking the verify email and Get News summary function


In [36]:
%%writefile NewsFinder.py
import os
import traceback
import sqlite3
import warnings
import json
import re
from groq import Groq
from langchain_groq.chat_models import ChatGroq
from langchain_community.agent_toolkits.sql.toolkit import SQLDatabaseToolkit
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits.sql.base import create_sql_agent
from langchain_community.tools import DuckDuckGoSearchRun


warnings.filterwarnings("ignore",category=Warning,module="langchain.*")

class NewsFinder:
  def __init__(self,base_path,groq_api):
    self.agent = None
    self.grok_api_key = groq_api
    self.base_path = base_path
    self.sql_DB = SQLDatabase.from_uri(f"sqlite:///{os.path.join(self.base_path,"Database","customer.db")}")

    self.SystemMessage = "You are a helpful assistant to assistant designed to respond to user queries with accurate and concise information"
    self.model = "llama-3.3-70b-versatile"
    #self.model = 'llama-3.1-8b-instant'

#model="llama-3.3-70b-versatile",
  def Load_LLM(self,userQuery):
    try:
      client = Groq(api_key=self.grok_api_key)
      completion = client.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": self.SystemMessage},
            {"role": "user","content": userQuery}

        ],
        temperature = 0.1,
        max_completion_tokens=None,
        top_p=1,
        stream=True,
        stop=None )

      full_response = ""
      for chunk in completion:
        content = chunk.choices[0].delta.content or ""
        full_response += content
        print(content, end ="")
      print(f"\n Complete LLM response: {full_response}")
      return full_response
    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())

  def SQLAgents(self):
    try:

      llm_Obj = ChatGroq(api_key=self.grok_api_key,
                         model_name = self.model)

      toolkit = SQLDatabaseToolkit(db=self.sql_DB, llm=llm_Obj)

      self.agent = create_sql_agent(llm= llm_Obj,
                                    toolkit=toolkit,
                                    verbose=True)
      return self.agent

    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())

  def Get_Customer_Details(self,emailID):
    try:
      if emailID is None or not emailID.strip():
          raise ValueError("Email ID is not provided")
      if self.agent is None:
        self.SQLAgents()

      email_list = [email.strip() for email in emailID.split(',')]

      emailID_list = ", ".join(f" '{email}' " for email in email_list)
      print(emailID_list)
      query = f"SELECT * FROM customers  WHERE email IN ({emailID_list})"
      result = self.agent.run(query)
      if result is "I don't know":
        result = f"Email ID {emailID} not found in DB"
      print(f"Raw Result: {result}")
      return result
    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())

  def Verify_Customer_Email(self,emailID):
    try:
      if emailID is None or not emailID.strip():
          raise ValueError("Email ID is not provided")
      if self.agent is None:
        self.SQLAgents()

      email_list = [email.strip() for email in emailID.split(',')]

      emailID_list = ", ".join(f" '{email}' " for email in email_list)
      print(emailID_list)
      query = f"SELECT interests FROM customers WHERE email IN ({emailID_list})"
      result = self.agent.run(query)

      print(f"Raw Result: {result}")
      print(type(result))
      interests = {}
      if isinstance(result,str):
        try:
          result_patters = re.findall(r'\["[^"\]]*(?:"[^"\]]*"[^"\]]*)*"\]', result)
          for indx, interest_string in enumerate(result_patters):
            if indx < len(email_list):
              cleaned_strings = interest_string.strip('[]')
              items = re.findall(r'"([^"]*)"',cleaned_strings)
              interests[email_list[indx]] = items
        except Exception as ex:
          print(f"Exception: {ex}")
          print(traceback.print_exc())

      return interests

    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())
      return {}

  def Search_News(self,interests,no_of_urls):
    try:
      search_tool = DuckDuckGoSearchRun()

      filtered_URLS = {}

      for email,interest_list in interests.items():
        for indvidual_interest in interest_list:
          expand_query = f"latest news article on {indvidual_interest} after:2025-01-01 site:bbc.com OR site:reuters.com OR site:nytimes.com OR site:apnews.com OR site:theguardian.com"

          search_results = search_tool.run(expand_query)
          #print(f"Search Results for {interest}")

          filter_query = f"From the following search search results, extract 3-5 trustworthy news URLs(not homepage) relevant to '{indvidual_interest}' published after December 31 2024. Priortize recent, credible source like BBC, Reuters, NYT, AP, Guardian. List only the URLs: \n {search_results}"

          query_results = self.Load_LLM(filter_query)

          #print(f"Query Results: {query_results}")
          urls = re.findall(r'https?://[^\s]+',query_results)

          urls = list(dict.fromkeys(url for url in urls if url.strip().startswith("http")))

          filtered_URLS[indvidual_interest] = urls[:no_of_urls]
      #print(f"Final Filtered URLs {filtered_URLS}")
      return filtered_URLS

    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())

  def Fetch_News_URLs(self, emailIDs,no_of_urls):
    try:
      results_intersets = self.Verify_Customer_Email(emailIDs)
      news_urls = self.Search_News(results_intersets,no_of_urls)
      print(f"Filtered news URLs for {emailIDs}:{news_urls}")
      return news_urls
    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())
      return {}

  def Get_News_Summary(self, urls, interest):
    try:
      summary = f"Relevant Latest news on {interest}"

      citation_id =1
      for indx,url in enumerate(urls[:4],1):
        content_query = f"Extract a concise summary (4-5 sentence) of the latest news from {url}. focus only on key facts and relevance to {interest}. Return the response in format \n\n\n Title:[title] \n\n Key Points:[points] \n\n Source: [source] \n\n Date: [date] \n\n Summary:[summary]\n\n URL:[url]"

        summary_content = self.Load_LLM(content_query)

        summary += f"\n\n{summary_content}\n\n"

        # print(f"summary for {interest} Item {indx} {summary_content}")

      return summary
    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())
      return f"No Summary Available for {interest}"

  def Generate_News_Summaries(self, news_urls):
    try:
      news_summary = {}
      for interest, urls in news_urls.items():
        news_summary[interest] = self.Get_News_Summary(urls,interest)
        #print(f"News Summary: {news_summary}")
        return news_summary
    except Exception as ex:
      print(f"Exception: {ex}")
      print(traceback.print_exc())


Overwriting NewsFinder.py


In [23]:
%%writefile main.py

import os
import warnings
os.environ["PYTHONWARNINGS"] = "ignore"
import sys
import argparse
from dotenv import load_dotenv

try:
  base_path = os.path.abspath((os.path.dirname(__file__)))
except:
  base_path = os.path.join(os.getcwd())

print(f"Base Path: {base_path}")
sys.path.append(base_path)
from NewsFinder import NewsFinder
parser = argparse.ArgumentParser(description="To run a specific job in pipeline")
parser.add_argument('--job', type=str, required=True,
                    choices=['test-llm','get-all-data','verify-email','get-urls','get-summary','test-agent'],
                    help='Jobs to execute')

args = parser.parse_args()

load_dotenv(dotenv_path=os.path.join(base_path,".env"))
groq_api = os.getenv('GROQ_API_KEY')
open_api = os.getenv('OPEN_API_KEY')
newFindr_Obj = NewsFinder(base_path,groq_api)

if groq_api is None:
  raise ValueError("GROQ_API_KEY not found in .env file")

if open_api is None:
  raise ValueError("OPEN_API_KEY not found in .env file")


if args.job == 'test-llm':
  userQuery = input("Enter your query:")
  newFindr_Obj.Load_LLM(userQuery)

elif args.job == 'get-all-data':
  emailID = input("Enter your email ID: ")
  result = newFindr_Obj.Get_Customer_Details(emailID)

elif args.job == 'verify-email':
  emailID = input("Enter your email ID: ")
  interset = newFindr_Obj.Verify_Customer_Email(emailID)
  print(f"interset: {interset}")

elif args.job == 'get-urls':
  newFindr_Obj.SQLAgents()
  emailID = input("Enter your Email ID")
  no_of_urls = int(input("Enter the number of URLs required"))
  news_urls = newFindr_Obj.Fetch_News_URLs(emailID,no_of_urls)
  print(f"News URLs: {news_urls}")

elif args.job == 'get-summary':
  newFindr_Obj.SQLAgents()
  emailID = input("Enter your Email ID")
  no_of_urls = int(input("Enter the number of URLs required"))
  news_urls = newFindr_Obj.Fetch_News_URLs(emailID,no_of_urls)
  news_summary = newFindr_Obj.Generate_News_Summaries(news_urls)
  for interest, summary in news_summary.items():
    print('-'*50)
    print(f"Summary for {interest}:\n\n {summary}")

elif args.job == 'test-agent':
  newFindr_Obj.SQLAgents()
  query = input(f"Enter the query:")
  print(f"Query: {query}")
  result = newFindr_Obj.agent.run(query)
  print(f"Result: {result}")
  print('-'*50)







Overwriting main.py


# Question ##01 Load LLM and test with simple query

In [66]:
!python main.py --job test-llm

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your query:What is RAG in GenAI ? list out all its types
In GenAI (General Artificial Intelligence), RAG stands for "Retrieval-Augmented Generation". It's a technique used in natural language processing (NLP) and language models to improve the accuracy and relevance of generated text.

RAG involves two main components:

1. **Retrieval**: This component is responsible for searching and retrieving relevant information from a large database or knowledge base.
2. **Generation**: This component uses the retrieved information to generate text that is contextually relevant and accurate.

There are several types of RAG models, including:

1. **RAG-T5**: This is a type of RAG model that uses the T5 (Text-to-Text Transfer Transformer) architecture. It's a popular model that has achieved state-of-the-art results in various NLP tasks.
2. **RAG-Seq2Seq**: This type of RAG model uses a sequence-t

# Question Verify the customer's email and retrieve their details

## for valid email

In [33]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-all-data

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your email ID: george.5483cb53-8@gmail.com
 'george.5483cb53-8@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mI have the list of tables in the database, which is just "customers". I should now query the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	["Politics", "Startups", "Travel"]	202

## For Invalid Email

In [37]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-all-data

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
  if result is "I don't know":
Enter your email ID: karthik@abc.com
 'karthik@abc.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mI have the list of tables in the database, and I see that the "customers" table is present. Now, I should check the schema of the "customers" table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.c

# To get the user Interest for single Email ID

In [34]:
!export PYTHONWARNINGS="ignore"
!python main.py --job verify-email

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your email ID: ian.203631a0-b@gmail.com
 'ian.203631a0-b@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mI have the list of tables in the database, and I see that the "customers" table is one of them. Now, I should check the schema of the "customers" table to see what columns it has.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	["Politics", 

# To get the user Interest for Multiple Email ID

In [68]:
!export PYTHONWARNINGS="ignore"
!python main.py --job verify-email

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your email ID: ian.203631a0-b@gmail.com,julia.d77d96f3-3@gmail.com
 'ian.203631a0-b@gmail.com' ,  'julia.d77d96f3-3@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mSince the observation only returned one table, "customers", I will query the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	[

# Interface between SQL and Search Agents

## To get the News URLs for the customers interest from DuckDuckGo based on the input email id given by the agent

In [8]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-urls

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDnora.c63fa237-e@gmail.com
Enter the number of URLs required1
 'nora.c63fa237-e@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mNow that I have the list of tables, I can see that there is a "customers" table. However, the question mentions a "users" table and an "email" column, which doesn't seem to match the "customers" table. I should query the schema of the "customers" table to see if it has any relevant columns.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (ema

# Output from LLMs

## Retrieve the final URL(s) for the latest news based on the customer’s interest and  Create a summary of the retrieved links

In [9]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-summary

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDgeorge.5483cb53-8@gmail.com
Enter the number of URLs required1
 'george.5483cb53-8@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mThere is only one table in the database, which is 'customers'. I should query the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	["Politics", "St

# Querying with the Agent

## Provide any 3 queries to the agentic AI system

### Query#01  Get all the records from the table from the customer table

In [60]:
!export PYTHONWARNINGS="ignore"
!python main.py --job test-agent

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter the query:get the all records from the customer db table and provide the record result in new row
Query: get the all records from the customer db table and provide the record result in new row
  result = newFindr_Obj.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mIt seems like there is a table named "customers" which is likely the table I need to query. I should now check the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from cu

### Query#02  List out the customers who have the interest in Politics

In [62]:
!export PYTHONWARNINGS="ignore"
!python main.py --job test-agent

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter the query:List out the customers who have the interest in Politics and provide the record result in new rows display only the name and interests
Query: List out the customers who have the interest in Politics and provide the record result in new rows display only the name and interests
  result = newFindr_Obj.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mNow that I have a list of tables, I can query the schema of the most relevant table, which appears to be "customers". This will allow me to understand the columns available in this table and construct a query to retrieve the required information.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT 

### Query#03 when was the Ian,Hannah and Oscar records was last updated

In [63]:
!export PYTHONWARNINGS="ignore"
!python main.py --job test-agent

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter the query:when was the Ian,Hannah and Oscar records was last updated and provide the result records in new rows display all columns
Query: when was the Ian,Hannah and Oscar records was last updated and provide the result records in new rows display all columns
  result = newFindr_Obj.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mNow that I have the list of tables, I can query the schema of the 'customers' table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/

##  Retrieve the top 3 latest news based on the customer’s interest and Generate a summary of each resul

## Get latest 3 news and its summary for oscar.edd38e10-6@gmail.com interest

In [13]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-summary

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDoscar.edd38e10-6@gmail.com
Enter the number of URLs required3
 'oscar.edd38e10-6@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mIt seems like there's only one table in the database, which is "customers". I should query the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	["Pol

## Get latest 3 news and its summary for emma.a88fec03-c@gmail.com interest

In [18]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-summary

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDemma.a88fec03-c@gmail.com
Enter the number of URLs required3
 'emma.a88fec03-c@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mNow that I have the list of tables, I can see that there is a table named "customers". I should query the schema of this table to see what columns are available.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.

## Get latest 3 news and its summary for kevin.f8641860-7@gmail.com interest

In [19]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-summary

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDkevin.f8641860-7@gmail.com
Enter the number of URLs required3
 'kevin.f8641860-7@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mThat's a good start, but it seems like the table "users" is not in the database. The only table available is "customers". 

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	last_updated
1	F8641860-7	Kevin	kevin.f8641860-7@gmail.com	["Politics", "Startups", "Travel"

## Get latest 3 news and its summary for nora.c63fa237-e@gmail.com interest

In [20]:
!export PYTHONWARNINGS="ignore"
!python main.py --job get-summary

Base Path: /content/drive/MyDrive/PGP_AI_ML_GREAT_LEARNING/11_Advance_GenAI_NLP/Final_Project/main
Enter your Email IDnora.c63fa237-e@gmail.com
Enter the number of URLs required3
 'nora.c63fa237-e@gmail.com' 
  result = self.agent.run(query)


[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mcustomers[0m[32;1m[1;3mI have the list of tables in the database, which is "customers". Now, I should query the schema of this table to see what columns are available and to check if the column "interset" and "email" exist in this table.

Action: sql_db_schema
Action Input: customers[0m[33;1m[1;3m
CREATE TABLE customers (
	id INTEGER, 
	customer_id TEXT NOT NULL, 
	name TEXT NOT NULL, 
	email TEXT NOT NULL, 
	interests TEXT NOT NULL, 
	last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP, 
	PRIMARY KEY (id), 
	UNIQUE (customer_id), 
	UNIQUE (email)
)

/*
3 rows from customers table:
id	customer_id	name	email	interests	la

# Conclusion

1. The application has been created with real time personalized news based on the user's interest

2. The news are retrieved from the well authozised and authenticated news platform like BBC news, Reuters, The Guardian etc.,  so it ensuring the news is very much authenticated

3. In the application the provision has been created to limit the no of URLs to be retrieved also the news summariser

4. Ensuring the latest news to be share in the prompting itself it was restricted or to retrive the news from 2025 year

5. currently using llama-3.3-70b-versatile APIkey from groq.com which is free version and limited to No. of tokens or API calls so we can opt for the enteriprise subscription and can be used other LLM models as well

6. Front end wrapped can also be create also hosted for centralized utilizations




# References

1. Great learning LMS and MLS session contents
2. Langchain documentation

In [39]:
!pip install nbconvert



In [43]:
!jupyter nbconvert --to html NewsFinder_version_1_0.ipynb

[NbConvertApp] Converting notebook NewsFinder_version_1_0.ipynb to html
[NbConvertApp] Writing 501254 bytes to NewsFinder_version_1_0.html
