<a href="https://www.kaggle.com/code/rachitss/gemeni-newsbot?scriptVersionId=244077675" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

## 📰 Interactive, Reliable NewsBot in Kaggle Notebooks using Gemeni

#### Problem Statement:
Finding up-to-date, trustworthy news summaries on specific topics is time-consuming and often requires visiting multiple sources. Users need a way to quickly access accurate headlines, summaries, and original references, all in a structured, machine-readable format—especially when working within data science environments like Kaggle.

#### Solution with GenAI:
By leveraging Google’s Gemini models and real-time search tools, this notebook enables users to interactively request news summaries, article digests, or trending headlines on any topic. The GenAI-powered NewsBot fetches the latest information from credible sources, presents it in a standardized JSON format, and even explains or evaluates the quality of its own responses. This approach streamlines research, supports fact-checking, and empowers data-driven decision-making—all within a secure, reproducible Kaggle workflow.


## GenAI capabilities demonstrated in the project:

- ✅ **Structured output / JSON mode / controlled generation**  
  News responses are generated and parsed in a strict JSON schema.

- ✅ **Few-shot prompting**  
  Multiple few-shot examples guide the model's output structure and style.

- ✅ **Function Calling**  
  Integrates and utilizes the Google Search tool as a callable function.

- ✅ **Agents**  
  Implements an agent with tool use, multi-turn chat, and session history.

- ✅ **Gen AI evaluation**  
  Uses LLM-based evaluation with explicit rubrics, rationales, and structured scores.

- ✅ **Grounding**  
  Fetches and cites real, up-to-date sources using the Google Search tool.


In [None]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## 🛠️ Kaggle Environment Setup

## Resolving Conflicts and Installing Dependencies

In [None]:
# Remove conflicting packages from the Kaggle base environment.
!pip uninstall -q -y kfp jupyterlab jupyterlab-lsp

# Install langgraph and the packages used in this lab.
!pip install -q -U "google-genai==1.7.0" 'json-repair'

## Python Imports

In [None]:
from typing import Literal
from typing import Annotated
from typing_extensions import TypedDict
from google import genai
from google.genai import types
from IPython.display import HTML, Markdown, Image, display
import json
import json_repair
import re
import enum

In [None]:
# Define a retry policy. The model might make multiple consecutive calls automatically
# for a complex query, this ensures the client retries if it hits quota limits.
from google.api_core import retry

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
  genai.models.Models.generate_content = retry.Retry(
      predicate=is_retriable)(genai.models.Models.generate_content)

## 🔑 Securely Load Google API Key

In [None]:
import os
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY

## 📝 Defines the NewsBot's system instructions and welcome message
guides the AI to answer only news-related queries with JSON-formatted, search-grounded results.

In [None]:
# Defining Sys Int

SYS_INT="""You are NewsBot, an interactive news assistant. A human will interact with you to get the latest news, daily headlines, and summaries of news articles. You will answer any questions strictly related to news topics, headlines, summaries, and source information—no off-topic discussion, but you may chat about news stories, their background, and context.

    If you think the user's request is served,  ask user to enter 'q', 'quit', or 'bye' to quit.

    Most Important- ALWAYS PROVIDE REFERENCES SUPPLIED BY TOOL, NEVER PROVIDE FAKE REFERENCES. IF THERE ARE NONE REFERENCES, DONT PROVIDE
    
    The user may request:
    
    Daily headlines on a specific topic or region.
    
    Summaries of online articles (by providing a link or headline).
    
    Other news-related queries, such as trending topics or news source reliability.
    
    For each request:
    
    Use the appropriate tool or API to search for and retrieve up-to-date, reliable news information.

    ALWAYS USE SEARCH_TOOL FOR PROVIDING NEWS, DO NOT USE YOUR OWN KNOWLEDGE, SEARCH_TOOL ONLY
    
    Make sure to provide real actual working links through google search tool only !!!, do not generate or hallucinate links

    Always prioritise new news posted on popular and reliable sources

    Whenever you provide news to the user, make sure to output it in the following JSON format:
    
    news_schema = {
    "type": "object",
    "properties": {
        "news": {
            "type": "array",
            "description": "A list of news items.",
            "items": {
                "type": "object",
                "properties": {
                    "headline": {
                        "type": "string",
                        "description": "The main headline or title of the news article."
                    },
                    "content": {
                        "type": "string",
                        "description": "The main body or summary of the news article."
                    },
                    "references": {
                        "type": "array",
                        "description": "A list of URLs or citations for the news source.",
                        "items": {
                            "type": "string",
                            "format": "uri"
                        }
                    }
                },
                "required": ["headline", "content", "references"]
            }
        }
    },
    "required": ["news"]
}

    provide about 4-5 lines of content for each headline.
    
    once you provide news in the json format, no need to provide it in free text, just continue the chat and ask user to enter 'q', 'quit', or 'bye' to quit.
    
    
    When summarizing articles, ensure your summary is detailed, factually accurate, and includes a citation to the original source, and suggest more similar news further entries.
    
    If the user asks for features not yet implemented (e.g., image or video analysis), inform them politely that the feature is not available yet.
    
    You may guide users on how to interact with you (e.g., 'Type a topic to get headlines,' or 'Paste a link to summarize an article'). If a request is unclear or too broad, ask clarifying questions to narrow down the topic.
    
    If you are unable to retrieve information (e.g., due to tool/API unavailability), break the fourth wall and let the user know the feature is not implemented yet and suggest they continue reading or try another request.
    
    Always maintain a helpful, concise, and factual tone, and thank the user at the end of each session.

    If you think the user's request is served, ask user to enter 'q', 'quit', or 'bye' to quit.
    """


# Welcome Message

WELCOME_MSG = """**Welcome to NewsBot**  \n Please tell me what topic you are interested in  \n *Hint: I can also Summarize an article! Just type summarize to know more*"""

## 📝 Few-shot Prompt Examples for NewsBot
These are structured few-shot prompt examples demonstrating how NewsBot should respond to various user queries. Each example pairs a user prompt with a model response in JSON format,showing the expected structure for news headlines, summaries, and source references.

In [None]:
# Examples

example1=""" Prompt
Electric Vehicles


Response

'```json
{
"news": [
{
"headline": "AI Detects Rare Diseases Earlier Than Ever Before",
"content": "A new AI-powered diagnostic tool is helping doctors identify rare diseases in patients months earlier than traditional methods. Clinical trials show the system can analyze patient data and medical images to flag potential issues, leading to faster treatment and improved outcomes.",
"references": [
]
},
{
"headline": "Hospitals Adopt AI to Streamline Patient Care",
"content": "Major hospital networks have begun deploying AI systems to optimize patient scheduling, predict staffing needs, and reduce wait times. Early reports indicate a 20% improvement in efficiency and higher patient satisfaction scores.",
"references": [
]
},
{
"headline": "AI-Driven Drug Discovery Accelerates New Treatments",
"content": "Pharmaceutical companies are increasingly using AI to analyze massive datasets and identify promising compounds for drug development. Recent breakthroughs include the discovery of a new antibiotic candidate in record time, now entering clinical trials.",
"references": [
]
}
]
}
```

Is there anything else I can help you with? Type 'q', 'quit', or 'bye' to quit.

"""

example2=""" Prompt
Space Exploration


Response

'```json
{
"news": [
{
"headline": "NASA’s Artemis II Mission Set for 2025 Launch",
"content": "NASA has confirmed that Artemis II, the first crewed mission to orbit the Moon since Apollo, is scheduled for launch in 2025. The mission will test new spacecraft systems and pave the way for a lunar landing in the coming years.",
"references": [
]
},
{
"headline": "India’s Chandrayaan-4 to Explore Lunar South Pole",
"content": "The Indian Space Research Organisation (ISRO) announced Chandrayaan-4, a mission aiming to collect samples from the Moon’s south pole. This region is believed to contain water ice, which could be vital for future lunar bases.",
"references": [
]
},
{
"headline": "SpaceX Starship Completes First Full-Orbit Test Flight",
"content": "SpaceX’s Starship has successfully completed its first full-orbit test flight around Earth, demonstrating the vehicle’s potential for deep space missions. The company plans to use Starship for Mars colonization and commercial satellite launches.",
"references": [
},
{
"headline": "James Webb Telescope Discovers Earth-Like Exoplanet",
"content": "Astronomers using the James Webb Space Telescope have identified an exoplanet with Earth-like conditions in the habitable zone of a nearby star. The discovery raises hopes of finding signs of life beyond our solar system.",
"references": [
]
},
{
"headline": "International Space Station to Host Commercial Research Lab",
"content": "A new commercial research laboratory will be added to the International Space Station in late 2025. The facility will support pharmaceutical research, materials science, and technology demonstrations in microgravity.",
"references": [
]
}
]
}
```

Type 'q', 'quit', or 'bye' to quit.
"""

example3=""" Prompt
Advances in Renewable Energy


Response

'```json
{
"news": [
{
"headline": "World’s Largest Offshore Wind Farm Begins Operation",
"content": "The North Sea now hosts the world’s largest offshore wind farm, generating enough electricity to power over 2 million homes. The project is a joint venture between several European energy companies and is expected to significantly reduce carbon emissions in the region.",
"references": [
]
},
{
"headline": "Breakthrough in Solar Panel Efficiency Announced",
"content": "Scientists have developed a new type of solar panel that converts over 30% of sunlight into electricity, setting a new world record. The innovation uses multi-junction cells and could accelerate the adoption of solar energy globally.",
"references": [
]
},
{
"headline": "Battery Storage Costs Fall, Enabling 24/7 Renewable Power",
"content": "A sharp decline in battery storage costs has made it feasible for utilities to provide round-the-clock renewable energy. Several regions have already reported days powered entirely by wind and solar, thanks to advanced grid-scale batteries.",
"references": [
]
},
{
"headline": "Green Hydrogen Production Scales Up Worldwide",
"content": "Major energy firms have launched large-scale green hydrogen plants in Australia, Germany, and the Middle East. Produced using renewable electricity, green hydrogen is expected to play a key role in decarbonizing heavy industry and transportation.",
"references": [
]
}
]
}
```

Is there anything else I can help you with? Type 'q', 'quit', or 'bye' to quit.
"""

In [None]:
# #!!! For deleting old chat
# del chat

## 🔗 Configuring Gemini agent in Kaggle to answer news queries using real-time Google Search.

- **Registers the Google Search tool** so the agent can fetch up-to-date information.
- **Combines system instructions and few-shot examples** to guide the model’s behavior and output format.
- **Creates a chat session** with the Gemini 2.0 Flash model, enabling interactive, grounded, and structured news responses.

In [None]:
# Client config

#Register the Google Search tool
search_tool = types.Tool(google_search=types.GoogleSearch())

#Create a config with the search tool and sys_int
client = genai.Client(api_key=GOOGLE_API_KEY)
sys_config = types.GenerateContentConfig(
    tools=[search_tool], system_instruction=SYS_INT + example1 + example2 + example3
)

chat = client.chats.create(model='gemini-2.5-flash-preview-04-17', config=sys_config, history=[])

## 💬 Interactive NewsBot Chat Loop

This cell starts an interactive chat loop with NewsBot:
- **Displays a welcome message** and waits for user input.
- **Processes each user query** using Gemini, returning up-to-date news in a structured, readable Markdown format.
- **Automatically parses and displays news in sections** with headlines, summaries, and references.
- **Handles both structured (JSON) and unstructured responses**, ensuring clear output for every query.

<div class="alert alert-block alert-danger">
<b>Warning:</b>  
This cell contains an interactive chat loop that requires human input.  
To use it, <b>fully uncomment the code (Ctrl+/)</b>.  
It is commented by default to prevent errors when running all cells in Kaggle Notebooks.
</div>


In [None]:
# # Print welcome message
# display(Markdown(WELCOME_MSG))

# # Initialize user_input and prompt
# user_input=0
# prompt=0

# # Start the chat loop
# while True:
#     prompt = user_input
#     user_input = input("You: ").strip()
#     if user_input.lower() in ['q', 'quit', 'bye']:
#         print("bye!")
#         break
#     response = chat.send_message(user_input)

#     # Print JSON File if any
    
#     # If using Gemini SDK, response.text contains the output as a string
#     raw_output = response.text

#     # Check if Json Struct is present
#     json_match = re.search(r'\{.*\}', raw_output, re.DOTALL)
#     if json_match:
#         # Try to parse the output as JSON
#         try:
#             news_data = json.loads(raw_output)
#         except json.JSONDecodeError:
#             # If the output is sometimes malformed, you may want to use a JSON repair library
#             # See the whitepaper for tools like `json-repair` (https://pypi.org/project/json-repair/)
#             from json_repair import repair_json
#             news_data = json.loads(repair_json(raw_output))
        
#         list_md = "### 📰 **News Highlights**\n\n"
        
#         for idx, item in enumerate(news_data["news"], 1):
#             # No leading spaces, just a newline for the references
#             refs = "".join(f"\n- [Reference {i+1}]({ref})" for i, ref in enumerate(item["references"]))
#             list_md += (
#                 f"---\n"
#                 f"**{idx}. {item['headline']}**\n\n"
#                 f"> *{item['content']}*\n\n"
#                 f"**References:**{refs}\n\n"
#             )
        
#         display(Markdown(list_md))
    
#         # Extract the JSON block from raw_output
#         json_match = re.search(r'\{.*\}', raw_output, re.DOTALL)
#         if json_match:
#             json_str = json_match.group(0)
#             # Remove the JSON part from the original raw_output
#             remaining_text = raw_output.replace(json_str, '').strip()
#         else:
#             remaining_text = raw_output.strip()
        
#         # Display the remaining text if any
#         if remaining_text:
#             display(Markdown(remaining_text))

#         eval_prompt=prompt
#         eval_response=list_md+remaining_text

#     else: display(Markdown(raw_output))

## 📝 Evaluation Prompt for AI Response Assessment

This cell defines a comprehensive, criteria-based evaluation prompt.  
It guides the evaluator to rate AI-generated responses using a structured rubric, ensuring consistent and objective assessment of relevance, accuracy, completeness, and clarity.

In [None]:
EVAL_PROMPT="""

# Instruction

You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user input and an AI-generated responses.
You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the Criteria provided in the Evaluation section below.
You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.


## Criteria

1. Relevance

Does the output directly address the prompt?
Are the news items or summaries on-topic and responsive to the user’s request?

2. Factual Accuracy / Groundedness

Are the statements factually correct and verifiable?
Are references present, real, and do they support the claims made?
Is the information up-to-date and grounded in credible sources?

3. Completeness / Coverage

Does the response cover all major aspects of the prompt?
For news: Are multiple facets or subtopics included, or is the answer too narrow?

4. Originality / Non-Redundancy

Are the news items or points unique, or are there duplicates/repetitions?
Are references unique for each news item, or are they reused inappropriately?

5. Clarity and Fluency

Is the output well-written, clear, and easy to understand?
Is the language appropriate for the intended audience?

6. Structure and Formatting

Is the output in the required structured format (e.g., JSON, bullet points)?
Are all required fields present (headline, content, references)?
Is the output parsable and consistent with the prompt’s instructions?

7. Faithfulness to Source

Does the summary or news accurately reflect the source material (if summarizing)?
Are there hallucinations or distortions?

8. Helpfulness / User Value

Does the output provide actionable, useful, or insightful information for the user?
Does it answer follow-up questions or guide the user appropriately?


## Rating Rubric

5: (Very good)
4: (Good)
3: (Ok)
2: (Bad)
1: (Very bad)

## Evaluation Steps

STEP 1: Assess the response based on criteria.
STEP 2: Score based on the rubric from 1-5.


# User Inputs and AI-generated Response
## User Inputs

### Prompt
{prompt}

## AI-generated Response
{response}
"""

## 🔍 LLM-Based Evaluation with Structured Scoring

This function uses a Gemini chat session to evaluate AI-generated summaries against a given prompt.  
It returns both a detailed, step-by-step rationale and a structured rating (from 1 to 5) using a custom enum.

In [None]:
# Define a structured enum class to capture the result.
class SummaryRating(enum.Enum):
  VERY_GOOD = '5'
  GOOD = '4'
  OK = '3'
  BAD = '2'
  VERY_BAD = '1'


def evaluate(prompt, ai_response):
  """Evaluate the generated summary against the prompt used."""

  evl = client.chats.create(model='gemini-2.0-flash')

  # Generate the full text response.
  response = evl.send_message(
      message=EVAL_PROMPT.format(prompt=prompt, response=ai_response)
  )
  verbose_eval = response.text

  # Coerce into the desired structure.
  structured_output_config = types.GenerateContentConfig(
      response_mime_type="text/x.enum",
      response_schema=SummaryRating,
  )
  response = evl.send_message(
      message="Convert the final score.",
      config=structured_output_config,
  )
  structured_eval = response.parsed

  return verbose_eval, structured_eval

## Calling Evaluation Function

<div class="alert alert-block alert-danger">
<b>Warning:</b>  
These cells depend on interactive chat loop that requires human input.  
To use it, <b>fully uncomment the code (Ctrl+/) and run after running the chat loop</b>.  
It is commented by default to prevent errors when running all cells in Kaggle Notebooks.
</div>


In [None]:
# text_eval, struct_eval = evaluate(prompt=eval_prompt, ai_response=eval_response)

## Displaying the Evaluation Results

In [None]:
# display(Markdown(text_eval))
# print('\n',struct_eval)