<a href="https://colab.research.google.com/github/kekubhai/Build-/blob/main/Copy_of_genai_workshop_(AI_Newsroom).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# News Summarization and Citation Pipeline

This notebook demonstrates a multi-agent workflow for summarizing news articles, generating citations, and redacting uncited content using `crewai` for agent orchestration and Google `Gemini` for LLM. The workflow includes:

- Fetching news articles using NewsAPI
- Summarizing news content
- Generating citations for each factual statement
- Redacting any uncited content for information integrity

## Environment Setup and API Keys

Install required packages and set up API keys for NewsAPI and Google Gemini.

In [1]:
!pip install -q "crewai>=0.165.1" "google-generativeai==0.8.5" "newsapi-python>=0.2.7" "pyparsing>=3.2.3"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.6/40.6 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.8/42.8 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.8/413.8 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m94.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.4/16.4 MB[0m [31m74.7 MB/s[0m eta [3

In [2]:
import os

os.environ["GEMINI_API_KEY"] = "AIzaSyCTjJIl3n9X3UJLs-XQJiIBUYnkTVJ6SwU"
os.environ["NEWS_API_KEY"] = "e616319df96c4edc8bd7a03cbb690f9d"

In [None]:
from newsapi import NewsApiClient


def denewline(line: str) -> str:
    """Remove newlines from a string."""
    return line.replace("\n", "")


def search_news(query):
    """Search for news articles related to the given query."""
    # Initialize the News API client
    api = NewsApiClient(api_key=os.environ["NEWS_API_KEY"])

    # Search for news articles
    results = api.get_everything(q=query)

    # Process the results
    return "\n".join(
        [
            f'- {denewline(article["title"])}: {denewline(article["description"])} ({article["url"]})'
            for article in results["articles"]
            if article["title"]
            and article["description"]
            and article["title"] != "[Removed]"
        ][:10]
    )

## Fetching and Summarizing News Articles

Define utility functions to fetch news articles using NewsAPI and summarize them.

In [None]:
from datetime import datetime
import logging
from textwrap import dedent

from crewai import Agent, Crew, Task, LLM


# Initialize the LLM
llm = LLM(
    model="gemini/gemini-2.5-flash",
    api_key=os.environ["GEMINI_API_KEY"],
    temperature=0.0,
)

## Multi-Agent Workflow Setup

Initialize Google `Gemini` and define agents for summarization, citation generation, and content redaction using `crewai`.

In [None]:
def main():
    """Main entry point for the application."""
    # Define the news summarizer agent
    news_summarizer = Agent(
        role="News Summarizer",


        goal=(
            dedent(
                """\
                Summarize current news articles and headlines into clear,
                concise paragraphs that provide an accurate and
                comprehensive overview of events. The summaries should
                be objective, well-structured, and highlight the most
                important information from multiple sources on a given
                topic."""
            )
        ),
        backstory=(
            dedent(
                """\
                Designed to process large volumes of news data, this
                agent was built in response to the growing demand for
                quick, reliable news summaries. Drawing from a wide
                range of reputable news sources, the agent uses its
                expertise in language processing and summarization to
                distill key facts and narratives from headlines and
                articles. It has a deep understanding of journalistic
                integrity and strives to present information that is
                balanced, accurate, and easy to read. Equipped with
                advanced natural language understanding, the agent
                ensures that every summary is both informative and
                engaging, keeping readers well-informed about current
                events in a fraction of the time."""
            )
        ),
        llm=llm,
    )

    # Define the citation generator agent
    citation_generator = Agent(
        role="Citation Manager",
        goal=(
            dedent(
                """\
                Generate accurate and properly formatted citations for
                news summaries, linking specific sentences to their
                corresponding source articles. The citations should
                follow a standardized format and include all necessary
                information, such as article title, author, publication
                date, and source URL, ensuring the summary is properly
                referenced."""
            )
        ),
        backstory=(
            dedent(
                """\
                Developed in response to the need for reliable and
                transparent sourcing of information, this agent is
                adept at tracking news articles used in a summary and
                generating corresponding citations. Drawing from
                practices in academic writing and journalism, the
                agent ensures that each summary is accompanied by
                clear and accurate references. With expertise in
                citation formats (APA, MLA, Chicago, etc.) and a deep
                understanding of web sources, the agent guarantees
                that readers can easily trace the information back to
                its original source, fostering trust and credibility
                in the summarized content."""
            )
        ),
        llm=llm,
    )

    # Define the content redactor agent
    redactor = Agent(
        role="Content Redactor",
        goal=(
            dedent(
                """\
                Redact any sentences in the summary that lack citations,
                ensuring that only well-sourced information remains.
                The agent should scan the summary for citation links,
                verify their presence, and remove any uncited content
                while maintaining the overall coherence and readability
                of the summary."""
            )
        ),
        backstory=(
            dedent(
                """\
                Created to uphold the highest standards of information
                integrity, this agent ensures that only verifiable and
                properly cited content is retained in the final
                summaries. By meticulously reviewing each sentence and
                cross-referencing it with the generated citations, the
                redactor agent removes any statements that lack clear
                sourcing. With its focus on transparency and accuracy,
                the agent contributes to creating a trustworthy and
                responsible final product, where readers can confidently
                rely on the presented information."""
            )
        ),
        llm=llm,
    )

    # Describe the news summarizer task
    summarize_news = Task(
        description=(
            dedent(
                """\
                Summarize the key points from the provided news articles
                below. The summary should be concise and combine relevant
                information from all the articles, highlighting the most
                important details. Ensure that the tone is neutral, and the
                summary is well-structured, presenting the most significant
                facts clearly. Focus on delivering a few paragraphs that
                provide an accurate and informative overview of the topic.

                The articles to be summarized are included at the end
                of this prompt.  Please ensure that the summary
                reflects the facts and important themes from all
                articles without introducing personal opinions.

                --- START OF ARTICLES ---
                {articles}
                --- END OF ARTICLES ---
                """
            )
        ),
        expected_output=(
            dedent(
                """\
                A summary of a few paragraphs that captures the main points
                and facts from the provided articles. The summary should be
                clear, concise, and well-organized, combining overlapping
                information and presenting a cohesive narrative. It should
                remain neutral and avoid editorializing, focusing solely on
                the factual content from the articles.
                """
            )
        ),
        agent=news_summarizer,
    )

    # Describe the citation generator task
    generate_citations = Task(
        description=(
            dedent(
                """\
                Add citations to the provided summary by linking
                relevant sentences to their corresponding news
                articles. For each factual statement in the summary,
                identify the source article and append a properly
                formatted citation. The citation should include key
                information such as article title, author, publication
                date, and source URL, and the provided access date.

                Ensure that each citation is linked to the correct
                portion of the summary and that the citations are
                placed unobtrusively, maintaining readability while
                making it clear which information comes from which
                article. The access date for the citations is provided
                below.

                The summary to be annotated, the articles to be
                referenced, and the access date are provided below.

                --- START OF SUMMARY ---
                {summary}
                --- END OF SUMMARY ---

                --- START OF ARTICLES ---
                {articles}
                --- END OF ARTICLES ---

                --- ACCESS DATE ---
                {access_date}
                """
            )
        ),
        expected_output=(
            dedent(
                """\
                A summary annotated with citations for each relevant
                factual statement. Each citation should be formatted
                with the article title, author, publication date,
                source URL, and the provided access date.  The
                citations should be linked to the correct portion of
                the summary, making it easy for the reader to trace
                statements back to their original sources while
                maintaining clarity and readability.
                """
            )
        ),
        agent=citation_generator,
    )

    # Describe the content redaction task
    redact_uncited_content = Task(
        description=(
            dedent(
                """\
                Review the provided summary and remove any sentences
                or sections that do not have corresponding
                citations. Ensure that the redacted version retains
                coherence and clarity, only eliminating content that
                lacks a citation reference.  The final output should
                be a concise, well-structured summary containing only
                the well-cited information.

                The summary with citations is provided below.

                --- START OF SUMMARY WITH CITATIONS ---
                {summary_with_citations}
                --- END OF SUMMARY WITH CITATIONS ---
                """
            )
        ),
        expected_output=(
            dedent(
                """\
                A redacted version of the summary that contains only the
                sentences or sections with citations. The redacted summary
                should be clear, concise, and maintain logical flow. Any
                uncited content should be removed, ensuring that only
                verifiable, properly sourced information remains.
                """
            )
        ),
        agent=redactor,
    )

    # Start news search
    news = search_news("Google Pixel 10 Release")

    # Initialize Crews with agents and tasks
    os.environ["CHROMA_GOOGLE_GENAI_API_KEY"] = os.environ["GEMINI_API_KEY"]

    summary = Crew(
        agents=[news_summarizer],
        tasks=[summarize_news],
        verbose=True,
        memory=True,
        embedder={
            "provider": "google",
            "config": {
                "model": "text-embedding-004",
                "task_type": "retrieval_document",
                "title": "Embeddings for Embedchain",
            },
        },
    ).kickoff(inputs={"articles": news})

    logging.info(summary)

    citations = Crew(
        agents=[citation_generator],
        tasks=[generate_citations],
        verbose=True,
        memory=True,
        embedder={
            "provider": "google",
            "config": {
                "model": "text-embedding-004",
                "task_type": "retrieval_document",
                "title": "Embeddings for Embedchain",
            },
        },
    ).kickoff(
        inputs={
            "summary": summary.raw,
            "articles": news,
            "access_date": datetime.today().date().isoformat(),
        }
    )

    logging.info(citations)

    redactions = Crew(
        agents=[redactor],
        tasks=[redact_uncited_content],
        verbose=True,
        memory=True,
        embedder={
            "provider": "google",
            "config": {
                "model": "text-embedding-004",
                "task_type": "retrieval_document",
                "title": "Embeddings for Embedchain",
            },
        },
    ).kickoff(inputs={"summary_with_citations": citations.raw})

    logging.info(redactions)

## Orchestrating the Workflow and Output

Run the main function to execute the workflow: fetch news, summarize, generate citations, and redact uncited content. The results are logged for review.

In [None]:
# Run the application
if __name__ == "__main__":
    main()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

Output()

## Conclusion

This notebook showcases a robust multi-agent workflow for processing news articles, ensuring summarized content is accurate and well-cited using advanced LLM capabilities.