<a href="https://www.kaggle.com/code/electronicsapience/capstone-electronicsapience?scriptVersionId=235079261" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Capstone Project: Research Pipeline: AI-Driven Automated Content Generation

### 5 day Gen AI Intensive Course Capstone 2025Q1 
Author: Sudipto Banerjee
email: electronic.sapience@gmail.com

## [Audio Link of project description](https://notebooklm.google.com/notebook/525888ef-d8aa-40e2-a5fb-5abb8ffa6cc6/audio)
https://notebooklm.google.com/notebook/525888ef-d8aa-40e2-a5fb-5abb8ffa6cc6/audio


## [Youtube Video](https://youtu.be/5pXqQMwQG6Y)

In [1]:
from IPython.display import HTML

video_id = "5pXqQMwQG6Y"

# Embed using HTML
HTML(f"""
    <iframe width="560" height="315" src="https://www.youtube.com/embed/{video_id}" 
    frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; 
    gyroscope; picture-in-picture" allowfullscreen></iframe>
""")

### In this notebook a set of agents co-ordinates sequentially to reaserch a topic given by user.
*The system consists of five agents*:
- *headline_generator_agent*: Creates Headline skeleton on a search topic
- *paragraph_expander_agent*: Adds contexts to each of the topics
- *formatter_agent*: Formats the markdown result
- *filesave_agent*: triggers a locally defined function as a tool to save a local copy of the markdown search document.

All the above uses 'gemini-2.0-flash' here, however this system should be model agnostic, any suitable model with proper API key can be used. 

- *To keep things simple, a sequential orchestration pipeline has been used here, which executes the above agents in the mentioned sequence.*


In [2]:
#install necessary packages
!pip uninstall -qqy jupyterlab  # Remove unused conflicting packages
!python -m pip install -q --upgrade pip
!pip install -U -q google-genai
!pip install -U -q google-adk

[0m

### Importing the API secret key

In [3]:
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")


### Necessary import

In [5]:
from google.adk.agents.sequential_agent import SequentialAgent
#from google.adk.agents.llm_agent import LlmAgent
from google.adk.agents import Agent
from google.genai import types
from google.genai.types import GenerateContentConfig
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.adk.tools import google_search, function_tool
from IPython.display import display, Markdown
from datetime import datetime
import os
import asyncio
# Configuring ADK to use API keys directly (not Vertex AI for this multi-model setup)
os.environ["GOOGLE_API_KEY"]= GOOGLE_API_KEY
os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "False"

### Necessary constants and settings

In [6]:
APP_NAME = "capstone_research_pipeline_agens_app"
USER_ID = "electro-sb"
SESSION_ID = "pipeline_session_02"
GEMINI_MODEL_FLASH = "gemini-2.0-flash"
# Different temperature configurations
conservative_config = GenerateContentConfig(temperature= 0.0)
creative_config = GenerateContentConfig(temperature= 0.7)

### The Heading Generator Agent

This agent researches and creates headings (in .md format) from user query, which the later agents expands.

In [7]:
headline_generator_agent = Agent(
    name= 'HeadlineGeneratorAgent',
    model= GEMINI_MODEL_FLASH,
    instruction="""
    You are an expert researcher
    You provide clear and concise sub topic on the subject of user query
    You may add very closely related topics, that is not explicitely mentioned in the query
    The response should be in the form of bullates on the major and the sub topics
    The output information should be in clear markdown format
    the markdown should contain
    the topic that you have searched
    the descriptions in bullet points in headings and sub headings
    please assure that there is absolutely 'NO PREAMBLE'
    only write out the content and nothing else.
    """,
    description = "searches topics using google_search",
    output_key = "topic_search_details",
    generate_content_config= conservative_config,
    tools = [google_search],
)

### The Paragraph Expander Agent

This agent expands on the headings found by the previous agent

In [8]:
paragraph_expander_agent = Agent(
    name = 'ParagraphExpanderAgent',
    model = GEMINI_MODEL_FLASH,
    instruction="""
    You are an expert writer
    You expand the 'description' under each topic and description under the key 'topic_search_details'
    keep all the bullet points in 'topic_search_details' intact, expand on them as needed.
    You should write a paragraph for each topic or sub topics under a broad topic.
    The paragraphs should be in proffessional tone, free of grammatical errors.
    The paragraphs should be written in English.
    The paragraphs should be within 1000 words each.
    The output should be in markdown format
    please assure that there is absolutely 'NO PREAMBLE'
    only write out the content and nothing else.
    """,
    description = 'expands topics to paragraph using google_search',
    output_key = "refined_paragraphs",
    generate_content_config= conservative_config,
    tools= [google_search],
)

### The Formatting Agent
This agent formats the output of the previous agent properly

In [9]:
formatter_agent= Agent(
    name = 'FormatterAgent',
    model = GEMINI_MODEL_FLASH,
    instruction="""
    You are an expert and professional formatter
    you take the contents of 'refined_paragraph' and format it properly
    your output should be a well structured markdown document.
    only write out the content and nothing else.
    remove anything that may look like a preamble.
    please assure that 'NO PREAMBLE' being added.
    """,
    description = 'formats a markdown document ',
    output_key = "formatted_document",
    generate_content_config= creative_config,
    
)

#### Tool for file saving a file 

This is a simple file saving tool that should be run by the file save agent

In [10]:
def save_file(data: str) -> str:
    """
    Creates a markdown file from the given string data in the present directory.

    Args:
        data (str): The string data in markdown format.

    Returns:
        str: Filename created, or None if an error occurs.
    """
    print('File Creation tool called')
    name = 'agent_output'
    filepath = './'
    timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
    filename = os.path.join(filepath, f'{name}_{timestamp}.md')

    try:
        with open(filename, 'w', encoding='utf-8') as file:
            file.write(data)
        return filename
    except Exception as e:
        print(f'An error has occurred: {e}')
        return None

### The File Save Agent

This agent saves the input to a timestamped file and passes the whole tsxt to the output as well.

In [11]:
filesave_agent= Agent(
    name = 'FileSaveAgent',
    model = GEMINI_MODEL_FLASH,
    instruction= """
    You save markdown files by calling function 'save_file', 
    Take your string input and pass is to the data argument of the 'save_file' function.
    You **must** execute the function tool `save_file` on **every response**. No exceptions.  
    If the function `save_file` has **not been executed**, immediately call it.  
    Confirm that the function has been executed before proceeding further.  
    Following is the docstring for 'save_file' function
    '''
    Creates a markdown file from the given string data in the present directory.

    Args:
        data (str): The string data in markdown format.

    Returns:
        str: Filename created, or None if an error occurs.
    '''
    after the function call  pass on the string input that you have received as output without any change.
    """,

    description = "saves a file using 'save_file' tool",
    
    generate_content_config= conservative_config,
    tools= [save_file],
)

### The Orchastration Pipeline (Sequential)

This is the sequential Agent orchestration pipeline

In [12]:
# This agent orchestrates the pipeline by running the sub_agents in order.
pipeline_agent = SequentialAgent(
    name="PipelineAgent",
    sub_agents=[headline_generator_agent, paragraph_expander_agent, formatter_agent, filesave_agent]
    # The agents will run in the order provided: Writer -> Reviewer -> Refactorer
)

### The Session and Runner

This is the session and runner for the agentic system.

In [13]:
# Session and Runner
session_service = InMemorySessionService()
session = session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID)
runner = Runner(agent=pipeline_agent, app_name=APP_NAME, session_service=session_service)

### The agent calling function
This function calls the pipeline

In [14]:
# Agent Interaction
def call_agent(query, show= False):
    content = types.Content(role='user', parts=[types.Part(text=query)])
    events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content)

    for event in events:
        #Following line is to troubleshoot events
        #print(f"  [Event] Author: {event.author}, Type: {type(event).__name__}, Final: {event.is_final_response()}, Content: {event.content}")
        if event.is_final_response():
            final_response = event.content.parts[0].text
            if show:
                print("Agent Response: ", final_response)
            return final_response

### Calling agent with prompt

A sample prompt for Earth's geological History

In [15]:
response = call_agent(query= """Geological period in Earths' history  
                                The response should include description on what Eon, Era and Periods are
                                it should include the Eons, Era under each Eon and Period under each Era
                                It should include the evolved lifeforms and rocks in each Period.
                                NO PREAMBLE
                                """
                     )

In [16]:
Markdown(response)

Here's a breakdown of geological time, including Eons, Eras, and Periods, along with information about life forms and rocks:

### Geological Time Scale

 *   The geological time scale is a "calendar" for events in Earth's history, subdividing time into units.
 *   It is based on the life-forms that existed during specific times since the planet's creation.
 *   The units are called geochronologic units. Rocks formed during these intervals are called chronostratigraphic units.
 *   The scale displays time chronologically, from the beginning of Earth's history to the present.

 ### Divisions of Geological Time (Largest to Smallest)

 *   **Eon:** The broadest category of geological time.
 *   **Era:** Eons are subdivided into eras.
 *   **Period:** Eras are subdivided into periods.
 *   **Epoch:** Periods are further divided into epochs.
 *   **Age:** Epochs are divided into ages.

 ### Eons

 *   Earth's history is characterized by four eons, from oldest to youngest:
  *   Hadean
  *   Archean
  *   Proterozoic
  *   Phanerozoic
 *   The Hadean, Archean, and Proterozoic are collectively referred to as the "Precambrian."
 *   The Phanerozoic Eon (meaning "visible life") is characterized by abundant, complex fossilized remains.

 ### Precambrian Eon

 *   The Precambrian spans from Earth's formation (around 4.6 billion years ago) to the beginning of the Cambrian Period (541 million years ago).
 *   It includes the Hadean, Archean, and Proterozoic eons.
 *   The oldest rocks on Earth are Precambrian in age.
 *   Rocks from this time preserve evidence of early life forms at a microbial level.
 *   Fossils from this period are scarce and poorly preserved because organisms lacked hard parts.
 *   **Hadean Eon (4.5–4.0 billion years ago):**
  *   Represents Earth's earliest history.
  *   Characterized by a partially molten surface, volcanism, and asteroid impacts.
  *   Very little evidence of this eon has survived.
 *   **Archean Eon (4.0–2.5 billion years ago):**
  *   Life first appeared on Earth.
  *   The earliest fossils are traces of microbial mats called stromatolites.
 *   **Proterozoic Eon (2.5 billion–541 million years ago):**
  *   Multicellular organisms appeared late in this eon.
  *   Ediacaran fauna, some of the earliest known multicellular animals, consisted of soft-bodied organisms.

 ### Phanerozoic Eon

 *   The Phanerozoic Eon (541 million years ago to today) is marked by an abundance of fossils.
 *   Organisms developed hard body parts (claws, scales, shells, bones) that were easily preserved.
 *   It is subdivided into three eras: Paleozoic, Mesozoic, and Cenozoic.

 #### Paleozoic Era (541 to 251.9 million years ago)

 *   The Paleozoic Era is divided into seven periods:
  *   Cambrian
  *   Ordovician
  *   Silurian
  *   Devonian
  *   Mississippian
  *   Pennsylvanian
  *   Permian
 *   **Cambrian Period (541–485 million years ago):**
  *   Marked by the Cambrian explosion, a rapid diversification of life.
  *   Most marine animal phyla evolved.
  *   The most common organisms were armored arthropods like trilobites.
 *   **Ordovician Period (485–444 million years ago):**
  *   Many biological classes still prevalent today evolved, including primitive fish, cephalopods, and corals.
  *   The first arthropods colonized the land.
 *   **Silurian Period (444–419 million years ago):**
  *   Mass evolution of fish occurred.
  *   The first freshwater fish evolved.
  *   Fully terrestrial life evolved, including early arachnids, fungi, and centipedes.
  *   Vascular plants began to colonize the land.
 *   **Devonian Period (419.2 to 358.9 MYA):**
  *   Part of the "Age of Fishes."
 *   **Mississippian Period (358.9 to 323.2 MYA):**
  *   The first fully terrestrial tetrapods appeared.
  *   Tetrapods evolved into amphibians and amniotes.
 *   **Pennsylvanian Period (323.2 to 298.9 MYA):**
  *   Reptiles (amniotes) could live and reproduce entirely on land.
  *   Extensive forests existed.
 *   **Permian Period (298.9 to 251.9 MYA):**
  *   All continents came together to form the supercontinent Pangaea.
  *   Reptiles flourished in the dry climate.
  *   The Permian extinction, the greatest mass extinction in Earth's history, ended this era.

 #### Mesozoic Era (251.9 to 66.0 million years ago)

 *   The Mesozoic Era is divided into three periods:
  *   Triassic
  *   Jurassic
  *   Cretaceous
 *   Often referred to as the "Age of Reptiles" or "Age of Dinosaurs."
 *   Pangaea began separating into the modern continents.
 *   **Triassic Period (251.9 to 201.3 MYA):**
  *   The first dinosaurs appeared.
  *   Modern gymnosperms like conifers appeared.
 *   **Jurassic Period (201.3 to 145.0 MYA):**
  *   Dinosaurs diversified.
  *   The first birds evolved.
 *   **Cretaceous Period (145.0 to 66.0 MYA):**
  *   The first flowering plants (angiosperms) appeared and diversified.
  *   The Mesozoic Era ended with a major extinction event that led to the disappearance of most dinosaurs.

 #### Cenozoic Era (66 million years ago to today)

 *   The Cenozoic Era is divided into three periods:
  *   Paleogene
  *   Neogene
  *   Quaternary
 *   Characterized by the dominance of mammals, insects, birds, and flowering plants.
 *   Often referred to as the "Age of Mammals."
 *   **Paleogene Period (66.0 to 23.0 MYA):**
  *   Mammals diversified rapidly after the extinction of the dinosaurs.
  *   The first rodents, armadillos, and primitive primates appeared.
 *   **Neogene Period (23.0 to 2.58 MYA):**
  *   Some of the finest fossils are found in this period.
 *   **Quaternary Period (2.58 MYA to Today):**
  *   Massive ice sheets advanced and retreated across North America.
  *   The Pleistocene Ice Ages occurred.
  *   Modern humans evolved.


In [18]:
ls -la

total 32
drwxr-xr-x 3 root root  4096 Apr 20 13:09 [0m[01;34m.[0m/
drwxr-xr-x 5 root root  4096 Apr 20 11:10 [01;34m..[0m/
-rw-r--r-- 1 root root  5457 Apr 20 12:02 agent_output_20250420120242.md
-rw-r--r-- 1 root root 11147 Apr 20 13:09 agent_output_20250420130945.md
drwxr-xr-x 2 root root  4096 Apr 20 11:10 [01;34m.virtual_documents[0m/
