# 🤖 Multi-Agent Deep Research System

This notebook showcases how to harness the combined power of **AutoGen** and **LangChain** tools to automate and elevate deep research workflows. At its core, the system coordinates a network of specialized agents—each executing a distinct role in the research and report generation workflow. Together, these agents collect data, analyze findings, and produce polished, insight-driven reports.

[Open in Colab](https://colab.research.google.com/github/miztiik/taars/blob/master/notebooks/deepresearch_w_autogen_langchain_tools.ipynb) <a href="https://colab.research.google.com/github/miztiik/taars/blob/master/notebooks/deepresearch_w_autogen_langchain_tools.ipynb" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🧠 Agent Roles

- **🧭 Planner**: Defines research scope, objectives, and success criteria
- **🔍 Researcher**: Gathers evidence using `wiki_search`, `web_research` tools with intelligent source selection
- **🧪 Critic**: Reviews plans and outputs for quality and completeness using approval signals
- **✍️ Editor**: Formats final reports with proper citations and saves using `save_report` tool

## 🔄 System Architecture

```mermaid
flowchart TD
    A[User Query] --> B[🧭 Planner]
    B --> C{🧪 Critic Review}
    C -->|PLAN_APPROVED| D[🔍 Researcher]
    C -->|NEEDS_REFINEMENT| B
    D --> E{🧪 Critic Review}
    E -->|RESEARCH_APPROVED| F[✍️ Editor]
    E -->|NEEDS_REFINEMENT| D
    F --> G[Final Report]
```

## 🚀 Quick Start

### Prerequisites

Set environment variables:

```bash
# Required: Gemini API (or configure other models in cells below)
export GEMINI_API_KEY="your_api_key"
export GEMINI_MODEL_NAME="gemini-1.5-flash"
export GEMINI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/"
```

### Run Research Task

1. **Execute all cells** in sequence
2. **Modify the query** in the final cell:
   ```python
   output = asyncio.run(run_task("Your research question here"))
   ```
3. **Monitor progress** in real-time through cell outputs

## 📋 Example Queries

- **Financial Analysis**: `"Indian steel sector growth prospects in an era of US tariffs"`
- **Economic Research**: `"Government factors that improved Indian economy during Modi era"`
- **Tech Industry**: `"Are we witnessing an AI infrastructure bubble? GDP investment vs productivity gains"`
- **AI/ML Trends**: `"Current trends in tool usage during LLM training"`

## 📊 Outputs & Monitoring

| Output Type | Location                                          | Description                                            |
| ----------- | ------------------------------------------------- | ------------------------------------------------------ |
| **Reports** | `./reports/`                                      | Timestamped Markdown reports (auto-saved)              |
| **Logs**    | `./logs/YYYYMMDD_HHMM_deep_research_agent.log`    | Timestamped execution logs with token usage            |
| **State**   | `./YYYYMMDD_HHMM_[task_keywords]_team_state.json` | Task-specific conversation state for resume capability |

## 🔄 **Resume Functionality**

```python
# Auto-resume from most recent state file
await resume_from_saved_state()

# Resume from specific state file
await resume_from_saved_state("20250824_1430_green_hydrogen_viability_team_state.json")

# List all available state files
list_team_state_files()
```

## 📝 TODO & Roadmap

- [ ] **Specialized Models**: Different LLMs for different agent roles (planning vs research vs writing)
- [ ] **Semantic Depth Search**: Advanced content extraction with semantic similarity scoring
- [ ] **Source Verification**: Cross-reference validation and fact-checking workflows
- [ ] **Domain-Specific Tools**: Specialized research tools for finance, science, law, etc.
- [ ] **Agent Control Flow Logging**: Meaningful event logging for agent-to-agent handoffs
- [ ] **Organic Flow Orchestration**: Improved prompts for natural, adaptive conversation flow
- [ ] **Streaming UI**: Real-time progress visualization and intervention capability
- [ ] **Performance Metrics**: Research quality scoring and optimization analytics


In [None]:
%%capture --no-stderr

%pip install -qU ipykernel
%pip install -qU loguru
%pip install -qU python-dotenv

%pip install -qU autogen-agentchat
%pip install -qU autogen-ext
%pip install -qU langchain
%pip install -qU langchain-community
%pip install -qU wikipedia
%pip install -qU selenium unstructured
%pip install -qU lxml
%pip install -qU ddgs


In [None]:
## GOOGLE COLAB LINE WRAPPING
# https://stackoverflow.com/questions/58890109/line-wrapping-in-collaboratory-google-results

from IPython.display import HTML, display


def set_css():
    display(
        HTML(
            """
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  """
        )
    )


get_ipython().events.register("pre_run_cell", set_css)


In [None]:
# %load_ext autoreload
# %autoreload 2
# %aimport -langchain_community
# Automatically reload modules before executing code

# https://ipython.org/ipython-doc/3/config/extensions/autoreload.html


In [None]:
## GENERIC IMPORTS
import asyncio
import json
import os
from pathlib import Path
import re
import sys
import time
import random
from datetime import datetime
from typing import Any, Dict, List, Optional, Annotated, Tuple
from textwrap import dedent
from dotenv import load_dotenv

import nest_asyncio
import tenacity
from loguru import logger
from IPython.display import Markdown, display


In [None]:
## CONSTANTS

## LOG CONFIG
LOG_ROTATION_SIZE = "10 MB"
LOG_RETENTION_DAYS = "7 days"

## AUTOGEN CONFIG
CONVERSATION_BUFFER_SIZE = 10
TASK_TERMINATION_MAX_MESSAGES = 100
API_CALL_DELAY_SECONDS = int(
    os.environ.get("API_CALL_DELAY_SECONDS", "59")
)  # 59 seconds = 1.02 RPM
NON_API_EVENT_DELAY = 10  # Small delay for non-API events

## TOOL CONFIG
WIKI_MAX_RESULTS = 5
WIKI_MAX_CHARS = 5000
WIKIPEDIA_MAX_DOCS = 2
DDGS_MAX_RESULTS = 3
WEB_CONTENT_MAX_LENGTH = 15000
WEB_CONTENT_MIN_LENGTH = 300
WEB_BATCH_MAX_URLS = 5
SELENIUM_WINDOW_SIZE = "1920,1080"
BROWSER_USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36"
TITLE_SLICE_LENGTH = 50
TOP_RESULTS_COUNT = 3
TASK_MAX_MEANINGFUL_WORDS = 3
CRITIC_MAX_WORDS = 500
URL_FETCH_DELAY = 5


In [None]:
## LOGGING CONFIG
notebook_dir = Path.cwd()
log_dir = notebook_dir / "logs"

# Generate timestamped log filename
timestamp = datetime.now()
log_timestamp = timestamp.strftime("%Y%m%d_%H%M")
log_file = log_dir / f"{log_timestamp}_deep_research_agent.log"

log_dir.mkdir(exist_ok=True)

if not getattr(sys, "_loguru_configured", False):
    logger.remove()
    logger.add(
        str(log_file),
        level="DEBUG",
        rotation="10 MB",
        retention="7 days",
        compression="zip",
        enqueue=True,
    )
    logger.add(sys.stderr, colorize=True, level="WARNING")
    logger.add(sys.stdout, colorize=True, level="INFO")
    sys._loguru_configured = True

logger.info(f"✅ Logging configured successfully - Log file: {log_file.name}")


In [None]:
## IMPORTS
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.messages import StopMessage
from autogen_agentchat.conditions import MaxMessageTermination, TextMentionTermination
from autogen_core.model_context import BufferedChatCompletionContext
from autogen_core.tools import FunctionTool
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_agentchat.messages import TextMessage


from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain_community.document_loaders import WikipediaLoader
from langchain_community.document_loaders import SeleniumURLLoader
from ddgs import DDGS


In [None]:
## GEMINI MODEL CLIENT
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.models import ModelInfo

load_dotenv(os.path.join("..", ".env"))

# Confirm the API key is set
assert os.environ["GEMINI_API_KEY"], "GEMINI_API_KEY is not set"
assert os.environ["GEMINI_MODEL_NAME"], "GEMINI_MODEL_NAME is not set"
assert os.environ["GEMINI_BASE_URL"], "GEMINI_BASE_URL is not set"


gemini_model_info = ModelInfo(
    vision=False,
    function_calling=True,
    json_output=True,
    family=os.environ["GEMINI_MODEL_NAME"],
    structured_output=True,
)

gemini_model_client = OpenAIChatCompletionClient(
    model=os.environ["GEMINI_MODEL_NAME"],
    api_key=os.environ["GEMINI_API_KEY"],
    base_url=os.environ["GEMINI_BASE_URL"],
    model_info=gemini_model_info,
    max_retries=3,
    parallel_tool_calls=False,
)


In [None]:
## AZURE OPENAI MODEL CLIENT
# from autogen_ext.models.openai import AzureOpenAIChatCompletionClient

## Confirm the API key is set
# assert os.environ["AZURE_OAI_DEPLOYMENT"], "AZURE_OAI_DEPLOYMENT is not set"
# assert os.environ["AZURE_OAI_MODEL_NAME"], "AZURE_OAI_MODEL_NAME is not set"
# assert os.environ["AZURE_OAI_MODEL_VERSION"], "AZURE_OAI_MODEL_VERSION is not set"
# assert os.environ["AZURE_OAI_BASE_URL"], "AZURE_OAI_BASE_URL is not set"

# az_oai_model_client = AzureOpenAIChatCompletionClient(
#     azure_deployment=os.environ["AZURE_OAI_DEPLOYMENT"],
#     model=os.environ["AZURE_OAI_MODEL_NAME"],
#     api_version=os.environ["AZURE_OAI_MODEL_VERSION"],
#     azure_endpoint=os.environ["AZURE_OAI_BASE_URL"],

# )


In [None]:
## AGENT TOOLS


wiki_api = WikipediaAPIWrapper(
    top_k_results=WIKI_MAX_RESULTS, doc_content_chars_max=WIKI_MAX_CHARS
)
wiki_tool = WikipediaQueryRun(api_wrapper=wiki_api)


def wiki_full_search(input: str) -> str:
    """Search Wikipedia for a query and return maximum 2 results."""
    logger.info(f"🔍 wiki_full_search: Starting search for '{input}'")

    try:
        search_docs = WikipediaLoader(
            query=input, load_max_docs=WIKIPEDIA_MAX_DOCS
        ).load()

        if not search_docs:
            logger.warning(f"⚠️ wiki_full_search: No documents found for '{input}'")
            return f"No Wikipedia articles found for query: {input}"

        logger.info(
            f"✅ wiki_full_search: Found {len(search_docs)} documents for '{input}'"
        )

        formatted_search_docs = "\n\n---\n\n".join(
            [
                f'<Document source="{doc.metadata["source"]}" page="{doc.metadata.get("page", "")}"/>\n{doc.page_content}\n</Document>'
                for doc in search_docs
            ]
        )

        content_length = len(formatted_search_docs)
        logger.info(
            f"📊 wiki_full_search: Returning {content_length} characters for '{input}'"
        )

        return formatted_search_docs

    except Exception as e:
        logger.error(f"❌ wiki_full_search failed for '{input}': {str(e)}")
        return f"Error searching Wikipedia for '{input}': {str(e)}"


def wiki_search(q: str) -> dict:
    """Return structured output including text and source."""
    logger.info(f"🔍 wiki_search: Starting search for '{q}'")

    try:
        result = wiki_tool.run(q)

        if not result or result.strip() == "":
            logger.warning(f"⚠️ wiki_search: Empty result for '{q}'")
            return {"text": "No results found", "source": "Wikipedia", "query": q}

        result_length = len(result)
        logger.info(f"✅ wiki_search: Retrieved {result_length} characters for '{q}'")

        return {"text": result, "source": "Wikipedia", "query": q}

    except Exception as e:
        logger.error(f"❌ wiki_search failed for '{q}': {str(e)}")
        return {"text": f"Error: {str(e)}", "source": None, "query": q}


def web_search(q: str) -> dict:
    """`
    Performs a web search across multiple search backends (Google, Brave, Bing, Yahoo)
    using DuckDuckGo's API to gather relevant results for research queries.

    Args:
        q (str): The search query string. Should be specific and well-formed
                for optimal results.

    Returns:
        dict: A structured response containing:
            - text (list): List of search result dictionaries, each containing:
                * title: Article/page title
                * href: URL link
                * body: Content snippet/description
            - source (str): Search engine identifier ("DuckDuckGo")
            - query (str): The original search query
                for optimal results.
    """
    logger.info(f"🌐 web_search: Starting web search for '{q}'")
    try:
        results = DDGS().text(
            q,
            region="us-en",
            safesearch="off",
            max_results=DDGS_MAX_RESULTS,
            backend="google, brave, bing, yahoo",
        )

        if not results:
            logger.warning(f"⚠️ web_search: No results found for '{q}'")
            return {
                "text": [],
                "source": "DuckDuckGo",
                "query": q,
                "results_found": 0,
            }

        logger.info(f"✅ web_search: Found {len(results)} results for '{q}'")

        # Log sample of top results for debugging
        if results:
            top_titles = [
                r.get("title", "No title")[:TITLE_SLICE_LENGTH]
                for r in results[:TOP_RESULTS_COUNT]
            ]
            logger.debug(
                f"📋 web_search: Logging {TOP_RESULTS_COUNT} results for '{q}': {top_titles}"
            )

        return {
            "text": results,
            "source": "DuckDuckGo",
            "query": q,
            "results_found": len(results),
        }

    except Exception as e:
        logger.error(f"❌ web_search failed for '{q}': {str(e)}")
        return {"text": f"Error: {str(e)}", "source": None, "query": q}


def web_fetch(
    url: str, max_content_length: int = WEB_CONTENT_MAX_LENGTH
) -> Dict[str, Any]:
    """
    Fetch web page content using Selenium for JavaScript-heavy sites.

    Args:
        url: The URL to fetch content from
        max_content_length: Maximum content length to return. Defaults to WEB_CONTENT_MAX_LENGTH

    Returns:
        Dict with keys: content, url, status, error (if any)
    """
    logger.info(
        f"🌐 web_fetch: Starting fetch for {url} (max_length: {max_content_length})"
    )

    loader = None

    try:
        # Validate URL
        if not url or not url.startswith(("http://", "https://")):
            return {
                "content": "",
                "url": url,
                "status": "error",
                "error": "Invalid URL format",
            }

        # Configure Selenium loader
        loader = SeleniumURLLoader(
            urls=[url],
            continue_on_failure=True,
            arguments=_get_selenium_arguments(),
            browser="chrome",
        )

        # Load content
        logger.info(f"📥 web_fetch: Loading content from {url}")
        documents = loader.load()

        if not documents:
            logger.warning(f"⚠️ web_fetch: No documents loaded from {url}")
            return {
                "content": "",
                "url": url,
                "status": "error",
                "error": "No content could be loaded from URL",
            }

        # Process content
        content = documents[0].page_content.strip()
        original_length = len(content)

        logger.debug(f"📊 web_fetch: Loaded {original_length} characters from {url}")

        if len(content) < WEB_CONTENT_MIN_LENGTH:
            logger.warning(
                f"⚠️ web_fetch: Content too short ({len(content)} chars) from {url}"
            )
            return {
                "content": content,
                "url": url,
                "status": "warning",
                "error": "Content appears too short, may indicate loading issues",
            }

        # Truncate if too long
        if len(content) > max_content_length:
            content = content[:max_content_length] + "\n\n[Content truncated...]"
            logger.info(
                f"✂️ web_fetch: Content truncated from {original_length} to {max_content_length} chars for {url}"
            )

        logger.info(
            f"✅ web_fetch: Successfully fetched {len(content)} characters from {url}"
        )

        return {"content": content, "url": url, "status": "success", "error": None}

    except Exception as e:
        logger.error(f"❌ web_fetch failed for {url}: {str(e)}")
        return {
            "content": "",
            "url": url,
            "status": "error",
            "error": f"Failed to fetch content: {str(e)}",
        }

    finally:
        # CRITICAL: Clean up browser resources
        if loader and hasattr(loader, "web_driver") and loader.web_driver:
            try:
                loader.web_driver.quit()
                logger.debug(f"🧹 web_fetch: Browser cleaned up for {url}")
            except Exception as cleanup_error:
                logger.warning(f"⚠️ web_fetch: Browser cleanup failed: {cleanup_error}")


def _get_selenium_arguments() -> List[str]:
    """Get optimized Selenium browser arguments for reliability and stealth.
    Returns:
        List of browser arguments
    """
    return [
        # Core stability
        "--headless",
        "--no-sandbox",
        "--disable-dev-shm-usage",
        "--disable-gpu",
        f"--window-size={SELENIUM_WINDOW_SIZE}",
        # Performance
        "--disable-extensions",
        "--disable-plugins",
        "--disable-images",
        "--disable-javascript",
        # Timeout and connection settings
        "--timeout=60000",  # 60 second timeout
        "--page-load-strategy=eager",  # Don't wait for all resources
        "--disable-background-timer-throttling",
        "--disable-renderer-backgrounding",
        # Network optimizations
        "--aggressive-cache-discard",
        "--disable-background-networking",
        # Stealth and compatibility
        "--disable-blink-features=AutomationControlled",
        f"--user-agent={BROWSER_USER_AGENT}",
        # GDPR/Cookie banner handling
        "--disable-notifications",
        "--disable-infobars",
        "--disable-default-apps",
        # Security bypasses (use cautiously)
        "--ignore-certificate-errors",
        "--ignore-ssl-errors",
        "--allow-running-insecure-content",
    ]


def web_fetch_multiple(
    urls: List[str], max_content_length: int = WEB_CONTENT_MAX_LENGTH
) -> Dict[str, Any]:
    """
    Fetch content from multiple URLs efficiently.

    Args:
        urls: List of URLs to fetch
        max_content_length: Maximum content length per URL

    Returns:
        Dict with results for each URL and summary statistics
    """
    logger.info(f"🌐 web_fetch_multiple: Starting batch fetch for {len(urls)} URLs")

    if not urls or len(urls) > WEB_BATCH_MAX_URLS:
        logger.warning(
            f"⚠️ web_fetch_multiple: Invalid URL list - {len(urls) if urls else 0} URLs (max {WEB_BATCH_MAX_URLS})"
        )
        return {
            "results": [],
            "status": "error",
            "error": f"Invalid URL list (empty or too many URLs, max {WEB_BATCH_MAX_URLS})",
        }

    results = []
    success_count = 0

    logger.debug(
        f"📋 web_fetch_multiple: Processing URLs: {[f'{url[:TITLE_SLICE_LENGTH]}...' if len(url) > TITLE_SLICE_LENGTH else url for url in urls]}"
    )

    for i, url in enumerate(urls, 1):
        logger.debug(f"🔄 web_fetch_multiple: Processing URL {i}/{len(urls)}: {url}")

        result = web_fetch(url, max_content_length)
        results.append(result)

        if result["status"] == "success":
            success_count += 1
            logger.debug(f"✅ web_fetch_multiple: URL {i}/{len(urls)} successful")
        else:
            logger.debug(
                f"❌ web_fetch_multiple: URL {i}/{len(urls)} failed: {result.get('error', 'Unknown error')}"
            )

        # Brief delay to be respectful
        time.sleep(URL_FETCH_DELAY)

    logger.info(
        f"🏁 web_fetch_multiple: Completed batch - {success_count}/{len(urls)} successful"
    )

    return {
        "results": results,
        "total_urls": len(urls),
        "successful": success_count,
        "failed": len(urls) - success_count,
        "status": "completed",
    }


def web_research(
    query: str, max_content_length: int = WEB_CONTENT_MAX_LENGTH
) -> Dict[str, Any]:
    """
    Complete web research workflow: search + selective content fetching.

    Combines web_search and web_fetch_multiple to provide comprehensive
    research results with both search snippets and full content.
    Uses DDGS_MAX_RESULTS to determine how many URLs to fetch content from.

    Args:
        query: Search query string
        max_content_length: Maximum content length per fetched page

    Returns:
        Dict containing:
            - search_results: Original search results from web_search
            - fetched_content: Full content from top N URLs (N = DDGS_MAX_RESULTS)
            - summary: Aggregated statistics
            - errors: Any fetch failures
    """
    logger.info(
        f"🔬 web_research: Starting comprehensive research for '{query}' (will fetch top {DDGS_MAX_RESULTS} URLs)"
    )

    try:
        # Step 1: Get search results (automatically limited by DDGS_MAX_RESULTS)
        search_response = web_search(query)

        if not search_response.get("text") or search_response["results_found"] == 0:
            logger.warning(f"⚠️ web_research: No search results for '{query}'")
            return {
                "search_results": search_response,
                "fetched_content": [],
                "summary": {"total_searched": 0, "total_fetched": 0, "success_rate": 0},
                "errors": ["No search results found"],
            }

        # Step 2: Extract ALL URLs from search results (already limited by DDGS_MAX_RESULTS)
        search_results = search_response["text"]
        top_urls = [
            result.get("href") for result in search_results if result.get("href")
        ]

        if not top_urls:
            logger.warning(
                f"⚠️ web_research: No valid URLs found in search results for '{query}'"
            )
            return {
                "search_results": search_response,
                "fetched_content": [],
                "summary": {
                    "total_searched": len(search_results),
                    "total_fetched": 0,
                    "success_rate": 0,
                },
                "errors": ["No valid URLs in search results"],
            }

        # Step 3: Fetch full content from all URLs (already limited by DDGS_MAX_RESULTS)
        logger.info(
            f"📥 web_research: Fetching content from {len(top_urls)} URLs (max from DDGS_MAX_RESULTS={DDGS_MAX_RESULTS})"
        )
        fetch_response = web_fetch_multiple(top_urls, max_content_length)

        # Step 4: Calculate summary statistics
        successful_fetches = sum(
            1 for result in fetch_response["results"] if result["status"] == "success"
        )
        success_rate = (successful_fetches / len(top_urls)) * 100 if top_urls else 0

        summary = {
            "total_searched": len(search_results),
            "total_fetched": len(top_urls),
            "successful_fetches": successful_fetches,
            "success_rate": round(success_rate, 1),
            "query": query,
            "max_urls_limit": DDGS_MAX_RESULTS,
        }

        # Step 5: Collect any errors
        errors = []
        if fetch_response["failed"] > 0:
            failed_urls = [
                r["url"] for r in fetch_response["results"] if r["status"] != "success"
            ]
            errors.append(
                f"Failed to fetch {fetch_response['failed']} URLs: {failed_urls[:3]}"
            )

        logger.info(
            f"✅ web_research: Completed for '{query}' - "
            f"{successful_fetches}/{len(top_urls)} fetches successful ({success_rate}%)"
        )

        return {
            "search_results": search_response,
            "fetched_content": fetch_response["results"],
            "summary": summary,
            "errors": errors,
        }

    except Exception as e:
        logger.error(f"❌ web_research failed for '{query}': {str(e)}")
        return {
            "search_results": {},
            "fetched_content": [],
            "summary": {"total_searched": 0, "total_fetched": 0, "success_rate": 0},
            "errors": [f"Research pipeline failed: {str(e)}"],
        }


def save_report(
    content: str, task_description: str, reports_dir: str = "reports"
) -> Dict[str, Any]:
    """
    Save timestamped Markdown report to disk with auto-generated filename.

    Args:
        content: Report content (plain text or Markdown). Auto-adds title if missing.
        task_description: Brief task description for filename generation.
        reports_dir: Output directory (default: "reports"). Created if missing.

    Returns:
        Dict with keys: status ("success"/"error"), filepath, filename, error

    Examples:
        save_report("# Analysis\n\nFindings...", "market research 2024")
        # → reports/20250824_1430_15_market_research.md

        save_report(draft_text, "AI impact assessment")
        # → reports/20250824_1431_22_ai_impact.md

    Filename: YYYYMMDD_HHMM_SS_key_words.md (auto-numbered if exists)
    """
    try:
        # Ensure reports directory exists
        reports_path = Path(reports_dir)
        reports_path.mkdir(exist_ok=True)

        # Generate timestamped filename with seconds for better uniqueness
        timestamp = datetime.now()
        date_time = timestamp.strftime("%Y%m%d_%H%M_%S")
        task_name = _extract_task_name(task_description)

        filename = f"{date_time}_{task_name}.md"
        filepath = reports_path / filename

        # Handle filename conflicts with counter
        counter = 1
        while filepath.exists():
            filename = f"{date_time}_{task_name}_{counter}.md"
            filepath = reports_path / filename
            counter += 1

        # Format content with title if needed
        formatted_content = _format_content(content, task_description, timestamp)

        # Save to disk using Path object
        filepath.write_text(formatted_content, encoding="utf-8")
        logger.info(f"📄 Report saved: {filepath}")

        return {
            "status": "success",
            "filepath": str(filepath),
            "filename": filename,
            "timestamp": timestamp.isoformat(),
        }

    except Exception as e:
        error_msg = f"Failed to save report: {str(e)}"
        logger.error(f"❌ {error_msg}")
        return {"status": "error", "error": error_msg, "task": task_description}


def _extract_task_name(task: str) -> str:
    """
    Extract 2-3 meaningful words from task description for filename.

    Args:
        task: The task description string

    Returns:
        Underscore-separated words suitable for filename
    """
    # Clean special characters and normalize
    clean_task = re.sub(r"[^\w\s]", " ", task.lower())
    words = [w for w in clean_task.split() if len(w) > 2]

    # Filter common stop words
    stop_words = {
        "the",
        "and",
        "for",
        "with",
        "from",
        "about",
        "into",
        "through",
        "during",
        "before",
        "after",
        "above",
        "below",
        "over",
        "under",
    }
    meaningful = [w for w in words if w not in stop_words][:TASK_MAX_MEANINGFUL_WORDS]

    return "_".join(meaningful) if meaningful else "report"


def _format_content(content: str, task: str, timestamp: datetime) -> str:
    """
    Format content with title and timestamp if needed.

    Args:
        content: Raw content to format
        task: Task description for title generation
        timestamp: When the report was created

    Returns:
        Formatted Markdown content
    """
    # If content already has a Markdown title, use as-is
    if content.strip().startswith("#"):
        return content

    # Add title and timestamp for plain text content
    formatted_title = f"# Report: {task}"
    timestamp_line = f"*Generated: {timestamp.strftime('%Y-%m-%d %H:%M')}*"

    return f"{formatted_title}\n\n{timestamp_line}\n\n{content}"


In [None]:
## REGISTER FUNCTIONS AS TOOLS

wiki_search_tool = FunctionTool(
    func=wiki_search,
    name="wiki_search",
    description="Search Wikipedia for information.",
)
web_search_tool = FunctionTool(
    func=web_search,
    name="web_search",
    description="Search the given query and get list of URLs.",
)
web_fetch_tool = FunctionTool(
    func=web_fetch, name="web_fetch", description="Fetch content from a web page."
)
web_fetch_multiple_tool = FunctionTool(
    func=web_fetch_multiple,
    name="web_fetch_multiple",
    description="Fetch content from multiple web pages.",
)
web_research_tool = FunctionTool(
    func=web_research,
    name="web_research",
    description="Complete web research: search + fetch full content from top results.",
)
save_report_tool = FunctionTool(
    func=save_report, name="save_report", description="Save the research report."
)


In [None]:
## UTILITY FUNCTIONS

# Global token accounting - managed by log_event_enhanced
_token_accounting = {
    "total_prompt_tokens": 0,
    "total_completion_tokens": 0,
    "total_tokens": 0,
    "llm_call_count": 0,
    "processed_event_ids": set(),
}


def reset_token_accounting():
    """Reset global token accounting for new task."""
    global _token_accounting
    _token_accounting = {
        "total_prompt_tokens": 0,
        "total_completion_tokens": 0,
        "total_tokens": 0,
        "llm_call_count": 0,
        "processed_event_ids": set(),
    }
    logger.debug("🔄 Token accounting reset for new task")


def get_token_stats():
    """Get current token statistics."""
    return _token_accounting.copy()


def determine_event_delay(event, event_count: int) -> float:
    """
    Determine appropriate delay for different event types based on API usage patterns.

    Logic: API-bound operations (model calls, tool calls, agent selections) get longer delay
    to respect rate limits. Internal processing events get minimal delay.

    Args:
        event: The event object from the team stream
        event_count: Total number of events processed (for logging context)

    Returns:
        float: Delay in seconds
    """
    # Check for model/LLM API usage (includes token-consuming calls)
    has_model_usage = hasattr(event, "models_usage") and event.models_usage

    # Check for tool API calls
    has_tool_calls = hasattr(event, "tool_calls") and event.tool_calls

    # Check for agent selection logic (often uses LLM reasoning but may not show tokens)
    event_type = getattr(event, "type", type(event).__name__)
    is_agent_selection = (
        "select" in event_type.lower() or "speaker" in event_type.lower()
    )

    # API-bound operations get longer delay for rate limiting
    if has_model_usage or has_tool_calls or is_agent_selection:
        return API_CALL_DELAY_SECONDS
    else:
        return NON_API_EVENT_DELAY


async def _save_team_state(task_text: str) -> Optional[str]:
    """
    Save the current team conversation state to a timestamped JSON file.

    Args:
        task_text: The original task description for filename generation

    Returns:
        Optional[str]: The saved filename if successful, None if failed
    """
    try:
        # Generate timestamped filename
        timestamp = datetime.now()
        date_time = timestamp.strftime("%Y%m%d_%H%M")
        task_name = _extract_task_name(task_text)
        filename = f"{date_time}_{task_name}_team_state.json"

        # Get conversation history from team
        if hasattr(team, "message_thread") and team.message_thread:
            # Convert messages to serializable format
            messages = []
            for msg in team.message_thread:
                msg_dict = {
                    "source": getattr(msg, "source", "Unknown"),
                    "content": getattr(msg, "content", str(msg)),
                    "type": type(msg).__name__,
                    "timestamp": timestamp.isoformat(),
                }
                messages.append(msg_dict)

            # Save state
            state_data = {
                "task": task_text,
                "timestamp": timestamp.isoformat(),
                "messages": messages,
                "token_stats": get_token_stats(),
                "message_count": len(messages),
            }

            # Write to file
            filepath = Path(filename)
            with open(filepath, "w", encoding="utf-8") as f:
                json.dump(state_data, f, indent=2, ensure_ascii=False)

            logger.info(f"💾 Team state saved: {filename}")
            return filename

        else:
            logger.warning("⚠️ No conversation history available to save")
            return None

    except Exception as e:
        logger.error(f"❌ Failed to save team state: {str(e)}")
        return None


def list_team_state_files() -> List[Dict[str, Any]]:
    """
    List all available team state files in the current directory.

    Returns:
        List of dicts with file info: filename, timestamp, task_preview, size
    """
    try:
        state_files = []
        current_dir = Path(".")

        # Find all team state JSON files
        for file_path in current_dir.glob("*_team_state.json"):
            try:
                # Get file stats
                stat = file_path.stat()

                # Try to read basic info from file
                with open(file_path, "r", encoding="utf-8") as f:
                    data = json.load(f)

                file_info = {
                    "filename": file_path.name,
                    "timestamp": data.get("timestamp", "Unknown"),
                    "task_preview": (
                        data.get("task", "")[:100] + "..."
                        if len(data.get("task", "")) > 100
                        else data.get("task", "")
                    ),
                    "message_count": data.get("message_count", 0),
                    "size_kb": round(stat.st_size / 1024, 1),
                    "modified": datetime.fromtimestamp(stat.st_mtime).strftime(
                        "%Y-%m-%d %H:%M"
                    ),
                }
                state_files.append(file_info)

            except Exception as e:
                logger.warning(
                    f"⚠️ Could not read state file {file_path.name}: {str(e)}"
                )
                continue

        # Sort by timestamp (newest first)
        state_files.sort(key=lambda x: x["timestamp"], reverse=True)

        logger.info(f"📋 Found {len(state_files)} team state files")
        return state_files

    except Exception as e:
        logger.error(f"❌ Failed to list team state files: {str(e)}")
        return []


async def resume_from_saved_state(filename: Optional[str] = None) -> bool:
    """
    Resume conversation from a saved team state file.

    Args:
        filename: Specific state file to resume from. If None, uses most recent.

    Returns:
        bool: True if successfully resumed, False otherwise
    """
    try:
        if filename is None:
            # Find most recent state file
            state_files = list_team_state_files()
            if not state_files:
                logger.warning("⚠️ No team state files found to resume from")
                return False
            filename = state_files[0]["filename"]

        # Load the state file
        filepath = Path(filename)
        if not filepath.exists():
            logger.error(f"❌ State file not found: {filename}")
            return False

        with open(filepath, "r", encoding="utf-8") as f:
            state_data = json.load(f)

        # Reset current token accounting
        reset_token_accounting()

        # Restore token stats if available
        if "token_stats" in state_data:
            global _token_accounting
            _token_accounting.update(state_data["token_stats"])

        logger.info(f"✅ Successfully loaded state from {filename}")
        logger.info(f"📋 Task: {state_data.get('task', 'Unknown')[:100]}...")
        logger.info(f"💬 Messages: {state_data.get('message_count', 0)}")
        logger.info(f"🔄 Use run_task() to continue from where you left off")

        return True

    except Exception as e:
        logger.error(f"❌ Failed to resume from state file {filename}: {str(e)}")
        return False


async def safe_resume() -> bool:
    """
    Safely resume from the most recent state file with error handling.

    Returns:
        bool: True if resumed successfully, False otherwise
    """
    try:
        return await resume_from_saved_state()
    except Exception as e:
        logger.error(f"❌ Safe resume failed: {str(e)}")
        return False


def log_event_enhanced(event, event_count: int = 0):
    """
    Enhanced agent-aware logging with centralized token accounting and API call tracking.

    Args:
        event: The event object from the team stream
        event_count: Total number of events processed

    Returns:
        bool: True if this consumed tokens (measurable API usage), False otherwise
    """
    global _token_accounting

    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    event_id = getattr(event, "id", f"event_{event_count}")
    event_type = getattr(event, "type", type(event).__name__)
    consumed_tokens = False  # More accurate than "is_llm_call"

    # Agent-specific emojis and formatting
    agent_config = {
        "Planner": {"emoji": "🧭", "role": "PLANNER"},
        "Researcher": {"emoji": "🔍", "role": "RESEARCHER"},
        "Critic": {"emoji": "⚖️", "role": "CRITIC"},
        "Editor": {"emoji": "✍️", "role": "EDITOR"},
    }

    # Get agent info
    source = getattr(event, "source", "Unknown")
    config = agent_config.get(source, {"emoji": "🤖", "role": "UNKNOWN"})

    # CENTRALIZED TOKEN ACCOUNTING - only count once per unique event
    current_prompt = 0
    current_completion = 0
    current_total = 0

    if (
        hasattr(event, "models_usage")
        and event.models_usage
        and event_id not in _token_accounting["processed_event_ids"]
    ):
        current_prompt = event.models_usage.prompt_tokens
        current_completion = event.models_usage.completion_tokens
        current_total = current_prompt + current_completion

        # Update global running totals ONLY for new events
        _token_accounting["total_prompt_tokens"] += current_prompt
        _token_accounting["total_completion_tokens"] += current_completion
        _token_accounting["total_tokens"] += current_total

        # Mark this event as processed
        _token_accounting["processed_event_ids"].add(event_id)

        # This consumed measurable tokens
        _token_accounting["llm_call_count"] += 1
        consumed_tokens = True

        logger.debug(
            f"💰 Token accounting: Event {event_id} - prompt:{current_prompt}, completion:{current_completion}, total:{current_total}"
        )

    # Determine event classification for logging
    has_tool_calls = hasattr(event, "tool_calls") and event.tool_calls
    is_agent_selection = (
        "select" in event_type.lower() or "speaker" in event_type.lower()
    )

    # Better event classification
    if consumed_tokens:
        call_info = f"Token-Consuming Calls #{_token_accounting['llm_call_count']}"
    elif has_tool_calls:
        call_info = "Tool Execution"
    elif is_agent_selection:
        call_info = "Agent Selection Logic"
    else:
        call_info = "Internal Processing"

    line1 = f"Agent: {config['emoji']} {config['role']} | Type: {event_type} | {call_info} | Total Events: #{event_count}"
    line2 = ""  # Reserved for future functional info

    if consumed_tokens and current_total > 0:
        # Show current event tokens + running totals for token-consuming calls
        line3 = f"Tokens: Current[prompt:{current_prompt}, completion:{current_completion}, total:{current_total}] → Running Total: {_token_accounting['total_tokens']}"
    elif current_total > 0:
        # Edge case: has tokens but not marked as token-consuming calls (investigation needed)
        line3 = f"⚠️ Tokens: {current_total} (has tokens but not marked as token-consuming) → Running Total: {_token_accounting['total_tokens']}"
    else:
        # No tokens for this event (normal for non-token-consuming events) - but show running total
        line3 = f"Tokens: 0 (no new tokens) → Running Total: {_token_accounting['total_tokens']}"

    line4 = f"Timestamp: {timestamp} | Event ID: {event_id}"

    # Format the log output with enhanced visual structure
    separator_line = "━" * 80
    content_preview = str(getattr(event, "content", str(event)))

    logger.info(
        f"""
{separator_line}
Event:{event}
{separator_line}
{line1}
{line2}
{line3}
{line4}
{separator_line}
{content_preview}
{separator_line}
"""
    )

    return consumed_tokens


In [None]:
## SYSTEM PROMPTS

### References
# https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/claude-4-best-practices
# https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/prompt-engineering-for-openai%E2%80%99s-o1-and-o3-mini-reasoning-models/4374010
# https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
today_str = datetime.now().strftime("%Y-%m-%d")

PLANNER_SYSTEM_PROMPT = dedent(
    f"""Strategic Research Planner. Convert user queries into executable research specifications. Today is {today_str}.

CORE FUNCTION: Evaluate planning approaches and select the best research strategy.
- Analyze user query for multiple possible research approaches
- Identify the optimal research approach for the query, Consult Critic if needed
- Structure the chosen approach into executable research questions
- Set Scope and temporal boundaries for the research process
- PLANNING ONLY: No research execution or evidence assessment

PLAN TEMPLATE: Produce a methodical plan focusing on clear, practical steps.
## Identify Key Objectives: 
 - Clarify what questions each option aims to answer
 - Detail the data/info needed for evaluation
## Specifies Expected Outcomes  
 - Possible findings or results  
## Provide Evaluation Criteria
 - Metrics, benchmarks, or qualitative factors to compare options  
 - Criteria for success or viability  

OUTPUT SIGNALS:
- "PLAN_CREATED → @Critic" (initial)
- "PLAN_REVISED → @Critic" (after revision)  
- "PLAN_APPROVED" (ONLY after CRITIC "APPROVED")
- "PLAN_ABANDONED: [reason]" (after 5 attempts)

RESPONSE PROTOCOLS:
- After CRITIC "APPROVED: [...]" → Immediately respond with "PLAN_APPROVED"
- After CRITIC "REJECTED: [...]" → Revise plan → "PLAN_REVISED → @Critic"
- Never declare "PLAN_APPROVED" without explicit CRITIC approval
- Count revision attempts (max 5 before "PLAN_ABANDONED")

COMPLETION SIGNALS:
- MUST wait for CRITIC approval signal before proceeding
- MUST declare "PLAN_APPROVED" to enable Researcher handoff
"""
)

RESEARCHER_SYSTEM_PROMPT = dedent(
    f"""Senior Research Analyst. Execute targeted research using wiki_search and web_research tools with strict resource management. Today is {today_str}.

CORE FUNCTION:
- Execute research questions, gather evidence meeting specified thresholds
- Gather evidence, respecting the specified source thresholds and search heuristics.

SIMPLIFIED TOOL DECISION TREE:
- HISTORICAL/DEFINITION/ESTABLISHED FACT → wiki_search only
   Example: "What is", "History of", "Definition", "Background of"
- CURRENT DATA/RECENT TRENDS/MARKET DATA/NEWS → web_research (comprehensive search + content fetching)
   Example: Contains "2024", "2025", "current", "latest", "recent", "market", "stock price", "financial"

TOOL DETAILS:
- wiki_search: Returns Wikipedia summaries for established facts and background
- web_research: Performs complete workflow (web search + fetches top 3 URLs with full content). 

SEARCH HEURISTICS:
- Include year for current topics: "AI adoption 2025"
- Geographic specificity: "EV sales Europe 2025"
- For contested topics, prioritize diverse source perspectives over source volume
- Avoid paid sources, PDF, CSV, and other non-HTML content
- Source priority: Primary > Institutional > Peer-reviewed > News

EFFICIENCY RULES:
- SUFFICIENT_EVIDENCE: Stop when confidence threshold reached
- DIMINISHING_RETURNS: Automatically terminate if <20% new info across 2 searches
- CIRCULAR_DETECTION: Automatically terminate if 3+ searches share >70% term overlap.
- QUALITY_OVER_QUANTITY: 3 high-quality sources > 10 weak sources

OUTPUT:
- Group findings by research theme
- Include source URLs and publication dates
- Flag incomplete evidence areas
- NO quality assessment (EDITOR's role)

COMPLETION SIGNALS:
- Evidence thresholds met → "RESEARCH_COMPLETE → @Critic"
- RESEARCH_APPROVED (after critic approval)
- Data inaccessible → "RESEARCH_BLOCKED: [reason]"
- DIMINISHING_RETURNS (new info threshold not met)
"""
)

CRITIC_SYSTEM_PROMPT = dedent(
    f"""Domain-agnostic Quality Assurance Expert. Systematic evaluation of research deliverables before stakeholder handoff. Today is {today_str}.

CORE FUNCTION:
- PLAN_REVIEW: Assess feasibility, scope clarity, domain expertise, tool alignment, temporal boundaries, success metrics.
- RESEARCH_REVIEW: Check completeness, clarity, strategic value, temporal accuracy.
- FINAL_REPORT_REVIEW: Evaluate actionability, clarity, and stakeholder value.

QUALITY THRESHOLDS:
Apply context-appropriate rigor - preliminary plans need different depth than final strategic recommendations. Balance thoroughness with practical constraints.
- HIGH: Multiple credible sources, robust methodology
- MEDIUM: Single credible source or minor limitations
- LOW: Preliminary evidence or significant gaps
- INSUFFICIENT: Below minimum threshold for meaningful analysis

OUTPUT SIGNALS:
- "APPROVED: [key strengths that justify approval]"
- "NEEDS_REFINEMENT: [primary issues] → [suggested improvements]"
- "MAJOR_GAPS: [critical deficiencies] → [required additions]"

OPERATIONAL BOUNDARIES:
- MAX_RESPONSE: 300 words
- NO_RESEARCH: Quality assessment only
- REVIEW_ONLY: No unsolicited feedback"""
)

EDITOR_SYSTEM_PROMPT = dedent(
    f"""Senior Strategy Consultant. Transform research into executive-ready reports. Today is {today_str}.

**MISSION**: Create actionable business intelligence for C-suite decision-makers.

**WRITING APPROACH**: Write like a seasoned consultant - weave evidence strength naturally into your analysis. Strong data gets assertive language, limited data gets qualified language.

**EVIDENCE LANGUAGE**:
- Strong evidence: "Our analysis demonstrates...", "Data consistently show...", "Evidence confirms..."
- Moderate evidence: "Available data suggest...", "Analysis indicates...", "Current evidence points to..."
- Limited evidence: "Preliminary findings suggest...", "Early indicators point to...", "Initial analysis suggests..."

**STRUCTURE** (adapt based on content):
```
# [Strategic Title - Business Impact]

## Executive Summary
**Key Finding**: [Insight with natural evidence qualifier]
**Business Impact**: [Revenue/risk/opportunity] 
**Recommended Action**: [Specific next steps]

## Strategic Analysis
[Core findings - evidence strength flows naturally in narrative]

### Market Dynamics / Competitive Position / Implementation Strategy
[Use relevant sections - uncertainty woven into analysis naturally]

## Recommendations  
### Immediate (0-6 months) / Strategic (6-18 months)

## Key Uncertainties & Next Steps
[Natural discussion of data gaps and mitigation approaches]

---
**Sources**: [With publication dates]
```

**LANGUAGE**: Business-focused with natural evidence qualifiers. Every strategic claim includes source attribution and organic uncertainty assessment.

**COMPLETION TRIGGERS**: Generate report → save_report() → "REPORT_SAVED. TASK_COMPLETE. TERMINATE"""
)


In [None]:
## AGENT SETUP


planner = AssistantAgent(
    name="Planner",
    model_client=gemini_model_client,
    system_message=PLANNER_SYSTEM_PROMPT,
    description="Creates and adapts research plans, handles replanning when research hits obstacles",
)

researcher = AssistantAgent(
    name="Researcher",
    description="Expert research agent that strategically uses wiki_search and web_research tools to gather comprehensive and factual evidence.",
    model_client=gemini_model_client,
    tools=[wiki_search_tool, web_research_tool],
    max_tool_iterations=1,
    reflect_on_tool_use=True,
    system_message=RESEARCHER_SYSTEM_PROMPT,
)

critic = AssistantAgent(
    name="Critic",
    model_client=gemini_model_client,
    system_message=CRITIC_SYSTEM_PROMPT,
    description="Reviews and provides constructive criticism; outputs 'APPROVED: [...]' when ready.",
)


editor = AssistantAgent(
    name="Editor",
    description="Formats approved drafts with proper citations, adapts structure to content type.",
    model_client=gemini_model_client,
    tools=[save_report_tool],
    system_message=EDITOR_SYSTEM_PROMPT,
)

__SELECTOR_PROMPT = """Smart agent selector with task progression awareness.

AGENT CAPABILITIES:
{roles}

Available agents: 
{participants}

Conversation history: 
{history}

QUALITY CONTROL FLOW:
1. Query → Planner (creates plan)
2. Plan → Critic (feasibility review) 
3. Approved Plan → Researcher (execution)
4. Research → Critic (quality review)
5. Approved Research → Editor (final report)

ROUTING LOGIC:
- No plan → Planner
- "PLAN_CREATED" or "PLAN_REVISED" → Critic
- "PLAN_APPROVED" → Researcher
- Research complete → Critic
- "RESEARCH_APPROVED" → Editor
- "REPORT_SAVED" → TERMINATE

QUALITY GATES: All plans and research require CRITIC approval

Return only the next agent name."""


# Team configuration with constants
max_messages = TASK_TERMINATION_MAX_MESSAGES
txt_termination = TextMentionTermination("TERMINATE")
termination_condition = (
    MaxMessageTermination(max_messages=max_messages) | txt_termination
)

# Build SelectorGroupChat

model_context = BufferedChatCompletionContext(buffer_size=CONVERSATION_BUFFER_SIZE)

team = SelectorGroupChat(
    name="Deep Research Team",
    description="A team of specialized agents working together to conduct deep research.",
    model_context=model_context,
    participants=[planner, researcher, critic, editor],
    model_client=gemini_model_client,
    selector_prompt=__SELECTOR_PROMPT,
    termination_condition=termination_condition,
    emit_team_events=True,
)


In [None]:
## SETUP TASK RUN


# @tenacity.retry(
#     wait=tenacity.wait_exponential(multiplier=1, min=60, max=120),
#     stop=tenacity.stop_after_attempt(2),
#     retry=tenacity.retry_if_exception_type(Exception),
# )
async def run_task(task_text: str):
    """
    Execute a multi-agent research task with robust termination handling.

    All token accounting is handled by log_event_enhanced().
    Uses determine_event_delay() utility for clean delay logic.

    Args:
        task_text (str): The research task description

    Returns:
        str | None: Final report content or None if task incomplete
    """
    final_report = None
    task_completed = False

    # Simple local variables
    event_count = 0
    tool_call_count = 0

    # Reset token accounting for new task
    reset_token_accounting()

    # Streamlined termination signals - agents handle termination explicitly
    termination_signals = [
        "TERMINATE",
        "REPORT_SAVED",
        "TASK_COMPLETE",
    ]

    try:
        logger.info(f"🚀 Starting task: {task_text}\n\n")

        async for event in team.run_stream(task=task_text):

            # Increment total event counter
            event_count += 1

            # Centralized logging with token accounting - returns if this consumed tokens
            consumed_tokens = log_event_enhanced(event, event_count)

            # Track tool calls for statistics only
            if hasattr(event, "tool_calls") and event.tool_calls:
                tool_call_count += len(event.tool_calls)

            # Check for termination signals in message content
            if hasattr(event, "content") and isinstance(event.content, str):
                content_upper = event.content.upper()

                # Check for any termination signal
                for signal in termination_signals:
                    if signal.upper() in content_upper:
                        final_report = event.content.split(signal, 1)[0].strip()
                        task_completed = True

                        # Get final stats from token accounting
                        stats = get_token_stats()
                        logger.info(
                            f"\n\n✅ Task completed with '{signal}' signal after {event_count} total events ({stats['llm_call_count']} token-consuming calls)"
                        )
                        logger.info(
                            f"📊 Final Token Usage - Total: {stats['total_tokens']} (prompt: {stats['total_prompt_tokens']}, completion: {stats['total_completion_tokens']})"
                        )
                        break

                if task_completed:
                    break

            # Apply simplified delay logic using utility function
            delay = determine_event_delay(event, event_count)

            if delay == API_CALL_DELAY_SECONDS:
                stats = get_token_stats()
                logger.info(
                    f"\n\n⏳ API-bound operation detected (event #{event_count}), waiting {delay} seconds. (Token-consuming calls: {stats['llm_call_count']})\n\n"
                )
            else:
                logger.debug(
                    f"\n\n💨 Internal processing event #{event_count}, brief pause ({delay}s)\n\n"
                )

            await asyncio.sleep(delay)

        # Log final totals and statistics
        if not task_completed:
            stats = get_token_stats()
            logger.warning(
                f"⚠️ Task reached end of stream without clear termination after {event_count} events ({stats['llm_call_count']} token-consuming calls)"
            )

        # Final statistics logging
        stats = get_token_stats()
        logger.info(
            f"""
📊 TASK COMPLETION STATISTICS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Task Completed: {task_completed}
🔢 Total Events: {event_count}
🤖 Token-Consuming API Calls: {stats['llm_call_count']}
🔧 Tool Calls: {tool_call_count}
💰 Tokens Used: {stats['total_tokens']} (prompt: {stats['total_prompt_tokens']}, completion: {stats['total_completion_tokens']})
🔄 Processed Unique Events: {len(stats['processed_event_ids'])}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
"""
        )

        return final_report if task_completed else None

    except Exception as e:
        logger.error(f"\n\n❌ Exception during task execution: {e}")
        stats = get_token_stats()
        logger.info(
            f"📊 Error at Event #{event_count} (Token-consuming calls: {stats['llm_call_count']}), Token Usage: {stats['total_tokens']} (prompt: {stats['total_prompt_tokens']}, completion: {stats['total_completion_tokens']})"
        )
        raise

    finally:
        await _save_team_state(task_text)


In [None]:
## RUN TASK
nest_asyncio.apply()

if __name__ == "__main__":

    __SAMPLE_QUERY = [
        "Indian steel sector growth post modernization and growth prospects in an era of US tariffs and reduce government protection through trade barriers and cheaper import options from China",
        "How government and governance factors improved economy and lives of indians during Modi and Pre-Modi starting from 1991",
        "Sectoral growth based on cyclics for 2025 and macro economic pressure and trade tariffs and uncertainty, which sectors are best poised for maximum investment returns in terms of % for the next year for a moderate to average risk profile investments",
        "Hyperscaler investments in data centers and cloud infrastructure for AI growth is not matching the proposed productivity gains in GDP. Are we witnessing a bubble? Is % of global GDP being invested in AI infrastructure matches the productivity gain percentages?",
        "If neural networks are foundation of LLMs and based on the human brain; Are LLMs given tools during training? Humans learn with tool usage, What are current trends on tool usage in LLM training?",
        f"Today is {today_str}. Analyze stock price performance of Nvidia in the past month, compare it with top 3 listed POWER Producers in India.",
        "Evaluating the long-term viability of green hydrogen as a baseload power source in India given current electrolyzer costs, renewable energy tariffs, grid stability constraints, and projected policy support through 2035",
        "Assessing whether India’s 2025-30 urban housing shortage can be resolved through large-scale 3D-printed construction without triggering systemic risk in NBFC and banking balance sheets exposed to real estate credit",
        "Quantifying the impact of EU CBAM (Carbon Border Adjustment Mechanism) on India’s MSME-dominated textile export clusters in Tiruppur and Surat, including cascading effects on informal employment and regional GDP",
        "Mapping supply-chain chokepoints for critical rare-earth elements (Neodymium, Dysprosium) essential for India’s EV and renewable energy targets, and evaluating geopolitical fallback strategies if China restricts exports",
        "Determining whether India’s Unified Payments Interface (UPI) can scale to serve as the backbone for a sovereign digital currency (CBDC) while preserving offline transaction capability and financial inclusion in rural hinterlands",
        "Analyzing if the projected 2025-30 growth in India’s domestic semiconductor consumption can justify the capital intensity of new fabs without sustained government subsidies and tax incentives that crowd out social-sector spending",
        "Investigating whether India’s demographic dividend can offset the fiscal drag from rising health-care costs driven by lifestyle diseases, by modeling the combined effect of PM-JAY coverage expansion and private insurance penetration",
        "Examining the systemic risk posed to Indian mutual funds and pension portfolios from concentrated exposure to Adani Group entities under evolving ESG disclosure norms and potential climate-litigation scenarios",
        "Evaluating the comparative efficiency of India’s inland waterways versus dedicated freight corridors in reducing logistics costs for bulk commodities, while accounting for seasonal monsoon disruptions and inter-state regulatory friction",
        "Determining if open-source foundation-model ecosystems (e.g., BLOOM, LLaMA derivatives) can reduce India’s reliance on proprietary LLM APIs, and measuring the incremental TCO of sovereign GPU clusters versus foreign cloud dependency",
    ]
    try:

        output = asyncio.run(run_task(random.choice(__SAMPLE_QUERY)))
        if output:
            print("\n" + "=" * 60)
            print("🎯 FINAL REPORT")
            print("=" * 60)
            print(output)
        else:
            print("❌ Task did not complete successfully.")
    except Exception as e:
        logger.error(f"❌ Fatal error: {e}")
        print(f"❌ Execution failed: {e}")

    # Uncomment to test:
    # file_info = list_team_state_files()
    # await resume_from_saved_state("20250824_1430_green_hydrogen_viability_team_state.json")
    # await safe_resume()


In [None]:
## TEAM STATE MANAGEMENT EXAMPLES

# Example: List all available team state files
print("📋 Available Team State Files:")
print("=" * 50)
file_info = list_team_state_files()

if file_info:
    for info in file_info:
        print(f"📄 {info['filename']}")
        print(f"   📅 Created: {info['timestamp']}")
        print(f"   💬 Messages: {info['message_count']}")
        print(f"   📝 Task: {info['task_preview']}")
        print(f"   💾 Size: {info['size_kb']} KB")
        print()
else:
    print("No team state files found.")

print("\n🔄 Resume Functions Available:")
print("=" * 50)
print("• await resume_from_saved_state() - Resume from most recent state file")
print("• await resume_from_saved_state('filename.json') - Resume from specific file")
print("• await safe_resume() - Resume with error handling")

# Example usage (commented out - uncomment to use):
# await resume_from_saved_state()  # Resume from most recent
# await resume_from_saved_state("20250824_1430_green_hydrogen_viability_team_state.json")  # Specific file
# await safe_resume()  # Safe resume with error handling


> **Note:** All values are approximate and reflect estimates as of **August 2025**. Word, token, sentence, and character counts can vary by writing style and tokenization method.

- Tokens calculated as `words ÷ 0.75`.
- Sentences calculated as `words ÷ 15–20`.
- Characters calculated as `words × 4–5`.

| Article Type / Measure         | Approx Words | Approx Sentences | Approx Tokens | Approx Characters | Notes                               | Suggested Summary (Tokens / Chars)     |
| ------------------------------ | ------------ | ---------------- | ------------- | ----------------- | ----------------------------------- | -------------------------------------- |
| Short text (60 tokens)         | ~45          | ~2–3             | ~60           | ~180–225          | Short paragraph / social media post | ~10 tokens / ~40–50 chars              |
| Blog article                   | 1,000–1,800  | ~50–120          | ~1,300–2,400  | ~4,000–9,000      | Typical online blog length          | ~150–360 tokens / ~400–900 chars       |
| NYT Op-Ed                      | 800–1,200    | ~40–80           | ~1,050–1,600  | ~3,200–6,000      | Opinion/editorial piece             | ~100–240 tokens / ~320–600 chars       |
| Research article (web/journal) | 3,000–7,000  | ~150–470         | ~4,000–9,300  | ~12,000–35,000    | Standard journal or web publication | ~400–1,400 tokens / ~1,200–3,500 chars |
| arXiv preprint                 | 6,000–12,000 | ~300–800         | ~8,000–16,000 | ~24,000–60,000    | Preprint scientific paper           | ~800–1,600 tokens / ~2,400–6,000 chars |
