## Example Notebook: Using VideoAgent

### üõ†Ô∏è Setup Instructions

Before running this notebook:

- Ensure you have created an `.env` file in the **same directory** as this notebook. It must contain all required environment variables (e.g., API keys or configuration values expected by `VideoAgent`).
- Make sure all required libraries are installed by running:

  ```bash
  pip install -r requirements.txt
  ```

### About

**VideoAgent** operates in two key stages:

1. **Video Retrieval** ‚Äì Given a user query, the agent retrieves relevant videos from a pre-ingested **Azure AI Search index**. This search ensures that only contextually relevant videos are passed on for deep analysis.

2. **Video Question Answering (QA)** ‚Äì After retrieval, the agent uses the **Multi-Modal Critical Thinking (MMCT)** framework ([arxiv.org/abs/2405.18358](https://arxiv.org/abs/2405.18358)) to generate a high-quality answer. MMCT involves two agents:

   - **Planner**: Drives the reasoning process using a structured toolchain, generating an initial response.
   - **Critic (optional)**: Analyzes the planner's output and, if needed, provides feedback that prompts an improved final answer.

> **Note:** The critic agent is enabled by default. You can disable it by setting `use_critic_agent=False` during initialization.  
> **Disabling the critic agent skips the feedback loop and may reduce the accuracy of the final response.**

---

### Tool Workflow

Unlike independent tool selection, **VideoAgent uses a fixed pipeline** of tools that work collaboratively during the QA stage. These tools are automatically orchestrated by the planner:

- `GET_VIDEO_DESCRIPTION` ‚Äì Extracts the full transcript and a high-level visual summary of the video.
- `QUERY_VIDEO_DESCRIPTION` ‚Äì Given a query, this tool identifies **three timestamps** in the transcript that are most relevant.
- `QUERY_FRAMES_COMPUTER_VISION` _(optional)_ ‚Äì Uses **Computer Vision** to return **three additional timestamps** related to the visual content of the query.
- `QUERY_VISION_LLM` ‚Äì Uses **vision LLM** to inspect video frames around the identified timestamps and generate a detailed response grounded in both visual and textual understanding.

By default, all tools are used in a coordinated pipeline. You can disable **only** the Computer Vision tool by setting `use_computer_vision_tool=False` during agent initialization.

### Importing Libaries


In [None]:
from mmct.video_pipeline import VideoAgent
import nest_asyncio
nest_asyncio.apply()

In [None]:
# Test the configuration first
try:
    from mmct.config.settings import MMCTConfig
    config = MMCTConfig()
    print("‚úÖ Configuration loaded successfully")
    print(f"LLM Provider: {config.llm.provider}")
    print(f"LLM Endpoint: {config.llm.endpoint}")
    print(f"LLM Deployment: {config.llm.deployment_name}")
    print(f"Embedding Provider: {config.embedding.provider}")
    print(f"Embedding Endpoint: {config.embedding.endpoint}")
    print(f"Embedding Deployment: {config.embedding.deployment_name}")
except Exception as e:
    print(f"‚ùå Configuration failed: {e}")
    import traceback
    traceback.print_exc()

# Create VideoAgent instance
video_agent = VideoAgent(
    query="",
    index_name="",
    top_n=2,
    use_computer_vision_tool=False,
    use_critic_agent=True,
    stream=True,
)

# Run the agent
response = await video_agent()
print("VideoAgent executed successfully!")

In [None]:
# Display the response
print(response)