# Testing Recursive Agents MCP Tools

Using FastMCP's in-memory testing pattern with REAL companions and LLM calls.

## Understanding MCP Server Architecture & Streaming Implementations

### 🔄 The Complete Data Flow

When you run tests in this notebook, here's the complete journey of your data:

1. **Test Cell** → `await client.call_tool("draft", {"params": {...}})`
2. **FastMCP Client** → Wraps in JSON-RPC 2.0 protocol
3. **MCP Server** → Routes to appropriate tool (draft/critique/revise)
4. **Tool Function** → Processes with Recursive Agents framework
5. **Streaming** → Tokens flow back via `ctx.report_progress()`
6. **FastMCP Client** → Wraps response in `CallToolResult` object
7. **Test Cell** → Access via `result.data` dictionary

### 📊 Session Management

The MCP server maintains session state through `CompanionSessionManager`:

- **New Session**: When no `session_id` is provided, creates UUID v4
- **Session Persistence**: Each session maintains its own companion instance with:
  - Conversation history
  - Run logs (draft → critique → revision cycles)
  - Companion-specific context
- **TTL Management**: Sessions expire after 30 minutes of inactivity
- **Companion Types**: Each can have different sessions (generic, marketing, bug_triage, strategy)

### 🌊 Streaming Implementations: Draft vs Critique

#### **Draft Tool (Simple Streaming)**
```python
# Uses LangChain's native astream
async for chunk in chain.astream(...):
    if hasattr(chunk, 'content'):
        token = chunk.content
        await ctx.report_progress(progress=0, total=None, message=token)
```

**Characteristics:**
- ✅ Simple and direct
- ✅ Works well for straightforward streaming
- ⚠️ No backpressure handling
- ⚠️ No thread safety mechanisms
- ⚠️ Could overflow if tokens arrive too fast

#### **Critique Tool (Robust Streaming)**
```python
# Uses StreamingManager + BaseTokenCallback
stream_mgr = StreamingManager()  # Bounded queue
callback = BaseTokenCallback(stream_mgr, current_loop, phase="critique")
chain = comp.crit_chain.with_config(callbacks=[callback])

async with stream_mgr.stream_handler():
    run_task = asyncio.create_task(chain.ainvoke(...))
    async for tok in stream_mgr.stream_tokens():
        await ctx.report_progress(progress=0, total=None, message=tok)
```

**Characteristics:**
- ✅ Thread-safe (handles LangChain's worker threads)
- ✅ Backpressure handling (bounded queue)
- ✅ Graceful error recovery
- ✅ Proper cleanup with context manager
- ✅ Handles the `on_llm_end` callback properly
- 🔧 More complex but production-ready

### 🔌 JSON-RPC 2.0 & MCP Protocol

The Model Context Protocol uses JSON-RPC 2.0 for communication:

**Request Structure:**
```json
{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "draft",
    "arguments": {
      "params": {
        "query": "Your question here",
        "companion_type": "generic"
      }
    }
  },
  "id": 1
}
```

**Response Structure:**
```json
{
  "jsonrpc": "2.0",
  "result": {
    "content": [{
      "type": "text",
      "text": "{\"answer\": \"...\", \"session_id\": \"...\"}"
    }]
  },
  "id": 1
}
```

### 🔍 Understanding Test Results

When you see:
```
Result type: <class 'fastmcp.client.client.CallToolResult'>
Data type: <class 'dict'>
```

This means:
1. FastMCP wrapped your response in a `CallToolResult` object
2. Your actual data is in `result.data` as a dictionary
3. No JSON parsing needed - FastMCP already parsed it for you!

### ⚠️ The Streaming Warnings

Those `WARNING:root:Failed to validate notification: object NoneType can't be used in 'await' expression` messages are:
- **NOT** errors in your code
- **NOT** affecting functionality
- Just FastMCP's internal validation having issues with high-frequency progress updates
- Will NOT appear when using Claude Code or other production MCP clients

### 🎯 Key Takeaways

1. **Both streaming approaches work** - Choose based on your reliability needs
2. **Session management is automatic** - MCP clients handle it intelligently
3. **The protocol is working correctly** - Warnings are just test client noise
4. **Your implementation is production-ready** - Will work seamlessly with Claude Code

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
import os
api_key_status = "Loaded" if os.getenv("OPENAI_API_KEY") else "NOT FOUND - Check your .env file and environment."
print(f"OpenAI API Key status: {api_key_status}")

OpenAI API Key status: Loaded


In [3]:
#import asyncio
from fastmcp import Client
import sys
from pathlib import Path
import re
import json

# Add parent directory to path if needed
# (Remove this if you installed recursive-agents with pip install -e .)
project_root = Path().resolve().parent
if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

In [4]:
# Import the MCP server - check if file exists and has content
from recursive_agent_MCP.NEWST_server_v2 import mcp


### Understanding the Three Access Patterns

In the test below, we checked three ways to access the result:

**1. `if hasattr(result, 'data')` - ✅ Should WORK**
- FastMCP returns a `CallToolResult` object
- Your actual data is stored in `result.data` 
- This is why `result.data.get['draft']` and `result.data.get['session_id']` work

**2. `if hasattr(result, 'draft')` - ❌ Should NOT WORK**
- `result` doesn't have an `draft` attribute directly
- The draft is inside `result.data`, not on result itself
- This check would only work if FastMCP returned an object with `result.draft = "..."`

**3. `if isinstance(result, dict)` - ❌ Should NOT WORK**
- `result` is a `CallToolResult` object, not a dictionary
- You can't do `result['draft']` because result isn't a dict
- The dict is at `result.data`, not result itself

**Summary:** FastMCP consistently returns `CallToolResult` objects with your data in the `.data` attribute. Always use `result.data` to access your returned values.

In [5]:
async def test_draft_companion_types():
    """Test that different companions give different perspectives WITHOUT JSON loading."""

    query = "Our mobile app crashes when users upload photos"

    async with Client(mcp) as client:
        # Test each companion type
        for companion_type in ["generic"]: #, "marketing", "bug_triage", "strategy"]:
            print(f"\n{'='*60}")
            print(f"Testing {companion_type} companion...")

            result = await client.call_tool("draft", {
                "params": {
                    "query": query,
                    "companion_type": companion_type
                }
            })

            print("\n\n✅ Draft generated!")
            print(f"Result type: {type(result)}")
            
            # Access the data correctly
            if hasattr(result, 'data'):
                print(f"\nData type: {type(result.data)}")
                if isinstance(result.data, dict):
                    print(f"Session ID: {result.data.get('session_id', 'Not found')}")
                    print(f"Draft Preview (Should Be shown): {result.data.get('draft', 'Not found')[:150]}...")
            
            # Try direct access without JSON loading
            if hasattr(result, 'draft'):
                print("\nDirect access works!")
                print(f"Draft Preview (should NOT be shown): {result.draft[:150]}...")
            elif isinstance(result, dict):
                print("\nResult is a dict!")
                print(f"Draft preview (should NOT be shown): {result.get('draft', 'Not found')[:150]}...")

# Run the test
await test_draft_companion_types()


Testing generic companion...


✅ Draft generated!
Result type: <class 'fastmcp.client.client.CallToolResult'>

Data type: <class 'dict'>
Session ID: 35ef4bd3-9b29-4754-8357-3778a5a10742
Draft Preview (Should Be shown): The presented problem is that a mobile application experiences crashes specifically when users attempt to upload photos. 

Key issues identified in th...


In [6]:
async def test_draft_companion_types_json_load():
    """Same as above but messing around to inpsect all object types 
    and maunally loading json to truly understand dataclasses in the fastmcp ecosystem"""

    query = "Our mobile app crashes when users upload photos"

    async with Client(mcp) as client:
        # Test each companion type
        for companion_type in ["generic"]: #, "marketing", "bug_triage", "strategy"]:
            print(f"\n{'='*60}")
            print(f"Testing {companion_type} companion...")

            result = await client.call_tool("draft", {
                "params": {
                    "query": query,
                    "companion_type": companion_type
                }
            })

            print("\n\n✅ Draft generated!")
            print(f"Result type: {type(result)}")
            
            
            # Extract the response from CallToolResult
            if result.content and len(result.content) > 0:
                print(type(result))
                print(dir(result))
                print(f"\n\nlength of result.content:{len(result.content)}")
                content = result.content[0]
                print(type(content))
                print(content)
                #print(result.content)
                #print(content.text)

            if hasattr(content, 'text'):
                data = json.loads(content.text)
                print("\nDraft Response preview:")
                print("Keys and their types:")
                print(f"\nData type: {type(data)}\n")
                for key in data:
                    print(f"Key: {key}, Type: {type(key)}")
                print(f"Session ID: {result.data.get('session_id', 'Not found')}")
                
                print(f"\n{data['draft'][:150]}...")

# Run the test
await test_draft_companion_types_json_load()
#The key change is extracting the actual response data from the FastMCP CallToolResult
#object before trying to access ['draft'].


Testing generic companion...


✅ Draft generated!
Result type: <class 'fastmcp.client.client.CallToolResult'>
<class 'fastmcp.client.client.CallToolResult'>
['__annotations__', '__class__', '__dataclass_fields__', '__dataclass_params__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__firstlineno__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__match_args__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__replace__', '__repr__', '__setattr__', '__sizeof__', '__static_attributes__', '__str__', '__subclasshook__', '__weakref__', 'content', 'data', 'is_error', 'structured_content']


length of result.content:1
<class 'mcp.types.TextContent'>

Draft Response preview:
Keys and their types:

Data type: <class 'dict'>

Key: draft, Type: <class 'str'>
Key: session_id, Type: <class 'str'>
Session ID: df7f71ac-d8d6-49df-aa10-1871850cdbc9

The problem presented involve

In [7]:
async def test_critique_companion_types():
    """Test that different companions give different perspectives."""
    
    PREMADE_QUERY = "How do I implement a REST API?"

    PREMADE_DRAFT = """To implement a REST API, you should:
    1. Choose a framework (Express.js, Flask, FastAPI)
    2. Define your endpoints
    3. Implement CRUD operations
    4. Add authentication
    5. Handle errors properly
    6. Test your API"""
    
    async with Client(mcp) as client:
        # Test each companion type
        for companion_type in ["generic", "marketing"]: #, "bug_triage", "strategy"]:
            print(f"\n{'='*60}")
            print(f"Testing {companion_type} companion...")
            
            result = await client.call_tool("critique", {
                "params":{
                    "query": PREMADE_QUERY,
                    "draft": PREMADE_DRAFT,
                    "companion_type": companion_type
                }
            })
            
            print("\n\n✅ Critique generated!")
            # Access the data correctly
            if hasattr(result, 'data'):
                print(f"\nData type: {type(result.data)}")
                for key in result.data:
                    print(f"Key: {key}, Type: {type(key)}")
                print(result.data["critique"][:200])
                if isinstance(result.data, dict):
                    print(f"Session ID: {result.data.get('session_id', 'Not found')}")
                    print(f"Critique Preview: {result.data.get('critique', 'Not found')[:200]}...")
            
# Run the test
await test_critique_companion_types()


Testing generic companion...


✅ Critique generated!

Data type: <class 'dict'>
Key: critique, Type: <class 'str'>
Key: session_id, Type: <class 'str'>
1. **Lack of Explanation for Framework Selection**: The draft suggests choosing a framework but doesn’t provide guidance on how to make that choice. Including criteria for selection—such as project si
Session ID: ad1353e9-8c0c-4764-a94e-0801568dffd6
Critique Preview: 1. **Lack of Explanation for Framework Selection**: The draft suggests choosing a framework but doesn’t provide guidance on how to make that choice. Including criteria for selection—such as project si...

Testing marketing companion...


✅ Critique generated!

Data type: <class 'dict'>
Key: critique, Type: <class 'str'>
Key: session_id, Type: <class 'str'>
1. **Lack of Detail on Frameworks**: The draft mentions choosing a framework but does not provide any context or guidance on how to select one. Suggestion: Include a brief overview of each framework m
Session ID: afb4aba

In [None]:
async def test_draft_tool():
    """Test the draft tool with a real companion."""
    
    # Create in-memory client
    async with Client(mcp) as client:
        print("Testing draft tool...\n")
        
        # Progress handler to see streaming (fixed signature)
        def progress_handler(progress_token, progress, message):
            print(message, end="", flush=True)
        
        # Simple test query
        result = await client.call_tool("draft", {
            "params": {
                "query": "What are the main benefits of test-driven development?",
                "companion_type": "generic"
            }
        }, progress_handler=progress_handler)
        
        print("\n\n✅ Draft generated!")
        print(f"Result type: {type(result)}")
        
        # Access the data correctly
        if hasattr(result, 'data'):
            print(f"\nData type: {type(result.data)}")
            if isinstance(result.data, dict):
                print(f"Session ID: {result.data.get('session_id', 'Not found')}")
                print(f"Answer preview: {result.data.get('answer', 'Not found')[:200]}...")
        
        return result

# Run the test
draft_result = await test_draft_tool()

In [None]:
async def test_draft_tool():
    """Test the draft tool with a real companion."""
    
    # Create in-memory client
    async with Client(mcp) as client:
        print("Testing draft tool...\n")
        
        # Simple test query
        result = await client.call_tool("draft", {
            "params": {
                "query": "What are the main benefits of test-driven development?",
                "companion_type": "generic"
            }
        })

        
        print("✅ Draft generated!")
        print(f"Result type: {type(result)}")
        
        # Access the data correctly from CallToolResult
        if hasattr(result, 'data'):
            data = result.data
            if isinstance(data, dict):
                print(f"Session ID: {data.get('session_id', 'Not found')}")
                print("\nDraft content (first 200 chars):")
                print(f"{data.get('answer', 'Not found')[:200]}...")
                print(f"\nFull length: {len(data.get('answer', ''))} characters")
        
        return result

# Run the test
draft_result = await test_draft_tool()

In [None]:
async def test_draft_response_structure():
    """Test that draft tool returns correct MCP response structure."""
    
    async with Client(mcp) as client:
        print("Testing draft response structure...\n")
        
        result = await client.call_tool("draft", {
            "query": "Explain recursion in programming",
            "companion_type": "generic"
        })
        
        # Check structure
        assert isinstance(result, dict), "Response should be a dictionary"
        assert "answer" in result, "Response must contain 'answer' field"
        assert "session_id" in result, "Response must contain 'session_id' field"
        assert isinstance(result["answer"], str), "Answer must be a string"
        assert len(result["answer"]) > 0, "Answer cannot be empty"
        
        # Check session_id is UUID v4
        uuid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'
        assert re.match(uuid_pattern, result["session_id"], re.IGNORECASE), \
            f"Session ID '{result['session_id']}' is not a valid UUID v4"
        
        print("✅ All structural checks passed!")
        print("  - Response is a dict with 'answer' and 'session_id'")
        print(f"  - Answer is non-empty string ({len(result['answer'])} chars)")
        print(f"  - Session ID is valid UUID v4: {result['session_id']}")
        
        return result

# Run structural validation
draft_validation_result = await test_draft_response_structure()

## Validate Draft Response Structure

Testing that the MCP server returns the correct format according to the protocol

## Test Different Companion Types

In [6]:
async def test_companion_types():
    """Test that different companions give different perspectives."""
    
    query = "Our mobile app crashes when users upload photos"
    
    async with Client(mcp) as client:
        # Test each companion type
        for companion_type in ["generic", "marketing", "bug_triage", "strategy"]:
            print(f"\n{'='*60}")
            print(f"Testing {companion_type} companion...")
            
            result = await client.call_tool("draft", {
                "query": query,
                "companion_type": companion_type
            })
            
            print("\nResponse preview:")
            print(f"{result['answer'][:150]}...")
            
# Run the test
await test_companion_types()


Testing generic companion...


ToolError: Input validation error: 'params' is a required property

## Verify Session Persistence

## Edge Cases

In [None]:
async def test_edge_cases():
    """Test edge cases like empty queries."""
    
    async with Client(mcp) as client:
        # Test with very short query
        try:
            result = await client.call_tool("draft", {
                "query": "Hi",
                "companion_type": "generic"
            })
            print("✅ Short query handled successfully")
            print(f"Response length: {len(result['answer'])} chars")
        except Exception as e:
            print(f"❌ Short query failed: {e}")
            
        # Test with special characters
        try:
            result = await client.call_tool("draft", {
                "query": "What does 2+2=4 mean? Also, explain $100 < $200",
                "companion_type": "generic"  
            })
            print("\n✅ Special characters handled successfully")
        except Exception as e:
            print(f"\n❌ Special characters failed: {e}")

# Run edge case tests
await test_edge_cases()

## Summary

The draft tool:
- ✅ Creates sessions automatically
- ✅ Works with all companion types
- ✅ Maintains session context
- ✅ Handles various input types

Next: Test the critique tool...

# Simple streaming test
PREMADE_QUERY = "How do I implement a REST API?"
PREMADE_DRAFT = """To implement a REST API, you should:
1. Choose a framework (Express.js, Flask, FastAPI)
2. Define your endpoints
3. Implement CRUD operations
4. Add authentication
5. Handle errors properly
6. Test your API"""

async def test_critique_streaming():
    """Test critique streaming."""
    print("Testing critique streaming...\n")
    
    async with Client(mcp) as client:
        # Collect all tokens
        tokens = []
        
        # Progress handler - FastMCP expects these 3 parameters
        async def progress_handler(progress_token, progress, message):
            tokens.append(message)
            print(message, end="", flush=True)
        
        result = await client.call_tool("critique", {
            "params": {
                "query": PREMADE_QUERY,
                "draft": PREMADE_DRAFT,
                "companion_type": "generic"
            }
        }, progress_handler=progress_handler)
        
        print(f"\n\n✅ Streamed {len(tokens)} tokens")
        return result

# Run it
await test_critique_streaming()

In [None]:
# Simple test with pre-made inputs
PREMADE_QUERY = "How do I implement a REST API?"

PREMADE_DRAFT = """To implement a REST API, you should:
1. Choose a framework (Express.js, Flask, FastAPI)
2. Define your endpoints
3. Implement CRUD operations
4. Add authentication
5. Handle errors properly
6. Test your API"""

async def test_critique_simple():
    """Simple critique test with pre-made inputs to see streaming."""
    print("Testing critique streaming with pre-made inputs...\n")
    
    async with Client(mcp) as client:
        # Progress handler to see streaming
        def progress_handler(progress_token, progress, message):
            print(message, end="", flush=True)
        
        result = await client.call_tool("critique", {
            "params": {
                "query": PREMADE_QUERY,
                "draft": PREMADE_DRAFT,
                "companion_type": "generic"
            }
        }, progress_handler=progress_handler)
        
        print("\n\n✅ Done!")
        return result

# Run it
await test_critique_simple()

In [None]:
async def test_critique_companion_types():
    """Test that different companions give different perspectives."""
    
    PREMADE_QUERY = "How do I implement a REST API?"

    PREMADE_DRAFT = """To implement a REST API, you should:
    1. Choose a framework (Express.js, Flask, FastAPI)
    2. Define your endpoints
    3. Implement CRUD operations
    4. Add authentication
    5. Handle errors properly
    6. Test your API"""
    
    async with Client(mcp) as client:
        # Test each companion type
        for companion_type in ["generic", "marketing", "bug_triage", "strategy"]:
            print(f"\n{'='*60}")
            print(f"Testing {companion_type} companion...")
            
            result = await client.call_tool("critique", {
                "params":{
                    "query": PREMADE_QUERY,
                    "draft": PREMADE_DRAFT,
                    "companion_type": companion_type
                }
            })
            
            # Extract the critique from the result
            if result.content and len(result.content) > 0:
                content = result.content[0]
                if hasattr(content, 'text'):
                    import json
                    data = json.loads(content.text)
                    critique_text = data.get('critique', 'No critique found')
                    print("\nCritique Response preview:")
                    print(f"{critique_text[:150]}...")
            
# Run the test
await test_critique_companion_types()

In [None]:
async def test_critique_with_timeout():
    """Test critique with a timeout to avoid hanging."""
    
    PREMADE_QUERY = "How do I implement a REST API?"
    PREMADE_DRAFT = """To implement a REST API, you should:
    1. Choose a framework (Express.js, Flask, FastAPI)
    2. Define your endpoints
    3. Implement CRUD operations
    4. Add authentication
    5. Handle errors properly
    6. Test your API"""
    
    async with Client(mcp) as client:
        print("Testing critique with timeout...")
        
        try:
            # Add a 30 second timeout
            result = await client.call_tool("critique", {
                "params":{
                    "query": PREMADE_QUERY,
                    "draft": PREMADE_DRAFT,
                    "companion_type": "generic"
                }
            }, timeout=30)  # 30 second timeout
            
            # Extract the response 
            if result.content and len(result.content) > 0:
                content = result.content[0]
                if hasattr(content, 'text'):
                    import json
                    data = json.loads(content.text)
                    print(f"\nCritique: {data['critique'][:200]}...")
                    print(f"Session ID: {data['session_id']}")
            
        except Exception as e:
            print(f"Error: {type(e).__name__}: {str(e)}")
            
# Run the test
await test_critique_with_timeout()

In [5]:
async def test_critique_debug():
    """Debug why critique is hanging."""

    PREMADE_QUERY = "How do I implement a REST API?"
    PREMADE_DRAFT = """To implement a REST API, you should:
    1. Choose a framework (Express.js, Flask, FastAPI)
    2. Define your endpoints
    3. Implement CRUD operations
    4. Add authentication
    5. Handle errors properly
    6. Test your API"""

    async with Client(mcp) as client:
        print("Starting critique test...")

        try:
            # Short timeout to see where it hangs
            result = await client.call_tool("critique", {
                "params":{
                    "query": PREMADE_QUERY,
                    "draft": PREMADE_DRAFT,
                    "companion_type": "generic"
                }
            }, timeout=10)  # 10 second timeout

            print("Got result!")
            # Extract and show the critique
            if result.content and len(result.content) > 0:
                content = result.content[0]
                if hasattr(content, 'text'):
                    import json
                    data = json.loads(content.text)
                    print(f"Critique: {data['critique'][:100]}...")

        except TimeoutError:
            print("❌ Timed out - critique is hanging in the streaming loop")
        except Exception as e:
            print(f"❌ Error: {type(e).__name__}: {str(e)}")

# Run the test
await test_critique_debug()

Starting critique test...
❌ Error: McpError: Timed out while waiting for response to ClientRequest. Waited 10.0 seconds.


## Test Critique Tool with Streaming

Testing the critique tool which uses a more sophisticated streaming implementation with StreamingManager and BaseTokenCallback.