# GitHubAgent runtime test

This notebook performs a runtime test of `doppiozero.agents.gh_deep_search.GitHubAgent`.
By default the notebook will use your real LLM and content layer when available. To force deterministic offline behavior set `USE_MOCKS = True` in the mocking cell.

Run the cells in order. The notebook saves a single `agent_test_output.json` artifact when complete.

In [1]:
import types, json
from dotenv import load_dotenv
from pathlib import Path
from pprint import pprint

from doppiozero.agents.gh_deep_search import GitHubAgent

load_dotenv()

True

In [7]:
# Load environment variables and reinitialize the package-level llm_client
# so the agent will use the real LLM (credentials should be in .env).



import doppiozero.clients.llm as llm_mod
# Recreate the module-level client using environment credentials
llm_mod.llm_client = llm_mod.LLMClient(verbose=True)
print('Reinitialized doppiozero.clients.llm.llm_client with real LLM (verbose=True)')

Reinitialized doppiozero.clients.llm.llm_client with real LLM (verbose=True)


In [5]:
clarifying_answers = '''Q: What is the main goal?
A: Find recent authentication failure discussions across repos.

Q: Are there specific repos to focus on?
A: doppiozero and financial-planning.
'''

clarifying_qa = Path('clarifying_answers.txt').resolve()
clarifying_qa.write_text(clarifying_answers, encoding='utf-8')
print('Wrote clarifying answers to', clarifying_qa)

Wrote clarifying answers to /Users/romanofoti/romanofoti/doppiozero/clarifying_answers.txt


## Run the agent

Instantiate `GitHubAgent` with lightweight options and run it.

In [6]:
options = {
    'collection': None,
    'limit': 3,
    'max_depth': 1,
    'editor_file': None,
    'clarifying_qa': clarifying_qa,
    'search_modes': ['semantic'],
    'cache_path': None,
    'models': {'fast': 'default', 'reasoning': 'default', 'embed': 'default'},
    'parallel': False,
    'verbose': True,
}
agent = GitHubAgent('What are the recent discussions about authentication failures?', options)
result = agent.run()
print('=== AGENT RUN RESULT ===')
pprint(result)
print('=== SHARED STATE ===')
pprint(agent.shared)

2025-09-04 16:16:37,847 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - === GITHUB CONVERSATIONS RESEARCH AGENT ===
2025-09-04 16:16:37,849 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Request: What are the recent discussions about authentication failures?
2025-09-04 16:16:37,850 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Collection: None
2025-09-04 16:16:37,851 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Max results per search: 3
2025-09-04 16:16:37,852 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Max deep research iterations: 1
2025-09-04 16:16:37,852 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Fast model: default
2025-09-04 16:16:37,853 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Reasoning model: default
2025-09-

=== AGENT RUN RESULT ===
{'claims_verified': 0,
 'draft': '{}',
 'num_conversations': 0,
 'unsupported_claims': ['Claim 1', 'Claim 2']}
=== SHARED STATE ===
{'cache_path': None,
 'claim_verification': {'details': [{'claim': 'Claim 1',
                                     'evidence': [],
                                     'reasoning': 'no_support_found',
                                     'status': 'unsupported'},
                                    {'claim': 'Claim 2',
                                     'evidence': [],
                                     'reasoning': 'no_support_found',
                                     'status': 'unsupported'}],
                        'insufficient_claims': [],
                        'supported_claims': [],
                        'total_claims': 2,
                        'unsupported_claims': ['Claim 1', 'Claim 2'],
                        'verification_errors': 0},
 'claim_verification_completed': True,
 'clarifications': 'Q: What is th

2025-09-04 15:46:20,239 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Max deep research iterations: 1


2025-09-04 15:46:20,239 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Fast model: default


2025-09-04 15:46:20,239 - doppiozero.agents.gh_deep_search - gh_deep_search.py  - run          - INFO     - Reasoning model: default


2025-09-04 15:46:20,239 - doppiozero.nodes.researcher - researcher.py      - prep         - INFO     - === INITIAL RESEARCH PHASE ===


2025-09-04 15:46:20,240 - doppiozero.nodes.researcher - researcher.py      - prep         - INFO     - Starting initial semantic search for: What are the recent discussions about authentication failures?


2025-09-04 15:46:20,240 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - Executing initial semantic search and enriching results...


2025-09-04 15:46:20,240 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - Searching (pass 1): What are the recent discussions about authentication failures? (pass 1)


2025-09-04 15:46:20,240 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - Fetching conversation: https://example.com/convo/0


2025-09-04 15:46:20,241 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - Fetching conversation: https://example.com/convo/1


2025-09-04 15:46:20,241 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - Fetching conversation: https://example.com/convo/2


2025-09-04 15:46:20,241 - doppiozero.nodes.researcher - researcher.py      - exec         - INFO     - ✓ Initial research complete: 3 conversations collected


2025-09-04 15:46:20,241 - doppiozero.nodes.researcher - researcher.py      - post         - INFO     - ✓ Initial research complete: 3 conversations collected


2025-09-04 15:46:20,241 - doppiozero.nodes.clarifier - clarifier.py       - prep         - INFO     - === CLARIFYING QUESTIONS PHASE ===


2025-09-04 15:46:20,242 - doppiozero.nodes.clarifier - clarifier.py       - exec         - INFO     - Presenting clarifying questions to user...


2025-09-04 15:46:20,242 - doppiozero.nodes.clarifier - clarifier.py       - post         - INFO     - Clarifications stored.


2025-09-04 15:46:20,243 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - === PLANNING PHASE ===


2025-09-04 15:46:20,243 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - === PLANNING PHASE (Iteration 1/1) ===


2025-09-04 15:46:20,243 - doppiozero.nodes.planner - planner.py         - exec         - INFO     - Transforming queries into search plans...


2025-09-04 15:46:20,243 - doppiozero.nodes.planner - planner.py         - post         - INFO     - ✓ Planning complete, generated 1 search plans


2025-09-04 15:46:20,244 - doppiozero.nodes.retriever - retriever.py       - prep         - INFO     - === RETRIEVAL PHASE ===


2025-09-04 15:46:20,244 - doppiozero.nodes.retriever - retriever.py       - exec         - INFO     - Executing search operations and retrieving data...


2025-09-04 15:46:20,244 - doppiozero.nodes.retriever - retriever.py       - post         - INFO     - Added 5 new conversations to memory.


2025-09-04 15:46:20,244 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - === FINAL REPORT PHASE ===


2025-09-04 15:46:20,244 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - Generating final report from all gathered data...


2025-09-04 15:46:20,244 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - Research summary: 8 conversations analyzed, 2 queries used, 1 deep research iterations


2025-09-04 15:46:20,245 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - Routing to claim verification before final output


2025-09-04 15:46:20,245 - doppiozero.nodes.claim_verifier - claim_verifier.py  - prep         - INFO     - === CLAIM VERIFICATION PHASE ===


2025-09-04 15:46:20,246 - doppiozero.nodes.claim_verifier - claim_verifier.py  - exec         - INFO     - Verifying 2 claims against evidence...


2025-09-04 15:46:20,246 - doppiozero.nodes.claim_verifier - claim_verifier.py  - exec         - INFO     - ✗ Claim unsupported: Claim 1


2025-09-04 15:46:20,246 - doppiozero.nodes.claim_verifier - claim_verifier.py  - exec         - INFO     - ✗ Claim unsupported: Claim 2


2025-09-04 15:46:20,247 - doppiozero.nodes.claim_verifier - claim_verifier.py  - post         - INFO     - ✓ Claim verification complete: 2 claims checked.


2025-09-04 15:46:20,247 - doppiozero.nodes.claim_verifier - claim_verifier.py  - post         - INFO     - Verification incomplete: 2 claims; retrying attempt 1/1.


2025-09-04 15:46:20,247 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - === PLANNING PHASE ===


2025-09-04 15:46:20,247 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - === PLANNING PHASE (Iteration 2/1) ===


2025-09-04 15:46:20,248 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - Focusing search on gathering evidence for 2 unsupported claims


2025-09-04 15:46:20,248 - doppiozero.nodes.planner - planner.py         - prep         - INFO     - Generated claim verification search plan: {}


2025-09-04 15:46:20,248 - doppiozero.nodes.planner - planner.py         - exec         - INFO     - Transforming queries into search plans...


2025-09-04 15:46:20,248 - doppiozero.nodes.planner - planner.py         - post         - INFO     - ✓ Planning complete, generated 1 search plans


2025-09-04 15:46:20,248 - doppiozero.nodes.retriever - retriever.py       - prep         - INFO     - === RETRIEVAL PHASE ===


2025-09-04 15:46:20,248 - doppiozero.nodes.retriever - retriever.py       - exec         - INFO     - Executing search operations and retrieving data...


2025-09-04 15:46:20,249 - doppiozero.nodes.retriever - retriever.py       - post         - INFO     - Added 5 new conversations to memory.


2025-09-04 15:46:20,249 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - === FINAL REPORT PHASE ===


2025-09-04 15:46:20,249 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - Generating final report from all gathered data...


2025-09-04 15:46:20,249 - doppiozero.nodes.reporter - reporter.py        - prep         - INFO     - Research summary: 13 conversations analyzed, 3 queries used, 2 deep research iterations


2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - === FINAL REPORT ===




2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - {}


2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - 

---




2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - **Note**: The following 2 claims could not be fully verified against the available evidence:


2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - 1. Claim 1


2025-09-04 15:46:20,250 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - 2. Claim 2


2025-09-04 15:46:20,251 - doppiozero.nodes.reporter - reporter.py        - post         - INFO     - 

✓ Research complete! Total conversations analyzed: 13, 2 claims verified (0 supported, 2 unsupported)


2025-09-04 15:46:20,251 - doppiozero.nodes.end - end.py             - exec         - INFO     - End node: terminating workflow and returning results.


=== AGENT RUN RESULT ===
{'claims_verified': 0,
 'draft': '{}',
 'num_conversations': 13,
 'unsupported_claims': ['Claim 1', 'Claim 2']}
=== SHARED STATE ===
{'cache_path': None,
 'claim_verification': {'details': [{'claim': 'Claim 1',
                                     'evidence': [{'score': 1.0,
                                                   'snippet': 'Summary of '
                                                              'Claim 1 #0',
                                                   'source': 'https://example.com/convo/0'},
                                                  {'score': 0.9,
                                                   'snippet': 'Summary of '
                                                              'Claim 1 #1',
                                                   'source': 'https://example.com/convo/1'},
                                                  {'score': 0.8,
                                                   'snippet': 'Summary of '
   

In [6]:
# Simple runtime checks (no external test files created)
assert isinstance(result, dict), 'Agent result should be a dict'
# The agent returns a structured final report placed in shared['final_report']
assert 'num_conversations' in result, 'Result missing num_conversations key'
print('Basic assertions passed: result appears to be a final report dict')

Basic assertions passed: result appears to be a final report dict


In [7]:
# Save the result artifact for inspection
import json
out_path = Path('agent_test_output.json').resolve()
with open(out_path, 'w', encoding='utf-8') as f:
    json.dump(result, f, ensure_ascii=False, indent=2)
print('Saved agent output to', out_path)

Saved agent output to /Users/romanofoti/romanofoti/doppiozero/agent_test_output.json


## Next steps / integration notes

- To run the agent against real GitHub data, remove or replace the monkeypatch cell and set real `collection`, `cache_path` and credentials via environment variables (do not commit tokens).
- If you want me to add optional assertions on `agent.shared` (for example `claim_verification`), tell me which keys you expect and I'll add them.
- If anything in this offline run fails, paste the notebook output and I will debug the failing cell.