Replace custom DictCheckpointStorage with built-in agent-framework 1.2.1 storages#423
Replace custom DictCheckpointStorage with built-in agent-framework 1.2.1 storages#423
Conversation
* update fraud_detection_durable to feb 12 agent-framework * enhance fraud detection durable * update readme --------- Co-authored-by: James N. <james.nguyen@microsoft.com>
* update fraud_detection_durable to feb 12 agent-framework * enhance fraud detection durable * update readme * chore: reduce anomaly probability to 1% for controlled demo pace --------- Co-authored-by: James N. <james.nguyen@microsoft.com>
…ped-contract workflow Scripts 6-8 for the MCP agent demo: - workflow_typed_contracts.py: 4-agent IT security incident response pipeline with Pydantic-enforced contracts at every boundary (SecurityAlert, ThreatAssessment, ImpactAnalysis, IncidentResponse) - mcp_server_hybrid.py: Single MCP endpoint exposing BOTH tool types: * Strict-schema: triage_alert, assess_threat, create_response (Pydantic) * Natural-language: ask_security_advisor, explain_for_customer (prose) * Shared session state across all tools - mcp_client_hybrid.py: 5-step incident flow using both tool types with explicit context passing between steps - README.md: Updated from 6 to 8 capabilities with Hybrid Mode architecture diagram and Quick Start sections 7-8
- Delete workflow_local_remote.py (old Script 3 with extra LLM call) - Add workflow_proxy_agent.py: MCPProxyAgent (BaseAgent, no LLM) calls MCP tools directly via call_tool() zero local LLM overhead - Simplify MCPProxyAgent: extract common _call() method, module-level _last_text() helper, remove verbose static methods - Fix workflow_typed_contracts.py: add async with context managers - Update README: merge Script 9 into Script 3 position, update architecture diagram, Quick Start, How It Works sections
- workflow_proxy_agent.py: remove 'traditional vs proxy' framing MCPProxyAgent is the standard approach for remote agent integration - Delete workflow_typed_contracts.py: not relevant to MCP integration - README: 7 scripts, renumber hybrid 6-7, remove typed-contract sections
- mcp_server_langgraph.py: LangGraph ReAct agent exposed as MCP server on port 8003 with architecture tools (pattern eval, migration estimate, tech stack recommendation) - workflow_group_chat.py: MAF GroupChatBuilder orchestration with LLM planner routing between local BusinessStrategist (MAF) and remote TechnicalArchitect (LangGraph via MCP) - pyproject.toml: add langgraph, langchain-openai, langchain-core deps - README.md: update to 9 scripts, add cross-framework architecture diagram, Quick Start sections, and dependency table entries
…nt, finalize group chat with inline Planner
…tterns (Agent-as-Tool + Agent Adapter), add conceptual architecture
Breaking changes migrated: - agent-framework 1.0.0rc1: ChatAgent->Agent, AgentThread->AgentSession, run_stream->run(stream=True), model->default_options(ChatOptions), WorkflowOutputEvent/AgentRunEvent/RequestInfoEvent->unified WorkflowEvent, MagenticBuilder now uses constructor kwargs, orchestrations moved to agent_framework_orchestrations package - fastmcp 3.0.2: removed mcp version pin (now pulled by fastmcp), no code changes needed (fully compatible) Files updated: - agentic_ai/agents/agent_framework/single_agent.py - agentic_ai/agents/agent_framework/multi_agent/handoff_multi_domain_agent.py - agentic_ai/agents/agent_framework/multi_agent/reflection_agent.py - agentic_ai/agents/agent_framework/multi_agent/magentic_group.py - agentic_ai/applications/pyproject.toml (agent-framework==1.0.0rc1) - agentic_ai/workflow/fraud_detection_durable/pyproject.toml (agent-framework==1.0.0rc1) - mcp/pyproject.toml (fastmcp==3.0.2) - tests/test_agent_framework_rc1_regression.py (51 regression tests, all passing)
The Docker build uses requirements.txt (not pyproject.toml/uv.lock), and it still had agent-framework-core==1.0.0b260130 pinned, which conflicts with agent-framework==1.0.0rc1 requiring agent-framework-core==1.0.0rc1.
Auto-merged after successful CI/CD pipeline run 22315210980
Pushes made by GITHUB_TOKEN don't trigger other workflows (GitHub Actions limitation to prevent infinite loops). This meant the auto-merge into int-agentic never triggered promote-to-main.yml. Now uses secrets.GH_PAT with fallback to GITHUB_TOKEN so: 1. If GH_PAT is configured: merge triggers promote-to-main.yml 2. If not: merge still works, but promotion PR must be created manually
Allows manual triggering when auto-trigger is missed (e.g. after GITHUB_TOKEN-based merges that don't fire downstream workflows).
entic' of https://github.com/microsoft/OpenAIWorkshop into int-agentic st "test" t config --global core.pager cat; git fetch origin main int-agentic; git checkout int-agentic o hello t log --oneline -3 t branch --show-current t branch --show-current 2>&1 | Out-String t fetch origin main int-agentic; git checkout int-agentic; git branch --show-current t -C c:\testing\OpenAIWorkshop -c core.pager=cat fetch origin main int-agentic t -C "c:\testing\OpenAIWorkshop" config --global core.pager cat iines starting with '#' will be ignored, and an empty message aborts
# Conflicts: # agentic_ai/workflow/fraud_detection_durable/pyproject.toml # agentic_ai/workflow/fraud_detection_durable/uv.lock
1. MCP Cosmos DB: Replace AzureCliCredential (not available in containers) with ManagedIdentityCredential (when AZURE_CLIENT_ID is set) or DefaultAzureCredential as fallback. This fixes the 'Azure CLI not found on path' error in production Container Apps. 2. Backend: Remove reflection_workflow_agent from DEFAULT_AVAILABLE_AGENTS and descriptions - the module file doesn't exist, causing it to show as a broken option in the UI agent selector.
Auto-merged after successful CI/CD pipeline run 22321497108
1. promote-to-main.yml: Use GH_PAT instead of GITHUB_TOKEN so the workflow can create PRs (GITHUB_TOKEN lacks permission for this in repos with branch protection). 2. orchestrate.yml: Add Step 8 auto-destroy that tears down the integration-* environment after successful auto-merge to int-agentic. Only dev environments are destroyed; production is retained.
1. promote-to-main.yml: Use GH_PAT instead of GITHUB_TOKEN so the workflow can create PRs (GITHUB_TOKEN lacks permission for this in repos with branch protection). 2. orchestrate.yml: Add Step 8 auto-destroy that tears down the integration-* environment after successful auto-merge to int-agentic. Only dev environments are destroyed; production is retained.
|
@copilot PR 422 failed integration test, I want you to include the fix for that in this PR |
…_group 1) tests/test_agent_framework_1_2_1_regression.py::test_chat_options_model_id incorrectly asserted that ChatOptions(model_id=...) works. ChatOptions is a TypedDict(total=False) so unknown keys are silently accepted at construction and only blow up downstream when forwarded as kwargs to responses.create(). Renamed to test_chat_options_model and added an __annotations__ check so the test fails loudly if the field is renamed again. 2) magentic_group._validate_configuration required AZURE_OPENAI_API_KEY even though _build_chat_client supports managed identity. The other agents (single_agent, handoff_multi_domain_agent, reflection_agent) only require the deployment/endpoint/version and accept either api_key or credential. Aligned magentic_group with that contract so MI-only deployments don't raise spurious RuntimeError on every /chat call. These follow-ups were originally pushed to PR #421 but dropped by its squash merge. PR #423 already touches both files, so re-applying here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot Reviewed PR #423 against the actual While reviewing I noticed two unrelated CI-blockers that PR #423 already touches but doesn't fix. I had pushed these as follow-ups to PR #421, but they were dropped by the squash-merge. Re-applied them here as
Both changes are scoped to files PR #423 already modifies, so no extra surface area. |
…E) to ChatOptions Agent-Logs-Url: https://github.com/microsoft/OpenAIWorkshop/sessions/a202414e-c0c2-481c-abb6-e986d34c845e Co-authored-by: james-tn <25941658+james-tn@users.noreply.github.com>
Investigated PR #422's integration-test failure (run 25069733984: 5 × Fixed in e8ee036 by passing Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
@copilot how about these comments? |
Both fixes from
Re-ran |
|
@copilot resolve conflicts |
…-checkpoint-methods # Conflicts: # agentic_ai/agents/agent_framework/multi_agent/handoff_multi_domain_agent.py # agentic_ai/agents/agent_framework/multi_agent/magentic_group.py # agentic_ai/agents/agent_framework/multi_agent/reflection_agent.py # agentic_ai/agents/agent_framework/single_agent.py Co-authored-by: james-tn <25941658+james-tn@users.noreply.github.com>
Resolved in Files resolved:
|
|
@copilot resolve the merge conflicts in this pull request |
…oint-methods # Conflicts: # agentic_ai/agents/agent_framework/multi_agent/handoff_multi_domain_agent.py # agentic_ai/agents/agent_framework/multi_agent/magentic_group.py # agentic_ai/agents/agent_framework/multi_agent/reflection_agent.py # agentic_ai/agents/agent_framework/single_agent.py Co-authored-by: james-tn <25941658+james-tn@users.noreply.github.com>
Resolved against |
…es 'preview' Live deployment was returning 500 'API version not supported' from Azure OpenAI because the agents pinned api_version='2025-04-01-preview' on every OpenAIChatClient construction. agent-framework 1.2.1 already reads AZURE_OPENAI_API_VERSION from env and otherwise defaults to the Responses-API magic value 'preview' (v1 endpoint), which is the right behaviour for Azure deployments like gpt-5.2-chat. Changes: - Remove api_version=self.api_version from all four agent_framework agents (single_agent, handoff_multi_domain_agent, magentic_group, reflection_agent). - Drop api_version from each agent's _validate_configuration. - Stop injecting AZURE_OPENAI_API_VERSION into ca-be Terraform; remove the openai_api_version variable and tfvars entries. - Also fix two latent CI issues that were already noted in PR #423 review: * tests/test_agent_framework_1_2_1_regression.py::test_chat_options_model_id asserted the wrong field name; renamed to test_chat_options_model and added an __annotations__ check. * magentic_group._validate_configuration no longer hard-requires the API key; managed identity is now accepted, matching the other three agents. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
agent-framework1.2.1 shipsInMemoryCheckpointStorage,FileCheckpointStorage, and (viaagent-framework-azure-cosmos)CosmosCheckpointStorage, all conforming to a standardizedCheckpointStorageprotocol. The two hand-rolledDictCheckpointStorage/_DictCheckpointStorageclasses in the multi-agent modules predate these and carry ~150 lines of bespoke storage code plus ~80 lines of defensive plumbing to bridge legacy/current method shapes.Changes
multi_agent/checkpoint_storage.py— singlecreate_checkpoint_storage(session_id)factory that picksmemory(default) /file/cosmosfromWORKFLOW_CHECKPOINT_BACKEND(withWORKFLOW_CHECKPOINT_DIRand standardAZURE_COSMOS_*env vars). Caches one storage instance per (backend, session) in-process. Also exposesprune_checkpoints/purge_checkpointsretention helpers built on the public protocol, and a_coerce_checkpoint_storageduck-type validator preserving the existingmagentic_checkpoint_storageoverride hook.magentic_group.py— dropDictCheckpointStorage,_workflow_name_for_storage,_purge_checkpoint_storage,mark_pending_prompt/consume_pending_prompt, and the multi-fallback_get_latest_checkpoint_id. The pending prompt was already mirrored instate_store; the storage-side copy was dead weight. Resume is now a directawait storage.get_latest(workflow_name=...)against the workflow name captured fromworkflow.nameafter build and persisted instate_store.handoff_multi_domain_agent.py— drop_DictCheckpointStorage. Resume now callsstorage.get_latest(workflow_name=f"handoff-{session_id}"). Per-turnprune_checkpoints(retain=5)replaces the in-storage_RETENTIONcap.monkeypatch), protocol surface, save/load roundtrip, the new resume helper, and the host-injected override hook. Migratedasyncio.get_event_loop()/new_event_loop()calls in this file toasyncio.run()(also silences aDeprecationWarning).STATE_MANAGEMENT.md— new §6.0 lists the built-in storages first; the legacy hand-rolled Cosmos sample in §6.2 is kept with a "prefer the built-in" note.int-agentic(Fix backend 500s by renaming ChatOptions(model_id=…) to ChatOptions(model=…) for agent-framework 1.2.1 #421ChatOptions(model_id=…)→ChatOptions(model=…)) into the branch and applied the rename inside the refactored modules.PR #422 integration-test fix (rolled into this PR)
PR #422's CI/CD pipeline failed on
tests/test_backend_api.py— 5test_backend_chat_*tests returned HTTP 500 in ~0.5s. PR #421 had attempted to fix this by renamingChatOptions(model_id=…)→ChatOptions(model=…), but the failure persisted because the wrong value was being passed. In agent-framework 1.2.1, themodelfield indefault_optionsoverrides the chat client's deployment name when building the Azure URL. We were passingself.openai_model_name(literal"gpt-5-chat"from bicep) whileAZURE_OPENAI_CHAT_DEPLOYMENTholds the actual deployment name, so requests were routed to a non-existent deployment and Azure returned 5xx — bubbling up as/chat500. Fixed in all 5 sites by passingself.azure_deploymentinstead:agentic_ai/agents/agent_framework/single_agent.pyagentic_ai/agents/agent_framework/multi_agent/handoff_multi_domain_agent.pyagentic_ai/agents/agent_framework/multi_agent/magentic_group.pyagentic_ai/agents/agent_framework/multi_agent/reflection_agent.py(2 sites)Robustness note
CosmosDBStateStore.__setitem__only upserts on top-level reassignment, so the previous storages' nested-dict mutations were never actually persisted — checkpoints were effectively in-memory regardless of state-store backend. SettingWORKFLOW_CHECKPOINT_BACKEND=cosmosnow finally enables real cross-process resume via the SDK's partitioned, durable container, with no behaviour change in the default path.Verification
tests/test_agent_framework_1_2_1_regression.py(after the deployment-name fix above).int-agentic: 5 files changed, +440 / −378 lines.