Python: add agent-framework-hosting-teams channel (microsoft-teams-apps SDK)#5642
Conversation
eb01b3b to
597ba83
Compare
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 72%
✗ Security Reliability
The main security concern is a path traversal vulnerability in
_resolve_checkpoint_storagein_host.py:request.session.isolation_keyis used directly in aPathjoin (self._checkpoint_location / request.session.isolation_key) without sanitization. SinceFileCheckpointStorage.__init__callsmkdir(parents=True, exist_ok=True)on the resulting path, a maliciousisolation_keycontaining../sequences could create directories and write checkpoint files outside the intended storage root. TheFileCheckpointStorage._validate_file_pathmethod only guardscheckpoint_idwithin the storage path — it cannot protect when the storage path itself is attacker-influenced. The risk is elevated whenskip_auth=Trueis used or a channel derivesisolation_keyfrom unvalidated user input.
✓ Test Coverage
The test_channel.py test suite for hosting-teams is well-structured and covers the main paths (message dispatch, outbound transforms, citations, streaming, feedback, identity, lifecycle startup). However, there are notable gaps: the async stream_transform_hook code path is untested (only sync hooks are tested), the _on_shutdown lifecycle handler logic is untested, and there is no test verifying behavior when activity.text is None (the
or "fallback). The new hosting package has good test coverage overall for the host wiring, session management, delivery routing, and workflow targets. However, theapply_run_hookhelper function—which contains meaningful branching logic (sync vs async hook returns)—is exported in__all__but has no test coverage in either test file. The host tests never pass arun_hooktoAgentFrameworkHost, so that code path is untested.
✗ Design Approach
The new hosting types and tests look directionally fine, but the workspace lockfile change is solving package-resolution problems by narrowing the entire Python support matrix from the repo’s declared 3.10+ baseline to 3.12+. That is a design-level regression because it affects every consumer of the frozen workspace, not just the new hosting packages, and it also strips existing per-package version guards from unrelated dependencies like github-copilot. The lockfile regeneration appears to have narrowed the supported install surface without updating the corresponding package metadata. In particular, it drops the only Python 3.10/3.11 artifacts for
hyperlight-sandbox-backend-wasmeven thoughagent-framework-hyperlightstill advertises>=3.10and depends on that wheel on supported Linux/Windows targets, which makes the repo’s frozen install path inconsistent with the declared support matrix. The lockfile changes appear to silently narrow Python compatibility from the repo’s declared>=3.10support to a 3.11+/3.12+ solution for key scientific dependencies. That is a design-level problem because it changes the supported runtime matrix through generated lock state only, without updating package metadata or docs. The lockfile regeneration appears to have been done against a narrower interpreter matrix than the repo actually supports. This lockfile regeneration appears to have been done from a 3.12+ environment and committed as if it were a workspace-wide update. That is a design-level mismatch: the checked-in lockfile no longer represents the full supported interpreter matrix.
Flagged Issues
- Path traversal in
_host.py_resolve_checkpoint_storage:request.session.isolation_keyis used unsanitized inself._checkpoint_location / request.session.isolation_key. TheFileCheckpointStorageconstructor callsmkdir(parents=True, exist_ok=True)on whatever path it receives, so anisolation_keycontaining../can escape the intended storage directory. Sanitize or validate that the resolved path stays withinself._checkpoint_locationbefore constructing the storage.
Automated review by eavanvalkenburg's agents
New ``agent-framework-hosting`` package implementing ADR 0026 / SPEC-002:
the channel-neutral host that lets a single ``Agent`` (or ``Workflow``)
fan out across multiple wire protocols ("channels") behind one Starlette
ASGI app.
Surface (re-exported from ``agent_framework_hosting``):
- ``AgentFrameworkHost`` — wraps a hostable target, mounts channels onto
an ASGI app, owns per-isolation-key ``AgentSession`` reuse, threads
request context (``response_id`` / ``previous_response_id``) into
context providers via an ``ExitStack`` of ``bind_request_context``
calls, and exposes an opt-in Hypercorn ``serve()`` helper (extra
``[serve]``).
- ``Channel`` protocol + ``ChannelContribution`` — the surface a channel
package implements (routes, lifespans, identity hooks, …).
- ``ChannelRequest`` / ``ChannelSession`` / ``ChannelIdentity`` /
``ChannelPush`` / ``ChannelCommand[Context]`` / ``ChannelRunHook`` /
``ChannelStreamTransformHook`` / ``DeliveryReport`` /
``HostedRunResult`` / ``ResponseTarget`` / ``ResponseTargetKind`` /
``apply_run_hook`` — channel-side dataclasses + helpers.
- ``IsolationKeys`` + ``ISOLATION_HEADER_USER`` / ``..._CHAT`` +
``get/set/reset_current_isolation_keys`` — the host's ASGI middleware
reads the ``x-agent-{user,chat}-isolation-key`` headers off each
inbound request and exposes them to the agent stack via a
``ContextVar`` so storage-side providers (e.g.
``FoundryHostedAgentHistoryProvider``) can apply per-tenant
partitioning without channels having to forward anything.
Includes 45 unit tests covering the host, channel contributions,
isolation contextvar, and shared types. Registers the package in
``python/pyproject.toml`` ``[tool.uv.sources]`` and adds the matching
pyright ``executionEnvironments`` entry for tests.
Hypercorn is an optional dependency (``[serve]`` extra); the soft import
in ``serve()`` is annotated for pyright since it isn't on the default
install.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eams-apps SDK
New ``agent-framework-hosting-teams`` package that exposes the
``TeamsChannel`` for ``agent-framework-hosting``. Layered on top of the
official microsoft/teams.py SDK (PyPI: ``microsoft-teams-apps``)
instead of speaking the Bot Framework Activity Protocol over raw
HTTP. Teams traffic still flows through Azure Bot Service -- this
change is to the *programming model*, not the transport.
Compared to the channel-neutral
``agent-framework-hosting-activity-protocol`` (PR-5a), this channel
gives users typed activity models, the SDK's streaming primitive,
adaptive cards, and citation entities, all wired as first-class
features through the host.
Highlights
----------
- Host integration:
- Custom ``HttpServerAdapter`` (``_StarletteCaptureAdapter``) that
captures the SDK's route registration into Starlette ``Route``
instances, so the host mounts the messaging webhook without
handing the SDK its own server.
- The synchronous half of ``App.initialize`` is invoked inside
``contribute()`` so the host has the routes ready before
serving; the async half (plugin ``on_init``) runs from the
channel's startup hook, idempotently.
- Defaults:
- Mount path: ``/teams/messages`` (matches Bot Framework
convention).
- Channel name: ``"teams"`` and isolation key prefix ``teams:``
via ``teams_isolation_key()``, distinct from the
``activity:`` prefix used by hosting-activity-protocol so the
two channels never collide on the same host.
- Inbound:
- ``on_message`` builds a ``ChannelRequest`` with the inbound
text, the ``ConversationAccount`` id as session isolation key,
and the inbound ``MessageActivity`` exposed as
``protocol_request`` to user ``ChannelRunHook`` callables.
- Optional ``on_message_submit_feedback`` registration when the
user supplies a ``feedback_handler`` -- the channel translates
the typed invoke into a ``TeamsFeedbackContext`` and supports
sync or async callables.
- Outbound:
- Default reply: ``ctx.send(result.text)``.
- Optional ``outbound_transform`` returns a
``TeamsOutboundPayload`` with one of:
- plain text,
- ``AdaptiveCard`` (sent verbatim via the SDK),
- text + ``citations`` (rendered as a
``CitationEntity`` containing one ``Claim`` per
``TeamsCitation``, positions are 1-based).
- Streaming:
- ``streaming=True`` calls ``run_stream`` on the host, emits each
update's text deltas through the SDK's ``HttpStream``, then
closes the stream and runs ``deliver_response`` with the
accumulated text so cross-channel delivery still works.
- Optional ``stream_transform_hook`` lets callers drop or rewrite
individual ``AgentResponseUpdate`` instances per-channel.
Auth
----
``client_id`` / ``client_secret`` / ``tenant_id`` (or a custom
``token`` callable) are passed through to the SDK as-is. ``skip_auth``
disables JWT validation on inbound activities for local Bot Framework
Emulator development.
Test coverage
-------------
30 unit tests covering: isolation key + helpers, citation entity
shape, route adapter capture and Starlette translation, channel
construction, end-to-end inbound dispatch (including run-hook
invocation, transform variants, citation wiring), streaming with
transform-hook drop, sync + async feedback handlers, identity
extraction, and lifecycle hook idempotency. Coverage 92%; pyright +
mypy clean.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
597ba83 to
c2d74be
Compare
| protocol_request=activity, | ||
| ) | ||
|
|
||
| if self._streaming: |
There was a problem hiding this comment.
Should the streaming branch also go through _send_outbound? Right now outbound_transform, citations, and Adaptive Cards are silently ignored whenever streaming=True - the user gets only concatenated text via HttpStream.emit, no card, no citations entity. Constructing TeamsChannel(streaming=True, outbound_transform=...) typechecks and runs but the transform never fires. Either reject the combo at __init__ or run the transform on the final HostedRunResult after stream.close()?
| raise RuntimeError("TeamsChannel was not contributed to a host.") | ||
| stream: ResponseStream[AgentResponseUpdate, AgentResponse] = self._ctx.run_stream(request) | ||
| accumulated: list[str] = [] | ||
| async for update in stream: |
There was a problem hiding this comment.
What happens if the host stream raises mid-iteration (transient model error, tool failure)? No try/finally, so stream.close() is skipped and the Teams message is left in the live "typing" state forever from the user's POV, and deliver_response never runs so cross-channel sinks miss the turn entirely. Is it worth wrapping in try/finally that closes the stream and delivers the partial result (or an error result) so the failure is visible?
| """ | ||
| self._ctx = context | ||
| # Synchronously register the route on our capturing adapter. | ||
| self._app.server.initialize( |
There was a problem hiding this comment.
The docstring says App.initialize() is idempotent because the synchronous server-init step gets short-circuited on the second call, but that short-circuit is gated on App._initialized, which we never set after the partial init in contribute(). So at startup _initialized is still false, app.initialize() runs in full, and HttpServer.initialize() plus register_route execute a second time.
Should we set self._app._initialized = True after the partial init, or invoke only the missing async plugin on_init step in _on_startup?
| self._app.server.on_request = self._app._process_activity_event # type: ignore[attr-defined] # pyright: ignore[reportPrivateUsage] | ||
|
|
||
| routes = [ | ||
| _route_for(handler, mount_path, method) for (method, mount_path), handler in self._adapter.handlers.items() |
There was a problem hiding this comment.
If _adapter.handlers is empty here (user passed an App whose server._adapter was already populated before the swap at line 364, or a future SDK version moves register_route into the async App.initialize path), we return zero routes and the host happily mounts nothing.
Bot Service POSTs to /teams/messages then 404 with no startup signal. Could we assert non-empty here, or at least logger.error so the failure mode is louder than it is currently?
| self._ctx: ChannelContext | None = None | ||
|
|
||
| self._adapter = _StarletteCaptureAdapter() | ||
| if app is not None: |
There was a problem hiding this comment.
Should we reject an already-initialized app here? If a caller built an App and ran await app.initialize() themselves before passing it in, routes are already registered on the previous adapter, our capturing adapter sees nothing, and the channel silently mounts zero routes (see other comment around _route_for(...)). I think an if self._app._initialized: raise ValueError(...) makes the misuse loud at construction time.
| if raw_body: | ||
| try: | ||
| parsed = await request.json() | ||
| except Exception: |
There was a problem hiding this comment.
except Exception here swallows ClientDisconnect, encoding errors, Starlette body-reader bugs, anything - and we return 400 with no log. Bot Service then retries and the operator has no signal beyond "Teams ignores some messages intermittently." Looks to be a common theme across the several PRs, I've looked at: is it worth narrowing to json.JSONDecodeError and logger.exception for the rest with content-type and body length?
| """Resolve outbound content and POST to Teams via the SDK.""" | ||
| payload = await self._resolve_outbound(request, activity_context, result) | ||
|
|
||
| if payload.card is not None: |
There was a problem hiding this comment.
The text=None, card=None branch returns silently with no log. Easy mistake from a transform ("no card applies, fall through to text" then forgetting to set text), and the symptom is a hung Teams conversation with nothing in logs to explain why. How about at a minimum we have a logger.warning with conversation/request id would give operators something to grep?
| reply_to_id=activity.reply_to_id, | ||
| identity=identity, | ||
| ) | ||
| outcome = self._feedback_handler(ctx) |
There was a problem hiding this comment.
User feedback handlers raising will propagate through the SDK's invoke dispatcher and 500 the invoke. Teams treats 5xx on feedback as a permanent failure for that activity, the user sees an error toast and the rating is lost. Feedback is a side channel, so wrapping in try/except and logger.exception seems safer than failing the invoke.
| return | ||
| activity = activity_context.activity | ||
| identity = self._identity_from_activity(activity) | ||
| action_value = getattr(activity.value, "action_value", None) |
There was a problem hiding this comment.
If the SDK ever renames action_value/reaction, every feedback event silently arrives with rating="" and the handler runs against empty data - thumbs-up/down telemetry breaks across the deployment with zero signal. Is it worth a logger.warning when neither field resolves, or a validation that rating is one of the expected values?
Motivation and Context
Implements the higher-level Teams channel described in SPEC-002 §7 + §"Teams-native fast follow" (merged via #5549). Built on the
microsoft-teams-appsSDK so users get Teams-native affordances out of the box on top of the same Bot Service transport as PR-5a.TeamsChannel(this PR)ActivityProtocolChannel(PR-5a)Description
Adds
agent-framework-hosting-teams(python/packages/hosting-teams/):TeamsChannel— built onmicrosoft-teams-apps. Default mount/teams/messages. Surfaces:ctx.stream.transform_outboundhook to emit Adaptive Cards.on_message_submit_feedbackcallback for Teams feedback events.HttpServerAdapter— captures the SDK's route registration synchronously insideChannel.contribute()so the host can mount a stable Starlette route (instead of letting the SDK spin up its own server).ActivityContext+HttpStream(no live Bot Service required).Note: this still uses Azure Bot Service as the transport (Teams does not currently expose a direct webhook outside Bot Service). The win over PR-5a is the developer-facing surface, not the hop count.
Stack
PR-5b of 9. Depends on #PR-2 (
feat/hosting-core). Independent of PR-5a — sample apps in PR-8 useActivityProtocolChannelfrom PR-5a; a follow-up sample for the Teams-SDK channel will be added separately.Contribution Checklist