Skip to content

feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76

Open
initializ-mk wants to merge 16 commits into
mainfrom
feature/msteams-graph-polling
Open

feat(channels): Microsoft Teams adapter via Graph polling (Phases A-D, closes #75)#76
initializ-mk wants to merge 16 commits into
mainfrom
feature/msteams-graph-polling

Conversation

@initializ-mk
Copy link
Copy Markdown
Contributor

Summary

First-party Microsoft Teams channel adapter implemented as a Microsoft Graph polling adapter (analogous to Telegram polling), not a Bot Framework webhook. Outbound-only — no inbound webhooks, no public endpoint. Preserves Forge's first design principle while finally putting Teams support on the table now that Graph delta polling and un-metered Teams APIs (since 2025-08-25) make the model viable.

Implements Phases A-D of the design recorded in #75:

  • Phase A — adapter skeleton + Plugin interface
  • Phase B — typed Graph client (/me, delta paging, POST messages, hosted-content attachments) + OAuth2 auth (delegated refresh-token + client_credentials)
  • Phase C — Markdown → Teams HTML converter + plain-text extractor + 24 KB chunker + mention extractor
  • Phase D — admission gate (self-loop guard, bot allowlist, mention/DM mode) + sliding-window dedup + atomic cursor persistence

Closes #75 for the runtime adapter; CLI wizard, user docs, and live tenant validation are deferred to a follow-up PR (see "Deferred" below).

Diff stats

 13 files changed, 2363 insertions(+), 9 deletions(-)
Path Purpose
forge-plugins/channels/msteams/msteams.go Plugin shell + poll loop + dispatch + SendResponse
forge-plugins/channels/msteams/auth.go Delegated + client_credentials OAuth2; refresh-token rotation hook
forge-plugins/channels/msteams/graph.go Typed Graph client; classified error sentinels (401/403/410/429)
forge-plugins/channels/msteams/admission.go 4-stage gate; self-loop beats allowlist
forge-plugins/channels/msteams/cursor.go Atomic delta-link persistence; 0700 dir perm
forge-plugins/channels/msteams/dedup.go Sliding 1000-ID ring
forge-plugins/channels/msteams/msteams_test.go 29 unit tests
forge-plugins/channels/markdown/teams.go MarkdownToTeamsHTML, TeamsHTMLToPlain, SplitMessageTeams, ExtractMention
forge-plugins/channels/markdown/teams_test.go 18 unit tests
forge-core/security/capabilities.go msteams capability bundle: graph.microsoft.com, login.microsoftonline.com
forge-cli/cmd/channel.go msteams added to ValidArgs, createPlugin, defaultRegistry, egress wiring, setup instructions
forge-cli/templates/init/msteams-config.yaml.tmpl Config template with _env-suffix discipline
forge-cli/templates/init/env-msteams.tmpl .env placeholders for MSTEAMS_*

Zero edits required to channels_stage.go, k8s_stage.go, requirements_stage.go, template_data.go, K8s manifest templates, or cmd/package.go — the existing _env-suffix scanner from PR #54 picks up msteams-config.yaml automatically. The architecture from Phase 7 (Slack/Telegram pattern) generalised cleanly.

Architecture invariants preserved

  • Outbound-only. No port bound. All HTTPS to graph.microsoft.com + login.microsoftonline.com.
  • No Bot Framework / Azure Bot Service. Direct net/http calls only.
  • _env-suffix convention holds — every secret-bearing setting in msteams-config.yaml.tmpl ends in _env. K8s manifest generation needs zero changes.
  • No model=A/B query param. Metering era ended 2025-08-25; the param is now ignored.
  • Self-loop guard fires FIRST in the admission gate — even if the agent's own userID is misconfigured into allow_bot_ids, self-authored messages are dropped (matches Slack/PR feat(slack): admit bot @mentions via allow_bot_ids, drop self always (closes #55) #56).
  • No hardcoded tenant ID — read via tenant_id_env. Multi-tenant seam stays clean even though multi-tenant adapter isn't in scope.
  • Sovereign clouds via graph_base_url_env — US Gov / China / GCC operators override MSTEAMS_GRAPH_BASE_URL and add their domains to egress.allowed_domains. Default bundle stays minimal for the 99% on commercial cloud.

Error handling

Graph status Behaviour
401 Force auth.Refresh(); if it still 401s → 60s backoff
403 Log with docs/channels/msteams.md remediation hint; 60s backoff
429 Honour Retry-After header; clamped to [10s, 300s]
410 Gone (delta cursor expired) Critical — discard cursor, re-init from $filter=lastModifiedDateTime gt <now>. WARN log. Never crashes.
5xx / network Exponential backoff 5s → 10s → 20s → 40s → 60s capped; reset on first success

Tests (47 total, no network)

  • channels/markdown — 18 tests: every conversion-table row from [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §6, plus HTML-escape safety, mention preservation, entity decoding, <br> handling, under-limit and over-limit splits, mention extraction with and without userID.
  • channels/msteams — 29 tests:
    • dedup: seen/mark, ring eviction (oldest first), duplicate-mark non-eviction
    • cursor: atomic save+load round-trip, 0700 dir perm enforced, corrupt-file recovery
    • admission: self-loop beats allowlist (hard rule), DM-only admit/drop, mention-only admit/drop, mention_or_dm, bot allowlist admit/drop
    • stripBotMention: 6 cases including case-insensitive prefix and non-start non-strip
    • parseAllowBotIDs: comma / space / newline splits
    • graph client: /me, delta paging with @odata.nextLink, all four error sentinels, Retry-After parsing with floor/ceiling clamps, POST payload shape
    • auth manager: delegated refresh-token flow with rotation hook + token cache (proves second Token() call doesn't re-hit endpoint), client_credentials flow, token-endpoint error surfaces correctly
    • Plugin Init validation: 4 cases covering each required field + poll-interval clamping
go test ./forge-core/... ./forge-cli/... ./forge-plugins/... — all green
gofmt -w + golangci-lint — 0 issues

Deferred to a follow-up PR

These are independent from the runtime adapter and won't block this PR from being reviewed and merged:

  • forge channel add msteams interactive bubbletea wizard with the device-code flow capture and live /me validation. Today, forge channel add msteams still works non-interactively via the existing add/serve plumbing — it writes the template and the .env placeholders; the operator runs the device-code curl themselves and fills in the refresh token. The wizard is purely UX sugar.
  • docs/channels/msteams.md user-facing setup guide (Entra app registration walkthrough, permission list, troubleshooting matrix, sovereign-cloud override notes).
  • Live end-to-end test against a real tenant (Phase E of [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75). All other Phase E test cases are covered by the unit tests above; this one needs a tenant + manual interaction.
  • FORGE_PROJECT_DESIGN.md Channel Connectors table row — separate doc-sync follow-up after merge, same pattern as PR docs: add linear and code-plan to embedded-skills index #72 for linear + code-plan.

Open questions deferred (from #75 §14)

The four open questions from the issue are best resolved in the wizard PR, since they all affect UX:

  1. Delegated vs client_credentials default in the wizard — currently delegated per the template default.
  2. Mention-strip behaviour — implemented as strip (the recommended option in [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75).
  3. Schedule delivery target formatev.WorkspaceID carries the Graph chat.id as-is; scheduler prompt already handles arbitrary strings.
  4. msteams-channels (in-team channel messages) — explicitly out of scope per [Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only) #75 §15 (blocked on the Microsoft @odata.nextLink bug).

Reference patterns followed

Implements Phases A-D of the FORGE_MSTEAMS_CHANNEL_GRAPH_POLLING design:
core adapter + Graph client + auth + markdown converter + unit tests.
Outbound-only — no inbound webhooks, no public endpoint, no Bot Framework.

CLI wizard (Phase E), user-facing setup docs, and live end-to-end testing
against a real tenant are deferred to a follow-up PR — see #75 OPEN
QUESTIONS for the wizard UX decisions still to settle (delegated vs
client_credentials default, mention-strip behavior). The plumbing
already supports `forge channel add msteams` non-interactively via the
existing add/serve flow.

Files

forge-plugins/channels/msteams/
- msteams.go              Plugin shell (Init/Start/Stop/SendResponse/
                          NormalizeEvent), poll loop with 3/5/60s clamp,
                          backoff state machine, dispatch + admission gate
                          + dedup, large-response handling that mirrors
                          slack/telegram (summary + hosted-content
                          attachment, fallback to chunked).
- auth.go                 OAuth2 token manager: delegated refresh-token
                          flow + client_credentials. Caches tokens with a
                          60s expiry window. Refresh-token rotation hook
                          for the runner to persist back to .env / K8s
                          Secret.
- graph.go                Typed Graph client: /me, /users/{id}, delta
                          paging via getAllMessages/delta, POST
                          chats/{id}/messages, hosted-content
                          attachments (4 MiB cap enforced). Classifies
                          responses into errUnauthorized / errForbidden /
                          errCursorExpired (410) / *rateLimitedError
                          (Retry-After clamped 10-300s) / generic 5xx.
                          No `model=A/B` query param (metering era ended
                          2025-08-25).
- admission.go            4-stage gate: self-loop FIRST (beats allowlist
                          even on misconfiguration), then bot-allowlist
                          check, then mode filter (mention / dm /
                          mention_or_dm). stripBotMention drops the
                          leading "@DisplayName" so the LLM doesn't see
                          the bot's own name as the first word.
- cursor.go               .forge/channels/msteams-cursor.json with
                          atomic write (tmp + rename), 0700 dir perm.
                          Corrupt file loads as empty.
- dedup.go                Sliding 1000-ID ring; oldest evicted first;
                          concurrent-safe.

forge-plugins/channels/markdown/
- teams.go                MarkdownToTeamsHTML (headings, bold, italic,
                          inline code, fenced code, links, ul, ol,
                          blockquote, strikethrough; HTML-safe
                          escaping). TeamsHTMLToPlain (strips Teams
                          chatMessage HTML to text, preserves @mention
                          display names, decodes entities).
                          SplitMessageTeams (24KB threshold — leaves
                          headroom for HTML-tag overhead under the
                          ~28KB Teams body limit). ExtractMention
                          (mentions[] + <at id="N"> belt-and-braces).
- teams_test.go           18 tests across all converter surfaces.

forge-core/security/capabilities.go
- msteams bundle: graph.microsoft.com + login.microsoftonline.com.
  Sovereign clouds (graph.microsoft.us, microsoftgraph.chinacloudapi.cn)
  stay out of the default bundle — operators add via
  egress.allowed_domains.

forge-cli/cmd/channel.go
- ValidArgs include msteams. createPlugin + defaultRegistry register
  msteams.New(). addChannelEgressToForgeYAML adds msteams to
  egress.capabilities (same pattern as slack). printSetupInstructions
  prints the Entra app registration steps.

forge-cli/templates/init/
- msteams-config.yaml.tmpl  Config emitted by `forge channel add msteams`.
                            All settings ending in _env are auto-discovered
                            by the existing ChannelsStage from PR #54 for
                            K8s manifest emission — zero edits to k8s_stage.go
                            or channels_stage.go required.
- env-msteams.tmpl          .env placeholders for the five MSTEAMS_*
                            variables.

Tests (all under unit-test budget — no network, no real Graph calls)

47 tests added across two packages:
- channels/markdown:   18 (MarkdownToTeamsHTML / TeamsHTMLToPlain /
                          SplitMessageTeams / ExtractMention)
- channels/msteams:    29 covering:
                         * dedup: seen-and-mark, ring eviction (oldest
                           first), duplicate-mark non-eviction
                         * cursor: atomic save + load round-trip, 0700
                           dir perm, corrupt-file recovery
                         * admission: self-loop beats allowlist
                           (HARD rule), DM-only, mention-only,
                           mention_or_dm, bot allowlist admit/drop
                         * stripBotMention: 6 cases including case-
                           insensitive prefix and non-start non-strip
                         * parseAllowBotIDs: comma/space/newline splits
                         * graph: /me happy path, delta paging
                           (@odata.nextLink shape), all four error
                           branches (401/403/410/429), Retry-After
                           parsing including floor/ceiling clamps,
                           POST chats/{id}/messages payload shape
                         * auth: delegated refresh-token rotation
                           hook fires + cache prevents second HTTP call,
                           client_credentials happy path,
                           refresh-token-expired error surfacing
                         * Plugin Init validation: tenant required,
                           delegated needs refresh_token, client_creds
                           needs user_id, poll_interval_seconds clamped
                           to 3/5/60

Zero edits to channels_stage.go, k8s_stage.go, requirements_stage.go,
template_data.go, K8s manifest templates, or cmd/package.go — the
existing _env-suffix scanner from PR #54 picks up msteams-config.yaml
automatically.

go test ./...      forge-core ✅ forge-cli ✅ forge-plugins ✅
golangci-lint      0 issues across all three modules
gofmt              clean

Deferred to a follow-up PR (Phase E + F from #75):
- forge-cli/cmd/channel_msteams.go interactive bubbletea wizard with
  device-code flow capture and live /me validation
- docs/channels/msteams.md user-facing setup guide (Entra app
  registration, permissions, troubleshooting, sovereign-cloud override)
- End-to-end manual test against a real tenant
- FORGE_PROJECT_DESIGN.md Channel Connectors table row
The init wizard's channel-selector step (forge-cli/internal/tui/steps/
channel_step.go) was hard-coded to None/Telegram/Slack. Adding the
msteams adapter to the runtime + CLI without surfacing it in the
interactive picker meant TUI users couldn't actually pick it.

Changes
- Add the "MS Teams" item to the SingleSelect list with the 👥 icon
  and the "Graph polling, no public URL needed" subtitle (same
  outbound-only framing as the existing options).
- Add 3 new phases for the 4-step token capture:
    channelMsteamsClientIDPhase
    channelMsteamsClientSecretPhase
    channelMsteamsRefreshTokenPhase
  (The 1st step — tenant_id — reuses the existing channelTokenPhase
  via the per-channel switch already used for Slack's two-step flow.)
- New `updateMsteamsPhase` helper drives the chain; each phase stores
  the just-collected value under the MSTEAMS_* env name and advances
  to the next prompt, or marks the step complete on the last step.
- New `newMsteamsInput` helper centralises the theme-binding boilerplate
  so the 4 phases stay terse.
- View() adds instructions for each of the 4 steps. The refresh-token
  step embeds the device-code curl invocation so the operator has the
  command at hand without leaving the TUI.
- Summary() returns "MS Teams" for the post-wizard recap row.

No new TUI components needed — reuses the existing SingleSelect +
SecretInput primitives. No changes to wizard plumbing
(WizardContext.ChannelTokens, init.go env-var copy, review_step,
egress_step) — the existing per-token-name map already supports
arbitrary channels.

The full device-code polling automation (where the wizard would call
/devicecode + /token itself instead of having the operator paste the
refresh token) was explicitly deferred in PR #76's body and remains
deferred; this commit just removes the "msteams is invisible in the
TUI" papercut.

go test ./forge-cli/... — all green
gofmt -w + golangci-lint — 0 issues
forge --version against /tmp/forge-issue-75-build/forge — boots
forge init shows MS Teams 👥 in the channel selector
The OAuth 2.0 device-code flow has two halves:
  1. user-consent: visit verification_uri, enter user_code, sign in
  2. client-polling: POST /token with grant_type=device_code repeatedly
     until the user completes consent

The previous TUI Step 4/4 instructions documented only half (1), so an
operator who finished the sign-in page never saw a refresh token —
nothing on their side was polling /token. The "successfully given
permission" Microsoft page is the consent-side result; the refresh
token only materialises when the client side polls and exchanges the
device_code.

New subcommand: forge channel msteams-login
- Reads tenant_id and client_id from --flags, $MSTEAMS_TENANT_ID /
  $MSTEAMS_CLIENT_ID in the shell env, or .env in the current dir
  (first non-empty wins).
- Calls POST {login_base}/{tenant}/oauth2/v2.0/devicecode with
  scope=https://graph.microsoft.com/.default offline_access — same
  scope set the runtime authManager uses, so the refresh token captured
  here is interchangeable with one captured externally.
- Prints verification_uri + user_code to stderr (so stdout stays clean
  for piping). Polls /token at the server-advertised interval, honours
  slow_down (interval += 5s), handles the four terminal error_codes
  (authorization_pending continue / expired_token / access_denied /
  generic error).
- On success, prints refresh_token to stdout (newline only — no labels,
  pipe-friendly). With --write-env, appends or replaces the
  MSTEAMS_REFRESH_TOKEN=<token> line in .env directly.
- Sovereign clouds: --login-base override (default
  https://login.microsoftonline.com; pass login.microsoftonline.us for
  US Gov, login.chinacloudapi.cn for China).
- Timeout: --timeout-seconds (default 900 = 15 minutes, matching
  Microsoft's device-code lifetime).

TUI Step 4/4 (channel_step.go) updated:
- Replaces the cryptic curl snippets with a single instruction to run
  `forge channel msteams-login` in a second terminal.
- Mentions --write-env so the operator can skip the paste-back entirely
  and just leave the TUI field blank.

Tested: forge channel msteams-login --help prints the expected
usage. Real Entra calls happen at runtime; the polling state machine
mirrors the runtime auth.go's refresh-token rotation handler.

go test ./forge-cli/... — all green
golangci-lint — 0 issues
…-open browser)

User feedback: running `forge channel msteams-login` from a second
terminal during the TUI wizard didn't work because the agent project
directory (and .env) don't exist until the wizard completes. Followed
by: "can tui trigger a browser open once I have provided the tenantid,
clientid and secret via tui?" — yes, and that's the right UX.

This change makes the MS Teams branch of `forge init` self-contained:

After collecting tenant_id (step 1), client_id (step 2), client_secret
(step 3), the wizard automatically:

  4a. POSTs /devicecode using the tenant + client just provided
  4b. Opens the verification_uri in the operator's default browser
      (cross-platform: open / xdg-open / start; failures are
       silently degraded — the URL is also shown on screen)
  4c. Renders the one-time user_code prominently in the TUI
  4d. Polls /token in the background until the operator approves
      consent in the browser
  4e. Captures the refresh_token, stores it under
      MSTEAMS_REFRESH_TOKEN in the wizard's token map, and advances

No second terminal. No manual paste. No curl wrangling. The wizard
finishes with a complete .env.

Architecture
- New shared package `forge-cli/internal/devicecode/` containing the
  RFC 8628 device-authorization-grant primitives:
    DeviceCodeResponse, TokenResponse,
    RequestDeviceCode, PollDeviceToken,
    OpenURL (cross-platform browser launcher mirroring the private
       openBrowser in forge-core/llm/oauth/flow.go)
  Used by BOTH the `forge channel msteams-login` CLI subcommand AND
  the TUI. Same scope (graph.microsoft.com/.default offline_access),
  same poll cadence, same error class handling — refresh tokens
  produced by either path are interchangeable.
- `cmd/channel_msteams_login.go` refactored to import from there;
  the duplicate types and helpers in this file are removed (~145
  lines down to ~210, mostly env-var helpers). Behaviour unchanged.
  As a small UX upgrade, this subcommand now also opens the
  verification_uri in the browser automatically.

TUI implementation
- New state `channelMsteamsDeviceLoginPhase` replaces the old
  `channelMsteamsRefreshTokenPhase` (which had been a paste-back
  SecretInput).
- New sub-state machine inside that phase: msteamsLoginRequesting →
  msteamsLoginWaiting → success | msteamsLoginErr.
- Two tea.Cmd dispatchers: requestDeviceCodeCmd (30s timeout) and
  pollTokenCmd (15-minute timeout, matches Microsoft's device-code
  expiry).
- View() renders:
  * "⣾ Requesting a one-time code from Microsoft..." during step 4a
  * The verification URL + user_code prominently, plus a
    "⣾ Waiting for you to complete sign-in..." spinner during 4d
  * On failure: error message + "Press R to retry, or S to skip"
    (skip allows the operator to finish the wizard and capture the
    refresh token later via `forge channel msteams-login --write-env`
    from the new project root)

Build / lint / test all green. No changes to runtime adapter code
(forge-plugins/channels/msteams/) — this is purely an installer
upgrade.

Existing manual flow (`forge channel msteams-login` from a project
root with .env present) still works for refresh-token rotation and
non-TUI deployments.
Error reproduced: AADSTS7000218 — "The request body must contain the
following parameter: 'client_assertion' or 'client_secret'."

The OAuth 2.0 device-authorization grant (RFC 8628) is designed for
public clients (no secret), but Microsoft Entra enforces stricter
credential checks when the registered app is a CONFIDENTIAL client
(a "web" application registration — anything with a client_secret).
For those apps, /token requires the client_secret to be present even
in the device_code grant.

The runtime adapter (forge-plugins/channels/msteams/auth.go) already
conditionally included client_secret in its refresh_token grant, so
this only manifested in the new device-code path.

Changes
- internal/devicecode/devicecode.go:
  PollDeviceToken signature gains a clientSecret string parameter.
  When non-empty, it's included in the form body. Public-client apps
  tolerate the extra field, so always passing it (when available) is
  safe — there's no need to ask the operator which kind their app is.

- cmd/channel_msteams_login.go:
  - New --client-secret flag.
  - Reads MSTEAMS_CLIENT_SECRET from the shell env or .env (same
    precedence pattern as tenant/client) when the flag is unset.
  - Threads the value through to PollDeviceToken.

- internal/tui/steps/channel_step.go:
  pollTokenCmd now passes s.tokens["MSTEAMS_CLIENT_SECRET"] which
  was captured in TUI step 3. The operator doesn't have to do
  anything different — the wizard had the secret in hand all along,
  we just weren't using it.

Side note: /devicecode itself does NOT require client_secret (it's
public for both client types), so RequestDeviceCode is unchanged.
Only the /token POST is affected.

Tested
- gofmt -w + golangci-lint — 0 issues
- go test ./forge-cli/... — all green
- Binary rebuilt at /tmp/forge-issue-75-build/forge
User saw the new ">> client_secret WAS sent (len=40)" diagnostic, which
confirmed the runtime IS sending client_secret correctly — Entra is
rejecting it for a different reason.

The actual most-common cause for AADSTS7000218 when client_secret IS
in the request body: the Entra app has 'Allow public client flows' OFF
under Authentication → Advanced settings. The device-code grant is
classified as a "public client flow" regardless of how the client
authenticates, and Entra rejects it outright on confidential-client
apps until that toggle is flipped on.

Update the diagnostic to point users at the toggle first, with the
expired-secret check as the less-common fallback. Format the diag as
multiple ">>"-prefixed lines so terminal wrapping can't hide it.

No code-path changes — only the error-message text differs. The actual
form-body handling (which IS correct) is unchanged.
User report: even after providing tenant_id / client_id / client_secret
in the TUI wizard (steps 1-3 of the msteams channel) and skipping
step 4 (device-code login), the generated agent project had no
MSTEAMS_* entries in .env or .forge/secrets.enc. First `forge run`
attempt fails with:

  Error: initialising msteams: msteams: tenant_id is required
    (set MSTEAMS_TENANT_ID)

Root cause: cmd/init.go:buildEnvVars has a per-channel switch that
converts opts.EnvVars (map) into the []envVarEntry written to disk.
That switch only knew about telegram and slack. msteams matched
nothing, so the four MSTEAMS_* keys captured by the TUI's
ChannelStep.Apply → ctx.ChannelTokens → opts.EnvVars chain were
silently dropped on the floor at the persistence step.

Fix: add a msteams case to the switch. Emits:
- MSTEAMS_TENANT_ID  → .env  (GUID, not a secret per isSecretKey)
- MSTEAMS_CLIENT_ID  → .env  (GUID)
- MSTEAMS_CLIENT_SECRET → .forge/secrets.enc (suffix _SECRET → secret)
- MSTEAMS_REFRESH_TOKEN → .forge/secrets.enc (suffix _TOKEN → secret)
  or placeholder in .env when empty (e.g., the user skipped step 4
  due to the "Allow public client flows" Entra gotcha and intends to
  capture the token later via `forge channel msteams-login --write-env`)
- MSTEAMS_USER_ID → .env if set (only used for client_credentials flow;
  delegated flow auto-resolves via /me at runtime, so we omit when empty)

The existing splitEnvVars + isSecretKey machinery already routes each
key correctly based on its suffix — this commit just makes sure the
keys actually reach that machinery.

go test ./forge-cli/... — all green
golangci-lint — 0 issues
User report: after toggling "Allow public client flows" = Yes in Entra
to fix AADSTS7000218 ("secret required") on the device-code call, the
RUNTIME refresh-token call started failing with the INVERSE error:

  AADSTS700025: Client is public so neither 'client_assertion' nor
  'client_secret' should be presented.

Microsoft enforces strict mutual exclusivity:
  - app confidential (toggle OFF) → client_secret REQUIRED
  - app public (toggle ON)        → client_secret FORBIDDEN

The agent can't tell which type the app is from inside the runtime,
and forcing the operator to declare it in msteams-config.yaml just
shifts the confusion. Instead, both code paths now auto-detect.

forge-plugins/channels/msteams/auth.go
  - Split refreshLocked into refreshLocked + requestToken + postTokenForm
    so the retry logic is testable without duplicating form construction.
  - First attempt sends client_secret when configured. On AADSTS700025
    (matched by string contains, no JSON re-parse), drop the secret
    and retry once on the same call.
  - Cache the inference in authManager.publicClientInferred — every
    refresh after the first detection skips the secret directly,
    avoiding a round-trip per hourly refresh.
  - Inference is per-process: rebooting the agent re-detects on first
    refresh. That's intentional; operators occasionally flip the toggle
    and we want the next start to discover the new truth.
  - client_credentials flow is unchanged — it's inherently confidential
    so the inference doesn't apply.

forge-cli/internal/devicecode/devicecode.go
  - Same pattern inside PollDeviceToken's polling loop: sendSecret flag
    starts true, flips false on AADSTS700025, the same device_code
    (which is valid for ~15 minutes) is re-used on the immediate retry.
  - This makes the TUI step 4 (inline OAuth) work for both app types
    out of the box.

Test surface
  - go test ./forge-plugins/channels/msteams/ — all green (existing
    TestAuthManager_* and TestGraphClient_* still pass against the
    refactored requestToken/postTokenForm split)
  - go test ./forge-cli/... — all green
  - gofmt + golangci-lint — 0 issues

Together with the existing AADSTS7000218 diagnostic message ("Allow
public client flows is OFF → toggle it on"), both Entra
client-type-mismatch errors now resolve themselves automatically
without operator intervention beyond the initial registration choice.
User report: agent boots and authenticates as MK successfully, then the
poll loop fires repeatedly with HTTP 412:

  PreconditionFailed: Requested API is not supported in delegated context

Root cause is a Graph API permission mismatch baked into PR #76's
design. /users/{id}/chats/getAllMessages/delta is APP-ONLY:
  Permission type: Delegated — NOT supported
  Permission type: Application — ChatMessage.Read.All, Chat.Read.All

The TUI wizard walks operators through the *delegated* OAuth flow
(device-code, personal API permissions, no admin consent), so the
token we present is delegated and Entra rejects with 412.

For delegated, Graph's per-chat /chats/{id}/messages/delta endpoint
*does* support delegated context (Chat.Read / ChatMessage.Read), but
there's no single getAll-style endpoint that does. The adapter must
enumerate /me/chats and maintain one delta cursor per chat.

graph.go:
  + ListChats(ctx, limit)  → GET /me/chats?$select=id,topic,chatType&$top=N
  + InitialChatDeltaURL(chatID, since) → /chats/{id}/messages/delta?$filter=...
  Both new endpoints work with the delegated Chat.Read scope already
  requested by the wizard.
  Existing InitialDeltaURL kept for the client_credentials path.

cursor.go:
  + Per-chat cursor map persisted alongside the existing single
    global deltaLink (both fields coexist in cursorFile JSON).
  + load/save keep the app-only behaviour for backwards compat.
  + loadChats/saveChats added for the delegated path.
  + writeLocked centralises the atomic tmp+rename so both APIs
    serialise to the same JSON shape.

msteams.go:
  + pollLoop dispatches by auth flow:
      FlowDelegated         → pollLoopDelegated (per-chat)
      FlowClientCredentials → pollLoopAppOnly  (existing single cursor)
  + pollLoopDelegated:
      - Lists chats once at startup, then every 60s in a separate
        refresh timer so new chats appear without restart.
      - Walks every known chat per poll tick (default 5s).
      - On per-chat 410, reinits just that chat's cursor (the others
        keep their position).
      - On 401, force-refreshes the token (same recovery as app-only).
      - On 429, honours Retry-After.
      - Exponential backoff on transient errors, reset on success.
      - Tracks chatType from /me/chats so the admission gate always
        sees the correct oneOnOne/group/meeting classification even
        when the chatMessage payload omits it.
  + dispatch gains a chatTypeHint parameter (passed by the delegated
    loop, empty by the app-only loop) so admission can fall back to
    the per-chat map when msg.ChatType is absent.

go test ./forge-plugins/... — all green
golangci-lint — 0 issues

This also closes the design gap from #75 §1.3: "An Entra ID app
authenticates as a single Teams user … polls
/users/{id}/chats/getAllMessages/delta on a 5-second cadence." The
operational story stays the same — outbound-only, no public endpoint,
~5s latency floor — but the Graph endpoint the adapter calls now
matches the permission model the wizard sets up.
User report: per-chat polling started but every chat got HTTP 400:

  BadRequest: Unsupported request: Change tracking is not supported
  against 'microsoft.graph.chatMessage'.

Root cause: /chats/{id}/messages/delta does NOT accept $filter. The
$filter=lastModifiedDateTime gt <ts> clause I was using to skip
historical backlog gets rejected outright by Microsoft Graph.

The Graph delta pattern for chatMessage:
  - First call (no $filter)         → returns ALL current messages in
                                       the chat, paginated via
                                       @odata.nextLink, ending in
                                       @odata.deltaLink
  - Subsequent calls (with deltaLink) → returns only changes since
                                       deltaLink was issued

Dispatching the initial-sync messages would flood the channel handler
with arbitrary chat history. The adapter must drain that initial sync
silently, persist the deltaLink, and only treat messages from polls
made AFTER the first deltaLink as new chat activity.

Changes
graph.go
  - InitialChatDeltaURL no longer takes `since time.Time` and no longer
    sets $filter. Just hits /chats/{id}/messages/delta with no query
    params. Comment explicitly calls out the change-tracking restriction.

cursor.go
  - Per-chat entry promoted from `map[string]string` (bare URL) to
    `map[string]chatCursorState{URL, DeltaSeen}`. DeltaSeen flips true
    the first time Graph returns @odata.deltaLink for the chat.
  - loadLocked now falls back to parse the legacy string-valued schema
    so cursor files produced by previous commits (b657cc8) are still
    accepted — they're treated as initial-sync state (DeltaSeen=false)
    so the next poll redoes initial sync rather than dispatching the
    next page as new traffic.

msteams.go (pollLoopDelegated)
  - Walks cursors as chatCursorState rather than bare strings.
  - When DeltaSeen is false: drain the response without dispatching,
    but DO mark each message in the dedup ring so a later cursor reset
    can't accidentally re-emit them.
  - When DeltaSeen is true: dispatch normally.
  - On deltaLink arrival, flip DeltaSeen=true and log
      "chat X initial sync complete (dropped N historical messages);
       now tracking changes"
    so operators can see the transition.
  - On 410 (cursor expired), reset to InitialChatDeltaURL AND clear
    DeltaSeen — the next poll re-does initial sync.

Tests + lint clean.
User report: cursor reset, fresh build, still hit:

  BadRequest: Unsupported request: Change tracking is not supported
  against 'microsoft.graph.chatMessage'.

Root cause is structural: Microsoft Graph v1.0 exposes NO delta
primitive for chatMessage in delegated context. Both endpoints I tried
fail by design:

  /chats/{id}/messages/delta             → HTTP 400 "Change tracking
                                            is not supported"
  /users/{id}/chats/getAllMessages/delta → HTTP 412 "API not supported
                                            in delegated context"

The only chatMessage delta that works in delegated v1.0 is the BETA
endpoint — but we're (correctly) targeting v1.0.

Solution: drop the delta primitive entirely from the delegated path
and use plain GET /chats/{id}/messages?$top=50&$orderby=lastModifiedDateTime%20desc
with client-side watermark tracking via lastModifiedDateTime.

graph.go
  - Replace InitialChatDeltaURL with ListChatMessages, a non-delta
    list call that returns newest-first up to 50 entries. Comment
    documents both failed delta attempts so future readers don't
    re-litigate this.
  - InitialDeltaURL (the app-only getAllMessages/delta call) kept
    for the client_credentials path where it does work.

cursor.go
  - chatCursorState shrinks from {URL, DeltaSeen} to {LastSeenTime}.
    Per-chat state is now just a single RFC3339 timestamp string.
  - Removed legacy schema fallback in loadLocked: when the on-disk
    file doesn't match the current schema, treat as empty so the
    next poll re-anchors cleanly. State files are internal scratch,
    not public API.

msteams.go (pollLoopDelegated)
  - First contact with a chat: anchor LastSeenTime at "now" and
    don't poll yet (no need to GET when we'd throw the results away
    anyway). The next tick starts dispatching real new messages.
  - Steady state: ListChatMessages(chat, 50), keep only entries with
    lastModifiedDateTime > LastSeenTime, dispatch in chronological
    order (Graph returns desc; we reverse so handler sees oldest
    first for session continuity), then advance LastSeenTime to the
    newest message's timestamp.
  - compareTimestamps helper: lexicographic comparison works for all
    ISO-8601 strings Graph returns because they're fixed-width.
  - All existing error handling (401 refresh, 429 backoff, generic
    backoff) carries over unchanged.

The visible operational change for operators:
  - Old log: "chat X initial sync complete (dropped N historical
              messages); now tracking changes"
  - New behaviour: no historical-message log line because the first
    contact with a chat now skips the GET entirely. The cursor file
    just records a timestamp on the first tick.

Tests + lint clean.
User report: agent is operational (self-loop guard fired correctly on
the user's own message) but one meeting chat (19:meeting_...@thread.v2)
keeps returning 403 forbidden. With the previous code that error path
broke out of the per-chat iteration AND tripped global exponential
backoff, so a single chat we lack permission for was effectively
slowing down polling on the 49 healthy chats.

Microsoft applies stricter permission gates on meeting chats: the
delegated Chat.Read scope is sometimes insufficient and per-meeting
consent (granted by the meeting organiser) is required. That's a
permanent state for any given (token, chat) pair — retrying every
5 seconds is pure log spam.

Changes (pollLoopDelegated only)
  - New forbiddenChats set tracks chats we've already learned we
    cannot read. Skipped at the top of the per-chat iteration so
    they cost zero HTTP calls.
  - On 403:
      * log once at NORMAL level (no "WARN", no backoff) with a hint
        about meeting-chat consent
      * add the chat to forbiddenChats
      * `continue` instead of `break` — the rest of the chats poll
        normally on this tick
  - Generic transient errors (5xx, network): also `continue` now
    instead of `break`. We still set anyErr=true so a global backoff
    kicks in if EVERY chat errored, but a single chat hiccup no
    longer blocks the rest.
  - On chat-list refresh (every 60s), clear forbiddenChats so a
    newly-granted permission gets picked up.
  - 401 and 429 still break out of the chat loop — those are global
    by nature (token expired = no chat works; rate limit applies
    across all calls in the tenant).

No tests added — the polling loop is hard to unit-test without
extensive HTTP mocking. Behavioural change covered by manual run
against the user's tenant.

Other modules unaffected. Lint + go test clean.
User report: "@forge Agent how can you help?" tagged in a group chat
was dropped by the admission gate as "msteams: dropping message
authored by self". The user expected the agent to reply.

Root cause is a fundamental property of delegated-mode Teams bots
that PR #76's admission logic got wrong:

  In delegated mode, the agent acts AS the user (the delegated OAuth2
  token grants the user's identity). When the agent posts a reply via
  PostChatMessage, Graph stamps that reply with from.user.id =
  ownUserID. When THE USER types a message from any Teams client,
  Graph stamps it with the same from.user.id = ownUserID. From inside
  admission, looking at from.user.id alone, the two are
  indistinguishable.

The previous self-loop guard ("drop if msg.From.User.ID == ownUserID")
was correct in spirit but caught both cases — including the case the
user actually cares about (typing a question to the agent from their
own Teams client). The result: any group chat or DM where the user
participates is silently dropped, agent never responds.

Correct loop prevention for delegated mode is per-message-ID: track
the IDs of messages we sent ourselves, and let the dedup ring catch
them before they reach admission.

Changes
graph.go
  - PostChatMessage now returns (string, error) — the created
    message's Graph ID.
  - PostChatMessageWithAttachment similarly. Comment explains why
    the return value matters in delegated mode.

msteams.go
  - SendResponse and sendChunked capture the ID from each successful
    POST and call markSent(id), which is a thin wrapper around
    dedup.mark for outbound messages.
  - markSent handles the empty-id corner gracefully so call sites
    don't need to check.

admission.go
  - Removed the from.user.id == ownUserID gate. ownUserID stays on
    the function signature (diagnostic value, future use) but is
    explicitly _ = ownUserID with a long comment explaining the
    delegated-identity trap and why dedup is the canonical loop
    breaker.

msteams_test.go
  - TestAdmit_SelfLoopBeatsAllowlist replaced with
    TestAdmit_SelfAuthoredUserMessageAdmitted, which encodes the
    new behaviour: a DM authored by ownUserID is admitted (loops
    are caught upstream by dedup, not here).
  - TestGraphClient_PostChatMessage updated to assert the returned
    message ID flows through correctly and matches what the stub
    server returned in its response body.

Behaviourally this is also right for the app-only (client_credentials)
path: agent posts as the application, so from.application is set on
its own messages instead of from.user, and the user.id check never
fired for app-posts anyway. Removing it is strictly more permissive
in delegated mode, no regression in app-only mode.

Lint + go test clean.
…ntion

User report: DM to agent works (admission passes via chat_type=oneOnOne)
but "@forge Agent how can you help?" in a group chat is dropped with
"non-mention non-dm message" — no clue WHY admission didn't see it as
a mention.

Add diagnostic logging on the non-mention drop path so the next round
of testing shows exactly what's in the mentions[] array. Two outcomes:

  Empty mentions[]:
    "no formal mentions in payload — in delegated mode you must @-mention
     your own Teams display name (autocomplete required); the agent
     shares your identity (<ownUserID>)"
    → user typed "@text" as plain text instead of using Teams'
      autocomplete to create a formal mention; OR they tried to mention
      a display name that doesn't resolve to a real Teams identity.
      In delegated mode the agent has no separate Teams display name —
      it shares the operator's user identity, so the operator must
      @-mention their OWN display name to invoke the agent.

  Non-empty mentions[] but no match:
    Logs each mention's text/user.id/displayName so the operator can
    see whether the mention is targeting a different user, an app id,
    or the right user under a different identifier.

Diagnostic only — no behavioural change to admission or dedup. The
delegated-identity quirk is structural and worth documenting in a
follow-up commit on the user guide, but the log now points operators
at the cause without needing source-level debugging.
User report: setup was complete, DM worked, but @-mentions in group
chats didn't trigger the agent. Diagnosis: during the device-code flow
the user signed in as themselves (MK) — so the agent's ownUserID is
MK's UUID, not the "Forge Agent" Microsoft 365 account they created
for the bot. @-mentioning Forge Agent in a chat doesn't match MK's
UUID and admission drops with non-mention.

The architectural rule for delegated-mode Teams agents: sign in as
the account you want the agent to ACT AS, which is also the account
other users will MENTION to invoke it. For multi-user deployments
this should be a dedicated service account (e.g.
forge-agent@yourtenant.onmicrosoft.com), not the operator's personal
account.

Add a prominent ImportantTxt-styled banner to the TUI step 4 (the
"waiting for sign-in" view) so operators see this before completing
device-code consent. Three lines, error-styled, embedded right above
the spinner:

  IMPORTANT: sign in as the dedicated Microsoft 365 account
  you want the agent to ACT AS (e.g. forge-agent@yourtenant).
  Other Teams users will @-mention that account to invoke the agent.

No code-path change; pure documentation surface. The existing failure
modes (signing in with the wrong account) are still recoverable via
`forge channel msteams-login --write-env` from the project root.
User report: after invoking the agent with "@forge Agent can you
summarize this week's updates?" in a group chat, the LLM correctly
admitted the mention but had no idea what "this week's updates"
referred to. The agent responded with a template, asking the user to
paste sources.

Architectural mismatch: the polling design anchors LastSeenTime to
"now" on first contact and only dispatches truly new messages. That's
correct for the "respond when invoked" loop semantics — but it means
the LLM sees only the literal mention text and zero conversational
context. Teams chats are thread-shaped; the chat history IS what the
agent should be reasoning about.

Fix: when a message passes admission, fetch the most recent N messages
in the same chat via ListChatMessages and prepend them as a context
block before the user's actual prompt. The historical messages are
NOT dispatched as separate events (that would flood the handler) —
they live only inside the current event's Message text.

Format:

  [Recent chat history for context — most recent message at the bottom:]
  05-20 14:32  Alice: working on the billing migration
  05-21 09:15  Bob: tests are green
  ...
  [End of history. The user's current message follows:]

  @forge Agent can you summarize this week's updates?

The current message ID is filtered out of the history (it's already
the prompt). HTML bodies are converted to plain text via the existing
TeamsHTMLToPlain helper. Bot-authored messages are tagged with
"(bot)" in the author field so the LLM doesn't conflate them.

Configuration (msteams-config.yaml):
  include_recent_history: true   # default; turn off for chats where
                                 # surrounding history would be noise
                                 # (e.g., alerts-only channels)
  recent_history_count: 20       # 1-50, hard-capped at 50 (Graph's
                                 # per-request limit on /chats/{id}/messages)

The formatted block is additionally soft-capped at ~5000 chars in
prependChatHistory to protect the LLM context budget — chatty threads
won't blow the context window even if recent_history_count is at the
max.

Cost: one extra HTTP call per admitted message. Mentions/DMs are
infrequent enough that this is a non-issue; large chats with
high-frequency admission would benefit from a future cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Microsoft Teams channel adapter via Graph polling (outbound-only)

1 participant