Skip to content

feat(deploy): Streamable-HTTP MCP entrypoint and Azure deployment#103

Merged
miguelgfierro merged 31 commits into
mainfrom
deploy/azure-mcp
May 5, 2026
Merged

feat(deploy): Streamable-HTTP MCP entrypoint and Azure deployment#103
miguelgfierro merged 31 commits into
mainfrom
deploy/azure-mcp

Conversation

@miguelgfierro
Copy link
Copy Markdown
Contributor

@miguelgfierro miguelgfierro commented May 4, 2026

Summary

  • New firefly-mcp-http CLI: minimal FastAPI app exposing FastMCP at /mcp over Streamable HTTP plus /healthz. Reuses create_mcp_app(); bypasses mount_http (which doesn't wire FastMCP's lifespan into the parent FastAPI — tools/list returns 500 otherwise).
  • Four MCP tools via corpus_rag.py: ingest_corpus_filesystem, ingest_corpus_sharepoint, corpus_retrieve, corpus_query — backed by library-grade CorpusAgent (MarkdownChunker + SqliteVec + hybrid retrieval + reranker + answerer).
  • Multi-stage Dockerfile (uv sync --extra rest --extra mcp, non-root, CMD ["firefly-mcp-http"]).
  • .github/workflows/deploy-mcp.yml — OIDC login → ACR build/push → az containerapp update.
  • docs/deploy/corpus-persistence.md — operator guide for durable CORPUS_ROOT on Azure Container Apps.

Auth model

No framework-level API keys. Auth belongs at the experience layer (Container Apps EasyAuth, future dedicated auth service) per the zero-trust direction in #98 — the framework itself stays unaware of bearer tokens.

Running locally and connecting to Claude Code

1. Copy and fill the template:

cp .env.template .env
# Fill in ANTHROPIC_API_KEY, EMBEDDING_BINDING_HOST, EMBEDDING_BINDING_API_KEY, etc.
# Set CORPUS_ROOT to an absolute path on your machine (not $PWD — dotenv doesn't expand shell vars)

2. Start the server:

uv run dotenv -f .env run -- firefly-mcp-http
# Server starts on PORT from .env (default 8080). Verify with:
curl http://localhost:8080/healthz   # → {"status":"ok"}

3. Register with Claude Code (once):

claude mcp add --transport http --scope user firefly-rag http://localhost:8080/mcp/

4. Test in Claude Code:

Ingest a local folder first (replace my-corpus and path as needed):

Use ingest_corpus_filesystem — corpus_id "my-corpus", root_path "/absolute/path/to/docs"

Then query it:

Use corpus_retrieve — corpus_id "my-corpus", question "what is this about?", top_k 5
Use corpus_query — corpus_id "my-corpus", question "summarise the key topics"

corpus_retrieve returns raw ranked chunks (no LLM answer) — useful for judging retrieval quality independently. corpus_query runs the full pipeline and returns a grounded answer with citations.

Azure resources (already provisioned in rg-firefly, spaincentral)

Resource Name Notes
Log Analytics firefly-logs PerGB2018, 30-day retention.
ACR fireflysignature Basic, admin disabled.
Storage + blob fireflysignature / firefly-artifacts StorageV2, LRS, public access disabled.
Key Vault kv-firefly-signature RBAC-mode, purge-protection on.
User-assigned MI firefly-mcp-mi Roles: AcrPull on ACR, Storage Blob Data Contributor on Storage, Key Vault Secrets User on KV.
Federated credential on firefly-mcp-mi Issuer token.actions.githubusercontent.com, subject repo:fireflyframework/fireflyframework-agentic:ref:refs/heads/main, audience api://AzureADTokenExchange.
Container Apps env firefly-env Wired to firefly-logs.
Container App firefly-mcp Ingress external port 8000; min 0 / max 3 replicas; 0.5 CPU / 1 Gi; ACR auth via the user-assigned MI. URL: https://firefly-mcp.mangosmoke-5d24814d.spaincentral.azurecontainerapps.io.

GitHub repo secrets AZURE_CLIENT_ID / AZURE_TENANT_ID are already configured for the workflow.

Verification

  • Local: docker buildx build -t firefly-mcp:dev ., docker run -p 8000:8000/healthz 200, MCP initialize returns {"name":"firefly","version":"3.2.4"}, tools/list returns the four corpus RAG tools.
  • Azure: same /healthz and MCP handshake succeed over HTTPS at the URL above.
  • Unit: pytest tests/unit/cli/test_mcp_http.py — 2 passing.

Refs: #98

Test plan

  • Copy .env.template.env, fill credentials, start with uv run dotenv -f .env run -- firefly-mcp-http.
  • Register with Claude Code: claude mcp add --transport http --scope user firefly-rag http://localhost:8080/mcp/.
  • Ingest a local folder via ingest_corpus_filesystem and verify counts returned.
  • Run corpus_query and confirm grounded answer with citations.
  • Re-run unit tests: uv run pytest tests/unit/cli/.
  • Build image locally with docker buildx build and probe /healthz + MCP initialize.
  • Decide and wire EasyAuth (Entra ID) on firefly-mcp before exposing the URL beyond internal use.

…ffolding

- Add firefly-mcp-http CLI entrypoint serving FastMCP over Streamable HTTP with /healthz
- Multi-stage Dockerfile (uv sync --extra rest --extra mcp, non-root)
- Idempotent Azure provisioning script for rg-firefly (Log Analytics, ACR,
  Storage + blob, Key Vault, user-assigned MI with federated GH credential,
  Container Apps env, Container App)
- GitHub Actions workflow building and deploying via OIDC
- Auth left at the ingress layer (Entra/EasyAuth on Container Apps), no
  framework-level auth keys, aligning with the zero-trust direction in #98

Refs: #98
Comment thread .github/workflows/deploy-mcp.yml Fixed
Comment thread .github/workflows/deploy-mcp.yml Fixed
Comment thread .github/workflows/deploy-mcp.yml Fixed
Resources are live in rg-firefly; the script already drifted from
reality (LAW customerId trim, MI propagation retries). If we ever
rebuild from scratch, do it with Bicep/Terraform instead.
CodeQL flagged azure/login@v2 and docker/setup-buildx-action@v3 as
mutable refs. Pin to the commit SHAs of v2.3.0 and v3.9.0 respectively.
Adds two MCP-exposed tools:
- ingest_sharepoint(drive_id, corpus_id, root_folder?): pulls all changed
  files from a SharePoint drive (delta-based) and ingests them into a
  corpus via the existing rag.ingest pipeline. Auth via the Container
  App's managed identity → Microsoft Graph token.
- query_corpus(corpus_id, question, top_k): hybrid retrieval (BM25 + dense)
  with citations.

Caveats:
- SharePointSource import is guarded; the tool raises NotImplementedError
  until feat/content-sources-sharepoint merges to main.
- VectorStore is in-process InMemoryVectorStore — replaced when the
  blob-backed store lands.
- SqliteCorpus persisted under /tmp/firefly/corpora/<corpus_id>.db
  (ephemeral on Container Apps replicas).

Dockerfile syncs the rag, openai-embeddings, azure, markitdown and
sqlite-vec extras so the imports resolve at runtime.
Comment thread src/fireflyframework_agentic/cli/mcp_http.py Fixed
miguelgfierro and others added 21 commits May 4, 2026 13:45
Design that replaces PR #103's tools/builtins/sharepoint_rag.py with a
thin composition over the existing CorpusAgent + ContentSource Protocol.
Promotes CorpusAgent into the library, adds LocalFolderSource, and
collapses the parallel RAG stack PR #103 introduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Twelve TDD tasks covering: AnswerAgent + CorpusAgent moves into the
library, LocalFolderSource, IngestSummary + ingest_source, retrieve()
vs query() split, watch_source polling, CorpusNotFoundError, four MCP
corpus_rag tools, span-prefix rename, operator deployment guide, and
the PR #103 rebase checklist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a filesystem-based ContentSource implementation that yields files from a
local folder recursively. Mirrors the cursor-based contract of remote sources
(SharePoint, S3) so a single ingest pipeline can serve both local and remote
corpora. Supports hidden-file filtering via FolderWatcher.is_hidden(). V1 is
delta-less; future enhancements may add mtime-based incremental listing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the canonical ingest API to CorpusAgent that drives the unified
ContentSource loop: list_changed(cursor) → fetch → ingest_one →
commit_delta. Introduces IngestSummary—a typed result wrapper with
.results (list of IngestionResult), .cursor, and aggregate count
properties (ingested, skipped, failed).

This task is purely additive and does not modify the existing
ingest_folder method.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename test to describe asserted behaviour (cursor IS committed after
  per-file fetch failures, since the iterator drained).
- Use public current_cursor() in test assertions instead of reaching
  into the stub's private attribute.
- Add TODO comment flagging that per-file fetch failures are not
  recorded in the IngestLedger today, so they're invisible to
  operational replay; tracked for Task 5 / follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erSource

Change ingest_folder to return IngestSummary instead of list[IngestionResult],
delegating to the unified ingest_source pipeline via LocalFolderSource. This
is a breaking change for callers; update all call sites to use the new
IngestSummary shape (.results, .ingested, .skipped, .failed properties).

Per-file filtering via FolderWatcher.is_hidden stays the same; the cursor
contract and delta-less logic are now handled by LocalFolderSource.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split the query pipeline into two public methods:
- retrieve(question, top_k, rerank=True) runs expand→retrieve→rerank,
  returns list[ChunkHit] without LLM answer. Useful for MCP tools and
  callers that want to compose their own answer over the hits.
- query(question, top_k) wraps retrieve() + answerer, returns Answer.
  Maintains the full pipeline behavior and telemetry.

The refactor allows raw retrieval (with optional reranking) to be called
independently, making the retrieval surface reusable for downstream callers.

Tested with new unit tests asserting independence of the two methods:
- retrieve() does not invoke the answer agent
- query() invokes the answer agent
- retrieve() respects the rerank flag

All existing tests pass unchanged, validating backward compatibility.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parity with ingest_source's failed-fetch IngestionResult literal; both
methods now spell n_chunks out so future readers don't wonder whether
the omission was meaningful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four MCP tools backed by the library-grade CorpusAgent:

- ingest_corpus_filesystem(corpus_id, root_path) — uses LocalFolderSource
- ingest_corpus_sharepoint(corpus_id, drive_id, root_folder?) — uses
  SharePointSource with the runtime's managed-identity Graph token
- corpus_retrieve(corpus_id, question, top_k) — hybrid + rerank, no LLM
  answer; raises CorpusNotFoundError on unknown corpus_id
- corpus_query(corpus_id, question, top_k) — full pipeline with citations

Each call constructs a fresh CorpusAgent rooted at CORPUS_ROOT/<corpus_id>.
No process-global registry; on-disk SqliteCorpus + SqliteVec carry state
across requests. CORPUS_ROOT defaults to /tmp/firefly/corpora; operators
should override for any non-toy deployment (Container Apps /tmp is
ephemeral — see docs/deploy/corpus-persistence.md once Task 11 lands).

Replaces tools/builtins/sharepoint_rag.py and updates cli/mcp_http.py's
side-effect import accordingly. The deleted module reimplemented the
RAG stack with worse defaults (TextChunker vs. MarkdownChunker,
InMemoryVectorStore vs. SqliteVecVectorStore, no expander/reranker/
answerer, per-process _CORPORA registry that returned "corpus not
found" on cold restart). corpus_rag is the canonical replacement.

Note: tests catch ToolError (with CorpusNotFoundError as __cause__) because
BaseTool.execute wraps domain exceptions in ToolError — consistent with the
framework's exception-handling contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Javier Alvarez-Valle and others added 2 commits May 5, 2026 14:49
Standardize telemetry span names across the RAG pipeline to use the
firefly.rag.* prefix consistently (matching metric/instrument names in
_telemetry.py). Also add a timed_span around ingest_source with terminal
status attributes (success, skipped, failed counts).

Changes:
- agent.py: corpus_search.retrieve -> firefly.rag.retrieve
- agent.py: corpus_search.query -> firefly.rag.query
- agent.py: wrap ingest_source body in firefly.rag.ingest_source span
- answerer.py: corpus_search.answer -> firefly.rag.answer
- spec: update KQL and architectural references to use firefly.rag.* names

All 180 tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Apps

Documents the Azure Files volume approach for persisting corpora across
Container Apps replicas / restarts, the multi-replica single-writer
caveat for SqliteCorpus, the env-var surface consumed by the corpus_rag
MCP tools, and the managed-identity scopes needed for SharePoint
ingestion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/fireflyframework_agentic/tools/builtins/corpus_rag.py Fixed
from fastapi import FastAPI

from fireflyframework_agentic.exposure.mcp.server import create_mcp_app
from fireflyframework_agentic.tools.builtins import corpus_rag # noqa: F401 — registers tools
The module declared `log = logging.getLogger(__name__)` but never logged.
Code-quality bot flagged it on PR #103. The MCP layer wraps tool
exceptions in ToolError so they're already surfaced to the caller; if
we want internal step-level logging we'll add it deliberately, not as
dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Javier Alvarez-Valle and others added 2 commits May 5, 2026 15:35
python-dotenv passes \$PWD as a literal string; os.path.expandvars()
resolves it to the actual working directory at runtime.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…OT default

Groups vars by concern, removes trailing whitespace, replaces \$PWD with
/tmp/firefly/corpora (python-dotenv does not expand shell variables), and
adds a note pointing operators to the Azure Files mount for persistence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@miguelgfierro
Copy link
Copy Markdown
Contributor Author

miguelgfierro commented May 5, 2026

End-to-end auth flow: SharePoint → Claude Code

There are two separate phases: indexing (done once or on a schedule) and querying (every user session).


Phase 1 — Indexing: SharePoint → SQLite corpus on Azure

Runs inside the firefly-mcp Container App. The admin calls ingest_corpus_sharepoint(corpus_id, drive_id) via MCP.

  1. ManagedIdentityCredential() calls the Azure IMDS endpoint (internal to the Container App infrastructure) and receives a short-lived Microsoft Graph token for the firefly-mcp-mi managed identity — no secrets, no config needed.
  2. SharePointSource calls GET /drives/{drive_id}/root/delta on Graph to list changed files.
  3. Each file is downloaded, converted to Markdown (MarkitdownLoader), split into ~600-token chunks (MarkdownChunker), embedded (AzureEmbedder → Azure OpenAI), and written to corpus.sqlite on the Azure Files share mounted at CORPUS_ROOT.
  4. A delta cursor is saved so the next run only fetches changes.

Prerequisite: firefly-mcp-mi must be granted Sites.Selected (preferred) or Sites.Read.All in Entra ID for the target SharePoint site.


Phase 2 — Querying: user in Claude Code → answer

The user's ~/.claude/settings.json points at the MCP server URL. When they ask a question, Claude calls corpus_query(corpus_id, question) over Streamable HTTP:

  1. QueryExpander — Claude Haiku generates 3–5 query variants.
  2. HybridRetriever — BM25 (FTS5) + dense (vec0) search over corpus.sqlite.
  3. HaikuReranker — Claude Haiku ranks the top-20 candidates.
  4. AnswerAgent — Claude Sonnet produces a grounded answer with citations.

The SharePoint data never leaves Azure. The user receives only the answer text and source references.


Who authenticates what

Actor Target Mechanism
GitHub Actions (CI/CD) Azure ACR + Container Apps OIDC federated credential on firefly-mcp-mi
firefly-mcp container SharePoint / Microsoft Graph Managed Identity via IMDS — zero config
firefly-mcp container Azure OpenAI (embedder) EMBEDDING_BINDING_API_KEY env var
Claude Code user firefly-mcp HTTP endpoint Not yet implemented — EasyAuth/Entra planned (see test plan)

Proposed: Claude Code user auth via Container Apps EasyAuth (respects #98)

The architecture from #98 places auth at the experience layer (ingress), keeping the framework process completely unaware of credential material. The simplest implementation is Container Apps EasyAuth — it sits at the Container App ingress, the Python process is untouched.

Claude Code
  │  POST /mcp
  │  Authorization: Bearer <entra_token>
  │
  ▼
Container Apps EasyAuth  (ingress — never hits Python)
  → validates token against Entra ID tenant
  → invalid/missing → 401 immediately
  → valid → forwards request + adds X-MS-CLIENT-PRINCIPAL headers
  │
  ▼
firefly-mcp Python process
  (sees a normal MCP request, never touches auth)

Implementation — no framework code changes needed

1. Create an Entra App Registration

az ad app create --display-name "firefly-mcp" --sign-in-audience AzureADMyOrg
APP_ID=$(az ad app list --display-name "firefly-mcp" --query '[0].appId' -o tsv)
az ad sp create --id $APP_ID

2. Enable EasyAuth on the Container App

az containerapp auth microsoft update \
  -g rg-firefly -n firefly-mcp \
  --client-id $APP_ID \
  --tenant-id <tenant-id>

az containerapp auth update \
  -g rg-firefly -n firefly-mcp \
  --unauthenticated-client-action Return401

Return401 is critical — API clients (Claude Code) cannot follow a login redirect.

3. Claude Code users configure their token

# Acquire token (valid ~1 hour)
az account get-access-token --resource api://$APP_ID --query accessToken -o tsv
// ~/.claude/settings.json
{
  "mcpServers": {
    "firefly": {
      "url": "https://firefly-mcp.mangosmoke-5d24814d.spaincentral.azurecontainerapps.io/mcp",
      "headers": { "Authorization": "Bearer <token>" }
    }
  }
}

A small wrapper script can automate token refresh before each session:

TOKEN=$(az account get-access-token --resource api://$APP_ID --query accessToken -o tsv)
jq --arg t "$TOKEN" '.mcpServers.firefly.headers.Authorization = "Bearer \($t)"' \
  ~/.claude/settings.json > /tmp/s.tmp && mv /tmp/s.tmp ~/.claude/settings.json

Why this respects #98

  • The firefly-mcp Python process is zero-auth-aware — no msal, no token parsing, no validation code.
  • EasyAuth is the "capa de experiencia" from Autenticación del framework #98: it sits at the WAN boundary; downstream layers receive requests only after validation.
  • When the architecture graduates to a dedicated auth service, EasyAuth is replaced by that service — the container needs no changes.
  • The token_provider callable pattern already in SharePointSource follows the same principle for the inbound (Graph) direction.

@miguelgfierro miguelgfierro merged commit cb37412 into main May 5, 2026
9 checks passed
@miguelgfierro miguelgfierro deleted the deploy/azure-mcp branch May 5, 2026 16:49
miguelgfierro added a commit that referenced this pull request May 6, 2026
…k directly

Delete examples/corpus_search/agent.py, retrieval/answerer.py, and
retrieval/sql.py — these were forwarding shims created during the
CorpusAgent/AnswerAgent migration in PR #103. All consumers (example
__init__, retrieval/__init__, cli.py, and test files) now import from
fireflyframework_agentic.rag.* directly.
ancongui pushed a commit that referenced this pull request May 31, 2026
The module declared `log = logging.getLogger(__name__)` but never logged.
Code-quality bot flagged it on PR #103. The MCP layer wraps tool
exceptions in ToolError so they're already surfaced to the caller; if
we want internal step-level logging we'll add it deliberately, not as
dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ancongui pushed a commit that referenced this pull request May 31, 2026
feat(deploy): Streamable-HTTP MCP entrypoint and Azure deployment
ancongui pushed a commit that referenced this pull request May 31, 2026
…k directly

Delete examples/corpus_search/agent.py, retrieval/answerer.py, and
retrieval/sql.py — these were forwarding shims created during the
CorpusAgent/AnswerAgent migration in PR #103. All consumers (example
__init__, retrieval/__init__, cli.py, and test files) now import from
fireflyframework_agentic.rag.* directly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants