Skip to content

feat(mcp): add alert, saved search, and webhook MCP tools#2274

Merged
kodiakhq[bot] merged 9 commits into
mainfrom
brandon/mcp-alert-tools
May 14, 2026
Merged

feat(mcp): add alert, saved search, and webhook MCP tools#2274
kodiakhq[bot] merged 9 commits into
mainfrom
brandon/mcp-alert-tools

Conversation

@brandon-pereira
Copy link
Copy Markdown
Member

@brandon-pereira brandon-pereira commented May 13, 2026

Summary

  • Adds 5 new MCP tools: hyperdx_get_alert, hyperdx_save_alert, hyperdx_get_webhook, hyperdx_get_saved_search, and hyperdx_save_saved_search — enabling AI agents to manage alerts, saved searches, and webhook destinations via the MCP server.
  • Makes McpContext.userId required and rejects MCP requests without a user ID (403), tightening the auth surface.
  • Enhances getLoggedInAgent test fixture to accept custom credentials for multi-tenancy isolation tests.
  • Improves logging, output, and memory usage of integration tests as we were pushing on the edges and int became flakey

Details

New Alert Tools (tools/alerts/)

  • hyperdx_get_alert: List all alerts as a slim summary (id, name, state, source, interval) with optional state filtering, or get full detail with recent evaluation history for a specific alert.
  • hyperdx_save_alert: Create or update alerts with full support for threshold types (including range-based between/not_between), webhook channels, schedule configuration, and both saved-search and dashboard-tile sources. Includes runtime cross-field validation to work around MCP SDK limitations with ZodEffects.
  • hyperdx_get_webhook: List webhook destinations (id, name, service) without exposing sensitive fields (url, headers, body).

New Saved Search Tools (tools/savedSearches/)

  • hyperdx_get_saved_search: List saved searches (id, name, tags) or get full detail including query, source, filters.
  • hyperdx_save_saved_search: Create or update saved searches with support for Lucene/SQL where clauses, structured filters (including sql_ast), column selection, ordering, and tags.

Auth Tightening

  • McpContext.userId is now required (string instead of string | undefined).
  • The HTTP handler rejects requests without a userId with 403.
  • Tracing unconditionally sets the mcp.user.id attribute.

Test Coverage

  • Comprehensive integration tests for all new tools covering: create, update, list, detail, error handling (invalid IDs, non-existent records), and team scoping.
  • Updated existing dashboard test to include userId in context.

Documentation

  • Updated MCP.md tool table with the 5 new tools.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 13, 2026

🦋 Changeset detected

Latest commit: 71cea47

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Minor
@hyperdx/app Minor
@hyperdx/otel-collector Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 14, 2026 4:22pm

Request Review

@github-actions github-actions Bot added the review/tier-3 Standard — full human review required label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

🟡 Tier 3 — Standard

Introduces new logic, modifies core functionality, or touches areas with non-trivial risk.

Why this tier:

  • Diff size: 908 production lines changed (Tier 2 max: < 250)
  • Cross-layer change: touches backend (packages/api) + shared utils (packages/common-utils)

Review process: Full human review — logic, architecture, edge cases.
SLA: First-pass feedback within 1 business day.

Stats
  • Production files changed: 18
  • Production lines changed: 908 (+ 1000 in test files, excluded from tier calculation)
  • Branch: brandon/mcp-alert-tools
  • Author: brandon-pereira

To override this classification, remove the review/tier-3 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

E2E Test Results

All tests passed • 176 passed • 3 skipped • 1261s

Status Count
✅ Passed 176
❌ Failed 0
⚠️ Flaky 5
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

PR Review

Well-structured feature addition with comprehensive integration tests and proper team-scoping throughout. Multi-tenancy is correctly enforced: validateAlertInput verifies that webhook/saved-search references belong to the calling team (packages/api/src/controllers/alerts.ts:90-111), and all MCP queries are filtered by teamId. The required-userId tightening in app.ts and the McpContext type change are a sensible auth hardening.

Non-critical items worth a look:

  • ⚠️ Mixed PR scope → Three unrelated changes are bundled with the feature: Makefile (--output-style=stream), package.json (drop --coverage from ci:int), and queryParser.ts (console.errorconsole.warn). CLAUDE.md asks for single-logical-change PRs — easier to revert if one of these regresses CI later. Acceptable to land together, but call them out in the PR body next time.
  • ⚠️ Inconsistent teamId castinggetWebhook.ts:30 and getSavedSearch.ts:42 pass team: teamId as a string, while getAlert.ts:51 uses new mongoose.Types.ObjectId(teamId). Mongoose auto-casts so both work, but pick one pattern for clarity.
  • ⚠️ Forced type cast in saveAlert.ts:140alertInput as Parameters<typeof createAlert>[1] hides any future divergence between AlertInput and createAlert's parameter type. Consider narrowing the types instead so the compiler keeps you honest.
  • ⚠️ BaseError message extraction (saveAlert.ts:108-112) → Relies on e.name holding the user-facing message for BaseError subclasses. This works today (e.g. Api400Error) but is non-idiomatic — a future BaseError subclass that sets name to a generic class name would silently surface that to the MCP client. A short comment in BaseError explaining the convention, or a dedicated .userMessage field, would make this more durable.

No blocking issues; the feature itself is clean and well-tested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Deep Review

🟡 P2 -- recommended

  • packages/api/src/mcp/tools/alerts/schemas.ts:117 -- the MCP name field accepts min(1).max(512) but does not .trim(), so a payload like name: " " is stored as a whitespace-only alert name while the REST route (SavedSearchSchema / alertSchema callers in routers/api/savedSearch.ts:39 and the alerts router) rejects it with z.string().trim().min(1); this is a behaviour skew between MCP and HTTP for the same resource.

    • Fix: Add .trim() to the MCP name schemas in mcp/tools/alerts/schemas.ts and mcp/tools/savedSearches/schemas.ts so MCP and HTTP write identical names.
  • packages/api/src/mcp/tools/savedSearches/schemas.ts:36 -- MCP name has the same drift versus routers/api/savedSearch.ts:39 (z.string().trim().min(1)); whitespace-only saved-search names create rows that fail later UI assumptions.

    • Fix: Add .trim() to name so saved-search creation via MCP matches the HTTP contract.
  • packages/api/src/mcp/tools/alerts/saveAlert.ts:108 -- alertInput as Parameters<typeof createAlert>[1] papers over a real type mismatch: thresholdType in AlertInput is AlertThresholdType (nativeEnum) while the MCP shape carries a plain string union; the cast happens to work because the literal values match today, so any future enum rename will silently break only at runtime.

    • Fix: Drop the cast and build the controller payload from AlertThresholdType[...] (mirror the existing AlertSource mapping a few lines above) so the type system catches drift.
  • packages/api/src/fixtures.ts:217 -- the new credentials parameter to getLoggedInAgent has no caller in this PR (the team-scoping tests use fake ObjectIds), so the new fixture branch is unexercised dead code that future readers will assume is load-bearing.

    • Fix: Either remove the optional credentials parameter or land a test that uses it (e.g., a real second-user multi-tenancy assertion) so the branch is covered.
  • packages/api/src/mcp/tools/alerts/schemas.ts:151 -- validateSaveAlertInput's scheduleOffsetMinutes >= interval branch has no test coverage (the test file exercises between/thresholdMax only); this is the runtime substitute for the deleted Zod superRefine, so regressions here go unnoticed.

    • Fix: Add a test case in alerts.test.ts for scheduleOffsetMinutes: 5, interval: '1m' asserting the error string mentions the interval, and (optionally) a not_between variant of the threshold test.
  • packages/common-utils/src/queryParser.ts:1027 -- downgrading console.error to console.warn for enable_full_text_index fetch failures and KV-items lookup builds is unrelated to the MCP feature in this PR, and lowers the operational signal for what is a real DB or schema problem (the catch returns false / empty map, so callers silently lose full-text-index behaviour). Bundling unrelated log-level changes into a feature PR makes blame-tracking the regressed signal harder.

    • Fix: Split this change into its own PR (or revert it here) and, if the goal is only "less noise in CI integration tests", scope the suppression to the test harness instead of production code.
🔵 P3 nitpicks (3)
  • packages/api/src/mcp/tools/alerts/getAlert.ts:54 -- wraps teamId in new mongoose.Types.ObjectId(...) while sibling getWebhook.ts:29 passes the raw string; both work because of Mongoose autocasting but the inconsistency invites copy-paste mistakes.

    • Fix: Pick one style (prefer the explicit ObjectId) and apply it across the alerts/ and savedSearches/ MCP tools.
  • packages/api/src/mcp/tools/savedSearches/saveSavedSearch.ts:64 -- the update flow does a getSavedSearch existence check and then updateSavedSearch, which is itself a team-scoped findOneAndUpdate; the pre-check is a redundant round-trip and a thin race window where updated becomes null collapses into the unhelpful "Failed to update saved search" message.

    • Fix: Drop the pre-check and treat a null from updateSavedSearch as "not found".
  • packages/api/src/mcp/tools/savedSearches/schemas.ts:48 -- tags: z.array(z.string()) does not constrain element length, so empty-string or whitespace tags can be persisted; minor inconsistency with the rest of the saved-search UX where tags are user-meaningful labels.

    • Fix: Add .min(1).max(...) (and optionally .trim()) to the tag element schema, or accept the divergence intentionally.

Reviewers (5): correctness, testing, maintainability, project-standards, api-contract.

Testing gaps:

  • validateSaveAlertInput branches for scheduleOffsetMinutes >= interval and not_between thresholdType are uncovered.
  • New getLoggedInAgent(server, credentials) signature has no caller exercising the custom-credentials path.

Add five new MCP tools for managing alerts, saved searches, and webhooks:
- hyperdx_get_alert: list/detail alerts with state filtering and history
- hyperdx_save_alert: create/update alerts with threshold, channel, and schedule config
- hyperdx_get_webhook: list webhook destinations for alert channels
- hyperdx_get_saved_search: list/detail saved searches
- hyperdx_save_saved_search: create/update saved searches with filters and tags

Also makes McpContext.userId required (rejects requests without it),
enhances getLoggedInAgent fixture for multi-tenancy test isolation,
and adds comprehensive integration tests for all new tools.
…plicit BaseError instanceof check

- saveSavedSearch: remove ...existing.toJSON() spread that passed a
  populated createdBy object where Mongoose expects an ObjectId.
  savedSearchData already contains every user-editable field.
- saveAlert: replace stringly-typed e.name heuristic with explicit
  e instanceof BaseError check for error message extraction.
…e, slim list query, simplify channel schema, remove non-null assertions

- Add translateSavedSearchDocumentToExternalSavedSearch translator so
  saved search responses use 'id' (not '_id'), string IDs, and no __v,
  consistent with alert/dashboard tools.
- Replace getSavedSearches controller call with a slim
  SavedSearch.find(team, 'name tags').lean() for the list endpoint,
  avoiding unnecessary Alert.find and populate calls.
- Make webhookId required in the Zod schema (was optional with runtime
  check), removing the dead-branch channel validation and the non-null
  assertion in toAlertChannel.
- Replace input.id! non-null assertions with early-return narrowing
  via a local alertId variable.
- Update saved search tests to expect the new external response shape.
Comment thread packages/api/package.json
"lint:fix": "npx eslint . --ext .ts --fix",
"ci:lint": "yarn lint && yarn tsc --noEmit && yarn lint:openapi",
"ci:int": "DOTENV_CONFIG_PATH=.env.test DOTENV_CONFIG_OVERRIDE=true jest --runInBand --ci --forceExit --coverage",
"ci:int": "DOTENV_CONFIG_PATH=.env.test DOTENV_CONFIG_OVERRIDE=true jest --runInBand --ci --forceExit",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backporting coverage removal to OSS as this PR seems to have crossed the memory threshold and now the int tests are failing

… queryParser errors to warnings

- Replace getLoggedInAgent with second credentials in alert team-scoping
  tests with a fabricated McpContext using a non-existent teamId. The
  single-tenant registration route returns 409 when a team already
  exists, making it impossible to register a second user mid-test.
- Use client?.close() in all MCP test afterEach hooks to prevent
  cascade failures when beforeEach fails.
- Downgrade console.error to console.warn in queryParser for non-fatal
  buildKvItemsLookup and enable_full_text_index fallbacks.
Comment thread Makefile
@mkdir -p $(HDX_CI_LOGS_DIR)
docker compose -p $(HDX_CI_PROJECT) -f ./docker-compose.ci.yml up -d --quiet-pull
bash -c 'set -o pipefail; npx nx run-many -t ci:int --parallel=false 2>&1 | tee $(HDX_CI_LOGS_DIR)/ci-int.log'; ret=$$?; \
bash -c 'set -o pipefail; npx nx run-many -t ci:int --parallel=false --output-style=stream 2>&1 | tee $(HDX_CI_LOGS_DIR)/ci-int.log'; ret=$$?; \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backporting change to improve logging of int tests - failures previously may not be shown

// Pre-fetch KV items lookup (map column -> KV items column with text(tokenizer=array) index)
this.kvItemsLookupPromise = this.buildKvItemsLookup().catch(error => {
console.error('Error building KV items lookup:', error);
console.warn('Error building KV items lookup:', error);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very chirpy in int tests - moved to warn since it does not break (warns are suppressed in int tests)

updatedAt?: string;
};

export function translateSavedSearchDocumentToExternalSavedSearch(doc: {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we’re using these translateXXX methods in the external API, but it feels error-prone and not very straightforward to use. What do you think about moving this logic into the model file (as a method)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - let me ship this in a fast-follow :)

Copy link
Copy Markdown
Member

@wrn14897 wrn14897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kodiakhq kodiakhq Bot merged commit 46fe675 into main May 14, 2026
28 checks passed
@kodiakhq kodiakhq Bot deleted the brandon/mcp-alert-tools branch May 14, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge review/tier-3 Standard — full human review required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants