Skip to content

Align AI analysis (Stage 3) with static analysis (Stage 2) in Query Insights#616

Merged
tnaum-ms merged 40 commits into
nextfrom
dev/tnaum/query-insights-ai-improvement
Apr 28, 2026
Merged

Align AI analysis (Stage 3) with static analysis (Stage 2) in Query Insights#616
tnaum-ms merged 40 commits into
nextfrom
dev/tnaum/query-insights-ai-improvement

Conversation

@tnaum-ms
Copy link
Copy Markdown
Collaborator

@tnaum-ms tnaum-ms commented Apr 27, 2026

Summary

The AI-powered query analysis (Stage 3) had no knowledge of what the user was already shown in the static analysis (Stage 2). This caused contradictions — the static analysis would show "Good" performance with badges, then the AI would give a different assessment without explaining why. Worse, for low-cardinality boolean queries returning >50% of the collection, the AI would recommend creating high-priority indexes that would not help and would later trigger a "Low-cardinality index" warning when re-run.

This PR bridges the gap between Stage 2 and Stage 3, making the AI aware of the static analysis results and teaching it when NOT to recommend indexes.

Changes

Core: Static Analysis → AI Context Pipeline

  • buildStaticAnalysisSummary() — New function that produces a compact text summary of Stage 2 results (performance rating with scale, summary indicators, diagnostic badges, concerns) for inclusion in the AI prompt
  • Stage 2 response cachingClusterSession.setStage2Response() / getStage2Response() stores the transformed Stage 2 response so Stage 3 can access it
  • Context data injection — Stage 3 router builds the summary and passes it as additional context in the AI prompt via fillPromptTemplate()

Prompt Engineering

  • Instruction positioning matters — CRITICAL rules for low-cardinality and high-return-ratio moved to positions 3-4 (from 21-22) after finding that GPT-4o ignores late-positioned instructions when they conflict with strong training priors (COLLSCAN → create index)
  • Static analysis alignment rule (position 8) — AI must either affirm or explain divergence from static analysis findings
  • Role rename — From "Index Advisor assistant" to "Query Performance Analyst" to reduce bias toward always recommending indexes
  • Metadata echo removed — The JSON schema no longer requires echoing back collectionStats/indexStats/executionStats, saving ~40% of response tokens (the metadata field was never parsed or used)
  • "It is OK to recommend nothing" — Explicit instruction replacing the weaker "Be brave to say no"
  • Compound index escape hatch tightened — Low-cardinality leading keys in compound indexes also get low priority
  • Educational content aligned — Performance Metrics section now uses the same vocabulary as static analysis (Selectivity, Fetch Overhead, In-Memory Sort, Efficiency Ratio)

Infrastructure

  • Prompt resource files — Template bodies extracted to resources/prompts/index-advisor-{find,aggregate,count}.prompt.md with runtime loading via buildIndexAdvisorPrompt(). Inline constants kept as fallbacks
  • .prettierignore — Prompt files excluded from prettier (it was mangling triple backticks and indentation)
  • Prompt source trackinggetLastPromptSource() / setLastPromptSource() tracks whether resource files or inline fallbacks are used

Tests

  • 66 prompt template tests (structural integrity, snapshots, escape characters, load/fallback behavior)
  • 16 static analysis summary tests (all score levels, badge types, edge cases)
  • All 1910 existing tests pass

Verified Behavior

Tested with { "additionalInfo.isFamilyFriendly": true } on a 65K-document collection returning 35K docs (55% selectivity):

  • Before: AI recommended HIGH PRIORITY index creation on the boolean field
  • After: AI correctly returns LOW PRIORITY with risk note about low cardinality and high return ratio

tnaum-ms and others added 16 commits April 27, 2026 14:00
… badge color handling

Co-authored-by: Copilot <copilot@github.com>
Add comprehensive test coverage for prompt templates as a safety net
before refactoring them into resource files:

- Test createPriorityDeclaration() and createSecurityInstructions()
- Test CRITICAL_JSON_REMINDER content
- Test all 3 index advisor prompt templates (find/aggregate/count):
  - Required sections (PRIORITY DECLARATION, SECURITY INSTRUCTIONS, etc.)
  - JSON schema structural elements
  - Escape character and backtick fence integrity
  - Analysis and educational content template structure
  - Markdown compatibility rules
  - Azure vCore and low-selectivity guidance
- Test getQueryTypeConfig() for all query types
- Test structural integrity across all templates (no unresolved
  template literals, numbered instructions, verification requirements)
Extract the body of each index advisor prompt template (find, aggregate,
count) into standalone resource files under resources/prompts/:
- index-advisor-find.prompt.md
- index-advisor-aggregate.prompt.md
- index-advisor-count.prompt.md

The dynamic header (PRIORITY DECLARATION + SECURITY INSTRUCTIONS) remains
generated in TypeScript code. At runtime, the PromptTemplateService
composes the full prompt by loading the body from the resource file and
prepending the dynamic header.

The inline template constants are kept as fallbacks for when the
extension context is not available (e.g., early startup, tests).

Adds:
- loadPromptBody() function for reading resource files with caching
- buildIndexAdvisorPrompt() function for composing header + body
- Snapshot tests to detect accidental content changes
- Tests for the new loading/fallback behavior
Add buildStaticAnalysisSummary() function that produces a compact text
summary of the Stage 2 static analysis results for inclusion in the
AI prompt context. The summary includes:

- Collection context (total docs, returned, examined, keys, exec time)
- Performance rating (excellent/good/fair/poor)
- Summary indicators (selectivity, index used, fetch overhead, sort)
- All diagnostic badges with type markers ([+]/[-]/[i]) and details
- Concerns when present

The summary is designed to be concise (~300-500 chars for typical
queries) while giving the LLM enough context to understand what the
user has already been told by the static analysis.

Includes 16 tests covering all scenarios: good/poor queries, covered
queries, collection scans, low-cardinality badges, missing data, etc.
Wire up the Stage 2 static analysis results as additional context in the
AI prompt for Stage 3. The LLM now receives a summary of what the user
has already been shown (performance rating, selectivity, fetch overhead,
diagnostic badges, concerns).

Changes:
- Add staticAnalysisSummary optional field to QueryOptimizationContext
- Update fillPromptTemplate() to append static analysis summary to
  context data when available
- Update QueryInsightsAIService.getOptimizationRecommendations() to
  accept and pass through the summary
- Add Stage 2 response caching on ClusterSession (setStage2Response/
  getStage2Response) so Stage 3 can access the transformed results
- Update Stage 3 router handler to build static analysis summary
  from cached Stage 2 response and pass it to the AI service
- Update DATA PLACEHOLDERS in all 3 prompt resource files and inline
  template constants to document the new Static Analysis Results section
Add specific instructions to all 3 index advisor prompt templates
(find, aggregate, count) that teach the LLM about:

1. Low-cardinality index awareness: When the static analysis has shown
   a 'Low-cardinality index' badge, do NOT recommend creating indexes
   on those fields (boolean, enum-like). If still recommending, mark
   as low priority with a confidence note.

2. High return ratio: When the query returns >20% of the collection,
   creating an index is unlikely to help. Suggest hiding unused
   indexes instead.

These instructions appear in both the resource files and the inline
template constants (fallback).
Add 'Static analysis alignment' instruction to all 3 index advisor
prompt templates that governs how the LLM handles agreement and
disagreement with the static analysis shown to the user:

- If the AI agrees with the static analysis, briefly affirm it
- If the AI disagrees, it MUST explain why in the Performance Summary
  section using the format: 'The initial analysis showed [X], but
  after deeper inspection [Y] because [Z]'
- Must not silently contradict the static analysis
- Must explain when it thinks the heuristic was too strict or missed
  something

The instruction acknowledges that static analysis is heuristic-based
while the AI has access to the full execution plan, so divergence is
expected and acceptable when explained.
- Run npm run l10n to update localization bundle
- Run npm run prettier-fix to format code
- Fix lint error: replace inline import() type annotation with proper
  import in collectionViewRouter.ts
- Run npx jest --no-coverage: all 1910 tests pass
- Run npm run build: TypeScript compilation successful
…uctions

The low-cardinality and high-return-ratio rules were in the 'Thinking
tips' section of the prompt, which the LLM treats as optional guidance.
This caused the AI to recommend creating high-priority indexes on
boolean fields (e.g., isFamilyFriendly: true) even when the query
returns 55% of the collection.

Changes:
- Promote both rules from tips to mandatory numbered instructions in
  all 3 prompt templates (find, aggregate, count)
- Make the rules work from the query data itself (boolean filter values,
  return ratio) rather than depending on the static analysis badge,
  which only fires on index scans
- Remove the now-redundant tip-section duplicates
- The rules now mandate: no high-priority index on boolean/low-
  cardinality fields, and no high-priority single-field index when
  >20% of collection is returned
… source

Stage 2 telemetry (no PII/OII — all categorical or numeric):
- performanceScore: excellent/good/fair/poor distribution
- executionStrategy, indexUsed, hadCollectionScan, hadInMemorySort,
  isCoveringQuery, fetchOverheadKind, isSharded (categorical flags)
- executionTimeMs, documentsReturned, totalDocsExamined,
  totalKeysExamined, examinedToReturnedRatio (aggregatable metrics)
- selectivityPercent, totalCollectionDocs (when available)
- diagnosticBadgeIds (comma-separated IDs for segmentation)
- positiveBadgeCount, neutralBadgeCount, negativeBadgeCount

Stage 3 telemetry:
- hasStaticAnalysisSummary: whether the summary was available
- staticAnalysisSummaryLength: prompt size impact
- staticAnalysisSummaryError: tracks build failures
- hasCachedExecutionPlan: whether Stage 2 plan was reused

Prompt source tracking:
- promptSource: 'resource-file' | 'inline-fallback' | 'custom-file'
  Tracks whether resource files are being used successfully,
  enabling future removal of inline fallback constants.
- hasStaticAnalysisSummary in optimizeQuery context
…sitions 3-4

The LLM was ignoring low-cardinality and high-return-ratio rules when
they were at positions 21-22 out of 22 instructions. LLMs weight early
instructions much more heavily, and the strong COLLSCAN→create-index
prior from training data was overriding the late rules.

Changes:
- Move both rules to positions 3-4 (right after 'do not hallucinate')
  in all 6 prompt sources (3 resource files + 3 inline fallbacks)
- Add CRITICAL prefix to both rules for emphasis
- Add concrete example: { isFamilyFriendly: true } on 65K docs
  returning 35K → do NOT suggest indexing this boolean field
- Remove old duplicate rules from bottom of each template
- Renumber all subsequent instructions (old 3→5, old 4→6, etc.)

Verified with live testing: the AI now correctly returns LOW PRIORITY
with a low-cardinality risk note instead of HIGH PRIORITY.
Update the Performance Summary and Performance Metrics sections in
all prompt templates to be consistent with the static analysis:

Performance Summary:
- Replace generic '(excellent/good/poor)' with the actual scale used
  by our static analysis: 'Excellent, Good, Fair, or Poor'

Performance Metrics (educational content):
- Find: Replace generic 'Documents Examined vs Returned' / 'Keys
  Examined' / 'Inefficiencies Detected' with Selectivity, Fetch
  Overhead, In-Memory Sort, and Efficiency Ratio — matching the
  indicators the user already sees in the Stage 2 static analysis
- Aggregate: Add Selectivity alongside Pipeline Efficiency
- Count: Replace 'Documents Examined' with Selectivity and
  Efficiency Ratio

This ensures the AI educational content uses the same vocabulary
and concepts as the static analysis, reducing cognitive dissonance.
…ion 8

The instruction telling the LLM it can disagree with the static
analysis (but must explain why) was at the very end of the instruction
list. Moving it to position 8 (right after the analysis/educational
content structure rules 6-7) ensures the LLM processes it early
when deciding how to frame its Performance Summary.

Applied to all 6 prompt sources (3 resource files + 3 inline
fallbacks). Remaining instructions renumbered 9-22.
Prettier was mangling the prompt template files:
- Collapsing triple backtick references (```json → `json)
- Adding blank lines after markdown headings
- Changing indentation on continuation lines

These changes break the LLM prompt structure (the triple backtick
instruction is critical for JSON output formatting).

Added resources/prompts/ to .prettierignore to prevent this.
Apply 6 prompt improvements across all resource files and inline
fallback templates:

1. Rename role from 'Index Advisor assistant' to 'Query Performance
   Analyst' — reduces bias toward always recommending indexes
2. Tighten compound index escape hatch — low-cardinality leading
   keys in compound indexes also get low priority
3. Remove metadata echo requirement — the metadata field was never
   parsed or used, saving ~40% of response tokens
4. Remove '<=6 sentences' constraint on analysis — conflicted with
   the 3-section structure template
5. Remove duplicate tips already covered by CRITICAL instructions
   (Low-selectivity fields, Prioritize restrictive range)
6. Fix trailing literal newline in Limited confidence rule
7. Replace 'Be brave to say no' with explicit 'It is OK to recommend
   nothing' — clearer signal for weaker models
8. Remove specific isFamilyFriendly example from CRITICAL rule —
   too specific, could introduce skew
9. Use DocumentDB API / MongoDB API terminology in role naming
Copilot AI review requested due to automatic review settings April 27, 2026 18:37
@tnaum-ms tnaum-ms requested a review from a team as a code owner April 27, 2026 18:37
Add structured documentation for the AI static analysis alignment
work under docs/ai-and-plans/616-ai-static-analysis-alignment/.

Includes architecture diagram, key implementation decisions,
lessons learned about LLM instruction positioning, and file map.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces contradictions between Query Insights Stage 2 (static analysis UI) and Stage 3 (AI recommendations) by caching Stage 2 results and injecting a compact Stage 2 summary into the AI prompt context, plus refactoring index-advisor prompts into resource files with added telemetry and tests.

Changes:

  • Add a Stage 2 → Stage 3 context pipeline via ClusterSession caching and buildStaticAnalysisSummary() prompt injection.
  • Refactor index-advisor prompt templates to load prompt bodies from resources/prompts/* with fallback behavior and prompt-source tracking.
  • Expand telemetry and add tests/snapshots for prompt templates and the static analysis summary formatter.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/webviews/documentdb/collectionView/components/queryInsightsTab/components/summaryCard/custom/PerformanceRatingCell.tsx Updates diagnostics rendering (icons/colors) in performance rating cell.
src/webviews/documentdb/collectionView/collectionViewRouter.ts Caches Stage 2 response, builds static analysis summary for Stage 3 prompt, adds Stage 2/3 telemetry.
src/services/promptTemplateService.ts Builds built-in index-advisor prompts via resource-file composition and tracks prompt source for custom templates.
src/services/ai/QueryInsightsAIService.ts Adds staticAnalysisSummary parameter plumbing into optimization calls.
src/documentdb/queryInsights/staticAnalysisSummary.ts New helper to convert Stage 2 UI results into compact prompt-ready text summary.
src/documentdb/queryInsights/staticAnalysisSummary.test.ts Unit tests for static analysis summary formatting across scenarios.
src/documentdb/ClusterSession.ts Adds Stage 2 response cache storage/retrieval for Stage 3 usage.
src/commands/llmEnhancedCommands/promptTemplates.ts Adds prompt-body resource loading/composition + prompt-source tracking; updates prompt instructions/schema text.
src/commands/llmEnhancedCommands/promptTemplates.test.ts Adds structural + snapshot tests for prompt templates and load/fallback behavior.
src/commands/llmEnhancedCommands/snapshots/promptTemplates.test.ts.snap Snapshots for prompt template content integrity.
src/commands/llmEnhancedCommands/indexAdvisorCommands.ts Injects static analysis summary into context data and records prompt-source/static-summary telemetry.
resources/prompts/index-advisor-find.prompt.md New resource prompt body for find index-advisor template.
resources/prompts/index-advisor-aggregate.prompt.md New resource prompt body for aggregate index-advisor template.
resources/prompts/index-advisor-count.prompt.md New resource prompt body for count index-advisor template.
l10n/bundle.l10n.json Adds localized strings for Stage 3 static analysis summary build logging.
.prettierignore Excludes prompt resource files from Prettier formatting.
Comments suppressed due to low confidence (1)

src/commands/llmEnhancedCommands/promptTemplates.ts:174

  • INDEX_ADVISOR_ROLE is still set to "MongoDB Index Advisor assistant" and is used in createPriorityDeclaration(...), which becomes the highest-priority instruction in the prompt. This contradicts the new "Query Performance Analyst" positioning in the TASK section and also violates the repo terminology guidance to avoid "MongoDB" alone (see .github/copilot-instructions.md:197-211). Consider updating the role string to the new analyst wording (DocumentDB / MongoDB API).
const INDEX_ADVISOR_ROLE = 'MongoDB Index Advisor assistant';
const QUERY_GENERATOR_ROLE = 'MongoDB Query Generator assistant';

const INDEX_ADVISOR_TASK_FIND =
    'analyze MongoDB queries and provide index optimization suggestions based on the data provided';
const INDEX_ADVISOR_TASK_AGGREGATE =
    'analyze MongoDB aggregation pipelines and provide index optimization suggestions based on the data provided';
const INDEX_ADVISOR_TASK_COUNT =
    'analyze MongoDB count queries and provide index optimization suggestions based on the data provided';

Comment thread src/commands/llmEnhancedCommands/promptTemplates.ts Outdated
Comment thread resources/prompts/index-advisor-find.prompt.md Outdated
Comment thread src/services/promptTemplateService.ts Outdated
Comment thread src/services/promptTemplateService.ts Outdated
Comment thread src/services/promptTemplateService.ts Outdated
Comment thread resources/prompts/index-advisor-count.prompt.md Outdated
Comment thread src/webviews/documentdb/collectionView/collectionViewRouter.ts
Comment thread src/documentdb/ClusterSession.ts
Comment thread src/commands/llmEnhancedCommands/promptTemplates.ts
…PI Query Performance Analyst and Index Advisor'

The PRIORITY DECLARATION — the first thing the LLM reads — still used the
old 'MongoDB Index Advisor assistant' role, contradicting the body text
that says 'Query Performance Analyst'. By the PR's own argument about
positional sensitivity, this preserved the bias the rename was meant
to remove.

Updated the role string in all 3 promptConfigs entries
(promptTemplateService.ts) and the INDEX_ADVISOR_ROLE constant
(promptTemplates.ts) to use the same neutral framing.

Fixes: HIGH-1 from review
The inline fallback templates had broken numbering:
- FIND: 1,2,3,4,7,8,7,8,10,10,11-21 (5,6,9 missing; 7,8,10 duplicated)
- AGGREGATE/COUNT: 10,10 (9 missing; 10 duplicated)

Renumbered all three templates to sequential 1-21, matching the
resource files. This ensures correct LLM instruction ordering
when the resource file cannot be loaded.

Fixes: HIGH-2 from review
…nown

The Stage 2 response cache in ClusterSession stored `unknown` and
consumers cast it back with `as QueryInsightsStage2Response` — a
classic erase-then-cast anti-pattern. The producer already returns
`QueryInsightsStage2Response` from `transformStage2Response()`, so
the type information existed at both boundaries but was discarded.

Now the cache field, setter, and getter all use the concrete type.
Removed the unsafe cast at the consumer in collectionViewRouter.ts.

Fixes: MEDIUM-1 from review
Added `staticAnalysisSummaryErrorKind` telemetry property (captures
the error constructor name: TypeError, RangeError, etc.) to allow
production diagnosis without PII. Promoted logging from trace to
error level with the actual error message, since silent degradation
here is hard to diagnose.

Fixes: MEDIUM-3 + LOW-2 from review
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

MEDIUM-3 + LOW-2 fix: Added staticAnalysisSummaryErrorKind telemetry property to the catch block in collectionViewRouter.ts (captures the error constructor name for dashboard diagnosis). Promoted logging from trace to error level and included the error message.

See e3a5b0a

Per repo terminology rules, 'MongoDB' alone should not be used as a
product name. Updated the index advisor prompt constants, message
descriptions, task descriptions, DATA PLACEHOLDERS in inline fallback
templates, and the three resource files (find, aggregate, count) to
use 'MongoDB API' consistently.

Scope limited to LLM-facing prompt strings only (not general codebase).

Fixes: LOW-1 from review
Changed loadPromptBody return type from `string | undefined` to a
discriminated union that carries the failure reason. Now
`lastPromptSource` is set to `'inline-fallback-no-context'` (expected
during tests/early init) vs `'inline-fallback-read-error'` (packaging
bug), allowing production telemetry to distinguish the two cases.

Updated the test to match the new return type.

Fixes: LOW-3 from review
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

LOW-3 fix: loadPromptBody now returns a discriminated union with the failure reason. lastPromptSource telemetry distinguishes 'inline-fallback-no-context' (expected: no extension context) from 'inline-fallback-read-error' (packaging bug: file missing/unreadable).

See cd6bcdf

For find queries routed through QueryInsightsAIService, only
queryObject is populated — query is undefined. This caused the
user message to contain 'N/A' even though the prompt tells the
LLM the first user message contains the query. Now falls back to
JSON.stringify(queryObject) when query is not available.

Fixes: LOW-5 from review
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

LOW-6 fix: Regenerated 3 snapshot files after HIGH-1 (role rename), HIGH-2 (numbering fix), and LOW-1 (MongoDB → MongoDB API). All 66 prompt template tests pass.

See eba1dba

tnaum-ms and others added 11 commits April 27, 2026 19:51
- Updated l10n bundle to reflect the error message change from
  MEDIUM-3/LOW-2 (added {error} parameter)
- Fixed import/consistent-type-specifier-style lint error in
  ClusterSession.ts (use inline type specifier)
Changed from 'MongoDB API Query Performance Analyst and Index Advisor'
to 'MongoDB API Index Advisor assistant' per operator decision.
Regenerated 3 snapshots.

Amends: HIGH-1 from review
Co-authored-by: Copilot <copilot@github.com>
- Add MANDATORY bitmap single-field index hide rule (rule 19) to all prompts
- Improve justification requirements (rule 12): 2-3 sentences per recommendation
- Add runtime filter removal signal thinking tip for compound index suggestions
- Update fallback templates in promptTemplates.ts to match .md prompts
- Update snapshots
Split the 'Unhide recommendations' clause from rule 3 into its own
dedicated rule 4 (CRITICAL — Unhiding low-cardinality / boolean indexes)
with explicit requirements: low priority, justification with percentage,
and risks note. Renumbered all subsequent rules (now 1-24).

Applied consistently to find, aggregate, count prompt files and
inline fallback templates in promptTemplates.ts.
Change PREFERRED_MODEL from gpt-4o to gpt-5-mini for better
instruction following on complex 24-rule prompts. Add gpt-4o as
fallback (replacing gpt-4o-mini).

Add modelOptions support to CopilotService to pass reasoning_effort
('medium') via the VS Code LanguageModelChatRequestOptions API.
Wire through both index advisor and query generation callers.
Revert PREFERRED_MODEL back to gpt-4o (from gpt-5-mini) and
FALLBACK_MODELS back to gpt-4o-mini — needs more prompt review.

Fix inaccurate copilotDurationMs: move timing into CopilotService.sendToModel
to measure only the actual LLM request + streaming, excluding
selectChatModels overhead and telemetry wrapper. Return durationMs
in CopilotResponse so callers use the accurate measurement.
Co-authored-by: Copilot <copilot@github.com>
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

R-001 resolved — Priority-declaration role conflicts with resource-file body

The prompt has been tested and changes to LLM-facing strings require regression testing. Created a tracked issue instead of a code change:

#619Query Insights: Align priority-declaration role with resource-file prompt framing (milestone: 0.9.0)

The resource files already use the correct "Query Performance Analyst" framing; only the wrapping createPriorityDeclaration() header and inline fallbacks still say "Index Advisor assistant". Severity was reassessed from High to Medium because the detailed CRITICAL rules in the body dominate LLM behavior in practice.

@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

R-002 resolved — Planner-tree traversal helpers do not recurse through shard branches

Created a tracked issue to address alongside the explain schema work:

#620Query Insights: Planner-tree traversal helpers miss shard branches (milestone: 0.9.0)
#522Query Insights: Improve with Explain Schema — moved from 0.8.1 to 0.9.0 so both can be tackled together.

Severity was reassessed from Medium to Low: only affects the narrow edge case of sharded clusters with isBitmap under a shard branch, and only Stage 2 badges are affected (Stage 3 AI receives the raw explain JSON regardless).

Cross-collection template jumped 4→7→10 (missing 5,6,9).
Single-collection template had the same gap pattern (4→7→8→10→11...).
Renumbered both instruction lists to be sequential.

Ref: PR #616 review (R-003, C-012)
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

R-003 resolved — Query-generation prompt instruction numbering is non-sequential

Fixed in commit ba9f71b8. Both cross-collection and single-collection query-generation templates had non-sequential instruction numbering (4→7→10, missing 5/6/9). Renumbered both lists sequentially.

Related review comment: C-012.

Cache now stores { template, source } together instead of just the template
string. On cache hits, the source is restored before returning so that
telemetry reports the correct prompt source for the current request.

Also extracted a PromptSource type for all valid source values, widening
setLastPromptSource() from 3 values to the full union.

Ref: PR #616 review (R-004, C-006)
@tnaum-ms
Copy link
Copy Markdown
Collaborator Author

R-004 resolvedpromptSource telemetry can be stale on cache hits

Fixed in commit d7258a2f. The template cache now stores { template, source } together. On cache hits, the prompt source is restored via setLastPromptSource() before returning, so telemetry always reports the correct source for the current request.

Also extracted a PromptSource type union for all valid source values, widening setLastPromptSource() from 3 values to the full set of 6.

Related review comment: C-006.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Code Quality Checks

Check Status How to fix
Localization (l10n) ✅ Passed
ESLint ✅ Passed
Prettier formatting ✅ Passed

This comment is updated automatically on each push.

@github-actions
Copy link
Copy Markdown
Contributor

📦 Build Size Report

Metric Base (next) PR Delta
VSIX (vscode-documentdb-0.8.0-beta.vsix) 6.76 MB 6.79 MB ⬆️ +27 KB (+0.4%)
Webview bundle (views.js) 5.79 MB 5.79 MB ⬆️ +0 KB (+0.0%)

Download artifact · updated automatically on each push.

@tnaum-ms tnaum-ms merged commit f365ea1 into next Apr 28, 2026
8 checks passed
@tnaum-ms tnaum-ms deleted the dev/tnaum/query-insights-ai-improvement branch April 28, 2026 18:40
@tnaum-ms tnaum-ms added this to the 0.8.0 milestone May 11, 2026
@tnaum-ms tnaum-ms mentioned this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants