Skip to content

feat: Add support for topics to Search Contexts#1028

Merged
brendan-kellam merged 4 commits intosourcebot-dev:mainfrom
fatmcgav:feat-search-contexts-support-topics
Mar 23, 2026
Merged

feat: Add support for topics to Search Contexts#1028
brendan-kellam merged 4 commits intosourcebot-dev:mainfrom
fatmcgav:feat-search-contexts-support-topics

Conversation

@fatmcgav
Copy link
Contributor

@fatmcgav fatmcgav commented Mar 23, 2026

This commit updates Sourcebot to include support for using topics as
part of the Search Context definition.

As part of this:

  • Updated repoMetadataSchema to store topics for github and
    gitlab host types
  • Populate the topic list when compiling GitHub and Gitlab repos
  • Updated schemas to support includeTopics/excludeTopics
  • Expanded test coverage
  • Updated Docs

Fixes: #1027

N.B Code largely generated using Claude.

Summary by CodeRabbit

  • New Features

    • Search contexts now support topic-based filtering via includeTopics and excludeTopics, using glob patterns with case-insensitive matching.
  • Documentation

    • Added "Filtering by topic" docs and updated schema descriptions and examples for the new fields.
  • Tests

    • Added comprehensive tests covering include/exclude topic matching, glob patterns, case handling, and combined behaviours.

This commit updates Sourcebot to include support for using `topics` as
part of the Search Context definition.

As part of this:
* Updated `repoMetadataSchema` to store `topics` for `github` and
  `gitlab` host types
* Populate the topic list when compiling GitHub and Gitlab repos
* Updated schemas to support `includeTopics/excludeTopics`
* Expanded test coverage
* Updated Docs
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

Walkthrough

Adds topic-based filtering to Search Contexts via new includeTopics and excludeTopics fields, schema and type updates, repository metadata population for GitHub/GitLab topics, sync logic to apply glob-based, case-insensitive topic matching, and comprehensive tests covering include/exclude/topic interactions.

Changes

Cohort / File(s) Summary
Documentation
docs/docs/features/search/search-contexts.mdx
Added "Filtering by topic" docs describing includeTopics/excludeTopics, additive semantics, glob support, case-insensitive matching, and re-sync requirement.
Schema Snippets (docs)
docs/snippets/schemas/v3/index.schema.mdx, docs/snippets/schemas/v3/searchContext.schema.mdx
Added optional includeTopics and excludeTopics array properties and examples to SearchContext snippets.
Schema Definitions (TS/JSON)
packages/schemas/src/v3/index.schema.ts, packages/schemas/src/v3/searchContext.schema.ts, schemas/v3/searchContext.json
Introduced includeTopics/excludeTopics to SearchContext schemas (base and per-tenant variants) with descriptions noting glob support.
Type Definitions
packages/schemas/src/v3/index.type.ts, packages/schemas/src/v3/searchContext.type.ts, packages/shared/src/types.ts
Extended SearchContext types with includeTopics?: string[] and excludeTopics?: string[]; added optional `codeHostMetadata.github
Repo Metadata Population
packages/backend/src/repoCompileUtils.ts
Populate metadata.codeHostMetadata.github.topics and metadata.codeHostMetadata.gitlab.topics from repo source data.
Sync Implementation
packages/backend/src/syncSearchContexts.ts
Extend sync logic to include/exclude repos by topics: parse repo metadata, perform case-insensitive glob matching (micromatch), deduplicate adds, and filter excludes; upstream upsert/connect/disconnect flow unchanged.
Tests
packages/backend/src/syncSearchContexts.test.ts, packages/backend/src/gitlab.test.ts
Added Vitest coverage for include/exclude topics: exact, glob, case-sensitivity behavior, repos-without-topics cases, combined include/exclude interactions, deduplication with existing include filters, and GitHub/GitLab scenarios.
Changelog
CHANGELOG.md
Documented the new topic-based filtering feature under Unreleased.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Scheduler as Sync Scheduler
participant DB as Database
participant Sync as syncSearchContexts
participant Matcher as Topic Matcher (micromatch)
participant Upsert as DB Upsert
Scheduler->>DB: fetch SearchContexts + repos (id,name,metadata)
DB->>Sync: return contexts and repo metadata
Sync->>Matcher: extract topics from metadata, apply includeTopics globs
Matcher-->>Sync: matching repo IDs
Sync->>Matcher: apply excludeTopics globs on matched set
Matcher-->>Sync: filtered repo IDs
Sync->>Upsert: upsert SearchContext with repos.connect/disconnect
Upsert-->>DB: persist changes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • brendan-kellam
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: adding topic filtering support to Search Contexts, which is the primary objective across all modified files.
Linked Issues check ✅ Passed The pull request fully implements the requirements from issue #1027: added includeTopics/excludeTopics fields to search contexts, populated repository topic metadata, and provided appropriate schema and documentation updates.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing topic-based filtering for Search Contexts; no out-of-scope changes detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
packages/backend/src/ee/syncSearchContexts.test.ts (1)

93-95: Consider extracting a small getConnectedIds helper to reduce repetition.

The upsertCall.create.repos.connect.map(...) pattern is repeated many times, which adds noise and makes future assertion changes tedious.

🧹 Optional cleanup
+const getConnectedIds = (db: PrismaClient): number[] => {
+    const upsertCall = vi.mocked(db.searchContext.upsert).mock.calls[0][0];
+    return upsertCall.create.repos.connect.map((r: { id: number }) => r.id);
+};
...
-const upsertCall = vi.mocked(db.searchContext.upsert).mock.calls[0][0];
-const connectedIds = upsertCall.create.repos.connect.map((r: { id: number }) => r.id);
+const connectedIds = getConnectedIds(db);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.test.ts` around lines 93 - 95,
Extract a small helper function (e.g., getConnectedIds) in the test to read
connected repo ids from the mocked upsert call instead of repeating
upsertCall.create.repos.connect.map(...); implement
getConnectedIds(upsertCallOrCreate) to accept either the full upsertCall (from
vi.mocked(db.searchContext.upsert).mock.calls[0][0]) or the create object and
return the mapped id array, then replace occurrences where you currently do
upsertCall.create.repos.connect.map((r: { id: number }) => r.id) with calls to
getConnectedIds(upsertCall) to reduce repetition and simplify assertions in
tests referencing db.searchContext.upsert and upsertCall.
packages/backend/src/ee/syncSearchContexts.ts (2)

69-71: Normalize repo IDs once before upsert.

Dedup currently happens only in the includeTopics path. Consolidating dedup right before upsert also covers connection-based duplicates and lets Line 173 use a Set instead of repeated array rebuilds.

⚙️ Suggested refactor
+// Canonicalize once before read/write operations
+const uniqueReposById = new Map<number, { id: number; name: string; metadata: unknown }>();
+for (const repo of newReposInContext) {
+    uniqueReposById.set(repo.id, repo);
+}
+newReposInContext = [...uniqueReposById.values()];
+const newRepoIds = new Set(newReposInContext.map(repo => repo.id));
...
 await db.searchContext.upsert({
   ...
   update: {
     repos: {
       connect: newReposInContext.map(repo => ({ id: repo.id })),
       disconnect: currentReposInContext
-        .filter(repo => !newReposInContext.map(r => r.id).includes(repo.id))
+        .filter(repo => !newRepoIds.has(repo.id))
         .map(repo => ({ id: repo.id })),
     },

Also applies to: 87-93, 169-173

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.ts` around lines 69 - 71, The loop
that builds newReposInContext by concatenating connection.repos currently defers
deduplication until includeTopics path; instead, normalize and deduplicate repo
IDs once immediately before the upsert to cover duplicates from connections and
topics: collect all repo IDs into newReposInContext (from
connection.repos.map(repo => repo.repo) and any includeTopics additions), then
replace the repeated array concatenations with a single Set-based dedupe (e.g.,
const uniqueRepos = Array.from(new Set(newReposInContext))) and use uniqueRepos
for the upsert call and anywhere Line 173 reconstructs arrays so you can remove
repeated rebuilds and switch to Set lookups.

74-86: Extract topic parsing/matching into a shared helper to avoid drift.

The same parse + topic extraction + lowercase matching logic appears twice (include and exclude paths). A helper would keep behavior consistent and easier to evolve.

♻️ Suggested refactor
+const getNormalizedRepoTopics = (metadata: unknown): string[] | null => {
+    const parsed = repoMetadataSchema.safeParse(metadata);
+    if (!parsed.success) {
+        return null;
+    }
+    return [
+        ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
+        ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
+    ].map(topic => topic.toLowerCase());
+};
...
 if (newContextConfig.includeTopics) {
     const topicPatterns = newContextConfig.includeTopics.map(t => t.toLowerCase());
     const matching = allRepos.filter(repo => {
-        const parsed = repoMetadataSchema.safeParse(repo.metadata);
-        if (!parsed.success) {
+        const repoTopics = getNormalizedRepoTopics(repo.metadata);
+        if (!repoTopics) {
             return false;
         }
-        const repoTopics = [
-            ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
-            ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
-        ];
-        return repoTopics.some(t => micromatch.isMatch(t.toLowerCase(), topicPatterns));
+        return repoTopics.some(t => micromatch.isMatch(t, topicPatterns));
     });
...
 if (newContextConfig.excludeTopics) {
     const topicPatterns = newContextConfig.excludeTopics.map(t => t.toLowerCase());
     newReposInContext = newReposInContext.filter(repo => {
-        const parsed = repoMetadataSchema.safeParse(repo.metadata);
-        if (!parsed.success) {
+        const repoTopics = getNormalizedRepoTopics(repo.metadata);
+        if (!repoTopics) {
             return true;
         }
-        const repoTopics = [
-            ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
-            ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
-        ];
-        return !repoTopics.some(t => micromatch.isMatch(t.toLowerCase(), topicPatterns));
+        return !repoTopics.some(t => micromatch.isMatch(t, topicPatterns));
     });
 }

Also applies to: 133-145

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.ts` around lines 74 - 86, Extract
the repeated parse+topic-extraction+lowercase+matching logic into a shared
helper used by both include and exclude paths: create a function (e.g.,
extractRepoTopics or getNormalizedRepoTopics) that takes a repo object, uses
repoMetadataSchema.safeParse to validate, returns a normalized array of
lowercase topic strings (combining parsed.data.codeHostMetadata.gitlab.topics
and github.topics or empty array on parse failure), and then replace the inline
logic in the includeTopics block (which currently builds topicPatterns and calls
micromatch.isMatch) and the excludeTopics block to use this helper for matching
with micromatch.isMatch; ensure the helper is imported/defined near
syncSearchContexts so both places call the same code to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snippets/schemas/v3/index.schema.mdx`:
- Around line 154-179: The schema descriptions for includeTopics and
excludeTopics are GitLab-specific; update their "description" values to be
host-agnostic (referencing repositories or repository topics rather than
"GitLab") so they accurately reflect repository-level topic filtering across
hosts; modify the "includeTopics" and "excludeTopics" entries in the schema to
use neutral wording like "List of repository topics to include/exclude from the
search context. Repositories matching..." and make the same wording change for
the other occurrences of these fields (the similar block around lines 342-367)
to keep descriptions consistent.

In `@packages/backend/src/gitlab.test.ts`:
- Around line 144-156: The test currently asserts exclusion due to a casing
mismatch; update the test in gitlab.test.ts to expect the project is NOT
excluded (i.e., matching should be case-insensitive) and then modify the
implementation in packages/backend/src/gitlab.ts so shouldExcludeProject
normalizes project topics the same way it normalizes config topics before
matching: when evaluating include.topics and exclude.topics in
shouldExcludeProject, lowercase (or otherwise normalize) the entries from
project.topics as well as the config topics so comparisons are case-insensitive;
locate the matching logic inside shouldExcludeProject in gitlab.ts and apply the
same normalization helper used for config topics (or add one) to project.topics
before performing includes/excludes.

In `@packages/schemas/src/v3/index.schema.ts`:
- Around line 341-366: Update the descriptions for the includeTopics and
excludeTopics schema entries to use host-agnostic wording (e.g., "repository
topics") instead of "GitLab topics" so they match the base SearchContext
phrasing; locate the includeTopics and excludeTopics properties in
index.schema.ts and replace "List of GitLab topics..." with something like "List
of repository topics..." for both description fields to avoid implying
GitLab-only support.

In `@schemas/v3/searchContext.json`:
- Around line 47-72: Update the description strings for the includeTopics and
excludeTopics schema properties so they no longer say "GitLab topics" but
instead refer generically to "repository topics" (or state "GitHub and GitLab
repository topics") and keep the rest of the text intact; locate the
includeTopics and excludeTopics properties in the searchContext.json schema and
replace their description values accordingly so they match the generated type
files' wording.

---

Nitpick comments:
In `@packages/backend/src/ee/syncSearchContexts.test.ts`:
- Around line 93-95: Extract a small helper function (e.g., getConnectedIds) in
the test to read connected repo ids from the mocked upsert call instead of
repeating upsertCall.create.repos.connect.map(...); implement
getConnectedIds(upsertCallOrCreate) to accept either the full upsertCall (from
vi.mocked(db.searchContext.upsert).mock.calls[0][0]) or the create object and
return the mapped id array, then replace occurrences where you currently do
upsertCall.create.repos.connect.map((r: { id: number }) => r.id) with calls to
getConnectedIds(upsertCall) to reduce repetition and simplify assertions in
tests referencing db.searchContext.upsert and upsertCall.

In `@packages/backend/src/ee/syncSearchContexts.ts`:
- Around line 69-71: The loop that builds newReposInContext by concatenating
connection.repos currently defers deduplication until includeTopics path;
instead, normalize and deduplicate repo IDs once immediately before the upsert
to cover duplicates from connections and topics: collect all repo IDs into
newReposInContext (from connection.repos.map(repo => repo.repo) and any
includeTopics additions), then replace the repeated array concatenations with a
single Set-based dedupe (e.g., const uniqueRepos = Array.from(new
Set(newReposInContext))) and use uniqueRepos for the upsert call and anywhere
Line 173 reconstructs arrays so you can remove repeated rebuilds and switch to
Set lookups.
- Around line 74-86: Extract the repeated
parse+topic-extraction+lowercase+matching logic into a shared helper used by
both include and exclude paths: create a function (e.g., extractRepoTopics or
getNormalizedRepoTopics) that takes a repo object, uses
repoMetadataSchema.safeParse to validate, returns a normalized array of
lowercase topic strings (combining parsed.data.codeHostMetadata.gitlab.topics
and github.topics or empty array on parse failure), and then replace the inline
logic in the includeTopics block (which currently builds topicPatterns and calls
micromatch.isMatch) and the excludeTopics block to use this helper for matching
with micromatch.isMatch; ensure the helper is imported/defined near
syncSearchContexts so both places call the same code to avoid drift.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3cf0aef-13cb-47f7-a01c-21e0e13d56dd

📥 Commits

Reviewing files that changed from the base of the PR and between 220a790 and b16a95a.

📒 Files selected for processing (13)
  • docs/docs/features/search/search-contexts.mdx
  • docs/snippets/schemas/v3/index.schema.mdx
  • docs/snippets/schemas/v3/searchContext.schema.mdx
  • packages/backend/src/ee/syncSearchContexts.test.ts
  • packages/backend/src/ee/syncSearchContexts.ts
  • packages/backend/src/gitlab.test.ts
  • packages/backend/src/repoCompileUtils.ts
  • packages/schemas/src/v3/index.schema.ts
  • packages/schemas/src/v3/index.type.ts
  • packages/schemas/src/v3/searchContext.schema.ts
  • packages/schemas/src/v3/searchContext.type.ts
  • packages/shared/src/types.ts
  • schemas/v3/searchContext.json

Copy link
Contributor

@brendan-kellam brendan-kellam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the high quality PR, this looks great!

@brendan-kellam brendan-kellam merged commit 38a54bc into sourcebot-dev:main Mar 23, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FR] Support topics on Search Contexts

2 participants