feat: Add support for `topics` to Search Contexts by fatmcgav · Pull Request #1028 · sourcebot-dev/sourcebot

fatmcgav · 2026-03-23T10:40:07Z

This commit updates Sourcebot to include support for using topics as
part of the Search Context definition.

As part of this:

Updated repoMetadataSchema to store topics for github and
gitlab host types
Populate the topic list when compiling GitHub and Gitlab repos
Updated schemas to support includeTopics/excludeTopics
Expanded test coverage
Updated Docs

N.B Code largely generated using Claude.

Summary by CodeRabbit

New Features
- Search contexts now support topic-based filtering via includeTopics and excludeTopics, using glob patterns with case-insensitive matching.
Documentation
- Added "Filtering by topic" docs and updated schema descriptions and examples for the new fields.
Tests
- Added comprehensive tests covering include/exclude topic matching, glob patterns, case handling, and combined behaviours.

This commit updates Sourcebot to include support for using `topics` as part of the Search Context definition. As part of this: * Updated `repoMetadataSchema` to store `topics` for `github` and `gitlab` host types * Populate the topic list when compiling GitHub and Gitlab repos * Updated schemas to support `includeTopics/excludeTopics` * Expanded test coverage * Updated Docs

coderabbitai · 2026-03-23T10:40:40Z

Walkthrough

Adds topic-based filtering to Search Contexts via new includeTopics and excludeTopics fields, schema and type updates, repository metadata population for GitHub/GitLab topics, sync logic to apply glob-based, case-insensitive topic matching, and comprehensive tests covering include/exclude/topic interactions.

Changes

Cohort / File(s)	Summary
Documentation `docs/docs/features/search/search-contexts.mdx`	Added "Filtering by topic" docs describing `includeTopics`/`excludeTopics`, additive semantics, glob support, case-insensitive matching, and re-sync requirement.
Schema Snippets (docs) `docs/snippets/schemas/v3/index.schema.mdx`, `docs/snippets/schemas/v3/searchContext.schema.mdx`	Added optional `includeTopics` and `excludeTopics` array properties and examples to SearchContext snippets.
Schema Definitions (TS/JSON) `packages/schemas/src/v3/index.schema.ts`, `packages/schemas/src/v3/searchContext.schema.ts`, `schemas/v3/searchContext.json`	Introduced `includeTopics`/`excludeTopics` to SearchContext schemas (base and per-tenant variants) with descriptions noting glob support.
Type Definitions `packages/schemas/src/v3/index.type.ts`, `packages/schemas/src/v3/searchContext.type.ts`, `packages/shared/src/types.ts`	Extended `SearchContext` types with `includeTopics?: string[]` and `excludeTopics?: string[]`; added optional `codeHostMetadata.github
Repo Metadata Population `packages/backend/src/repoCompileUtils.ts`	Populate `metadata.codeHostMetadata.github.topics` and `metadata.codeHostMetadata.gitlab.topics` from repo source data.
Sync Implementation `packages/backend/src/syncSearchContexts.ts`	Extend sync logic to include/exclude repos by topics: parse repo metadata, perform case-insensitive glob matching (micromatch), deduplicate adds, and filter excludes; upstream upsert/connect/disconnect flow unchanged.
Tests `packages/backend/src/syncSearchContexts.test.ts`, `packages/backend/src/gitlab.test.ts`	Added Vitest coverage for include/exclude topics: exact, glob, case-sensitivity behavior, repos-without-topics cases, combined include/exclude interactions, deduplication with existing include filters, and GitHub/GitLab scenarios.
Changelog `CHANGELOG.md`	Documented the new topic-based filtering feature under Unreleased.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Scheduler as Sync Scheduler
participant DB as Database
participant Sync as syncSearchContexts
participant Matcher as Topic Matcher (micromatch)
participant Upsert as DB Upsert
Scheduler->>DB: fetch SearchContexts + repos (id,name,metadata)
DB->>Sync: return contexts and repo metadata
Sync->>Matcher: extract topics from metadata, apply includeTopics globs
Matcher-->>Sync: matching repo IDs
Sync->>Matcher: apply excludeTopics globs on matched set
Matcher-->>Sync: filtered repo IDs
Sync->>Upsert: upsert SearchContext with repos.connect/disconnect
Upsert-->>DB: persist changes

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

brendan-kellam

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The pull request title accurately describes the main change: adding topic filtering support to Search Contexts, which is the primary objective across all modified files.
Linked Issues check	✅ Passed	The pull request fully implements the requirements from issue `#1027`: added `includeTopics`/`excludeTopics` fields to search contexts, populated repository topic metadata, and provided appropriate schema and documentation updates.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing topic-based filtering for Search Contexts; no out-of-scope changes detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (3)

packages/backend/src/ee/syncSearchContexts.test.ts (1)

93-95: Consider extracting a small getConnectedIds helper to reduce repetition.

The upsertCall.create.repos.connect.map(...) pattern is repeated many times, which adds noise and makes future assertion changes tedious.

🧹 Optional cleanup

+const getConnectedIds = (db: PrismaClient): number[] => {
+    const upsertCall = vi.mocked(db.searchContext.upsert).mock.calls[0][0];
+    return upsertCall.create.repos.connect.map((r: { id: number }) => r.id);
+};
...
-const upsertCall = vi.mocked(db.searchContext.upsert).mock.calls[0][0];
-const connectedIds = upsertCall.create.repos.connect.map((r: { id: number }) => r.id);
+const connectedIds = getConnectedIds(db);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.test.ts` around lines 93 - 95,
Extract a small helper function (e.g., getConnectedIds) in the test to read
connected repo ids from the mocked upsert call instead of repeating
upsertCall.create.repos.connect.map(...); implement
getConnectedIds(upsertCallOrCreate) to accept either the full upsertCall (from
vi.mocked(db.searchContext.upsert).mock.calls[0][0]) or the create object and
return the mapped id array, then replace occurrences where you currently do
upsertCall.create.repos.connect.map((r: { id: number }) => r.id) with calls to
getConnectedIds(upsertCall) to reduce repetition and simplify assertions in
tests referencing db.searchContext.upsert and upsertCall.

packages/backend/src/ee/syncSearchContexts.ts (2)

69-71: Normalize repo IDs once before upsert.

Dedup currently happens only in the includeTopics path. Consolidating dedup right before upsert also covers connection-based duplicates and lets Line 173 use a Set instead of repeated array rebuilds.

⚙️ Suggested refactor

+// Canonicalize once before read/write operations
+const uniqueReposById = new Map<number, { id: number; name: string; metadata: unknown }>();
+for (const repo of newReposInContext) {
+    uniqueReposById.set(repo.id, repo);
+}
+newReposInContext = [...uniqueReposById.values()];
+const newRepoIds = new Set(newReposInContext.map(repo => repo.id));
...
 await db.searchContext.upsert({
   ...
   update: {
     repos: {
       connect: newReposInContext.map(repo => ({ id: repo.id })),
       disconnect: currentReposInContext
-        .filter(repo => !newReposInContext.map(r => r.id).includes(repo.id))
+        .filter(repo => !newRepoIds.has(repo.id))
         .map(repo => ({ id: repo.id })),
     },

Also applies to: 87-93, 169-173

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.ts` around lines 69 - 71, The loop
that builds newReposInContext by concatenating connection.repos currently defers
deduplication until includeTopics path; instead, normalize and deduplicate repo
IDs once immediately before the upsert to cover duplicates from connections and
topics: collect all repo IDs into newReposInContext (from
connection.repos.map(repo => repo.repo) and any includeTopics additions), then
replace the repeated array concatenations with a single Set-based dedupe (e.g.,
const uniqueRepos = Array.from(new Set(newReposInContext))) and use uniqueRepos
for the upsert call and anywhere Line 173 reconstructs arrays so you can remove
repeated rebuilds and switch to Set lookups.

74-86: Extract topic parsing/matching into a shared helper to avoid drift.

The same parse + topic extraction + lowercase matching logic appears twice (include and exclude paths). A helper would keep behavior consistent and easier to evolve.

♻️ Suggested refactor

+const getNormalizedRepoTopics = (metadata: unknown): string[] | null => {
+    const parsed = repoMetadataSchema.safeParse(metadata);
+    if (!parsed.success) {
+        return null;
+    }
+    return [
+        ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
+        ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
+    ].map(topic => topic.toLowerCase());
+};
...
 if (newContextConfig.includeTopics) {
     const topicPatterns = newContextConfig.includeTopics.map(t => t.toLowerCase());
     const matching = allRepos.filter(repo => {
-        const parsed = repoMetadataSchema.safeParse(repo.metadata);
-        if (!parsed.success) {
+        const repoTopics = getNormalizedRepoTopics(repo.metadata);
+        if (!repoTopics) {
             return false;
         }
-        const repoTopics = [
-            ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
-            ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
-        ];
-        return repoTopics.some(t => micromatch.isMatch(t.toLowerCase(), topicPatterns));
+        return repoTopics.some(t => micromatch.isMatch(t, topicPatterns));
     });
...
 if (newContextConfig.excludeTopics) {
     const topicPatterns = newContextConfig.excludeTopics.map(t => t.toLowerCase());
     newReposInContext = newReposInContext.filter(repo => {
-        const parsed = repoMetadataSchema.safeParse(repo.metadata);
-        if (!parsed.success) {
+        const repoTopics = getNormalizedRepoTopics(repo.metadata);
+        if (!repoTopics) {
             return true;
         }
-        const repoTopics = [
-            ...(parsed.data.codeHostMetadata?.gitlab?.topics ?? []),
-            ...(parsed.data.codeHostMetadata?.github?.topics ?? []),
-        ];
-        return !repoTopics.some(t => micromatch.isMatch(t.toLowerCase(), topicPatterns));
+        return !repoTopics.some(t => micromatch.isMatch(t, topicPatterns));
     });
 }

Also applies to: 133-145

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/backend/src/ee/syncSearchContexts.ts` around lines 74 - 86, Extract
the repeated parse+topic-extraction+lowercase+matching logic into a shared
helper used by both include and exclude paths: create a function (e.g.,
extractRepoTopics or getNormalizedRepoTopics) that takes a repo object, uses
repoMetadataSchema.safeParse to validate, returns a normalized array of
lowercase topic strings (combining parsed.data.codeHostMetadata.gitlab.topics
and github.topics or empty array on parse failure), and then replace the inline
logic in the includeTopics block (which currently builds topicPatterns and calls
micromatch.isMatch) and the excludeTopics block to use this helper for matching
with micromatch.isMatch; ensure the helper is imported/defined near
syncSearchContexts so both places call the same code to avoid drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snippets/schemas/v3/index.schema.mdx`:
- Around line 154-179: The schema descriptions for includeTopics and
excludeTopics are GitLab-specific; update their "description" values to be
host-agnostic (referencing repositories or repository topics rather than
"GitLab") so they accurately reflect repository-level topic filtering across
hosts; modify the "includeTopics" and "excludeTopics" entries in the schema to
use neutral wording like "List of repository topics to include/exclude from the
search context. Repositories matching..." and make the same wording change for
the other occurrences of these fields (the similar block around lines 342-367)
to keep descriptions consistent.

In `@packages/backend/src/gitlab.test.ts`:
- Around line 144-156: The test currently asserts exclusion due to a casing
mismatch; update the test in gitlab.test.ts to expect the project is NOT
excluded (i.e., matching should be case-insensitive) and then modify the
implementation in packages/backend/src/gitlab.ts so shouldExcludeProject
normalizes project topics the same way it normalizes config topics before
matching: when evaluating include.topics and exclude.topics in
shouldExcludeProject, lowercase (or otherwise normalize) the entries from
project.topics as well as the config topics so comparisons are case-insensitive;
locate the matching logic inside shouldExcludeProject in gitlab.ts and apply the
same normalization helper used for config topics (or add one) to project.topics
before performing includes/excludes.

In `@packages/schemas/src/v3/index.schema.ts`:
- Around line 341-366: Update the descriptions for the includeTopics and
excludeTopics schema entries to use host-agnostic wording (e.g., "repository
topics") instead of "GitLab topics" so they match the base SearchContext
phrasing; locate the includeTopics and excludeTopics properties in
index.schema.ts and replace "List of GitLab topics..." with something like "List
of repository topics..." for both description fields to avoid implying
GitLab-only support.

In `@schemas/v3/searchContext.json`:
- Around line 47-72: Update the description strings for the includeTopics and
excludeTopics schema properties so they no longer say "GitLab topics" but
instead refer generically to "repository topics" (or state "GitHub and GitLab
repository topics") and keep the rest of the text intact; locate the
includeTopics and excludeTopics properties in the searchContext.json schema and
replace their description values accordingly so they match the generated type
files' wording.

---

Nitpick comments:
In `@packages/backend/src/ee/syncSearchContexts.test.ts`:
- Around line 93-95: Extract a small helper function (e.g., getConnectedIds) in
the test to read connected repo ids from the mocked upsert call instead of
repeating upsertCall.create.repos.connect.map(...); implement
getConnectedIds(upsertCallOrCreate) to accept either the full upsertCall (from
vi.mocked(db.searchContext.upsert).mock.calls[0][0]) or the create object and
return the mapped id array, then replace occurrences where you currently do
upsertCall.create.repos.connect.map((r: { id: number }) => r.id) with calls to
getConnectedIds(upsertCall) to reduce repetition and simplify assertions in
tests referencing db.searchContext.upsert and upsertCall.

In `@packages/backend/src/ee/syncSearchContexts.ts`:
- Around line 69-71: The loop that builds newReposInContext by concatenating
connection.repos currently defers deduplication until includeTopics path;
instead, normalize and deduplicate repo IDs once immediately before the upsert
to cover duplicates from connections and topics: collect all repo IDs into
newReposInContext (from connection.repos.map(repo => repo.repo) and any
includeTopics additions), then replace the repeated array concatenations with a
single Set-based dedupe (e.g., const uniqueRepos = Array.from(new
Set(newReposInContext))) and use uniqueRepos for the upsert call and anywhere
Line 173 reconstructs arrays so you can remove repeated rebuilds and switch to
Set lookups.
- Around line 74-86: Extract the repeated
parse+topic-extraction+lowercase+matching logic into a shared helper used by
both include and exclude paths: create a function (e.g., extractRepoTopics or
getNormalizedRepoTopics) that takes a repo object, uses
repoMetadataSchema.safeParse to validate, returns a normalized array of
lowercase topic strings (combining parsed.data.codeHostMetadata.gitlab.topics
and github.topics or empty array on parse failure), and then replace the inline
logic in the includeTopics block (which currently builds topicPatterns and calls
micromatch.isMatch) and the excludeTopics block to use this helper for matching
with micromatch.isMatch; ensure the helper is imported/defined near
syncSearchContexts so both places call the same code to avoid drift.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b3cf0aef-13cb-47f7-a01c-21e0e13d56dd

📥 Commits

Reviewing files that changed from the base of the PR and between 220a790 and b16a95a.

📒 Files selected for processing (13)

docs/docs/features/search/search-contexts.mdx
docs/snippets/schemas/v3/index.schema.mdx
docs/snippets/schemas/v3/searchContext.schema.mdx
packages/backend/src/ee/syncSearchContexts.test.ts
packages/backend/src/ee/syncSearchContexts.ts
packages/backend/src/gitlab.test.ts
packages/backend/src/repoCompileUtils.ts
packages/schemas/src/v3/index.schema.ts
packages/schemas/src/v3/index.type.ts
packages/schemas/src/v3/searchContext.schema.ts
packages/schemas/src/v3/searchContext.type.ts
packages/shared/src/types.ts
schemas/v3/searchContext.json

docs/snippets/schemas/v3/index.schema.mdx

packages/backend/src/gitlab.test.ts

packages/schemas/src/v3/index.schema.ts

schemas/v3/searchContext.json

brendan-kellam

Thanks for the high quality PR, this looks great!

Replace some references to GitLab with repository

80972c5

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

docs/snippets/schemas/v3/index.schema.mdx Show resolved Hide resolved

packages/backend/src/gitlab.test.ts Show resolved Hide resolved

packages/schemas/src/v3/index.schema.ts Show resolved Hide resolved

schemas/v3/searchContext.json Show resolved Hide resolved

Gavin Williams added 2 commits March 23, 2026 10:47

One more replace

9ba10cd

Add CHANGELOG.md entry

5df6a1b

brendan-kellam approved these changes Mar 23, 2026

View reviewed changes

brendan-kellam merged commit 38a54bc into sourcebot-dev:main Mar 23, 2026
6 checks passed

github-actions bot mentioned this pull request Mar 23, 2026

Sourcebot Roadmap 🚀 #459

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support for `topics` to Search Contexts#1028

feat: Add support for `topics` to Search Contexts#1028
brendan-kellam merged 4 commits intosourcebot-dev:mainfrom
fatmcgav:feat-search-contexts-support-topics

fatmcgav commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brendan-kellam left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fatmcgav commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brendan-kellam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fatmcgav commented Mar 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading