Skip to content

Commit 36518e4

Browse files
committed
feat: add temporal filtering to search and repository APIs
Add temporal filtering capabilities for searches by git branch/revision and repository index dates (since/until). Integrates with the refactored QueryIR-based search architecture. - Add gitRevision, since, until parameters to SearchOptions - Implement temporal repo filtering by indexedAt field - Add branch filtering via QueryIR wrapper - Add search_commits MCP tool for commit-based searches - Update list_repos with activeAfter/activeBefore filtering - Add 88 new tests (all passing) Signed-off-by: Wayne Sun <gsun@redhat.com>
1 parent f3a8fa3 commit 36518e4

File tree

17 files changed

+1775
-28
lines changed

17 files changed

+1775
-28
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
<!-- Bump @sourcebot/mcp since there are breaking changes to the api in this release -->
1111

1212
### Added
13+
- Added temporal filtering to search and repository APIs with support for git branch/revision filtering and repository index date filtering (since/until parameters). Supports both ISO 8601 and relative date formats (e.g., "30 days ago", "last week").
1314
- Added support for streaming code search results. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
1415
- Added buttons to toggle case sensitivity and regex patterns. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
1516
- Added counts to members, requets, and invites tabs in the members settings. [#621](https://github.com/sourcebot-dev/sourcebot/pull/621)

packages/mcp/CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Added
11+
- Added comprehensive relative date support for all temporal parameters (e.g., "30 days ago", "last week", "yesterday")
12+
- Added `search_commits` tool to search commits by actual commit time with full temporal filtering. Accepts both numeric database IDs (e.g., 123) and string repository names (e.g., "github.com/owner/repo") for the `repoId` parameter, allowing direct use of repository names from `list_repos` output
13+
- Added `since`/`until` parameters to `search_code` (filters by index time - when Sourcebot indexed the repo)
14+
- Added `gitRevision` parameter to `search_code`
15+
- Added `activeAfter`/`activeBefore` parameters to `list_repos` (filters by index time - when Sourcebot indexed the repo)
16+
- Added date range validation to prevent invalid date ranges (since > until)
17+
- Added 30-second timeout for git operations to handle large repositories
18+
- Added enhanced error messages for git operations (timeout, repository not found, invalid git repository, ambiguous arguments)
19+
- Added clarification that repositories must be cloned on Sourcebot server disk for `search_commits` to work
20+
- Added comprehensive temporal parameter documentation to README with clear distinction between index time and commit time filtering
21+
- Added comprehensive unit tests for date parsing utilities (90+ test cases)
22+
- Added unit tests for git commit search functionality with mocking
23+
- Added integration tests for temporal parameter validation
24+
- Added unit tests for repository identifier resolution (both string and number types)
25+
1026
### Changed
1127
- Updated API client to match the latest Sourcebot release. [#555](https://github.com/sourcebot-dev/sourcebot/pull/555)
1228

packages/mcp/README.md

Lines changed: 47 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,8 @@ For a more detailed guide, checkout [the docs](https://docs.sourcebot.dev/docs/f
166166

167167
Fetches code that matches the provided regex pattern in `query`.
168168

169+
**Temporal Filtering**: Use `since` and `until` to filter by repository index time (when Sourcebot last indexed the repo). This is different from commit time. See `search_commits` for commit-time filtering.
170+
169171
<details>
170172
<summary>Parameters</summary>
171173

@@ -176,6 +178,9 @@ Fetches code that matches the provided regex pattern in `query`.
176178
| `filterByLanguages` | no | Restrict search to specific languages (GitHub linguist format, e.g., Python, JavaScript). |
177179
| `caseSensitive` | no | Case sensitive search (default: false). |
178180
| `includeCodeSnippets` | no | Include code snippets in results (default: false). |
181+
| `gitRevision` | no | Git revision to search (e.g., 'main', 'develop', 'v1.0.0'). Defaults to HEAD. |
182+
| `since` | no | Only search repos indexed after this date. Supports ISO 8601 or relative (e.g., "30 days ago"). |
183+
| `until` | no | Only search repos indexed before this date. Supports ISO 8601 or relative (e.g., "yesterday"). |
179184
| `maxTokens` | no | Max tokens to return (default: env.DEFAULT_MINIMUM_TOKENS). |
180185
</details>
181186

@@ -184,14 +189,18 @@ Fetches code that matches the provided regex pattern in `query`.
184189

185190
Lists repositories indexed by Sourcebot with optional filtering and pagination.
186191

192+
**Temporal Filtering**: Use `activeAfter` and `activeBefore` to filter by repository index time (when Sourcebot last indexed the repo). This is the same filtering behavior as `search_code`'s `since`/`until` parameters.
193+
187194
<details>
188195
<summary>Parameters</summary>
189196

190-
| Name | Required | Description |
191-
|:-------------|:---------|:--------------------------------------------------------------------|
192-
| `query` | no | Filter repositories by name (case-insensitive). |
193-
| `pageNumber` | no | Page number (1-indexed, default: 1). |
194-
| `limit` | no | Number of repositories per page (default: 50). |
197+
| Name | Required | Description |
198+
|:----------------|:---------|:-----------------------------------------------------------------------------------------------|
199+
| `query` | no | Filter repositories by name (case-insensitive). |
200+
| `pageNumber` | no | Page number (1-indexed, default: 1). |
201+
| `limit` | no | Number of repositories per page (default: 50). |
202+
| `activeAfter` | no | Only return repos indexed after this date. Supports ISO 8601 or relative (e.g., "30 days ago"). |
203+
| `activeBefore` | no | Only return repos indexed before this date. Supports ISO 8601 or relative (e.g., "yesterday"). |
195204

196205
</details>
197206

@@ -208,6 +217,39 @@ Fetches the source code for a given file.
208217
| `repoId` | yes | The Sourcebot repository ID. |
209218
</details>
210219

220+
### search_commits
221+
222+
Searches for commits in a specific repository based on actual commit time (NOT index time).
223+
224+
**Requirements**: Repository must be cloned on the Sourcebot server disk. Sourcebot automatically clones repositories during indexing, but the cloning process may not be finished when this query is executed. Use `list_repos` first to get the repository ID.
225+
226+
**Date Formats**: Supports ISO 8601 dates (e.g., "2024-01-01") and relative formats (e.g., "30 days ago", "last week", "yesterday").
227+
228+
<details>
229+
<summary>Parameters</summary>
230+
231+
| Name | Required | Description |
232+
|:-----------|:---------|:-----------------------------------------------------------------------------------------------|
233+
| `repoId` | yes | Repository identifier: either numeric database ID (e.g., 123) or full repository name (e.g., "github.com/owner/repo") as returned by `list_repos`. |
234+
| `query` | no | Search query to filter commits by message (case-insensitive). |
235+
| `since` | no | Show commits after this date (by commit time). Supports ISO 8601 or relative formats. |
236+
| `until` | no | Show commits before this date (by commit time). Supports ISO 8601 or relative formats. |
237+
| `author` | no | Filter by author name or email (supports partial matches). |
238+
| `maxCount` | no | Maximum number of commits to return (default: 50). |
239+
240+
</details>
241+
242+
## Date Format Examples
243+
244+
All temporal parameters support:
245+
- **ISO 8601**: `"2024-01-01"`, `"2024-12-31T23:59:59Z"`
246+
- **Relative dates**: `"30 days ago"`, `"1 week ago"`, `"last month"`, `"yesterday"`
247+
248+
**Important**: Different tools filter by different time dimensions:
249+
- `search_code` `since`/`until`: Filters by **index time** (when Sourcebot indexed the repo)
250+
- `list_repos` `activeAfter`/`activeBefore`: Filters by **index time** (when Sourcebot indexed the repo)
251+
- `search_commits` `since`/`until`: Filters by **commit time** (actual git commit dates)
252+
211253

212254
## Supported Code Hosts
213255
Sourcebot supports the following code hosts:

packages/mcp/src/client.ts

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { env } from './env.js';
2-
import { listRepositoriesResponseSchema, searchResponseSchema, fileSourceResponseSchema } from './schemas.js';
3-
import { FileSourceRequest, FileSourceResponse, ListRepositoriesResponse, SearchRequest, SearchResponse, ServiceError } from './types.js';
2+
import { listRepositoriesResponseSchema, searchResponseSchema, fileSourceResponseSchema, searchCommitsResponseSchema } from './schemas.js';
3+
import { FileSourceRequest, FileSourceResponse, ListRepositoriesResponse, SearchRequest, SearchResponse, ServiceError, SearchCommitsRequest, SearchCommitsResponse } from './types.js';
44
import { isServiceError } from './utils.js';
55

66
export const search = async (request: SearchRequest): Promise<SearchResponse | ServiceError> => {
@@ -21,8 +21,16 @@ export const search = async (request: SearchRequest): Promise<SearchResponse | S
2121
return searchResponseSchema.parse(result);
2222
}
2323

24-
export const listRepos = async (): Promise<ListRepositoriesResponse | ServiceError> => {
25-
const result = await fetch(`${env.SOURCEBOT_HOST}/api/repos`, {
24+
export const listRepos = async (params?: { activeAfter?: string, activeBefore?: string }): Promise<ListRepositoriesResponse | ServiceError> => {
25+
const url = new URL(`${env.SOURCEBOT_HOST}/api/repos`);
26+
if (params?.activeAfter) {
27+
url.searchParams.append('activeAfter', params.activeAfter);
28+
}
29+
if (params?.activeBefore) {
30+
url.searchParams.append('activeBefore', params.activeBefore);
31+
}
32+
33+
const result = await fetch(url.toString(), {
2634
method: 'GET',
2735
headers: {
2836
'Content-Type': 'application/json',
@@ -55,3 +63,21 @@ export const getFileSource = async (request: FileSourceRequest): Promise<FileSou
5563

5664
return fileSourceResponseSchema.parse(result);
5765
}
66+
67+
export const searchCommits = async (request: SearchCommitsRequest): Promise<SearchCommitsResponse | ServiceError> => {
68+
const result = await fetch(`${env.SOURCEBOT_HOST}/api/commits`, {
69+
method: 'POST',
70+
headers: {
71+
'Content-Type': 'application/json',
72+
'X-Org-Domain': '~',
73+
...(env.SOURCEBOT_API_KEY ? { 'X-Sourcebot-Api-Key': env.SOURCEBOT_API_KEY } : {})
74+
},
75+
body: JSON.stringify(request)
76+
}).then(response => response.json());
77+
78+
if (isServiceError(result)) {
79+
return result;
80+
}
81+
82+
return searchCommitsResponseSchema.parse(result);
83+
}

packages/mcp/src/index.ts

Lines changed: 102 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
55
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
66
import escapeStringRegexp from 'escape-string-regexp';
77
import { z } from 'zod';
8-
import { listRepos, search, getFileSource } from './client.js';
8+
import { listRepos, search, getFileSource, searchCommits } from './client.js';
99
import { env, numberSchema } from './env.js';
1010
import { listReposRequestSchema } from './schemas.js';
1111
import { TextContent } from './types.js';
@@ -49,6 +49,18 @@ server.tool(
4949
.boolean()
5050
.describe(`Whether to include the code snippets in the response (default: false). If false, only the file's URL, repository, and language will be returned. Set to false to get a more concise response.`)
5151
.optional(),
52+
gitRevision: z
53+
.string()
54+
.describe(`The git revision to search in (e.g., 'main', 'HEAD', 'v1.0.0', 'a1b2c3d'). If not provided, defaults to the default branch (usually 'main' or 'master').`)
55+
.optional(),
56+
since: z
57+
.string()
58+
.describe(`Filter repositories by when they were last indexed by Sourcebot (NOT by commit time). Only searches in repos indexed after this date. Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week', 'yesterday').`)
59+
.optional(),
60+
until: z
61+
.string()
62+
.describe(`Filter repositories by when they were last indexed by Sourcebot (NOT by commit time). Only searches in repos indexed before this date. Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').`)
63+
.optional(),
5264
maxTokens: numberSchema
5365
.describe(`The maximum number of tokens to return (default: ${env.DEFAULT_MINIMUM_TOKENS}). Higher values provide more context but consume more tokens. Values less than ${env.DEFAULT_MINIMUM_TOKENS} will be ignored.`)
5466
.transform((val) => (val < env.DEFAULT_MINIMUM_TOKENS ? env.DEFAULT_MINIMUM_TOKENS : val))
@@ -61,6 +73,9 @@ server.tool(
6173
maxTokens = env.DEFAULT_MINIMUM_TOKENS,
6274
includeCodeSnippets = false,
6375
caseSensitive = false,
76+
gitRevision,
77+
since,
78+
until,
6479
}) => {
6580
if (repoIds.length > 0) {
6681
query += ` ( repo:${repoIds.map(id => escapeStringRegexp(id)).join(' or repo:')} )`;
@@ -76,6 +91,9 @@ server.tool(
7691
contextLines: env.DEFAULT_CONTEXT_LINES,
7792
isRegexEnabled: true,
7893
isCaseSensitivityEnabled: caseSensitive,
94+
gitRevision,
95+
since,
96+
until,
7997
});
8098

8199
if (isServiceError(response)) {
@@ -160,16 +178,95 @@ server.tool(
160178
}
161179
);
162180

181+
server.tool(
182+
"search_commits",
183+
`Searches for commits in a specific repository based on actual commit time (NOT index time).
184+
185+
**Requirements**: The repository must be cloned on the Sourcebot server disk. Sourcebot automatically clones repositories during indexing, but the cloning process may not be finished when this query is executed. If the repository is not found on the server disk, an error will be returned asking you to try again later.
186+
187+
**Date Formats**: Supports ISO 8601 (e.g., "2024-01-01") or relative formats (e.g., "30 days ago", "last week", "yesterday").
188+
189+
**YOU MUST** call 'list_repos' first to obtain the exact repository ID.
190+
191+
If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.`,
192+
{
193+
repoId: z.union([z.number(), z.string()]).describe(`Repository identifier. Can be either:
194+
- Numeric database ID (e.g., 123)
195+
- Full repository name (e.g., "github.com/owner/repo") as returned by 'list_repos'
196+
197+
**YOU MUST** call 'list_repos' first to obtain the repository identifier.`),
198+
query: z.string().describe(`Search query to filter commits by message content (case-insensitive).`).optional(),
199+
since: z.string().describe(`Show commits more recent than this date. Filters by actual commit time. Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week').`).optional(),
200+
until: z.string().describe(`Show commits older than this date. Filters by actual commit time. Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').`).optional(),
201+
author: z.string().describe(`Filter commits by author name or email (supports partial matches and patterns).`).optional(),
202+
maxCount: z.number().describe(`Maximum number of commits to return (default: 50).`).optional(),
203+
},
204+
async ({ repoId, query, since, until, author, maxCount }) => {
205+
const result = await searchCommits({
206+
repoId,
207+
query,
208+
since,
209+
until,
210+
author,
211+
maxCount,
212+
});
213+
214+
if (isServiceError(result)) {
215+
return {
216+
content: [{ type: "text", text: `Error: ${result.message}` }],
217+
isError: true,
218+
};
219+
}
220+
221+
return {
222+
content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
223+
};
224+
}
225+
);
226+
163227
server.tool(
164228
"list_repos",
165-
"Lists repositories in the organization with optional filtering and pagination. If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.",
166-
listReposRequestSchema.shape,
167-
async ({ query, pageNumber = 1, limit = 50 }: {
229+
`Lists repositories in the organization with optional filtering and pagination.
230+
231+
**Temporal Filtering**: When using 'activeAfter' or 'activeBefore', only repositories indexed within the specified timeframe are returned. This filters by when Sourcebot last indexed the repository (indexedAt), NOT by git commit dates. For commit-time filtering, use 'search_commits'.
232+
233+
**Date Formats**: Supports ISO 8601 (e.g., "2024-01-01") and relative dates (e.g., "30 days ago", "last week", "yesterday").
234+
235+
If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.`,
236+
{
237+
query: z
238+
.string()
239+
.describe("Filter repositories by name (case-insensitive).")
240+
.optional(),
241+
pageNumber: z
242+
.number()
243+
.int()
244+
.positive()
245+
.describe("Page number (1-indexed, default: 1)")
246+
.default(1),
247+
limit: z
248+
.number()
249+
.int()
250+
.positive()
251+
.describe("Number of repositories per page (default: 50)")
252+
.default(50),
253+
activeAfter: z
254+
.string()
255+
.describe("Only return repositories indexed after this date (filters by indexedAt). Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week').")
256+
.optional(),
257+
activeBefore: z
258+
.string()
259+
.describe("Only return repositories indexed before this date (filters by indexedAt). Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').")
260+
.optional(),
261+
},
262+
async ({ query, pageNumber = 1, limit = 50, activeAfter, activeBefore }: {
168263
query?: string;
169264
pageNumber?: number;
170265
limit?: number;
266+
activeAfter?: string;
267+
activeBefore?: string;
171268
}) => {
172-
const response = await listRepos();
269+
const response = await listRepos({ activeAfter, activeBefore });
173270
if (isServiceError(response)) {
174271
return {
175272
content: [{

0 commit comments

Comments
 (0)