Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
<!-- Bump @sourcebot/mcp since there are breaking changes to the api in this release -->

### Added
- Added temporal filtering to search and repository APIs with support for git branch/revision filtering and repository index date filtering (since/until parameters). Supports both ISO 8601 and relative date formats (e.g., "30 days ago", "last week").
- Added support for streaming code search results. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
- Added buttons to toggle case sensitivity and regex patterns. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
- Added counts to members, requets, and invites tabs in the members settings. [#621](https://github.com/sourcebot-dev/sourcebot/pull/621)
Expand Down
16 changes: 16 additions & 0 deletions packages/mcp/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- Added comprehensive relative date support for all temporal parameters (e.g., "30 days ago", "last week", "yesterday")
- Added `search_commits` tool to search commits by actual commit time with full temporal filtering. Accepts both numeric database IDs (e.g., 123) and string repository names (e.g., "github.com/owner/repo") for the `repoId` parameter, allowing direct use of repository names from `list_repos` output
- Added `since`/`until` parameters to `search_code` (filters by index time - when Sourcebot indexed the repo)
- Added `gitRevision` parameter to `search_code`
- Added `activeAfter`/`activeBefore` parameters to `list_repos` (filters by index time - when Sourcebot indexed the repo)
- Added date range validation to prevent invalid date ranges (since > until)
- Added 30-second timeout for git operations to handle large repositories
- Added enhanced error messages for git operations (timeout, repository not found, invalid git repository, ambiguous arguments)
- Added clarification that repositories must be cloned on Sourcebot server disk for `search_commits` to work
- Added comprehensive temporal parameter documentation to README with clear distinction between index time and commit time filtering
- Added comprehensive unit tests for date parsing utilities (90+ test cases)
- Added unit tests for git commit search functionality with mocking
- Added integration tests for temporal parameter validation
- Added unit tests for repository identifier resolution (both string and number types)

### Changed
- Updated API client to match the latest Sourcebot release. [#555](https://github.com/sourcebot-dev/sourcebot/pull/555)

Expand Down
52 changes: 47 additions & 5 deletions packages/mcp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ For a more detailed guide, checkout [the docs](https://docs.sourcebot.dev/docs/f

Fetches code that matches the provided regex pattern in `query`.

**Temporal Filtering**: Use `since` and `until` to filter by repository index time (when Sourcebot last indexed the repo). This is different from commit time. See `search_commits` for commit-time filtering.

<details>
<summary>Parameters</summary>

Expand All @@ -176,6 +178,9 @@ Fetches code that matches the provided regex pattern in `query`.
| `filterByLanguages` | no | Restrict search to specific languages (GitHub linguist format, e.g., Python, JavaScript). |
| `caseSensitive` | no | Case sensitive search (default: false). |
| `includeCodeSnippets` | no | Include code snippets in results (default: false). |
| `gitRevision` | no | Git revision to search (e.g., 'main', 'develop', 'v1.0.0'). Defaults to HEAD. |
| `since` | no | Only search repos indexed after this date. Supports ISO 8601 or relative (e.g., "30 days ago"). |
| `until` | no | Only search repos indexed before this date. Supports ISO 8601 or relative (e.g., "yesterday"). |
| `maxTokens` | no | Max tokens to return (default: env.DEFAULT_MINIMUM_TOKENS). |
</details>

Expand All @@ -184,14 +189,18 @@ Fetches code that matches the provided regex pattern in `query`.

Lists repositories indexed by Sourcebot with optional filtering and pagination.

**Temporal Filtering**: Use `activeAfter` and `activeBefore` to filter by repository index time (when Sourcebot last indexed the repo). This is the same filtering behavior as `search_code`'s `since`/`until` parameters.

<details>
<summary>Parameters</summary>

| Name | Required | Description |
|:-------------|:---------|:--------------------------------------------------------------------|
| `query` | no | Filter repositories by name (case-insensitive). |
| `pageNumber` | no | Page number (1-indexed, default: 1). |
| `limit` | no | Number of repositories per page (default: 50). |
| Name | Required | Description |
|:----------------|:---------|:-----------------------------------------------------------------------------------------------|
| `query` | no | Filter repositories by name (case-insensitive). |
| `pageNumber` | no | Page number (1-indexed, default: 1). |
| `limit` | no | Number of repositories per page (default: 50). |
| `activeAfter` | no | Only return repos indexed after this date. Supports ISO 8601 or relative (e.g., "30 days ago"). |
| `activeBefore` | no | Only return repos indexed before this date. Supports ISO 8601 or relative (e.g., "yesterday"). |

</details>

Expand All @@ -208,6 +217,39 @@ Fetches the source code for a given file.
| `repoId` | yes | The Sourcebot repository ID. |
</details>

### search_commits

Searches for commits in a specific repository based on actual commit time (NOT index time).

**Requirements**: Repository must be cloned on the Sourcebot server disk. Sourcebot automatically clones repositories during indexing, but the cloning process may not be finished when this query is executed. Use `list_repos` first to get the repository ID.

**Date Formats**: Supports ISO 8601 dates (e.g., "2024-01-01") and relative formats (e.g., "30 days ago", "last week", "yesterday").

<details>
<summary>Parameters</summary>

| Name | Required | Description |
|:-----------|:---------|:-----------------------------------------------------------------------------------------------|
| `repoId` | yes | Repository identifier: either numeric database ID (e.g., 123) or full repository name (e.g., "github.com/owner/repo") as returned by `list_repos`. |
| `query` | no | Search query to filter commits by message (case-insensitive). |
| `since` | no | Show commits after this date (by commit time). Supports ISO 8601 or relative formats. |
| `until` | no | Show commits before this date (by commit time). Supports ISO 8601 or relative formats. |
| `author` | no | Filter by author name or email (supports partial matches). |
| `maxCount` | no | Maximum number of commits to return (default: 50). |

</details>

## Date Format Examples

All temporal parameters support:
- **ISO 8601**: `"2024-01-01"`, `"2024-12-31T23:59:59Z"`
- **Relative dates**: `"30 days ago"`, `"1 week ago"`, `"last month"`, `"yesterday"`

**Important**: Different tools filter by different time dimensions:
- `search_code` `since`/`until`: Filters by **index time** (when Sourcebot indexed the repo)
- `list_repos` `activeAfter`/`activeBefore`: Filters by **index time** (when Sourcebot indexed the repo)
- `search_commits` `since`/`until`: Filters by **commit time** (actual git commit dates)


## Supported Code Hosts
Sourcebot supports the following code hosts:
Expand Down
34 changes: 30 additions & 4 deletions packages/mcp/src/client.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { env } from './env.js';
import { listRepositoriesResponseSchema, searchResponseSchema, fileSourceResponseSchema } from './schemas.js';
import { FileSourceRequest, FileSourceResponse, ListRepositoriesResponse, SearchRequest, SearchResponse, ServiceError } from './types.js';
import { listRepositoriesResponseSchema, searchResponseSchema, fileSourceResponseSchema, searchCommitsResponseSchema } from './schemas.js';
import { FileSourceRequest, FileSourceResponse, ListRepositoriesResponse, SearchRequest, SearchResponse, ServiceError, SearchCommitsRequest, SearchCommitsResponse } from './types.js';
import { isServiceError } from './utils.js';

export const search = async (request: SearchRequest): Promise<SearchResponse | ServiceError> => {
Expand All @@ -21,8 +21,16 @@ export const search = async (request: SearchRequest): Promise<SearchResponse | S
return searchResponseSchema.parse(result);
}

export const listRepos = async (): Promise<ListRepositoriesResponse | ServiceError> => {
const result = await fetch(`${env.SOURCEBOT_HOST}/api/repos`, {
export const listRepos = async (params?: { activeAfter?: string, activeBefore?: string }): Promise<ListRepositoriesResponse | ServiceError> => {
const url = new URL(`${env.SOURCEBOT_HOST}/api/repos`);
if (params?.activeAfter) {
url.searchParams.append('activeAfter', params.activeAfter);
}
if (params?.activeBefore) {
url.searchParams.append('activeBefore', params.activeBefore);
}

const result = await fetch(url.toString(), {
method: 'GET',
headers: {
'Content-Type': 'application/json',
Expand Down Expand Up @@ -55,3 +63,21 @@ export const getFileSource = async (request: FileSourceRequest): Promise<FileSou

return fileSourceResponseSchema.parse(result);
}

export const searchCommits = async (request: SearchCommitsRequest): Promise<SearchCommitsResponse | ServiceError> => {
const result = await fetch(`${env.SOURCEBOT_HOST}/api/commits`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Org-Domain': '~',
...(env.SOURCEBOT_API_KEY ? { 'X-Sourcebot-Api-Key': env.SOURCEBOT_API_KEY } : {})
},
body: JSON.stringify(request)
}).then(response => response.json());

if (isServiceError(result)) {
return result;
}

return searchCommitsResponseSchema.parse(result);
}
116 changes: 110 additions & 6 deletions packages/mcp/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import escapeStringRegexp from 'escape-string-regexp';
import { z } from 'zod';
import { listRepos, search, getFileSource } from './client.js';
import { listRepos, search, getFileSource, searchCommits } from './client.js';
Copy link

@coderabbitai coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

search_commits.repoId is typed as a number, which likely conflicts with string‑typed repo IDs elsewhere.

Other tools appear to treat repository identifiers as strings (e.g., search_code.filterByRepoIds: string[], get_file_source.repoId: string, and list_repos printing id: ${repo.repoName}), but search_commits currently defines repoId as z.number(). That mismatch may make it awkward or error‑prone for clients to take the ID from list_repos and pass it directly into search_commits, and may not match what searchCommits expects.

I’d recommend:

-        repoId: z.number().describe(`The ID of the repository to search in. Obtain this by calling 'list_repos' first.`),
+        repoId: z.string().describe(`The Sourcebot repository ID to search in. Obtain this by calling 'list_repos' first.`),

and updating the searchCommits argument type to accept the same string ID shape used by the other tools.

As a small polish, if the real default for maxCount is 50, you could also encode that in the schema (e.g., .int().positive().default(50)) so validation and docs stay in sync.

Also applies to: 191-231

🤖 Prompt for AI Agents
In packages/mcp/src/index.ts around lines 8 and also 191-231, change the
search_commits schema and related types so repoId is a string (not z.number())
to match other APIs (e.g., search_code.filterByRepoIds and get_file_source),
update the searchCommits function/signature to accept the string ID shape used
elsewhere, and ensure callers pass the repo id from list_repos directly;
additionally, set the maxCount schema to .int().positive().default(50) (or the
real default) so validation/docs match the actual default behavior.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in b30bbd3, dual support with repo id and name

@brendan-kellam the current list_repos return id which is repo name, don't want to change that field to be numeric id and add a new repo name field which might cause inconsistent issue.

What do you think?

import { env, numberSchema } from './env.js';
import { listReposRequestSchema } from './schemas.js';
import { TextContent } from './types.js';
Expand Down Expand Up @@ -49,6 +49,18 @@ server.tool(
.boolean()
.describe(`Whether to include the code snippets in the response (default: false). If false, only the file's URL, repository, and language will be returned. Set to false to get a more concise response.`)
.optional(),
gitRevision: z
.string()
.describe(`The git revision to search in (e.g., 'main', 'HEAD', 'v1.0.0', 'a1b2c3d'). If not provided, defaults to the default branch (usually 'main' or 'master').`)
.optional(),
since: z
.string()
.describe(`Filter repositories by when they were last indexed by Sourcebot (NOT by commit time). Only searches in repos indexed after this date. Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week', 'yesterday').`)
.optional(),
until: z
.string()
.describe(`Filter repositories by when they were last indexed by Sourcebot (NOT by commit time). Only searches in repos indexed before this date. Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').`)
.optional(),
maxTokens: numberSchema
.describe(`The maximum number of tokens to return (default: ${env.DEFAULT_MINIMUM_TOKENS}). Higher values provide more context but consume more tokens. Values less than ${env.DEFAULT_MINIMUM_TOKENS} will be ignored.`)
.transform((val) => (val < env.DEFAULT_MINIMUM_TOKENS ? env.DEFAULT_MINIMUM_TOKENS : val))
Expand All @@ -61,6 +73,9 @@ server.tool(
maxTokens = env.DEFAULT_MINIMUM_TOKENS,
includeCodeSnippets = false,
caseSensitive = false,
gitRevision,
since,
until,
}) => {
if (repoIds.length > 0) {
query += ` ( repo:${repoIds.map(id => escapeStringRegexp(id)).join(' or repo:')} )`;
Expand All @@ -76,6 +91,9 @@ server.tool(
contextLines: env.DEFAULT_CONTEXT_LINES,
isRegexEnabled: true,
isCaseSensitivityEnabled: caseSensitive,
gitRevision,
since,
until,
});

if (isServiceError(response)) {
Expand Down Expand Up @@ -160,16 +178,95 @@ server.tool(
}
);

server.tool(
"search_commits",
`Searches for commits in a specific repository based on actual commit time (NOT index time).

**Requirements**: The repository must be cloned on the Sourcebot server disk. Sourcebot automatically clones repositories during indexing, but the cloning process may not be finished when this query is executed. If the repository is not found on the server disk, an error will be returned asking you to try again later.

**Date Formats**: Supports ISO 8601 (e.g., "2024-01-01") or relative formats (e.g., "30 days ago", "last week", "yesterday").

**YOU MUST** call 'list_repos' first to obtain the exact repository ID.

If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.`,
{
repoId: z.union([z.number(), z.string()]).describe(`Repository identifier. Can be either:
- Numeric database ID (e.g., 123)
- Full repository name (e.g., "github.com/owner/repo") as returned by 'list_repos'

**YOU MUST** call 'list_repos' first to obtain the repository identifier.`),
query: z.string().describe(`Search query to filter commits by message content (case-insensitive).`).optional(),
since: z.string().describe(`Show commits more recent than this date. Filters by actual commit time. Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week').`).optional(),
until: z.string().describe(`Show commits older than this date. Filters by actual commit time. Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').`).optional(),
author: z.string().describe(`Filter commits by author name or email (supports partial matches and patterns).`).optional(),
maxCount: z.number().int().positive().default(50).describe(`Maximum number of commits to return (default: 50).`),
},
async ({ repoId, query, since, until, author, maxCount }) => {
const result = await searchCommits({
repoId,
query,
since,
until,
author,
maxCount,
});

if (isServiceError(result)) {
return {
content: [{ type: "text", text: `Error: ${result.message}` }],
isError: true,
};
}

return {
content: [{ type: "text", text: JSON.stringify(result, null, 2) }],
};
}
);

server.tool(
"list_repos",
"Lists repositories in the organization with optional filtering and pagination. If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.",
listReposRequestSchema.shape,
async ({ query, pageNumber = 1, limit = 50 }: {
`Lists repositories in the organization with optional filtering and pagination.

**Temporal Filtering**: When using 'activeAfter' or 'activeBefore', only repositories indexed within the specified timeframe are returned. This filters by when Sourcebot last indexed the repository (indexedAt), NOT by git commit dates. For commit-time filtering, use 'search_commits'. When temporal filters are applied, the output includes a 'lastIndexed' field showing when each repository was last indexed.

**Date Formats**: Supports ISO 8601 (e.g., "2024-01-01") and relative dates (e.g., "30 days ago", "last week", "yesterday").

If you receive an error that indicates that you're not authenticated, please inform the user to set the SOURCEBOT_API_KEY environment variable.`,
{
query: z
.string()
.describe("Filter repositories by name (case-insensitive).")
.optional(),
pageNumber: z
.number()
.int()
.positive()
.describe("Page number (1-indexed, default: 1)")
.default(1),
limit: z
.number()
.int()
.positive()
.describe("Number of repositories per page (default: 50)")
.default(50),
activeAfter: z
.string()
.describe("Only return repositories indexed after this date (filters by indexedAt). Supports ISO 8601 (e.g., '2024-01-01') or relative formats (e.g., '30 days ago', 'last week').")
.optional(),
activeBefore: z
.string()
.describe("Only return repositories indexed before this date (filters by indexedAt). Supports ISO 8601 (e.g., '2024-12-31') or relative formats (e.g., 'yesterday').")
.optional(),
},
async ({ query, pageNumber = 1, limit = 50, activeAfter, activeBefore }: {
query?: string;
pageNumber?: number;
limit?: number;
activeAfter?: string;
activeBefore?: string;
}) => {
const response = await listRepos();
const response = await listRepos({ activeAfter, activeBefore });
if (isServiceError(response)) {
return {
content: [{
Expand Down Expand Up @@ -199,9 +296,16 @@ server.tool(

// Format output
const content: TextContent[] = paginated.map(repo => {
let output = `id: ${repo.repoName}\nurl: ${repo.webUrl}`;

// Include indexedAt when temporal filtering is used
if ((activeAfter || activeBefore) && repo.indexedAt) {
output += `\nlastIndexed: ${repo.indexedAt.toISOString()}`;
}

return {
type: "text",
text: `id: ${repo.repoName}\nurl: ${repo.webUrl}`,
text: output,
}
});

Expand Down
Loading
Loading