Skip to content

Commit f3a8fa3

Browse files
feat(web): Streamed code search (#623)
* generate protobuf types * stream poc over SSE * wip: make stream search api follow existing schema. Modify UI to support streaming * fix scrolling issue * Dockerfile * wip on lezer parser grammar for query language * add lezer tree -> grpc transformer * remove spammy log message * fix syntax highlighting by adding a module resolution for @lezer/common * further wip on query language * Add case sensitivity and regexp toggles * Improved type safety / cleanup for query lang * support search contexts * update Dockerfile with query langauge package * fix filter * Add skeletons to filter panel when search is streaming * add client side caching * improved cancelation handling * add isSearchExausted flag for flagging when a search captured all results * Add back posthog search_finished event * remove zoekt tenant enforcement * migrate blocking search over to grpc. Centralize everything in searchApi * branch handling * plumb file weburl * add repo_sets filter for repositories a user has access to * refactor a bunch of stuff + add support for passing in Query IR to search api * refactor * dev README * wip on better error handling * error handling for stream path * update mcp * changelog wip * type fix * style * Support rev:* wildcard * changelog * changelog nit * feedback * fix build * update docs and remove uneeded test file
1 parent 09507d3 commit f3a8fa3

File tree

130 files changed

+6600
-1199
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

130 files changed

+6600
-1199
lines changed

.env.development

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ DATABASE_URL="postgresql://postgres:postgres@localhost:5432/postgres"
66
ZOEKT_WEBSERVER_URL="http://localhost:6070"
77
# The command to use for generating ctags.
88
CTAGS_COMMAND=ctags
9-
# logging, strict
10-
SRC_TENANT_ENFORCEMENT_MODE=strict
119

1210
# Auth.JS
1311
# You can generate a new secret with:

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
<!-- Bump @sourcebot/mcp since there are breaking changes to the api in this release -->
11+
1012
### Added
13+
- Added support for streaming code search results. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
14+
- Added buttons to toggle case sensitivity and regex patterns. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
1115
- Added counts to members, requets, and invites tabs in the members settings. [#621](https://github.com/sourcebot-dev/sourcebot/pull/621)
1216

17+
### Changed
18+
- Changed the default search behaviour to match patterns as substrings and **not** regular expressions. Regular expressions can be used by toggling the regex button in search bar. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
19+
- Renamed `public` query prefix to `visibility`. Allowed values for `visibility` are `public`, `private`, and `any`. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
20+
- Changed `archived` query prefix to accept values `yes`, `no`, and `only`. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
21+
22+
### Removed
23+
- Removed `case` query prefix. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
24+
- Removed `branch` and `b` query prefixes. Please use `rev:` instead. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
25+
- Removed `regex` query prefix. [#623](https://github.com/sourcebot-dev/sourcebot/pull/623)
26+
1327
### Fixed
1428
- Fixed spurious infinite loads with explore panel, file tree, and file search command. [#617](https://github.com/sourcebot-dev/sourcebot/pull/617)
1529
- Wipe search context on init if entitlement no longer exists [#618](https://github.com/sourcebot-dev/sourcebot/pull/618)

Dockerfile

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,10 +43,12 @@ COPY .yarn ./.yarn
4343
COPY ./packages/db ./packages/db
4444
COPY ./packages/schemas ./packages/schemas
4545
COPY ./packages/shared ./packages/shared
46+
COPY ./packages/queryLanguage ./packages/queryLanguage
4647

4748
RUN yarn workspace @sourcebot/db install
4849
RUN yarn workspace @sourcebot/schemas install
4950
RUN yarn workspace @sourcebot/shared install
51+
RUN yarn workspace @sourcebot/query-language install
5052
# ------------------------------------
5153

5254
# ------ Build Web ------
@@ -92,6 +94,7 @@ COPY --from=shared-libs-builder /app/node_modules ./node_modules
9294
COPY --from=shared-libs-builder /app/packages/db ./packages/db
9395
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
9496
COPY --from=shared-libs-builder /app/packages/shared ./packages/shared
97+
COPY --from=shared-libs-builder /app/packages/queryLanguage ./packages/queryLanguage
9598

9699
# Fixes arm64 timeouts
97100
RUN yarn workspace @sourcebot/web install
@@ -130,6 +133,7 @@ COPY --from=shared-libs-builder /app/node_modules ./node_modules
130133
COPY --from=shared-libs-builder /app/packages/db ./packages/db
131134
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
132135
COPY --from=shared-libs-builder /app/packages/shared ./packages/shared
136+
COPY --from=shared-libs-builder /app/packages/queryLanguage ./packages/queryLanguage
133137
RUN yarn workspace @sourcebot/backend install
134138
RUN yarn workspace @sourcebot/backend build
135139

@@ -173,7 +177,6 @@ ENV DATA_DIR=/data
173177
ENV DATA_CACHE_DIR=$DATA_DIR/.sourcebot
174178
ENV DATABASE_DATA_DIR=$DATA_CACHE_DIR/db
175179
ENV REDIS_DATA_DIR=$DATA_CACHE_DIR/redis
176-
ENV SRC_TENANT_ENFORCEMENT_MODE=strict
177180
ENV SOURCEBOT_PUBLIC_KEY_PATH=/app/public.pem
178181

179182
# Valid values are: debug, info, warn, error
@@ -217,6 +220,9 @@ COPY --from=zoekt-builder \
217220
/cmd/zoekt-index \
218221
/usr/local/bin/
219222

223+
# Copy zoekt proto files (needed for gRPC client at runtime)
224+
COPY vendor/zoekt/grpc/protos /app/vendor/zoekt/grpc/protos
225+
220226
# Copy all of the things
221227
COPY --from=web-builder /app/packages/web/public ./packages/web/public
222228
COPY --from=web-builder /app/packages/web/.next/standalone ./
@@ -229,6 +235,7 @@ COPY --from=shared-libs-builder /app/node_modules ./node_modules
229235
COPY --from=shared-libs-builder /app/packages/db ./packages/db
230236
COPY --from=shared-libs-builder /app/packages/schemas ./packages/schemas
231237
COPY --from=shared-libs-builder /app/packages/shared ./packages/shared
238+
COPY --from=shared-libs-builder /app/packages/queryLanguage ./packages/queryLanguage
232239

233240
# Fixes git "dubious ownership" issues when the volume is mounted with different permissions to the container.
234241
RUN git config --global safe.directory "*"

docs/docs/features/search/syntax-reference.mdx

Lines changed: 31 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,32 +4,51 @@ title: Writing search queries
44

55
Sourcebot uses a powerful regex-based query language that enabled precise code search within large codebases.
66

7-
87
## Syntax reference guide
98

10-
Queries consist of space-separated regular expressions. Wrapping expressions in `""` combines them. By default, a file must have at least one match for each expression to be included.
9+
Queries consist of space-separated search patterns that are matched against file contents. A file must have at least one match for each expression to be included. Queries can optionally contain search filters to further refine the search results.
10+
11+
## Keyword search (default)
12+
13+
Keyword search matches search patterns exactly in file contents. Wrapping search patterns in `""` combines them as a single expression.
1114

1215
| Example | Explanation |
1316
| :--- | :--- |
14-
| `foo` | Match files with regex `/foo/` |
15-
| `foo bar` | Match files with regex `/foo/` **and** `/bar/` |
16-
| `"foo bar"` | Match files with regex `/foo bar/` |
17+
| `foo` | Match files containing the keyword `foo` |
18+
| `foo bar` | Match files containing both `foo` **and** `bar` |
19+
| `"foo bar"` | Match files containing the phrase `foo bar` |
20+
| `"foo \"bar\""` | Match files containing `foo "bar"` exactly (escaped quotes) |
21+
22+
## Regex search
1723

18-
Multiple expressions can be or'd together with `or`, negated with `-`, or grouped with `()`.
24+
Toggle the regex button (`.*`) in the search bar to interpret search patterns as regular expressions.
1925

2026
| Example | Explanation |
2127
| :--- | :--- |
22-
| `foo or bar` | Match files with regex `/foo/` **or** `/bar/` |
23-
| `foo -bar` | Match files with regex `/foo/` but **not** `/bar/` |
24-
| `foo (bar or baz)` | Match files with regex `/foo/` **and** either `/bar/` **or** `/baz/` |
28+
| `foo` | Match files with regex `/foo/` |
29+
| `foo.*bar` | Match files with regex `/foo.*bar/` (foo followed by any characters, then bar) |
30+
| `^function\s+\w+` | Match files with regex `/^function\s+\w+/` (function at start of line, followed by whitespace and word characters) |
31+
| `"foo bar"` | Match files with regex `/foo bar/`. Quotes are not matched. |
2532

26-
Expressions can be prefixed with certain keywords to modify search behavior. Some keywords can be negated using the `-` prefix.
33+
## Search filters
34+
35+
Search queries (keyword or regex) can include multiple search filters to further refine the search results. Some filters can be negated using the `-` prefix.
2736

2837
| Prefix | Description | Example |
2938
| :--- | :--- | :--- |
3039
| `file:` | Filter results from filepaths that match the regex. By default all files are searched. | `file:README` - Filter results to filepaths that match regex `/README/`<br/>`file:"my file"` - Filter results to filepaths that match regex `/my file/`<br/>`-file:test\.ts$` - Ignore results from filepaths match regex `/test\.ts$/` |
31-
| `repo:` | Filter results from repos that match the regex. By default all repos are searched. | `repo:linux` - Filter results to repos that match regex `/linux/`<br/>`-repo:^web/.*` - Ignore results from repos that match regex `/^web\/.*` |
40+
| `repo:` | Filter results from repos that match the regex. By default all repos are searched. | `repo:linux` - Filter results to repos that match regex `/linux/`<br/>`-repo:^web/.*` - Ignore results from repos that match regex `/^web\/.*/` |
3241
| `rev:` | Filter results from a specific branch or tag. By default **only** the default branch is searched. | `rev:beta` - Filter results to branches that match regex `/beta/` |
3342
| `lang:` | Filter results by language (as defined by [linguist](https://github.com/github-linguist/linguist/blob/main/lib/linguist/languages.yml)). By default all languages are searched. | `lang:TypeScript` - Filter results to TypeScript files<br/>`-lang:YAML` - Ignore results from YAML files |
3443
| `sym:` | Match symbol definitions created by [universal ctags](https://ctags.io/) at index time. | `sym:\bmain\b` - Filter results to symbols that match regex `/\bmain\b/` |
35-
| `context:` | Filter results to a predefined [search context](/docs/features/search/search-contexts). | `context:web` - Filter results to the web context<br/>`-context:pipelines` - Ignore results from the pipelines context |
44+
| `context:` | Filter results to a predefined [search context](/docs/features/search/search-contexts). | `context:web` - Filter results to the web context<br/>`-context:pipelines` - Ignore results from the pipelines context |
45+
46+
## Boolean operators & grouping
47+
48+
By default, space-separated expressions are and'd together. Using the `or` keyword as well as parentheses `()` can be used to create more complex boolean logic. Parentheses can be negated using the `-` prefix.
49+
50+
| Example | Explanation |
51+
| :--- | :--- |
52+
| `foo or bar` | Match files containing `foo` **or** `bar` |
53+
| `foo (bar or baz)` | Match files containing `foo` **and** either `bar` **or** `baz`. |
54+
| `-(foo) bar` | Match files containing `bar` **and not** `foo`. |

package.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
"dev:prisma:studio": "yarn with-env yarn workspace @sourcebot/db prisma:studio",
1919
"dev:prisma:migrate:reset": "yarn with-env yarn workspace @sourcebot/db prisma:migrate:reset",
2020
"dev:prisma:db:push": "yarn with-env yarn workspace @sourcebot/db prisma:db:push",
21-
"build:deps": "yarn workspaces foreach --recursive --topological --from '{@sourcebot/schemas,@sourcebot/db,@sourcebot/shared}' run build"
21+
"build:deps": "yarn workspaces foreach --recursive --topological --from '{@sourcebot/schemas,@sourcebot/db,@sourcebot/shared,@sourcebot/query-language}' run build"
2222
},
2323
"devDependencies": {
2424
"concurrently": "^9.2.1",
@@ -27,6 +27,7 @@
2727
},
2828
"packageManager": "yarn@4.7.0",
2929
"resolutions": {
30-
"prettier": "3.5.3"
30+
"prettier": "3.5.3",
31+
"@lezer/common": "1.3.0"
3132
}
3233
}

packages/backend/src/index.ts

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,6 @@ const listenToShutdownSignals = () => {
9494
const cleanup = async (signal: string) => {
9595
try {
9696
if (receivedSignal) {
97-
logger.debug(`Recieved repeat signal ${signal}, ignoring.`);
9897
return;
9998
}
10099
receivedSignal = true;

packages/db/src/index.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,3 @@
1+
import type { User, Account } from ".prisma/client";
2+
export type UserWithAccounts = User & { accounts: Account[] };
13
export * from ".prisma/client";

packages/mcp/CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
### Changed
11+
- Updated API client to match the latest Sourcebot release. [#555](https://github.com/sourcebot-dev/sourcebot/pull/555)
12+
1013
## [1.0.9] - 2025-11-17
1114

1215
### Added

packages/mcp/src/index.ts

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -70,16 +70,12 @@ server.tool(
7070
query += ` ( lang:${languages.join(' or lang:')} )`;
7171
}
7272

73-
if (caseSensitive) {
74-
query += ` case:yes`;
75-
} else {
76-
query += ` case:no`;
77-
}
78-
7973
const response = await search({
8074
query,
8175
matches: env.DEFAULT_MATCHES,
8276
contextLines: env.DEFAULT_CONTEXT_LINES,
77+
isRegexEnabled: true,
78+
isCaseSensitivityEnabled: caseSensitive,
8379
});
8480

8581
if (isServiceError(response)) {

packages/mcp/src/schemas.ts

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,17 @@ export const symbolSchema = z.object({
2121
kind: z.string(),
2222
});
2323

24+
export const searchOptionsSchema = z.object({
25+
matches: z.number(), // The number of matches to return.
26+
contextLines: z.number().optional(), // The number of context lines to return.
27+
whole: z.boolean().optional(), // Whether to return the whole file as part of the response.
28+
isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.
29+
isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
30+
});
31+
2432
export const searchRequestSchema = z.object({
25-
// The zoekt query to execute.
26-
query: z.string(),
27-
// The number of matches to return.
28-
matches: z.number(),
29-
// The number of context lines to return.
30-
contextLines: z.number().optional(),
31-
// Whether to return the whole file as part of the response.
32-
whole: z.boolean().optional(),
33+
query: z.string(), // The zoekt query to execute.
34+
...searchOptionsSchema.shape,
3335
});
3436

3537
export const repositoryInfoSchema = z.object({
@@ -109,7 +111,7 @@ export const searchStatsSchema = z.object({
109111
regexpsConsidered: z.number(),
110112

111113
// FlushReason explains why results were flushed.
112-
flushReason: z.number(),
114+
flushReason: z.string(),
113115
});
114116

115117
export const searchResponseSchema = z.object({
@@ -139,7 +141,6 @@ export const searchResponseSchema = z.object({
139141
content: z.string().optional(),
140142
})),
141143
repositoryInfo: z.array(repositoryInfoSchema),
142-
isBranchFilteringEnabled: z.boolean(),
143144
isSearchExhaustive: z.boolean(),
144145
});
145146

0 commit comments

Comments
 (0)