Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,17 @@ for more details on this configuration.
go install github.com/sourcegraph/zoekt/cmd/zoekt-webserver
$GOPATH/bin/zoekt-webserver -index ~/.zoekt/

This will start a web server with a simple search UI at http://localhost:6070. See the [uuery syntax docs](doc/query_syntax.md)
for more details on the query language.
This will start a web server with a simple search UI at http://localhost:6070.
See the [query syntax docs](doc/query_syntax.md) for more details on the query
language.

If you start the web server with `-rpc`, it exposes a [simple JSON search API](doc/json-api.md) at `http://localhost:6070/search/api/search.
If you start the web server with `-rpc`, it exposes a [simple JSON search
API](doc/json-api.md) at `http://localhost:6070/api/search`.

The JSON API supports advanced features including:
- Streaming search results (using the `FlushWallTime` option)
- Alternative BM25 scoring (using the `UseBM25Scoring` option)
- Context lines around matches (using the `NumContextLines` option)

Finally, the web server exposes a gRPC API that supports [structured query objects](query/query.go) and advanced search options.

Expand Down
9 changes: 8 additions & 1 deletion doc/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,8 @@ index shard are the following
* branch masks
* metadata (repository name, index format version, etc.)

In practice, the shard size is about 3x the corpus (size).
In practice, the shard size is about 3.5x the corpus size, composed of
original content, posting lists, and other metadata.

The format uses uint32 for all offsets, so the total size of a shard
should be below 4G. Given the size of the posting data, this caps
Expand Down Expand Up @@ -179,6 +180,12 @@ For the latter, it is necessary to find symbol definitions and other
sections within files on indexing. Several (imperfect) programs to do
this already exist, eg. `ctags`.

Zoekt also supports an alternative BM25-based scoring algorithm that can be
enabled with `UseBM25Scoring`. When enabled, each match in a file is treated
as a term, and an approximation to BM25 is computed. This is useful for
multi-term queries, better handling of term frequency, and appropriate
document length normalization.


Query language
--------------
Expand Down
4 changes: 3 additions & 1 deletion doc/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,9 @@ rudimentary support for filtering, and there is no symbol ranking.

The search server should have local SSD to store the index file (which
is 3.5x the corpus size), and have at least 20% more RAM than the
corpus size.
corpus size. For optimal performance with large codebases, consider
using machines with ample CPU cores, as search operations can be
parallelized across shards.

## Can I index multiple branches?

Expand Down
42 changes: 41 additions & 1 deletion doc/query_syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ For a brief overview of Zoekt's query syntax, see [these great docs from neogrok

A query is made up of expressions. An **expression** can be:
- A negation (e.g., `-`),
- A field (e.g., `repo:`).
- A field (e.g., `repo:`),
- A grouping (e.g., parentheses `()`),

Logical `OR` operations combine multiple expressions. The **`AND` operator is implicit**, meaning multiple expressions written together will be automatically treated as `AND`.
Expand Down Expand Up @@ -86,6 +86,33 @@ Use `or` to combine multiple expressions.

---

## Special Query Types

### Filtering by Repository Type

Zoekt supports filtering repositories by various attributes:

```plaintext
public:yes archived:no fork:no
```

This finds repositories that are public, not archived, and not forks.

### Result Type Control

The `type:` operator controls what kind of results are returned:

```plaintext
type:repo content:config
```

This returns repository names instead of file matches. Valid values include:
- `filematch` - Returns file content matches (default)
- `filename` - Returns only matching filenames
- `repo` - Returns only repository names

---

## Special Query Values

- **Boolean Values**:
Expand All @@ -111,6 +138,19 @@ Use `or` to combine multiple expressions.

---

## Case Sensitivity

Zoekt supports three case sensitivity modes:

- `case:yes` - Exact case matching
- `case:no` - Case-insensitive matching
- `case:auto` - Automatically detect based on pattern (default)

In auto mode, if the pattern contains uppercase letters, the search will be
case-sensitive; otherwise, it will be case-insensitive.

---

## Advanced Examples

1. **Search for content in Python files in public repositories**:
Expand Down
Loading