Skip to content

Conversation

@natehessler
Copy link
Contributor

The documentation was updated in search.mdx to specify that the Zoekt indexer skips files exceeding 20,000 unique trigrams or those that are not valid UTF-8. Instructions were added detailing how to override these limits by configuring the search.largeFiles setting and reindexing the repository.


Thread: https://ampcode.com/threads/T-0390a39a-9c04-441e-8982-7e2ef7b9bf76

@natehessler natehessler added the amp label Nov 19, 2025 — with Amp for GitHub
@vercel
Copy link

vercel bot commented Nov 19, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
sourcegraph-docs Ready Ready Preview Comment Nov 19, 2025 9:34pm

To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**.
To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [`search.largeFiles`](/admin/config/site_config#search-largeFiles) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a dead markdown path: https://github.com/sourcegraph/docs/blob/amp/zoekt-indexer-trigram-and-file-size-limits/admin/config/site_config#search-largeFiles

Maybe we want to point here or docs/admin/config/site_config.mdx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Updated link in search documentation for large files setting.
Updated the documentation to include a link for the search.largeFiles setting.
Copy link

@s3nu s3nu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@natehessler natehessler merged commit c7efb73 into main Nov 19, 2025
5 checks passed
@natehessler natehessler deleted the amp/zoekt-indexer-trigram-and-file-size-limits branch November 19, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants