Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow configuring file exclusions #1358

Closed
wants to merge 4 commits into from
Closed

Conversation

abeatrix
Copy link
Contributor

@abeatrix abeatrix commented Oct 10, 2023

Close: #1049

This change allows configuring file exclusions via the new 'cody.excludeFiles' configuration option.

Files can now be excluded from fetched context before they are sent to LLM to prevent sharing sensitive information from a known location, e.g. directory, file name, etc.

Examples

configuration: cody.excludeFiles = [".json", ".env", "path/to/my/file.js", "/dirName/"]

files that will be excluded:

  • anything with .json in the file name, e.g. package.json, cody.json etc
  • anything with .env in the file name, e.g. .env, vscode/.env, etc
  • the exact file path/to/my/file.js, e.g. path/to/my/file.js, or new/path/to/my/file.js etc
  • all files under the /dirName/ directory, e.g. path/to/dirName/file.js, path/dirName/file.js etc

I wanted to do the filtering in codebase-context/index.ts, but then we will need to do the filtering again when we combine them with context from the editor, so for now I think it makes more sense to do it at the recipe level. Open to other ideas and suggestions though!

Test plan

WIP

@abeatrix
Copy link
Contributor Author

@tjdevries would this approach work for neovim/agent?

@philipp-spiess
Copy link
Contributor

philipp-spiess commented Oct 11, 2023

@abeatrix Some random ideas:

  • Instead of a config option, have you considered adding support for a .codyignore file? This would allow for more fine-grained control as every sub-folder could have it's own .codyignore and users are familiar on how to set this up from git
  • This absolutely needs to work for Autocomplete too. Maybe we can wrap the VS Code document opening APIs so we can be 100% sure this is always being use whenever we open documents for context?
    • This should also work when reading files via embeddings search I think. I would expect this feature to never leak any of the files defined in .codyignore to the LLM, under no circumstances.
  • My personal hunch is that we should use glob rules like gitignore for maximum flexibility. In your example that would mean having to use *.json instead of .json.
  • We should probably always ignore .env files 😬

@abeatrix
Copy link
Contributor Author

abeatrix commented Oct 11, 2023

Instead of a config option, have you considered adding support for a .codyignore file? This would allow for more fine-grained control as every sub-folder could have it's own .codyignore and users are familiar on how to set this up from git

I did, but because we will need to support other clients, including web, I was thinking config option might be the best option. Let me move this conversation to Slack and check with them to see if that's viable.

My personal hunch is that we should use glob rules like gitignore for maximum flexibility. In your example that would mean having to use *.json instead of .json

Cool, I can look into that. Thanks Philipp!

Maybe we can wrap the VS Code document opening APIs so we can be 100% sure this is always being use whenever we open documents for context?

Since agent is not using the vscode-editor.ts atm, this would not work for other clients I presume? I'll check with them on Slack.

@philipp-spiess
Copy link
Contributor

Since agent is not using the vscode-editor.ts atm, this would not work for other clients I presume? I'll check with them on Slack.

Agent is using vs code apis to read files so if we wrap the VS Code API (e.g. openSafeDocument) it should work alright 🤔

@abeatrix abeatrix closed this Oct 12, 2023
@abeatrix
Copy link
Contributor Author

moved to #1382

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce exclusion patterns for files that shouldn't be sent over the network
2 participants