Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocomplete: add naive suggestions ranking based on syntactic validity #837

Merged
merged 10 commits into from
Aug 29, 2023

Conversation

valerybugakov
Copy link
Member

@valerybugakov valerybugakov commented Aug 28, 2023

Context

This PR adds naive suggestions ranking based on syntactic validity powered by tree-sitter. In the post-processing logic, we insert the suggested code snippet into the document, parse it with tree-sitter, and query for ERROR nodes. Then in the ranking logic, suggestions without syntax errors are pulled forward.

Test plan

  • Added unit test for the updated post-processing logic.
  • Manually tested locally.

* Return a filtered version of the given array, de-duplicating items based on the given key function.
* The order of the filtered array is not guaranteed to be related to the input ordering.
*/
export const dedupeBy = <T>(items: T[], key: keyof T | ((item: T) => string)): T[] => [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to shared utilities for reusability

}
}

export const ROOT_PATH = resolveWithSymlink(__dirname, '../../../../')
Copy link
Member Author

@valerybugakov valerybugakov Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to avoid '../../../../' in path constants.

Comment on lines +35 to +38
// Add parse errors info to completions
// Does nothing if `cody.autocomplete.experimental.syntacticPostProcessing` is not enabled.
// TODO: add explicit configuration check here when it's possible to avoid prop-drilling for config values.
const withParseInfo = addParseInfoToCompletions(uniqueResults, { document, position, docContext })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: Is there a reason to do this outside of the rankCompletions function and patching the Completion type instead of passing it as arguments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to use tree-sitter in processItem to rely on the parse tree for multi-line truncation, so I added it to the neutral place for now to better understand how to use it. Will need to move it around in follow-up PRs, and we would probably need to have an augmented completions type because parsing will be one of the first operations in the processInlineCompletions function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think what I don't "like" (very subjective btw feel free to ignore) is that we extend the object type and then leak the results to the callers, too (which might start depending on it and then the lines become blurry).

vscode/package.json Outdated Show resolved Hide resolved
- `dist`: build outputs from both webpack and vite
- `resources`: everything in this directory will be move to the ./dist directory automatically during build time for easy packaging
- `dist`: build outputs from both esbuild and vite
- `resources`: everything in this directory will be moved to the ./dist directory automatically during build time for easy packaging
Copy link
Member Author

@valerybugakov valerybugakov Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is still true? @philipp-spiess @umpox @abeatrix
If so, do you know where the logic for that is defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can't find anything for that anyways only that it is used as that static dir in the vite config which might copy it into dist?

It doesn't have to be in dist to be in the resulting package though AFAIK

https://sourcegraph.com/github.com/sourcegraph/cody/-/blob/web/vite.config.ts?L10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking specifically about the VS Code package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think currently everything will be in the VS Code package even the source code 🙈

Screenshot 2023-08-29 at 12 35 47

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fun investigation in DMs

@@ -22,13 +22,13 @@
"build:dev:desktop": "concurrently \"pnpm run -s _build:esbuild:desktop\" \"pnpm run -s _build:webviews --mode development\"",
"build:dev:web": "concurrently \"pnpm run -s _build:esbuild:web\" \"pnpm run -s _build:webviews --mode development\"",
"watch:build:dev:web": "concurrently \"pnpm run -s _build:esbuild:web --watch\" \"pnpm run -s _build:webviews --mode development --watch\"",
"_build:esbuild:desktop": "esbuild ./src/extension.node.ts --bundle --outfile=dist/extension.node.js --external:vscode --format=cjs --platform=node --sourcemap",
"_build:esbuild:desktop": "pnpm download-wasm && esbuild ./src/extension.node.ts --bundle --outfile=dist/extension.node.js --external:vscode --format=cjs --platform=node --sourcemap",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to manually copy tree-sitter.wasm to the dist folder. It's fast because modules are already downloaded in post-install.

@valerybugakov valerybugakov marked this pull request as ready for review August 29, 2023 02:54
@@ -14,7 +14,6 @@
"test/e2e",
"webviews",
"vite.config.ts",
"webpack.config.js",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have webpack in this repo.

Copy link
Contributor

@philipp-spiess philipp-spiess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesomesauce

"_build:esbuild:web": "esbuild ./src/extension.web.ts --platform=browser --bundle --outfile=dist/extension.web.js --alias:path=path-browserify --external:vscode --define:process='{\"env\":{}}' --define:window=self --format=cjs --sourcemap",
"_build:webviews": "vite -c webviews/vite.config.ts build",
"lint": "pnpm run lint:js",
"lint:js": "eslint --cache '**/*.[tj]s?(x)'",
"release": "ts-node ./scripts/release.ts",
"download-wasm": "ts-node ./scripts/download-wasm-modules.ts",
"download-wasm": "ts-node-transpile-only ./scripts/download-wasm-modules.ts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this change do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disables type-checking, which makes the execution faster. We can afford that because we check Typescript types on CI anyways.

Comment on lines +35 to +38
// Add parse errors info to completions
// Does nothing if `cody.autocomplete.experimental.syntacticPostProcessing` is not enabled.
// TODO: add explicit configuration check here when it's possible to avoid prop-drilling for config values.
const withParseInfo = addParseInfoToCompletions(uniqueResults, { document, position, docContext })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think what I don't "like" (very subjective btw feel free to ignore) is that we extend the object type and then leak the results to the callers, too (which might start depending on it and then the lines become blurry).

* TODO: Decouple language detect to make it editor agnostic
*/
export enum SupportedLanguage {
JavaScript = 'javascript',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of introducing new symbols for the languages when we already have one (e.g. now we need to remember the exact casing for all of the languages below which are also cased very inconsistent ➡️ Php vs TSX). Is there anything here that a literal string enum can't do?

Copy link
Member Author

@valerybugakov valerybugakov Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to remember the exact casing

Typescript covers us in cases where we need to reference enum keys.

Is there anything here that a literal string enum can't do?

It doesn't make a practical difference for our use case so I'm happy with both approaches. We can migrate to that in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typescript covers us in cases where we need to reference enum keys.

It can do that for literal enums too btw 😬

return
}

const parser = await createParser({ language: parseLanguage })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this never yields before the first completion request is done, the parser just won't be adde and we will do the completion without treesitter support, is that correct?

Asking because I see this function being called and not awaited in the code below. Might be good to add a quick comment if we do that on purpose

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional comment here. This behavior will be updated in a follow-up with the incremental parsing.

['array) {\nreturn array.sort()\n}', 'array) new\n']
)

expect(completions.map(c => c.hasParseErrors)).toEqual([false, true])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: What do you think that instead of testing the implementation detail here, we instead yield two completions where only one is valid and assert on the behavior (that it is now ordered correctly) instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the call, I will shuffle the implementation details in multiple follow-up PRs. I agree that testing the desirable behavior of the whole post-processing pipeline is better, and I plan to introduce that once all the tree-sitter pieces are stable.

@valerybugakov valerybugakov merged commit 436d799 into main Aug 29, 2023
9 checks passed
@valerybugakov valerybugakov deleted the vb/tree-sitter-1 branch August 29, 2023 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tree-sitter: Integrate tree-sitter into completions post-processing pipeline
2 participants