Autocomplete: add naive suggestions ranking based on syntactic validity #837

valerybugakov · 2023-08-28T10:11:43Z

Context

This PR adds naive suggestions ranking based on syntactic validity powered by tree-sitter. In the post-processing logic, we insert the suggested code snippet into the document, parse it with tree-sitter, and query for ERROR nodes. Then in the ranking logic, suggestions without syntax errors are pulled forward.

All the changes are behind the feature flag cody.autocomplete.experimental.syntacticPostProcessing.
This PR does not leverage the incremental parsing capabilities tree-sitter has. They will be integrated with a follow-up PR. We will need to subscribe to all document changes and update the parse-tree on every change.
Part of Tree-sitter: Rank autocomplete suggestions based on syntactic validation #809
Closes Tree-sitter: Integrate tree-sitter into completions post-processing pipeline #849

Test plan

Added unit test for the updated post-processing logic.
Manually tested locally.

valerybugakov · 2023-08-28T10:20:06Z

lib/shared/src/common/index.ts

+ * Return a filtered version of the given array, de-duplicating items based on the given key function.
+ * The order of the filtered array is not guaranteed to be related to the input ordering.
+ */
+export const dedupeBy = <T>(items: T[], key: keyof T | ((item: T) => string)): T[] => [


Moved to shared utilities for reusability

valerybugakov · 2023-08-28T10:20:31Z

lib/shared/src/common/paths.ts

+    }
+}
+
+export const ROOT_PATH = resolveWithSymlink(__dirname, '../../../../')


Used to avoid '../../../../' in path constants.

…ty powered by tree-sitter

philipp-spiess · 2023-08-28T11:50:51Z

vscode/src/completions/processInlineCompletions.ts

+    // Add parse errors info to completions
+    // Does nothing if `cody.autocomplete.experimental.syntacticPostProcessing` is not enabled.
+    // TODO: add explicit configuration check here when it's possible to avoid prop-drilling for config values.
+    const withParseInfo = addParseInfoToCompletions(uniqueResults, { document, position, docContext })


Out of curiosity: Is there a reason to do this outside of the rankCompletions function and patching the Completion type instead of passing it as arguments?

The plan is to use tree-sitter in processItem to rely on the parse tree for multi-line truncation, so I added it to the neutral place for now to better understand how to use it. Will need to move it around in follow-up PRs, and we would probably need to have an augmented completions type because parsing will be one of the first operations in the processInlineCompletions function.

Makes sense. I think what I don't "like" (very subjective btw feel free to ignore) is that we extend the object type and then leak the results to the callers, too (which might start depending on it and then the lines become blurry).

vscode/package.json

vscode/src/completions/document,

valerybugakov · 2023-08-29T02:51:57Z

vscode/CONTRIBUTING.md

- `dist`: build outputs from both webpack and vite
- `resources`: everything in this directory will be move to the ./dist directory automatically during build time for easy packaging
+- `dist`: build outputs from both esbuild and vite
+- `resources`: everything in this directory will be moved to the ./dist directory automatically during build time for easy packaging


Do you know if this is still true? @philipp-spiess @umpox @abeatrix
If so, do you know where the logic for that is defined?

Yeah I can't find anything for that anyways only that it is used as that static dir in the vite config which might copy it into dist?

It doesn't have to be in dist to be in the resulting package though AFAIK

https://sourcegraph.com/github.com/sourcegraph/cody/-/blob/web/vite.config.ts?L10

Yeah, I was thinking specifically about the VS Code package.

I think currently everything will be in the VS Code package even the source code 🙈

fun investigation in DMs

valerybugakov · 2023-08-29T02:53:25Z

vscode/package.json

@@ -22,13 +22,13 @@
    "build:dev:desktop": "concurrently \"pnpm run -s _build:esbuild:desktop\" \"pnpm run -s _build:webviews --mode development\"",
    "build:dev:web": "concurrently \"pnpm run -s _build:esbuild:web\" \"pnpm run -s _build:webviews --mode development\"",
    "watch:build:dev:web": "concurrently \"pnpm run -s _build:esbuild:web --watch\" \"pnpm run -s _build:webviews --mode development --watch\"",
-    "_build:esbuild:desktop": "esbuild ./src/extension.node.ts --bundle --outfile=dist/extension.node.js --external:vscode --format=cjs --platform=node --sourcemap",
+    "_build:esbuild:desktop": "pnpm download-wasm && esbuild ./src/extension.node.ts --bundle --outfile=dist/extension.node.js --external:vscode --format=cjs --platform=node --sourcemap",


We need to manually copy tree-sitter.wasm to the dist folder. It's fast because modules are already downloaded in post-install.

valerybugakov · 2023-08-29T02:54:29Z

vscode/tsconfig.json

@@ -14,7 +14,6 @@
    "test/e2e",
    "webviews",
    "vite.config.ts",
-    "webpack.config.js",


We do not have webpack in this repo.

philipp-spiess

Awesomesauce

philipp-spiess · 2023-08-29T08:25:26Z

vscode/package.json

    "_build:esbuild:web": "esbuild ./src/extension.web.ts --platform=browser --bundle --outfile=dist/extension.web.js --alias:path=path-browserify --external:vscode --define:process='{\"env\":{}}' --define:window=self --format=cjs --sourcemap",
    "_build:webviews": "vite -c webviews/vite.config.ts build",
    "lint": "pnpm run lint:js",
    "lint:js": "eslint --cache '**/*.[tj]s?(x)'",
    "release": "ts-node ./scripts/release.ts",
-    "download-wasm": "ts-node ./scripts/download-wasm-modules.ts",
+    "download-wasm": "ts-node-transpile-only ./scripts/download-wasm-modules.ts",


What does this change do?

Disables type-checking, which makes the execution faster. We can afford that because we check Typescript types on CI anyways.

philipp-spiess · 2023-08-29T08:27:20Z

vscode/src/completions/processInlineCompletions.ts

+    // Add parse errors info to completions
+    // Does nothing if `cody.autocomplete.experimental.syntacticPostProcessing` is not enabled.
+    // TODO: add explicit configuration check here when it's possible to avoid prop-drilling for config values.
+    const withParseInfo = addParseInfoToCompletions(uniqueResults, { document, position, docContext })


Makes sense. I think what I don't "like" (very subjective btw feel free to ignore) is that we extend the object type and then leak the results to the callers, too (which might start depending on it and then the lines become blurry).

philipp-spiess · 2023-08-29T08:28:56Z

vscode/src/completions/tree-sitter/grammars.ts

+ * TODO: Decouple language detect to make it editor agnostic
+ */
+export enum SupportedLanguage {
+    JavaScript = 'javascript',


Not a fan of introducing new symbols for the languages when we already have one (e.g. now we need to remember the exact casing for all of the languages below which are also cased very inconsistent ➡️ Php vs TSX). Is there anything here that a literal string enum can't do?

we need to remember the exact casing

Typescript covers us in cases where we need to reference enum keys.

Is there anything here that a literal string enum can't do?

It doesn't make a practical difference for our use case so I'm happy with both approaches. We can migrate to that in a follow-up.

Typescript covers us in cases where we need to reference enum keys.

It can do that for literal enums too btw 😬

philipp-spiess · 2023-08-29T08:30:42Z

vscode/src/completions/tree-sitter/parse-tree-cache.ts

+        return
+    }
+
+    const parser = await createParser({ language: parseLanguage })


If this never yields before the first completion request is done, the parser just won't be adde and we will do the completion without treesitter support, is that correct?

Asking because I see this function being called and not awaited in the code below. Might be good to add a quick comment if we do that on purpose

Added additional comment here. This behavior will be updated in a follow-up with the incremental parsing.

philipp-spiess · 2023-08-29T08:32:20Z

vscode/src/completions/processInlineCompletions.test.ts

+            ['array) {\nreturn array.sort()\n}', 'array) new\n']
+        )
+
+        expect(completions.map(c => c.hasParseErrors)).toEqual([false, true])


Nit: What do you think that instead of testing the implementation detail here, we instead yield two completions where only one is valid and assert on the behavior (that it is now ordered correctly) instead?

As discussed in the call, I will shuffle the implementation details in multiple follow-up PRs. I agree that testing the desirable behavior of the whole post-processing pipeline is better, and I plan to introduce that once all the tree-sitter pieces are stable.

valerybugakov added the autocomplete label Aug 28, 2023

valerybugakov self-assigned this Aug 28, 2023

valerybugakov commented Aug 28, 2023

View reviewed changes

valerybugakov added 3 commits August 28, 2023 18:49

Autocomplete: add naive suggestions ranking based on syntactic validi…

c3ab588

…ty powered by tree-sitter

Autocomplete: update the PR link

c01e14f

Autocomplete: revert some changes

a63d985

valerybugakov force-pushed the vb/tree-sitter-1 branch from a526bb8 to a63d985 Compare August 28, 2023 10:50

philipp-spiess reviewed Aug 28, 2023

View reviewed changes

vscode/package.json Outdated Show resolved Hide resolved

philipp-spiess reviewed Aug 28, 2023

View reviewed changes

vscode/src/completions/document, Outdated Show resolved Hide resolved

valerybugakov added 5 commits August 28, 2023 20:02

Autocomplete: remove redundant file

d7fd3c6

Autocomplete: revert redundant changes

d45ac77

Merge branch 'main' into vb/tree-sitter-1

523991f

Autocomplete: cp tree-sitter.wasm to dist on build

cf6e72c

Autocomplete: rename util to avoid merge conflicts

6e9221a

valerybugakov commented Aug 29, 2023

View reviewed changes

valerybugakov requested a review from philipp-spiess August 29, 2023 02:54

valerybugakov marked this pull request as ready for review August 29, 2023 02:54

valerybugakov commented Aug 29, 2023

View reviewed changes

valerybugakov requested a review from a team August 29, 2023 02:54

valerybugakov mentioned this pull request Aug 29, 2023

[Autocompletion]: Add initial tree-sitter post-processing #638

Closed

philipp-spiess approved these changes Aug 29, 2023

View reviewed changes

valerybugakov added 2 commits August 29, 2023 18:47

Autocomplete: add comment

8f610a3

Merge branch 'main' into vb/tree-sitter-1

08183f6

valerybugakov merged commit 436d799 into main Aug 29, 2023
9 checks passed

valerybugakov deleted the vb/tree-sitter-1 branch August 29, 2023 11:27

valerybugakov mentioned this pull request Aug 29, 2023

Tree-sitter: enable incremental parsing for tree-sitter #850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocomplete: add naive suggestions ranking based on syntactic validity #837

Autocomplete: add naive suggestions ranking based on syntactic validity #837

valerybugakov commented Aug 28, 2023 •

edited

Loading

valerybugakov Aug 28, 2023

valerybugakov Aug 28, 2023 •

edited

Loading

philipp-spiess Aug 28, 2023

valerybugakov Aug 29, 2023

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023 •

edited

Loading

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023

valerybugakov Aug 29, 2023

valerybugakov Aug 29, 2023

philipp-spiess left a comment

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023

philipp-spiess Aug 29, 2023

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023 •

edited

Loading

philipp-spiess Aug 29, 2023

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023

philipp-spiess Aug 29, 2023

valerybugakov Aug 29, 2023

Autocomplete: add naive suggestions ranking based on syntactic validity #837

Autocomplete: add naive suggestions ranking based on syntactic validity #837

Conversation

valerybugakov commented Aug 28, 2023 • edited Loading

Context

Test plan

Choose a reason for hiding this comment

valerybugakov Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

philipp-spiess left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov Aug 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valerybugakov commented Aug 28, 2023 •

edited

Loading

valerybugakov Aug 28, 2023 •

edited

Loading

valerybugakov Aug 29, 2023 •

edited

Loading

valerybugakov Aug 29, 2023 •

edited

Loading