Add PoC of gritql integration #12

transitive-bullshit · 2024-04-14T19:02:50Z

https://github.com/getgrit/gritql - GritQL is a query language for searching, linting, and modifying code.
- it's an OSS project made by the awesome folks over at https://www.grit.io which I think is very complementary to gptlint 😊

This PR adds an optional dependency on the Rust-based grit binary via the @getgrit/launcher npm package.

Goals of adding gritql:

enable file-scoped rules to optionally specify gritql patterns to extract more focused source code context for the LLM-based linter to work with
increase the accuracy of the linter by focusing the source context
increase the speed of the linter by passing less context
reduce the cost of the linter by passing less context
gritql wraps tree-sitter and a bunch of tree-sitter wasm grammers, which are generally a pain to deal with (see my exploration here for more details and alternatives), so one potential upside of using gritql is that it will make cross-language and multi-file-context support a lot easier vs using tree-sitter directly
gritql also natively supports specifying before & after code transforms in their pattern syntax via the => operator. when we do end up tackling autofix, this could be a useful, more deterministic tool to augment LLM-based code editing for many built-in rules

Most of these goals still need to be quantified and tested in real-world repos as well as our evals before we consider merging. I'm not sure if we'll include this in the MVP with our goal of launching next week, but given how large of an impact this could potentially have, I was excited to explore this direction to understand viability and quantify the potential impact.

See also honeycombio/gritql#214. If we end up adopting this approach, we'll likely want to use a more generally compatible solution as opposed to running grit as a subprocess, but this would be good enough for the purposes of this PoC for now.

vercel · 2024-04-14T19:02:57Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
gptlint	✅ Ready (Inspect)	Visit Preview	Apr 16, 2024 0:56am

… are not accurate enough yet

transitive-bullshit · 2024-04-15T04:37:42Z

OpenAI before & after tests

All results without grit were run with the --no-grit flag. I tried running the same tests with Anthropic Claude 3 Haiku & Opus, but ran into a lot of random Cloudflare 524 errors which corrupted the results.

OpenAI gptlint results with grit

Linter tasks { numTasks: 328, numTasksFilteredGrit: 147, numTasksDisabled: 1 }
Linter stats; total cost $2.52 {
  model: 'gpt-4-turbo-preview',
  weakModel: 'gpt-3.5-turbo',
  numModelCalls: 397,
  numModelCallsCached: 0,
  numPromptTokens: 628157,
  numCompletionTokens: 60081,
  numTotalTokens: 688238,
  lintDuration: '1m 5.2s'
}

lint errors: [
  {
    "model": "gpt-4-turbo-preview",
    "level": "error",
    "filePath": "bin/grit.ts",
    "language": "typescript",
    "message": "Use semantic variable names",
    "ruleName": "semantic-variable-names",
    "codeSnippet": "const partialFilesC = [...partialFilesMap.values()].filter(",
    "confidence": "high",
    "reasoning": "The variable name 'partialFilesC' is not descriptive and does not clearly convey the purpose or content of the variable. The use of 'C' is unclear and does not follow best practices for semantic variable names."
  }
]

OpenAI gptlint results without grit

Linter tasks { numTasks: 475, numTasksDisabled: 1 }
Linter stats; total cost $3.16 {
  model: 'gpt-4-turbo-preview',
  weakModel: 'gpt-3.5-turbo',
  numModelCalls: 521,
  numModelCallsCached: 0,
  numPromptTokens: 1197978,
  numCompletionTokens: 61159,
  numTotalTokens: 1259137,
  lintDuration: '54.6s'
}

lint errors: [
  {
    "model": "gpt-4-turbo-preview",
    "level": "error",
    "filePath": "bin/generate-evals.ts",
    "language": "typescript",
    "message": "Prefer types that always represent valid states",
    "ruleName": "prefer-types-always-valid-states",
    "codeSnippet": "state.isLoading = true\nstate.isLoading = false\nstate.error = '' + e",
    "confidence": "high",
    "reasoning": "The provided code snippet directly manipulates state properties in a way that could lead to invalid states, such as setting isLoading to true then to false, and setting an error without a clear transition state. This approach does not ensure that the state always represents valid and consistent states, violating the rule's intent to prefer types that always represent valid states."
  }
]

OpenAI eval results with grit

Linter stats; total cost $3.48 {
  model: 'gpt-4-turbo-preview',
  weakModel: 'gpt-3.5-turbo',
  numModelCalls: 428,
  numModelCallsCached: 0,
  numPromptTokens: 599655,
  numCompletionTokens: 57969,
  numTotalTokens: 657624
}

Eval results {
  numFiles: 292,
  numRules: 13,
  numUnexpectedErrors: 0,
  numFalseNegatives: 1,
  numFalsePositives: 0,
  numTrueNegatives: 145,
  numTruePositives: 146,
  precision: 1,
  recall: 0.9931972789115646,
  f1Score: 0.9965870307167235
}

OpenAI eval results without grit

Linter stats; total cost $3.59 {
  model: 'gpt-4-turbo-preview',
  weakModel: 'gpt-3.5-turbo',
  numModelCalls: 451,
  numModelCallsCached: 0,
  numPromptTokens: 635057,
  numCompletionTokens: 60404,
  numTotalTokens: 695461
}

Eval results {
  numFiles: 292,
  numRules: 13,
  numUnexpectedErrors: 0,
  numFalseNegatives: 1,
  numFalsePositives: 0,
  numTrueNegatives: 145,
  numTruePositives: 146,
  precision: 1,
  recall: 0.9931972789115646,
  f1Score: 0.9965870307167235
}

Analysis

The eval results are roughly the same, which makes sense because all of the generated evals are purposefully minimal, and gritql is currently only being used to filter source files.

The more interesting comparison is the results of running the linter on the gptlint codebase itself. Here, using grit resulted in a 20% cost reduction ($3.16 to $2.52) but is 15% slower (65s vs 54s). The grit version correctly found 1 true positive error, whereas the --no-grit version incorrectly found 1 false positive and has one false negative. So it's clear that the grit version is more accurate for the current rules, but it's difficult to quantify how much more accurate without more extensive testing.

Overall, these results aren't as significant as I was expecting, but filtering out source files with no matches does provide a significant cost reduction, and I expect this difference to be proportional to the size of the codebase.

Also, there are a lot of the built-in rules which are still sending too much context to the LLM, and several of the rules don't use gritql filters at all at the moment, so there is definitely still work to be done which I believe will increase these gains.

I was a bit surprised that the grit version is 15% slower than the non-grit version since the grit version made only 397 LLM calls whereas the --no-grit version made 521 LLM calls. This discrepancy is likely due to invoking grit multiple times (once per rule with a gritql pattern) and scanning the codebase for each of these grit invocations. This can definitely be optimized, but I'd rather focus on ensuring that this is the right direction first before spending too much time optimizing the gritql integration. It was a bit surprising, though, since I would've expected the LLM inference time for 124 extra LLM calls to dominate the runtime compared to our local grit invocations.

transitive-bullshit · 2024-04-15T04:43:18Z

Also interesting to note that the grit version used ~2x less prompt tokens than the --no-grit version (628,157 vs 1,197,978) but they both used approximately the same number of response completion tokens (60,081 vs 61,159) which are more expensive than prompt tokens.

feat: add PoC of gritql integration

a5385fd

transitive-bullshit added the enhancement New feature or request label Apr 14, 2024

transitive-bullshit requested a review from mergebandit April 14, 2024 19:02

feat: incorporate gritql filtering into linting engine and evals

65347d0

vercel bot deployed to Preview April 14, 2024 21:29 View deployment

transitive-bullshit mentioned this pull request Apr 14, 2024

NPM Package for gritql honeycombio/gritql#214

Open

feat: fix rule markdown parser to ignore grit code blocks

5051303

vercel bot deployed to Preview April 14, 2024 21:48 View deployment

feat: enable gritql for more built-in rules

4efdec8

vercel bot deployed to Preview April 15, 2024 01:36 View deployment

transitive-bullshit added 2 commits April 14, 2024 22:35

fix: improve linter with and without grit; deprecate some rules which…

5f32e70

… are not accurate enough yet

fix: ci

509e8fe

vercel bot deployed to Preview April 15, 2024 03:37 View deployment

🌭

584b598

vercel bot deployed to Preview April 15, 2024 03:42 View deployment

💒

5ca2077

vercel bot deployed to Preview April 15, 2024 03:52 View deployment

⏸

6671168

vercel bot deployed to Preview April 15, 2024 04:40 View deployment

🙄

3a2ba39

vercel bot deployed to Preview April 15, 2024 04:44 View deployment

fix: support local grit install

61ce674

vercel bot deployed to Preview April 15, 2024 20:22 View deployment

docs: update docs to include gritql

7964d58

vercel bot deployed to Preview April 15, 2024 21:05 View deployment

feat: improve built-in rules with grit

eb1badf

vercel bot deployed to Preview April 16, 2024 00:03 View deployment

feat: optimize gritql for prefer-loose-array-bounds-checks-in-loops

9804f98

vercel bot deployed to Preview April 16, 2024 00:27 View deployment

fix: ci

a822f17

vercel bot deployed to Preview April 16, 2024 00:56 View deployment

transitive-bullshit merged commit 583073a into main Apr 16, 2024

transitive-bullshit deleted the feature/gritql branch April 16, 2024 00:58

transitive-bullshit changed the title ~~WIP Add PoC of gritql integration~~ Add PoC of gritql integration Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add PoC of gritql integration #12

Add PoC of gritql integration #12

Uh oh!

transitive-bullshit commented Apr 14, 2024

Uh oh!

vercel bot commented Apr 14, 2024 •

edited

Loading

Uh oh!

transitive-bullshit commented Apr 15, 2024 •

edited

Loading

Uh oh!

transitive-bullshit commented Apr 15, 2024

Uh oh!

Uh oh!

Uh oh!

Add PoC of gritql integration #12

Add PoC of gritql integration #12

Uh oh!

Conversation

transitive-bullshit commented Apr 14, 2024

Uh oh!

vercel bot commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

transitive-bullshit commented Apr 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenAI before & after tests

OpenAI gptlint results with grit

OpenAI gptlint results without grit

OpenAI eval results with grit

OpenAI eval results without grit

Analysis

Uh oh!

transitive-bullshit commented Apr 15, 2024

Uh oh!

Uh oh!

vercel bot commented Apr 14, 2024 •

edited

Loading

transitive-bullshit commented Apr 15, 2024 •

edited

Loading