Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

mergebandit · 2024-04-23T01:57:34Z

This PR rewrites the liberal-accept-strict-produce rule.

Provided we don't actually change the intent of the rule, should I be generating new evals?

I chose to, kind of wanted to see what would happen - both when I generate them, then when I run the evals on them.

There were two false negatives (screenshots below) in the newly generated evals:

vercel · 2024-04-23T01:57:37Z

@mergebandit is attempting to deploy a commit to the Saasify Team on Vercel.

To accomplish this, @mergebandit needs to request access to the Team.

Afterwards, an owner of the Team is required to accept their membership request.

If you're already a member of the respective Vercel Team, make sure that your Personal Vercel Account is connected to your GitHub account.

transitive-bullshit · 2024-04-23T02:04:40Z

Provided we don't actually change the intent of the rule, should I be generating new evals?

Depends; I think it's fully reasonable to go either way. In this case, I think it's enough of a change that we should add some new evals.

mergebandit · 2024-04-23T03:00:03Z

Updated with new description @transitive-bullshit

Interestingly, two of the previously generated evals were flagged. Here's one example:

I asked claude whether it thought that this was a violation, and it said yes. I then provided the reasoning to it, and it gave me an answer that was basically "yeah you raise a good point"

Curious what your take is.

transitive-bullshit

This is definitely getting better :) and the process of going through this, getting a feel for evals, false positives, false negatives, etc is invaluable for getting up to speed w/ gptlint 💯

Left a bunch of comments.

Overall, I'm worried that the incorrect examples generated by the current version aren't really related to the rule's intent. This is a good sign that the rule's currently not well-specified enough.

Here's my main advice: give it another pass focusing solely on functions and their typescript parameter types and return types. There are definitely some solid examples in here; we just need to iterate on making it more specific. Another good bad example to add would be a parseSchema function which returns any instead of a strict type – that'd be a very concrete code smell if you saw it in the wild. These are the types of patterns we're ultimately trying to focus on, so imho it's worth taking longer to really deeply understand the rule's intent and representative examples which show that intent. And unfortunately that'll be pretty time-consuming for now until we learn more. btw my gut says that this rule, if properly prompted, should be a keeper, but we'll go by the evals at the end of the day. 😄

transitive-bullshit · 2024-04-23T04:54:09Z

.gptlint/liberal-accept-strict-produce.md

-  // ~~~ Property 'lat' does not exist on type ...
-  // ~~~ Property 'lng' does not exist on type ... zoom; // Type is number | undefined
-  window.location.search = `?v=@${lat},${lng}z${zoom}`
+// Only accepts a Date object, doesn't handle string or number input


it's important for every example to be in it's own, isolated code block.

this makes it clear to the model and humans where one example ends and the next begins.

transitive-bullshit · 2024-04-23T04:55:30Z

.gptlint/liberal-accept-strict-produce.md


-In either case, with these new type declarations the `focusOnFeature` function passes the type checker:
+// Accepts Date, string, or number input, attempts to parse the input, returns 'Invalid Date' for invalid input
+function formatDate(date: Date | string | number): string {


I like the formatDate example

transitive-bullshit · 2024-04-23T04:56:15Z

.gptlint/liberal-accept-strict-produce.md

-  bearing?: number
-  pitch?: number
+// Accepts numbers or strings, filters out valid numbers, handles empty array gracefully
+function calculateAverage(numbers: (number | string)[]): number {


if you saw this code, would you not raise an eyebrow? this function is really weird in accepting numbers and strings. examples should ideally be as golden as possible. I'd recommend removing this one

transitive-bullshit · 2024-04-23T04:57:23Z

.gptlint/liberal-accept-strict-produce.md

-  window.location.search = `?v=@${lat},${lng}z${zoom}`
+// Accepts a user object with an optional name property or null, returns 'Anonymous' as a default value
+function getUserName(user: { name?: string } | null): string {
+  if (user && user.name) {


this example is fine. in our own defensive programming rule (WIP), we recommend using nullish coalescing so we should try to be consistent.

return user?.name ?? 'Anonymous'

transitive-bullshit · 2024-04-23T04:58:39Z

fixtures/evals/liberal-accept-strict-produce/correct/170a6230.ts

@@ -0,0 +1,11 @@
+// Accepts any input and tries to convert it to JSON, returns a default error message for unserializable input
+function safeStringify(input: any): string {


I know this is auto-generated, but I try to look at every example manually. in this case, input: unknown is preferred over input: any

transitive-bullshit · 2024-04-23T05:02:04Z

fixtures/evals/liberal-accept-strict-produce/correct/f671e2f1.ts

@@ -0,0 +1,7 @@
+// Accepts a range of numeric inputs, returns a clamped value within a specified range
+function clampValue(value: number, min: number, max: number): number {


this example should prolly be thrown out. I try to remove examples which don't really add anything. in this case, this example doesn't really fit w/ the rule's intent

transitive-bullshit · 2024-04-23T05:02:55Z

fixtures/evals/liberal-accept-strict-produce/incorrect/16e16c05.ts

@@ -0,0 +1,10 @@
+// Only processes strings of a certain length, doesn't handle other valid strings
+function isShortWord(word: string): boolean {


recommend removing this example; not sure what it's adding

transitive-bullshit · 2024-04-23T05:03:38Z

fixtures/evals/liberal-accept-strict-produce/incorrect/3725942e.ts

@@ -0,0 +1,10 @@
+// Expects an exact boolean value, doesn't handle truthy or falsy values
+function toggleFeature(isEnabled: boolean): void {


not sure what this example is testing

transitive-bullshit · 2024-04-23T05:06:44Z

.gptlint/liberal-accept-strict-produce.md

@@ -13,130 +13,75 @@ resources:
 function_declaration
 ```

-# Be liberal in what you accept and strict in what you produce
+# Postel's Law: Be Liberal in What You Accept, Strict in What You Produce


I'd recommend removing Postel's Law from the title. that's a detail, and I've never heard about postel's law nor do I care about it's name imho.

Recommend keeping the title the same as before Be liberal in what you accept and strict in what you produce because titles should be sentences, not uppercased. This makes more sense as a convention when you view the rule file on github and when you run the linter with -D to view the model output. Also the and is important because grammar seems to correlate w/ llm dark magics

transitive-bullshit · 2024-04-23T05:11:10Z

.gptlint/liberal-accept-strict-produce.md


-As a general best practice, input types should be broader than output types. Optional properties and union types are more common in parameter types than return types.
+1. **Be liberal in what you accept**: When receiving input or interacting with other systems, be tolerant and accept a wide range of possible inputs. Handle minor variations, deviations, or even slightly malformed input gracefully, without rejecting it outright.


"interacting with other systems" – this is super vague. as a human reading this, I have no idea how to apply this rule.

I get the intent, but I've learned from lot of iterations that being a lot more specific helps a ton. In this case, we're only targeting functions, so focus on that use case versus the more general principle which could also be applied at the REST/API/service/etc level.

the same language is used further down too.

btw one "prompt engineering smell" that I'm seeing here (and am guilty of too sometimes) is that more text !== better prompt. just like w/ good writing, very succinct, clear, precise instructions are much more effective than longer, general, hand-wavy instructions.

this is partially why, at least for the time being, we need all prompts to be written by hand and iterated on w/ evals to try and get to the core essence of things.

transitive-bullshit · 2024-04-23T05:27:46Z

I asked claude whether it thought that this was a violation, and it said yes. I then provided the reasoning to it, and it gave me an answer that was basically "yeah you raise a good point"

My main feedback here is that this isn't the type of task you can / should be asking an LLM about, at least not until you have developed your own strong opinions on the rule's intent. We're already using an LLM to generate these; the real value in the evals is that they have had a human expert evaluate whether they're good positive/negative examples. And using LLMs to answer things about LLM-generated output quickly tends to degrade in utility.

For this code snippet in particular, it's not the best code, but imho it's also not an egregious violation. In the future, maybe this would be a confidence: "low" warning or something. What we really want is to ideally find some more egregious example snippets that would make your human senior engineer's spidey sense tingle – not some examples which kinda sorta don't conform. I know that's vague, but the goal of the eval set is to get better over time w/ human expert curated examples, and we're just using the LLM generation as a stepping stone towards that end. So if you're not sure about an example, then skip it. If an incorrect example doesn't make your spidey sense tingle, then it's probably not a worthwhile incorrect example.

btw a lot of the other rules need more passes w/ this same level of scrutiny. I've been applying this where I can, but have been spread thin across rules / testing / core linter, so just documenting the verbose version of how I'd like us to approach evals since this is imho so crucial to the long-term success of this project.

mergebandit requested a review from transitive-bullshit April 23, 2024 01:57

rewrite liberal-accept-strict-produce rule and regenerate examples

a92773a

mergebandit force-pushed the rewrite-liberal-accept-strict-produce-rule branch from d72a603 to a92773a Compare April 23, 2024 02:57

transitive-bullshit requested changes Apr 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

mergebandit commented Apr 23, 2024

vercel bot commented Apr 23, 2024

transitive-bullshit commented Apr 23, 2024

mergebandit commented Apr 23, 2024

transitive-bullshit left a comment

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024

transitive-bullshit Apr 23, 2024 •

edited

transitive-bullshit Apr 23, 2024 •

edited

transitive-bullshit commented Apr 23, 2024 •

edited

		@@ -0,0 +1,11 @@
		// Accepts any input and tries to convert it to JSON, returns a default error message for unserializable input
		function safeStringify(input: any): string {

		@@ -0,0 +1,7 @@
		// Accepts a range of numeric inputs, returns a clamped value within a specified range
		function clampValue(value: number, min: number, max: number): number {

		@@ -0,0 +1,10 @@
		// Only processes strings of a certain length, doesn't handle other valid strings
		function isShortWord(word: string): boolean {

		@@ -0,0 +1,10 @@
		// Expects an exact boolean value, doesn't handle truthy or falsy values
		function toggleFeature(isEnabled: boolean): void {


		As a general best practice, input types should be broader than output types. Optional properties and union types are more common in parameter types than return types.
		1. Be liberal in what you accept: When receiving input or interacting with other systems, be tolerant and accept a wide range of possible inputs. Handle minor variations, deviations, or even slightly malformed input gracefully, without rejecting it outright.

Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

Are you sure you want to change the base?

Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

Conversation

mergebandit commented Apr 23, 2024

vercel bot commented Apr 23, 2024

transitive-bullshit commented Apr 23, 2024

mergebandit commented Apr 23, 2024

transitive-bullshit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

transitive-bullshit Apr 23, 2024 • edited

Choose a reason for hiding this comment

transitive-bullshit Apr 23, 2024 • edited

Choose a reason for hiding this comment

transitive-bullshit commented Apr 23, 2024 • edited

transitive-bullshit Apr 23, 2024 •

edited

transitive-bullshit Apr 23, 2024 •

edited

transitive-bullshit commented Apr 23, 2024 •

edited