Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite liberal-accept-strict-produce rule. Regenerate examples #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mergebandit
Copy link
Collaborator

This PR rewrites the liberal-accept-strict-produce rule.

Provided we don't actually change the intent of the rule, should I be generating new evals?

I chose to, kind of wanted to see what would happen - both when I generate them, then when I run the evals on them.

There were two false negatives (screenshots below) in the newly generated evals:

image

image

image

Copy link

vercel bot commented Apr 23, 2024

@mergebandit is attempting to deploy a commit to the Saasify Team on Vercel.

To accomplish this, @mergebandit needs to request access to the Team.

Afterwards, an owner of the Team is required to accept their membership request.

If you're already a member of the respective Vercel Team, make sure that your Personal Vercel Account is connected to your GitHub account.

@transitive-bullshit
Copy link
Collaborator

Provided we don't actually change the intent of the rule, should I be generating new evals?

Depends; I think it's fully reasonable to go either way. In this case, I think it's enough of a change that we should add some new evals.

@mergebandit mergebandit force-pushed the rewrite-liberal-accept-strict-produce-rule branch from d72a603 to a92773a Compare April 23, 2024 02:57
@mergebandit
Copy link
Collaborator Author

Updated with new description @transitive-bullshit

Interestingly, two of the previously generated evals were flagged. Here's one example:

image
image

I asked claude whether it thought that this was a violation, and it said yes. I then provided the reasoning to it, and it gave me an answer that was basically "yeah you raise a good point"

Curious what your take is.

Copy link
Collaborator

@transitive-bullshit transitive-bullshit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely getting better :) and the process of going through this, getting a feel for evals, false positives, false negatives, etc is invaluable for getting up to speed w/ gptlint 💯

Left a bunch of comments.

Overall, I'm worried that the incorrect examples generated by the current version aren't really related to the rule's intent. This is a good sign that the rule's currently not well-specified enough.

Here's my main advice: give it another pass focusing solely on functions and their typescript parameter types and return types. There are definitely some solid examples in here; we just need to iterate on making it more specific. Another good bad example to add would be a parseSchema function which returns any instead of a strict type – that'd be a very concrete code smell if you saw it in the wild. These are the types of patterns we're ultimately trying to focus on, so imho it's worth taking longer to really deeply understand the rule's intent and representative examples which show that intent. And unfortunately that'll be pretty time-consuming for now until we learn more. btw my gut says that this rule, if properly prompted, should be a keeper, but we'll go by the evals at the end of the day. 😄

// ~~~ Property 'lat' does not exist on type ...
// ~~~ Property 'lng' does not exist on type ... zoom; // Type is number | undefined
window.location.search = `?v=@${lat},${lng}z${zoom}`
// Only accepts a Date object, doesn't handle string or number input
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's important for every example to be in it's own, isolated code block.

this makes it clear to the model and humans where one example ends and the next begins.


In either case, with these new type declarations the `focusOnFeature` function passes the type checker:
// Accepts Date, string, or number input, attempts to parse the input, returns 'Invalid Date' for invalid input
function formatDate(date: Date | string | number): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the formatDate example

bearing?: number
pitch?: number
// Accepts numbers or strings, filters out valid numbers, handles empty array gracefully
function calculateAverage(numbers: (number | string)[]): number {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you saw this code, would you not raise an eyebrow? this function is really weird in accepting numbers and strings. examples should ideally be as golden as possible. I'd recommend removing this one

window.location.search = `?v=@${lat},${lng}z${zoom}`
// Accepts a user object with an optional name property or null, returns 'Anonymous' as a default value
function getUserName(user: { name?: string } | null): string {
if (user && user.name) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example is fine. in our own defensive programming rule (WIP), we recommend using nullish coalescing so we should try to be consistent.

return user?.name ?? 'Anonymous'

@@ -0,0 +1,11 @@
// Accepts any input and tries to convert it to JSON, returns a default error message for unserializable input
function safeStringify(input: any): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is auto-generated, but I try to look at every example manually. in this case, input: unknown is preferred over input: any

@@ -0,0 +1,7 @@
// Accepts a range of numeric inputs, returns a clamped value within a specified range
function clampValue(value: number, min: number, max: number): number {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example should prolly be thrown out. I try to remove examples which don't really add anything. in this case, this example doesn't really fit w/ the rule's intent

@@ -0,0 +1,10 @@
// Only processes strings of a certain length, doesn't handle other valid strings
function isShortWord(word: string): boolean {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend removing this example; not sure what it's adding

@@ -0,0 +1,10 @@
// Expects an exact boolean value, doesn't handle truthy or falsy values
function toggleFeature(isEnabled: boolean): void {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this example is testing

@@ -13,130 +13,75 @@ resources:
function_declaration
```

# Be liberal in what you accept and strict in what you produce
# Postel's Law: Be Liberal in What You Accept, Strict in What You Produce
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend removing Postel's Law from the title. that's a detail, and I've never heard about postel's law nor do I care about it's name imho.

Recommend keeping the title the same as before Be liberal in what you accept and strict in what you produce because titles should be sentences, not uppercased. This makes more sense as a convention when you view the rule file on github and when you run the linter with -D to view the model output. Also the and is important because grammar seems to correlate w/ llm dark magics


As a general best practice, input types should be broader than output types. Optional properties and union types are more common in parameter types than return types.
1. **Be liberal in what you accept**: When receiving input or interacting with other systems, be tolerant and accept a wide range of possible inputs. Handle minor variations, deviations, or even slightly malformed input gracefully, without rejecting it outright.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"interacting with other systems" – this is super vague. as a human reading this, I have no idea how to apply this rule.

I get the intent, but I've learned from lot of iterations that being a lot more specific helps a ton. In this case, we're only targeting functions, so focus on that use case versus the more general principle which could also be applied at the REST/API/service/etc level.

the same language is used further down too.

btw one "prompt engineering smell" that I'm seeing here (and am guilty of too sometimes) is that more text !== better prompt. just like w/ good writing, very succinct, clear, precise instructions are much more effective than longer, general, hand-wavy instructions.

this is partially why, at least for the time being, we need all prompts to be written by hand and iterated on w/ evals to try and get to the core essence of things.

@transitive-bullshit
Copy link
Collaborator

transitive-bullshit commented Apr 23, 2024

I asked claude whether it thought that this was a violation, and it said yes. I then provided the reasoning to it, and it gave me an answer that was basically "yeah you raise a good point"

My main feedback here is that this isn't the type of task you can / should be asking an LLM about, at least not until you have developed your own strong opinions on the rule's intent. We're already using an LLM to generate these; the real value in the evals is that they have had a human expert evaluate whether they're good positive/negative examples. And using LLMs to answer things about LLM-generated output quickly tends to degrade in utility.

For this code snippet in particular, it's not the best code, but imho it's also not an egregious violation. In the future, maybe this would be a confidence: "low" warning or something. What we really want is to ideally find some more egregious example snippets that would make your human senior engineer's spidey sense tingle – not some examples which kinda sorta don't conform. I know that's vague, but the goal of the eval set is to get better over time w/ human expert curated examples, and we're just using the LLM generation as a stepping stone towards that end. So if you're not sure about an example, then skip it. If an incorrect example doesn't make your spidey sense tingle, then it's probably not a worthwhile incorrect example.

btw a lot of the other rules need more passes w/ this same level of scrutiny. I've been applying this where I can, but have been spread thin across rules / testing / core linter, so just documenting the verbose version of how I'd like us to approach evals since this is imho so crucial to the long-term success of this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants