Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cody: Add support for server-side token limits to Chat #54488

Merged
merged 8 commits into from
Jul 5, 2023

Conversation

philipp-spiess
Copy link
Contributor

@philipp-spiess philipp-spiess commented Jun 30, 2023

This is a follow-up to @mrnugget's PR that added a new site config API to expose server-side token limits.

For now we only implement this for the chat providers and we also use hardcoded response limits.

Test plan

Defaults

Screenshot 2023-06-30 at 14 23 11

With local config overwrite (to stay backward compatible for now)

Screenshot 2023-06-30 at 14 24 29

With site config setting

Screenshot 2023-06-30 at 14 29 25 Screenshot 2023-06-30 at 14 26 25

@philipp-spiess philipp-spiess self-assigned this Jun 30, 2023
@cla-bot cla-bot bot added the cla-signed label Jun 30, 2023
@sourcegraph-bot
Copy link
Contributor

sourcegraph-bot commented Jun 30, 2023

📖 Storybook live preview

Comment on lines 219 to 222
return this.fetchSourcegraphAPI<APIResponse<any>>(CURRENT_SITE_CODY_LLM_CONFIGURATION, {}).then(response =>
extractDataOrError(response, data => data.site?.codyLLMConfiguration)
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the server is still on 5.0? I think we need to be backwards compatible and make this here more generic to also check for others fields:

public async getSiteHasIsCodyEnabledField(): Promise<boolean | Error> {
return this.fetchSourcegraphAPI<APIResponse<SiteGraphqlFieldsResponse>>(
CURRENT_SITE_GRAPHQL_FIELDS_QUERY,
{}
).then(response =>
extractDataOrError(response, data => !!data.__type?.fields?.find(field => field.name === 'isCodyEnabled'))
)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work but I haven't tested it - Do you know if we have an instance hosted that I can use or do I have to reset my dev box to test?

We gracefully "ignore" errors and make sure that the thing returned from GraphQL here can be undefined too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can easily run the Docker images:

docker run --publish 7080:7080 --publish 127.0.0.1:3370:3370 --rm --volume ~/.sourcegraph-docker/config:/etc/sourcegraph --volume ~/.sourcegraph-docker/data:/var/opt/sourcegraph sourcegraph/server:5.0

This runs 5.0, for example.

Or you comment out this line in your local dev instance:

codyLLMConfiguration: CodyLLMConfiguration

That will disable the resolver.

As for "it should work": it might work client-side, but I do think it'll produce errors on the backend and while we don't have extensive error reporting today on response codes, I don't think we should rely on server errors not bubbling up in client if we're already fetching the schema anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side-comment: it should be a 4xx error, so I don't feel terribly strong about this

Copy link
Member

@valerybugakov valerybugakov Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the resolver disabled on the local instance, the API returns a 200 response with the errors object. Let's address this in a follow-up if needed because it's unclear if the current behavior negatively affects backend.

{
  "errors": [
    {
      "message": "Cannot query field \"codyLLMConfiguration\" on type \"Site\". Did you mean \"configuration\"?",
      "locations": [
        {
          "line": 4,
          "column": 9
        }
      ]
    }
  ]
}

const authStatus = this.authProvider.getAuthStatus()

if (authStatus.configOverwrites?.chatModelMaxTokens) {
return authStatus.configOverwrites.chatModelMaxTokens - ANSWER_TOKENS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we intentionally ignore const solutionLimit = codyConfig.get<number>('provider.limit.solution') || ANSWER_TOKENS here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a small safety buffer of 100 tokens here, because we only do estimations and they might be slightly off.
The backend should still return the true values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, local config should probably take precedence?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local config should probably take precedence?

I would expect that.

Copy link
Member

@valerybugakov valerybugakov Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a small safety buffer of 100 tokens here, because we only do estimations and they might be slightly off.

@eseliger could you elaborate on that?

UPD: I added a buffer but I need clarification on whether this is what we need 🙂

Copy link
Member

@valerybugakov valerybugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mrnugget and @eseliger, I prepped the PR for merge. Let me know if it looks good for you.

return tokenLimit - localSolutionLimit
}

// TODO: add comment on why SAFETY_ANSWER_TOKENS is required here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this safety threshold is actually for the prompt, not the answer.

The problem with a token limit for the prompt is that we can only estimate tokens (and do so in a very cheap way), so it can be that we undercount tokens. If we exceed the maximum tokens, things will start to break, so we should have some safety cushion for when we're wrong in estimating.

Ie.: Long text, 10000 characters, we estimate it to be 2500 tokens. That would fit into a limit of 3000 tokens easily. Now, it's actually 3500 tokens, because it splits weird and our estimation is off, it will fail. That's where we want to add this safety cushion in :)

Copy link
Member

@valerybugakov valerybugakov Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Erik; I added your comment to the sources!

Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than my inline comment for what the cushion is for, LGTM - Thank you Philipp and Valery!

@valerybugakov valerybugakov enabled auto-merge (squash) July 5, 2023 10:05
@valerybugakov valerybugakov self-assigned this Jul 5, 2023
@valerybugakov valerybugakov merged commit 29b8d4e into main Jul 5, 2023
13 of 16 checks passed
@valerybugakov valerybugakov deleted the ps/cody-limits branch July 5, 2023 10:16
@philipp-spiess
Copy link
Contributor Author

Thanks for fixing this up for me @valerybugakov :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants