-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cody: Add support for server-side token limits to Chat #54488
Conversation
return this.fetchSourcegraphAPI<APIResponse<any>>(CURRENT_SITE_CODY_LLM_CONFIGURATION, {}).then(response => | ||
extractDataOrError(response, data => data.site?.codyLLMConfiguration) | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the server is still on 5.0? I think we need to be backwards compatible and make this here more generic to also check for others fields:
sourcegraph/client/cody-shared/src/sourcegraph-api/graphql/client.ts
Lines 171 to 178 in 3356721
public async getSiteHasIsCodyEnabledField(): Promise<boolean | Error> { | |
return this.fetchSourcegraphAPI<APIResponse<SiteGraphqlFieldsResponse>>( | |
CURRENT_SITE_GRAPHQL_FIELDS_QUERY, | |
{} | |
).then(response => | |
extractDataOrError(response, data => !!data.__type?.fields?.find(field => field.name === 'isCodyEnabled')) | |
) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work but I haven't tested it - Do you know if we have an instance hosted that I can use or do I have to reset my dev box to test?
We gracefully "ignore" errors and make sure that the thing returned from GraphQL here can be undefined
too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can easily run the Docker images:
docker run --publish 7080:7080 --publish 127.0.0.1:3370:3370 --rm --volume ~/.sourcegraph-docker/config:/etc/sourcegraph --volume ~/.sourcegraph-docker/data:/var/opt/sourcegraph sourcegraph/server:5.0
This runs 5.0
, for example.
Or you comment out this line in your local dev instance:
codyLLMConfiguration: CodyLLMConfiguration |
That will disable the resolver.
As for "it should work": it might work client-side, but I do think it'll produce errors on the backend and while we don't have extensive error reporting today on response codes, I don't think we should rely on server errors not bubbling up in client if we're already fetching the schema anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side-comment: it should be a 4xx error, so I don't feel terribly strong about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the resolver disabled on the local instance, the API returns a 200 response with the errors object. Let's address this in a follow-up if needed because it's unclear if the current behavior negatively affects backend.
{
"errors": [
{
"message": "Cannot query field \"codyLLMConfiguration\" on type \"Site\". Did you mean \"configuration\"?",
"locations": [
{
"line": 4,
"column": 9
}
]
}
]
}
const authStatus = this.authProvider.getAuthStatus() | ||
|
||
if (authStatus.configOverwrites?.chatModelMaxTokens) { | ||
return authStatus.configOverwrites.chatModelMaxTokens - ANSWER_TOKENS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we intentionally ignore const solutionLimit = codyConfig.get<number>('provider.limit.solution') || ANSWER_TOKENS
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a small safety buffer of 100 tokens here, because we only do estimations and they might be slightly off.
The backend should still return the true values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, local config should probably take precedence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local config should probably take precedence?
I would expect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a small safety buffer of 100 tokens here, because we only do estimations and they might be slightly off.
@eseliger could you elaborate on that?
UPD: I added a buffer but I need clarification on whether this is what we need 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return tokenLimit - localSolutionLimit | ||
} | ||
|
||
// TODO: add comment on why SAFETY_ANSWER_TOKENS is required here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this safety threshold is actually for the prompt, not the answer.
The problem with a token limit for the prompt is that we can only estimate tokens (and do so in a very cheap way), so it can be that we undercount tokens. If we exceed the maximum tokens, things will start to break, so we should have some safety cushion for when we're wrong in estimating.
Ie.: Long text, 10000 characters, we estimate it to be 2500 tokens. That would fit into a limit of 3000 tokens easily. Now, it's actually 3500 tokens, because it splits weird and our estimation is off, it will fail. That's where we want to add this safety cushion in :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Erik; I added your comment to the sources!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my inline comment for what the cushion is for, LGTM - Thank you Philipp and Valery!
Thanks for fixing this up for me @valerybugakov :) |
This is a follow-up to @mrnugget's PR that added a new site config API to expose server-side token limits.
For now we only implement this for the chat providers and we also use hardcoded response limits.
Test plan
Defaults
With local config overwrite (to stay backward compatible for now)
With site config setting