Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use set, inferred max token limits wherever chat models are used #713

Merged
merged 1 commit into from Apr 23, 2024

Conversation

debanjum
Copy link
Collaborator

@debanjum debanjum commented Apr 20, 2024

  • User configured max tokens limits weren't being passed to
    send_message_to_model_wrapper
  • One of the load offline model code paths wasn't reachable. Remove it
    to simplify code
  • When max prompt size isn't set infer max tokens based on free VRAM
    on machine
  • Use min of app configured max tokens, vram based max tokens and
    model context window

- User configured max tokens limits weren't being passed to
  `send_message_to_model_wrapper'
- One of the load offline model code paths wasn't reachable. Remove it
  to simplify code
- When max prompt size isn't set infer max tokens based on free VRAM
  on machine
- Use min of app configured max tokens, vram based max tokens and
  model context window
@debanjum debanjum force-pushed the enforce-max-token-limits-wherever-chat-model-used branch from 8efb4bc to 175169c Compare April 20, 2024 05:53
@debanjum debanjum merged commit 419b044 into master Apr 23, 2024
7 checks passed
@debanjum debanjum deleted the enforce-max-token-limits-wherever-chat-model-used branch April 23, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant