-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support configurable inference params per prompt #16
Comments
For reference on how the request is ultimately formed (in https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L184-L195 |
Yes it works: the parameters are correctly sent 🚀
how about a Llama.cpp compatible api? For example the tail free sampling is not supported in the ChatGpt api. I have an example of such an api here, or see the demo server in Llama.cpp for more params |
Maybe I'm misunderstanding, but you should be able to just put |
I use two different api on my server: the Llama.cpp one and the OpenAi one that I made recently to use Wingman. They run on different endpoints ( [Edit]: maybe I can help with the code as I already have this api implemented in frontend if you wish to go this way |
Oh I see, yeah if you can think of a good way to handle this you are welcomed to open up a PR - you may have a better idea of how to implement this than I do |
I'll wait until you will have replaced the chatgpt-api with your own code: then it should be easy to adapt the params and the endpoint name, with maybe a setting to select the Llama.cpp api |
Closing this as with the new providers + completionParams we have the feature |
It would be nice to support per prompt params. For now only the temperature param is supported, and can not be adjusted for different prompts because the setting is global
Example use hypothetical case: I may want a more creative setup for prompts like the Explain one, and a more deterministic for code gen. For example I could set a higher temperature and some tail free sampling for Explain, and want a lower temperature and lower top_p for the Analyze one
Some params from the Llama.cpp server api that I would like to have support for:
Ref: the Llama.cpp completion endpoint doc
It would also be nice to be able to have the
model
params per prompt for server tools that support multiple models at runtime. My use case: I have very small 3B and 7B models and I want to use one or the other depending on the prompt: I have very specific tasks tailored for a particular model with predefined inference params (example of the concept)The text was updated successfully, but these errors were encountered: