Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configurable inference params per prompt #16

Closed
synw opened this issue Aug 19, 2023 · 8 comments
Closed

Support configurable inference params per prompt #16

synw opened this issue Aug 19, 2023 · 8 comments

Comments

@synw
Copy link

synw commented Aug 19, 2023

It would be nice to support per prompt params. For now only the temperature param is supported, and can not be adjusted for different prompts because the setting is global

Example use hypothetical case: I may want a more creative setup for prompts like the Explain one, and a more deterministic for code gen. For example I could set a higher temperature and some tail free sampling for Explain, and want a lower temperature and lower top_p for the Analyze one

Some params from the Llama.cpp server api that I would like to have support for:

interface InferParams {
  n_predict: number;
  top_k: number;
  top_p: number;
  temperature: number;
  frequency_penalty: number;
  presence_penalty: number;
  repeat_penalty: number;
  tfs_z: number;
  stop: Array<string>;
}

Ref: the Llama.cpp completion endpoint doc

It would also be nice to be able to have the model params per prompt for server tools that support multiple models at runtime. My use case: I have very small 3B and 7B models and I want to use one or the other depending on the prompt: I have very specific tasks tailored for a particular model with predefined inference params (example of the concept)

@nvms
Copy link
Owner

nvms commented Aug 19, 2023

I pushed a version (1.3.17) just now that should support this, if you want to give it a try. Prompt templates now recognize a completionParams object, which will be passed to the endpoint, e.g.:

  {
    label: "Explain",
    description: "Explains the selected code.",
    userMessageTemplate:
      "Explain the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nExplain as if you were explaining to another developer.\n\n{{input:What specifically do you need explained? Leave blank for general explaination.}}",
    completionParams: {
      temperature: 0.1,
      frequency_penalty: 1.1,
    },
  },

  {
    label: "Fix known bug",
    description: "Prompts for a description of the bug, and attempts to resolve it.",
    userMessageTemplate:
      "I have the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nYour task is to find and fix a bug. Apart from fixing the bug, do not change the code.\n\nDescription of bug: {{input:Briefly describe the bug.}}\n\nIMPORTANT: Only return the code inside a code fence and nothing else. Do not explain your fix or changes in any way.",
    callbackType: "replace",
    completionParams: {
      temperature: 0.9,
    },
  },

Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400:

Screenshot 2023-08-19 at 8 16 57 AM

@nvms
Copy link
Owner

nvms commented Aug 19, 2023

For reference on how the request is ultimately formed (in transitive-bullshit/chatgpt-api):

https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L184-L195

@synw
Copy link
Author

synw commented Aug 19, 2023

Yes it works: the parameters are correctly sent 🚀

Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400

how about a Llama.cpp compatible api? For example the tail free sampling is not supported in the ChatGpt api. I have an example of such an api here, or see the demo server in Llama.cpp for more params

@nvms
Copy link
Owner

nvms commented Aug 19, 2023

Maybe I'm misunderstanding, but you should be able to just put tfs_z in your completionParams and it will be sent. In fact, anything you put in there will be spread into the body of the JSON payload.

@synw
Copy link
Author

synw commented Aug 19, 2023

I use two different api on my server: the Llama.cpp one and the OpenAi one that I made recently to use Wingman. They run on different endpoints (/v1/chat/completions for OpenAi and /completion for the Llama.cpp one). I would like to stick to these official api specs to avoid confusion for the users. If I start to add things to the OpenAi api it would introduce confusion I think, so it would be better to have a way to support the Llama.cpp api

[Edit]: maybe I can help with the code as I already have this api implemented in frontend if you wish to go this way

@nvms
Copy link
Owner

nvms commented Aug 19, 2023

Oh I see, yeah if you can think of a good way to handle this you are welcomed to open up a PR - you may have a better idea of how to implement this than I do

@synw
Copy link
Author

synw commented Aug 19, 2023

I'll wait until you will have replaced the chatgpt-api with your own code: then it should be easy to adapt the params and the endpoint name, with maybe a setting to select the Llama.cpp api

@synw
Copy link
Author

synw commented Sep 1, 2023

Closing this as with the new providers + completionParams we have the feature

@synw synw closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants