-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Koboldcpp provider #21
Comments
This is great! I also have koboldcpp running locally occasionally, so I would be happy to test this out.
Yes, absolutely! I have plans to support selecting a provider (and maybe other things like template format) from the UI/command panel. The more providers and template formats that wingman supports, the more unreasonable it becomes to force the user to define these as a per-command configuration. These things should probably be selected from the UI. This will be a pretty fundamental change to the extension itself. Not a big deal, but a lot of things will change related to how commands are defined, how the UI is built, etc.
Updates have been slow lately (I have a day job), but I'm still planning on maintaining it. I will likely make the changes I mentioned above soon. I have the UI somewhat prototyped already, but nothing is final. As you make changes in your branch that you believe are ready, I'd be happy to review any PR you want to open. This way, your changes will be included in the refactor I mentioned above. |
Nice: a way to select the provider was missing (I just set a provider as default in my branch to test my new ones). The question of the prompt templates is also crucial if you want to have a chance to get decent results from a model or another, good that we can have support for this. In my branch I added some default template and a default context size settings because I really need these features, but I am going to remove them and prepare a PR that can merge in the actual code. As we are discussing about refactoring and improvements, I add a list of the most important features I would like to have to use a local model:
|
Here is the PR #22 : please review the code. It will be usable when we will have this setting to select the default provider that you are working on |
Looking at the PR now.
I agree this is very useful, but: how will we infer which tokenizer we need to use at runtime? AFAIK
I agree. This is in progress, but not yet functional. I wanted to get your feedback on this approach I'm trying. "ChatML": {
system: "<|im_start|>system\n{system_message}<|im_end|>",
user: "<|im_start|>user\n{user_message}<|im_end|>\n<|im_start|>assistant\n",
first: "{system}\n{user}",
stops: ["<|im_end|>"],
},
"Llama 2": {
system: "<<SYS>>\n{system_message}\n<</SYS>>",
user: "<s>[INST] {user_message} [/INST]",
first: "<s>[INST] {system}\n\n{user_message} [/INST]",
stops: ["</s>"],
}, The above probably makes sense to you, but I will explain my thinking anyways. Tell me if I'm wrong or this is a bad approach.
I think the usefulness of this type of format configuration becomes apparent when examining the llama 2 format, most specifically.
|
Respond to my own comment here. I don't actually think this approach works. Tokenization approach really depends more on the model being used than the provider, and since a provider can potentially support many different models (e.g. OpenAI provider but speaking to Goinfer running some llama or llama 2 model), it doesn't make sense that the |
Yes it depends on the model family. We could have a mixed approach like defining default tokenizers for each providers, and have a tokenizer fallback setting and command param to let the user choose in case needed. For the local model providers the Llama tokenizer would be fine for most cases. For OpenAi the llama-tokenizer-js lib recommends the Gpt Tokenizer lib, or use the official Tiktoken one. I don't know about Anthropic. About the templating it looks good but I don't really get the first abstraction thing, I must read the code to get a better idea about this. The supported template variables should include the conversation history. I like the Orca mini format abstractions, it's pretty simple and clear:
Associating a stop sequence to a template might be a good idea, but in the api it is an inference parameter: please have a look a this data structure for reference: https://github.com/synw/infergui/blob/main/src/interfaces.ts#L29 : it's the Llama.cpp api implemented in Goinfer but Koboldcpp is pretty much similar By the way I have a question: would it be possible to edit the commands in a more convenient format than json? Like in human readable yaml files for example? For complex templates like few shots ones it is much more convenient. This is what I did in Goinfer with my concept of tasks. Example "command" in yaml: https://github.com/synw/goinfer/blob/main/examples/tasks/code/json/fix.yml (one shot prompt) |
Yeah, this might be the only approach that makes sense for wingman. I'll work on this in the refactor (I should publish the branch maybe today when it's in a good spot).
Maybe it's not needed, but it is my understanding that the first message in a conversation is usually formatted differently than all followup messages, like in the current Anthropic provider (simplified for example purposes): if (!isFollowup) {
prompt = `${system}${user} ${user_message}${assistant}`
} else {
prompt = `${history}${user} ${user_message}${assistant}`
} Becomes something like: if (!isFollowup) {
// formats with `format.first` as guide, e.g. `<s>[INST] {system}\n\n{user_message} [/INST]`
prompt = formatFirst(command, userMessage, llama2Template);
} else {
// formats with `format.user` as guide, e.g. `<s>[INST] {user_message} [/INST]`
prompt = `${history}${format(command, userMessage, llama2Template)}`;
} Since the first message is not always just Is this not necessary?
Ah, I overlooked this. Thank you for pointing this out to me. It definitely does feel right to define it on the template. Will have to think about this one some more.
Open to suggestions that are vscode-friendly. AFAIK the recommended method is to use |
Mostly lurking at the moment (also due to day job stuff) but hope to be able to contribute some more as well soon. Re: prompt config, having it in the json settings is a huge benefit because the config is automatically synced and always available. There are some other vs code extensions out there that let you customize prompts (flexigpt, continue, for example) as js or python objects in separate config files, but then it's up to the user to manage that js/python file and make sure that the path to it is correct and all that. I'm just one user here, but I use vs code on remote machines 90% of the time via the k8s or ssh remote extensions... wingman just works in that scenario. The other extensions require me to copy config/prompt files around. Personally, I'd prefer for common prompting formats + tokenizer selection to be built in (keyed by model name for example), with maybe an option to add a custom one to the FWIW I'm mainly a python programmer so when I was initially setting up a bunch of prompts I made a tiny script that let me define the prompt settings as a python object and then json encode it/escape it/etc to copy/paste into |
Completely agree. A new user with a fresh install should be able to set their API key and just start using it, entirely ignoring the configuration panel. Likely more than 90% of users will fall into that bucket as you mentioned. For the remaining few, it should be configurable enough such that they can use whatever provider/llm/format/etc. they want to use -- and it should be easy to do so. So far version 2 captures this idea, but there is still a bit more work to be done.
Yeah, since this is such a core feature of the extension it might make sense to have some small UI like this. I have no good ideas at the moment though. |
I made a Koboldcpp experimental provider: https://github.com/synw/wingman/blob/main/src/providers/koboldcpp.ts. I did it to be able to run inference from my 8Go RAM phone from Wingman queries using small models (Koboldcpp is the only thing that runs on my phone)
It would be nice to be able to switch provider depending on your prompt commands. I have different local servers running different backends: Goinfer on Linux, Koboldcpp on Android. I would like to be able to submit a query to one or another depending on the prompt. If the command could specify the provider it would be nice
By the way what's the plan for this extension: do you want to develop it further and maintain it, or not really? I am wondering because I am suggesting many changes and improvements, but they might not fit in your plan
The text was updated successfully, but these errors were encountered: