I found a way how to use these models directly with Text Generation WebUI #24

GMartin-dev · 2023-05-08T23:11:23Z

From the README
"If you try an unsupported model, you'll see "gibberish output".
This happens for instance with https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
If you know how to use these models directly with Text Generation WebUI please share your expertise :)"

I Managed to get this working on my local on Linux. with
https://huggingface.co/4bit/vicuna-13B-1.1-GPTQ-4bit-128g
https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ
https://huggingface.co/4bit/gpt4-x-alpaca-13b-native-4bit-128g-cuda
https://huggingface.co/4bit/stable-vicuna-13B-GPTQ

If that helps, my setup:

Latest revision in branch main in https://github.com/oobabooga/text-generation-webui.git
GPTQ-for-LLaMa: using https://github.com/qwopqwop200/GPTQ-for-LLaMa revision:
eb3be97 ("I noticed a slowdown. Revert the code.", 2023-04-26)

load with:
python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama --api

Currently running models in a nvidia A2000 and consuming from langchain just using api endpoint... but simple stuff no agents. Alpaca and vicuna 1.1 are the best ones for me so far.
I was about to try to use embeddings and found your repo... great work!
Trying to understand how did you managed to get embeddings working xD.

paolorechia · 2023-05-09T09:38:22Z

Nice, if you want feel free to update the readme on a PR. I can also add / commit your instructions directly when I get time, whatever works better

mikolodz · 2023-05-09T19:07:28Z

@paolorechia Your repo is exactly what I was looking for! Thank you for your effort!
@GDrupal Thanks for your insight. I will try to get it working this evening. You really find that vicuna 1.1 works better w/ langchain than stable vicuna or wizard vicuna? It could be due to the instruction scheme I guess.. I found out that after fine tuning wizard-vicuna on e.g. sensmaking_train set I get completely different results based on chat mode used. I mean.. like the opposite results for the same question with the temperature = 0.01..

GMartin-dev · 2023-05-09T21:37:50Z

The newer models, specifically designed to be more conversational, are inconsistent for my particular use case. I'm working on creating a user-guided app that generates technical content using third-party APIs. My goal: obtain structured JSON output from the models.
Alpaca is not very creative in terms of content but provides consistent JSON (sweet spot is temp 1.9, to get decent content and still get the JSON).
Vicuna 1.1 produces better content but the JSON output is somewhat random. Other models either ignore the JSON requirement entirely or produce a strange mix of comments and JSON without any way to enforce compliance.
I've experimented with a million prompt variants and various temperature settings, but the results remain unsatisfactory. Additionally, I've noticed that the output differs significantly between the user interface and the text-generation-webui API endpoint, even with similar parameters.

mikolodz · 2023-05-09T23:31:19Z

I may be completely wrong, but if I were you I would consider generating some JSON outputs using openai api based on your requests and fine tune vicuña or wizard 7B.

Otherwise, maybe you don’t even have to use some detailed output but rather make the model generate proper JSON when it’s being asked to do that. Hopefully some existing JSON data could suffice to achieve that.

GMartin-dev · 2023-05-10T03:31:31Z

Yeap probably for langchain we will need a "GuanacoLC" model with more training in data types, "tools", etc. A good langchain "soldier" rather than a chatty assistant.

But in the context of a proof of concept 13B quantized Vicuna 1.1 or alpaca are good enough for me.
On production this will go with GPT for sure but I hope to provide open source models as an option.
THose 2 models understand what's the JSON format but they sometimes fail, but it's also documented that GPT fails time to time too with JSON.

paolorechia · 2023-05-10T04:54:17Z

@GDrupal interesting undertaking!
I have a couple questions/suggestions

How much did you tweak with the prompt? Usually the “chatty” models work better with a long prompt
Have you tried sampling multiple outputs and applying a JSON parser to pick just an output that is parsable? I have a similar problem with Python syntax errors, been thinking of trying this approach to see if it helps reduce the rate of errors.

paolorechia · 2023-05-10T13:38:10Z

Sorry, @GDrupal , just re-read your original comment on my desktop and noticed you did mention prompting the models. The Vicuna 1.1 is also pretty garbage when it comes generating Python code, it's full of syntax errors.

I get much better results from WizardLM 7b unquantized, so far the best to use as langchain agent with access to Python REPL from the models I tried (also tried Vicuna 1.1 both 7b/13b and stable-vicuna).

On the topic of training a "soldier", I'm planning on fine-tuning a LoRA to perform these actions. Here's my plan:

Use WizardLM to generate tasks the user may ask, using a base list of 17 items. Currently doing this with a temperature of 2.0, seems to be generating diverse content just fine.
Use WizardLM on my server with the Prompt Logger I've recently implemented, and execute all tasks from this initial dataset. This will log for me a pair of prompt/action taken by WizardLM.
Extract these pairs and finetune

No idea whether it will work - I'm excited to try it out and see what happens :)

paolorechia · 2023-05-13T08:18:06Z

Updated documentation with a link to this issue. Thanks again

paolorechia added documentation Improvements or additions to documentation good first issue Good for newcomers labels May 10, 2023

paolorechia closed this as completed May 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I found a way how to use these models directly with Text Generation WebUI #24

I found a way how to use these models directly with Text Generation WebUI #24

GMartin-dev commented May 8, 2023

paolorechia commented May 9, 2023

mikolodz commented May 9, 2023

GMartin-dev commented May 9, 2023

mikolodz commented May 9, 2023

GMartin-dev commented May 10, 2023

paolorechia commented May 10, 2023

paolorechia commented May 10, 2023

paolorechia commented May 13, 2023

I found a way how to use these models directly with Text Generation WebUI #24

I found a way how to use these models directly with Text Generation WebUI #24

Comments

GMartin-dev commented May 8, 2023

paolorechia commented May 9, 2023

mikolodz commented May 9, 2023

GMartin-dev commented May 9, 2023

mikolodz commented May 9, 2023

GMartin-dev commented May 10, 2023

paolorechia commented May 10, 2023

paolorechia commented May 10, 2023

paolorechia commented May 13, 2023