Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I found a way how to use these models directly with Text Generation WebUI #24

Closed
GMartin-dev opened this issue May 8, 2023 · 8 comments
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@GMartin-dev
Copy link

From the README
"If you try an unsupported model, you'll see "gibberish output".
This happens for instance with https://huggingface.co/TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
If you know how to use these models directly with Text Generation WebUI please share your expertise :)"

I Managed to get this working on my local on Linux. with
https://huggingface.co/4bit/vicuna-13B-1.1-GPTQ-4bit-128g
https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ
https://huggingface.co/4bit/gpt4-x-alpaca-13b-native-4bit-128g-cuda
https://huggingface.co/4bit/stable-vicuna-13B-GPTQ

If that helps, my setup:

load with:
python server.py --model vicuna-13B-1.1-GPTQ-4bit-128g --wbits 4 --groupsize 128 --model_type Llama --api

Currently running models in a nvidia A2000 and consuming from langchain just using api endpoint... but simple stuff no agents. Alpaca and vicuna 1.1 are the best ones for me so far.
I was about to try to use embeddings and found your repo... great work!
Trying to understand how did you managed to get embeddings working xD.

@paolorechia
Copy link
Owner

Nice, if you want feel free to update the readme on a PR. I can also add / commit your instructions directly when I get time, whatever works better

@mikolodz
Copy link

mikolodz commented May 9, 2023

@paolorechia Your repo is exactly what I was looking for! Thank you for your effort!
@GDrupal Thanks for your insight. I will try to get it working this evening. You really find that vicuna 1.1 works better w/ langchain than stable vicuna or wizard vicuna? It could be due to the instruction scheme I guess.. I found out that after fine tuning wizard-vicuna on e.g. sensmaking_train set I get completely different results based on chat mode used. I mean.. like the opposite results for the same question with the temperature = 0.01..

@GMartin-dev
Copy link
Author

The newer models, specifically designed to be more conversational, are inconsistent for my particular use case. I'm working on creating a user-guided app that generates technical content using third-party APIs. My goal: obtain structured JSON output from the models.
Alpaca is not very creative in terms of content but provides consistent JSON (sweet spot is temp 1.9, to get decent content and still get the JSON).
Vicuna 1.1 produces better content but the JSON output is somewhat random. Other models either ignore the JSON requirement entirely or produce a strange mix of comments and JSON without any way to enforce compliance.
I've experimented with a million prompt variants and various temperature settings, but the results remain unsatisfactory. Additionally, I've noticed that the output differs significantly between the user interface and the text-generation-webui API endpoint, even with similar parameters.

@mikolodz
Copy link

mikolodz commented May 9, 2023

I may be completely wrong, but if I were you I would consider generating some JSON outputs using openai api based on your requests and fine tune vicuña or wizard 7B.

Otherwise, maybe you don’t even have to use some detailed output but rather make the model generate proper JSON when it’s being asked to do that. Hopefully some existing JSON data could suffice to achieve that.

@GMartin-dev
Copy link
Author

Yeap probably for langchain we will need a "GuanacoLC" model with more training in data types, "tools", etc. A good langchain "soldier" rather than a chatty assistant.
image

But in the context of a proof of concept 13B quantized Vicuna 1.1 or alpaca are good enough for me.
On production this will go with GPT for sure but I hope to provide open source models as an option.
THose 2 models understand what's the JSON format but they sometimes fail, but it's also documented that GPT fails time to time too with JSON.

@paolorechia
Copy link
Owner

@GDrupal interesting undertaking!
I have a couple questions/suggestions

  1. How much did you tweak with the prompt? Usually the “chatty” models work better with a long prompt
  2. Have you tried sampling multiple outputs and applying a JSON parser to pick just an output that is parsable? I have a similar problem with Python syntax errors, been thinking of trying this approach to see if it helps reduce the rate of errors.

@paolorechia paolorechia added documentation Improvements or additions to documentation good first issue Good for newcomers labels May 10, 2023
@paolorechia
Copy link
Owner

Sorry, @GDrupal , just re-read your original comment on my desktop and noticed you did mention prompting the models. The Vicuna 1.1 is also pretty garbage when it comes generating Python code, it's full of syntax errors.

I get much better results from WizardLM 7b unquantized, so far the best to use as langchain agent with access to Python REPL from the models I tried (also tried Vicuna 1.1 both 7b/13b and stable-vicuna).

On the topic of training a "soldier", I'm planning on fine-tuning a LoRA to perform these actions. Here's my plan:

  1. Use WizardLM to generate tasks the user may ask, using a base list of 17 items. Currently doing this with a temperature of 2.0, seems to be generating diverse content just fine.
  2. Use WizardLM on my server with the Prompt Logger I've recently implemented, and execute all tasks from this initial dataset. This will log for me a pair of prompt/action taken by WizardLM.
  3. Extract these pairs and finetune

No idea whether it will work - I'm excited to try it out and see what happens :)

@paolorechia
Copy link
Owner

Updated documentation with a link to this issue. Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants