Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Alpaca integration to match it's trained prompt syntax #302

Closed
nitram147 opened this issue Mar 19, 2023 · 12 comments
Closed

Improve Alpaca integration to match it's trained prompt syntax #302

nitram147 opened this issue Mar 19, 2023 · 12 comments
Labels
enhancement New feature or request help wanted Extra attention is needed high priority Very important issue

Comments

@nitram147
Copy link

nitram147 commented Mar 19, 2023

Alpaca LoRA model was trained on the same dataset as original Stanford Alpaca.

However, this dataset contains two types of instructions, namely:

  • instructions with input
  • instructions without input

For more details about the instructions format see details here.

In case of instructions such as text summarization, instruction alone only "explain" the task, while the text to be summarized is inserted into the "input" part of the prompt.

Current integration of alpaca in llama.cpp mimics the current integration in alpaca.cpp which completely omits the "instructions with input" type of instructions. This may have significant impact on the model performance using task which were trained to be used in "instruction with input" prompt syntax when using just ordinary "instruction without input" prompt syntax instead.

I suggest to build some small tutorial with example usage in order for users to be able to know which type of instruction should be used in input mode and which not.

Then I suggest to integrate this "input" mode somehow into the current implementation. Easiest way would be to let user type text prompt like:

Summarize following text.***input***Text to be summarized

which will be transformed into:

### Instruction:
Summarize following text.

### Input:
Text to be summarized

### Response:

While when user don't specify ***input*** tag, the instruction will be transformed into "standard" (currently implemented) format:

### Instruction:
Instruction text from user

### Response:
@ggerganov
Copy link
Owner

Aha, it seemed there is something wrong there. Thanks for this clarification!
We should add the input mode as suggested.

One more thing: we will very soon change the format of the ggml model files (#252).
Once we do that, the existing ggml Alpaca models will no longer be compatible.
Do you know some instructions of generating the ggml model files, so we can create new ones?

@ggerganov ggerganov added enhancement New feature or request help wanted Extra attention is needed high priority Very important issue labels Mar 19, 2023
@Green-Sky
Copy link
Collaborator

Do you know some instructions of generating the ggml model files, so we can create new ones?

https://github.com/tloen/alpaca-lora#checkpoint-export-export__checkpointpy

the project also auto downloads models from huggingface

@lofcz
Copy link

lofcz commented Mar 19, 2023

@ggerganov Generating ggml models is explained here: antimatter15#13

13B model took me about 130gb space and like an hour to run convert and quantize scripts. I can write a full tutorial on that tomorrow my time

@jxy
Copy link
Contributor

jxy commented Mar 20, 2023

Alpaca RLHF is for instruction only. Make it interactive doesn't really make sense. It would be simpler just to have two different command line arguments tailored for instructions with or without input field respectively.

@jxy
Copy link
Contributor

jxy commented Mar 21, 2023

Tried https://github.com/tloen/alpaca-lora on a 13B model from Hugging Face. This is the diff for alpaca-lora,

diff --git a/export_state_dict_checkpoint.py b/export_state_dict_checkpoint.py
index 78e9d1f..3b88cb9 100644
--- a/export_state_dict_checkpoint.py
+++ b/export_state_dict_checkpoint.py
@@ -11,10 +11,10 @@ assert (
 ), "LLaMA is now in HuggingFace's main branch.\nPlease reinstall it: pip uninstall transformers && pip install git+https://github.com/huggingface/transformers.git"
 from transformers import LlamaTokenizer, LlamaForCausalLM
 
-tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
+tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-13b-hf")
 
 base_model = LlamaForCausalLM.from_pretrained(
-    "decapoda-research/llama-7b-hf",
+    "decapoda-research/llama-13b-hf",
     load_in_8bit=False,
     torch_dtype=torch.float16,
     device_map={"": "cpu"},
@@ -22,7 +22,7 @@ base_model = LlamaForCausalLM.from_pretrained(
 
 lora_model = PeftModel.from_pretrained(
     base_model,
-    "tloen/alpaca-lora-7b",
+    "mattreid/alpaca-lora-13b",
     device_map={"": "cpu"},
     torch_dtype=torch.float16,
 )
@@ -37,10 +37,10 @@ lora_model.train(False)
 lora_model_sd = lora_model.state_dict()
 
 params = {
-    "dim": 4096,
+    "dim": 5120,
     "multiple_of": 256,
-    "n_heads": 32,
-    "n_layers": 32,
+    "n_heads": 40,
+    "n_layers": 40,
     "norm_eps": 1e-06,
     "vocab_size": -1,
 }

With the above patch, running python3 export_state_dict_checkpoint.py gave me a 36G ckpt/consolidated.00.pth. With this I generated a 24G ggml-model-f16.bin, and 8G ggml-model-q4_0.bin. I had to modify the code here to load a single file for 13B model instead of the default 2.

diff --git a/main.cpp b/main.cpp
index 3321818..e26a26d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -90,7 +90,7 @@ struct llama_model {
 };
 
 // load the model's weights from a file
-bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, ggml_type memory_type = GGML_TYPE_F32) {
+bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, int n_parts = 0, ggml_type memory_type = GGML_TYPE_F32) {
     fprintf(stderr, "%s: loading model from '%s' - please wait ...\n", __func__, fname.c_str());
 
     std::vector<char> f_buf(1024*1024);
@@ -127,7 +127,6 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
     }
 
     int n_ff = 0;
-    int n_parts = 0;
 
     // load hparams
     {
@@ -145,7 +144,8 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
         hparams.n_ctx = n_ctx;
 
         n_ff = ((2*(4*hparams.n_embd)/3 + hparams.n_mult - 1)/hparams.n_mult)*hparams.n_mult;
-        n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
+        if (n_parts < 1)
+            n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
 
         fprintf(stderr, "%s: n_vocab = %d\n", __func__, hparams.n_vocab);
         fprintf(stderr, "%s: n_ctx   = %d\n", __func__, hparams.n_ctx);
@@ -839,7 +839,7 @@ int main(int argc, char ** argv) {
     {
         const ggml_type memory_type = params.memory_f16 ? GGML_TYPE_F16 : GGML_TYPE_F32;
         const int64_t t_start_us = ggml_time_us();
-        if (!llama_model_load(params.model, model, vocab, params.n_ctx, memory_type)) {
+        if (!llama_model_load(params.model, model, vocab, params.n_ctx, params.n_parts, memory_type)) {
             fprintf(stderr, "%s: failed to load model from '%s'\n", __func__, params.model.c_str());
             return 1;
         }
diff --git a/utils.cpp b/utils.cpp
index 188f114..163441d 100644
--- a/utils.cpp
+++ b/utils.cpp
@@ -64,6 +64,8 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
             params.n_batch = std::stoi(argv[++i]);
         } else if (arg == "-m" || arg == "--model") {
             params.model = argv[++i];
+        } else if (arg == "--n_parts") {
+            params.n_parts = std::stoi(argv[++i]);
         } else if (arg == "-i" || arg == "--interactive") {
             params.interactive = true;
         } else if (arg == "-ins" || arg == "--instruct") {
@@ -119,6 +121,7 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
     fprintf(stderr, "  -b N, --batch_size N  batch size for prompt processing (default: %d)\n", params.n_batch);
     fprintf(stderr, "  -m FNAME, --model FNAME\n");
     fprintf(stderr, "                        model path (default: %s)\n", params.model.c_str());
+    fprintf(stderr, "  --n_parts N           number of model files, 0 automatic based on model size (default: %d)\n", params.n_parts);
     fprintf(stderr, "\n");
 }
 
diff --git a/utils.h b/utils.h
index 65fe02b..0939117 100644
--- a/utils.h
+++ b/utils.h
@@ -30,6 +30,7 @@ struct gpt_params {
 
     std::string model      = "models/lamma-7B/ggml-model.bin"; // model path
     std::string prompt     = "";
+    int32_t n_parts = 0; // default based on the model size
 
     bool random_prompt = false;
 

I'm not sure if people would prefer sharded model weights or not. If needed, I can make a pull request for the patch above.

@gjmulder
Copy link
Collaborator

@jxy

I'm not sure if people would prefer sharded model weights or not. If needed, I can make a pull request for the patch above.

Sharded would be preferred. And also sha256 sums from alpaca 7B, 13B and 30B models converted with the latest file format.

I'm trying to compile a complete list of sha256 checksums for all the *bin* models in the latest file format.

@gjmulder gjmulder pinned this issue Mar 27, 2023
@DanielWicz
Copy link

DanielWicz commented Mar 27, 2023

Does Alpaca model require quantization ? Or they are already quantized ?

@edmundronald
Copy link

edmundronald commented Mar 28, 2023

Hi,

I don't think changing the model format is very smart, as people will end up witha bunch of HUGE incompatible files on their laptop disks. There will be a real issue figuring out which format works with what forked version downstream. It might be nice to have a base format -eg. the current one, and with each version of llama.cpp or alpaca.cpp, a script which converts to the needed format and writes the right magic numbers. I would expect a 100B and then a 200B or so model to arrive within a few months, which will make the space problem worse.

I am using alpaca.cpp and not llama, but I believe my comment is relevant to both.

@anzz1
Copy link
Contributor

anzz1 commented Mar 29, 2023

Reading the data release closely,

During inference (eg for the web demo), we use the user instruction with an empty input field (second option).

How it's currently implemented is how it's "supposed" to be used. It was trained with the instruction-input-response model, but inference is done with the instruction-response model.

That being said, it is not a rule to use it like that. The instruction-input-response option should be researched whether the output is improved when using the 2-input approach. These models are an ongoing research and there is no right and wrong, and testing all different things is fun and provide valuable feedback.

A basic example, does this

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
List 3 ingredients for the following recipe.

### Input:
Spaghetti Bolognese

### Response:

produce better output than this

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
List 3 ingredients for the following recipe: Spaghetti Bolognese

### Response:

You can already try these out as one-shots as-is, when not using the --instruct or --interactive args and using -f prompt.txt to enter those as a regular prompt. After all, there is really nothing special to the --instruct mode except hiding the ### Instruction: and ### Response: parts and injecting them accordingly.

With the interactive mode, it could be almost done already with a command line like this
main -m ./models/llama-13B-ggml/ggml-model-q4_0.bin -i -c 2024 -n -1 -f ./prompts/instruction-input.txt -r "### Instruction:"
However it's not exactly the same since you cannot input newlines in the commandline arguments and newlines should be a part of the instruction reverse prompt. An option so that both reverse prompt and input prefix would allow using a file like prompt already does would definitely be an enhancement.

If this is something that works well, it's not a problem at all to implement. The only hard part I can think of is communicating and describing the command line options to the user so it's understandable how it works. 😄

@MarkSchmidty
Copy link

Vicuna appears to be trained to use:

### Assistant:
Text

### Human:
Text

Using "### Human:" as a reverse prompt partially works. But instruct mode support could be cleaner.

image

@ai2p
Copy link

ai2p commented Apr 11, 2023

"### Human:"

How to delete "### Human:" part from Vicuna's response?

@ggerganov ggerganov unpinned this issue Apr 12, 2023
Thireus added a commit to Thireus/text-generation-webui that referenced this issue May 4, 2023
Alpaca format compatibility with openai's 'system', 'assistant' and 'user' roles lowercase.

Alpaca prompt syntax is as follows:
```
Task description...

### Instruction:
Summarize following text.

### Input:
Text to be summarized

### Response:
```

https://github.com/tatsu-lab/stanford_alpaca/blob/65512697dc67779a6e53c267488aba0ec4d7c02a/train.py#L31

ggerganov/llama.cpp#302

Notice how the "roles" (Instruction, Input, Response) have a capital letter.

Same goes when using the following format:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write a poem about the transformers Python library. 

### Response:
```

Or for openai's compatibility:
```
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### User:
Write a poem about the transformers Python library. 

### Assistant:
```
@phamkhactu
Copy link

phamkhactu commented Jul 18, 2023

Alpaca LoRA model was trained on the same dataset as original Stanford Alpaca.

However, this dataset contains two types of instructions, namely:

  • instructions with input
  • instructions without input

For more details about the instructions format see details here.

In case of instructions such as text summarization, instruction alone only "explain" the task, while the text to be summarized is inserted into the "input" part of the prompt.

Current integration of alpaca in llama.cpp mimics the current integration in alpaca.cpp which completely omits the "instructions with input" type of instructions. This may have significant impact on the model performance using task which were trained to be used in "instruction with input" prompt syntax when using just ordinary "instruction without input" prompt syntax instead.

I suggest to build some small tutorial with example usage in order for users to be able to know which type of instruction should be used in input mode and which not.

Then I suggest to integrate this "input" mode somehow into the current implementation. Easiest way would be to let user type text prompt like:

Summarize following text.***input***Text to be summarized

which will be transformed into:

### Instruction:
Summarize following text.

### Input:
Text to be summarized

### Response:

While when user don't specify ***input*** tag, the instruction will be transformed into "standard" (currently implemented) format:

### Instruction:
Instruction text from user

### Response:

Hi all,

I have question that: "if I want to combine 2 type instruction with input and without input. How to do this?".
Why I do that? Some time you don't give input to comprehend context, but sometime you need.

I have followed the instruction with input. if no input. I've set "". But I get error: Can not cast string, because of not parse string, some prompts have input text, but some prompt don't have input.

I am very happy if you give me advance methods?? Or it only create step by step, not combine 2 methods above?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed high priority Very important issue
Projects
None yet
Development

No branches or pull requests