Improve Alpaca integration to match it's trained prompt syntax #302

nitram147 · 2023-03-19T19:17:47Z

Alpaca LoRA model was trained on the same dataset as original Stanford Alpaca.

However, this dataset contains two types of instructions, namely:

instructions with input
instructions without input

For more details about the instructions format see details here.

In case of instructions such as text summarization, instruction alone only "explain" the task, while the text to be summarized is inserted into the "input" part of the prompt.

Current integration of alpaca in llama.cpp mimics the current integration in alpaca.cpp which completely omits the "instructions with input" type of instructions. This may have significant impact on the model performance using task which were trained to be used in "instruction with input" prompt syntax when using just ordinary "instruction without input" prompt syntax instead.

I suggest to build some small tutorial with example usage in order for users to be able to know which type of instruction should be used in input mode and which not.

Then I suggest to integrate this "input" mode somehow into the current implementation. Easiest way would be to let user type text prompt like:

Summarize following text.***input***Text to be summarized

which will be transformed into:

### Instruction:
Summarize following text.

### Input:
Text to be summarized

### Response:

While when user don't specify ***input*** tag, the instruction will be transformed into "standard" (currently implemented) format:

### Instruction:
Instruction text from user

### Response:

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-03-19T19:31:44Z

Aha, it seemed there is something wrong there. Thanks for this clarification!
We should add the input mode as suggested.

One more thing: we will very soon change the format of the ggml model files (#252).
Once we do that, the existing ggml Alpaca models will no longer be compatible.
Do you know some instructions of generating the ggml model files, so we can create new ones?

Green-Sky · 2023-03-19T19:41:34Z

Do you know some instructions of generating the ggml model files, so we can create new ones?

https://github.com/tloen/alpaca-lora#checkpoint-export-export__checkpointpy

the project also auto downloads models from huggingface

lofcz · 2023-03-19T22:36:43Z

@ggerganov Generating ggml models is explained here: antimatter15#13

13B model took me about 130gb space and like an hour to run convert and quantize scripts. I can write a full tutorial on that tomorrow my time

jxy · 2023-03-20T20:53:09Z

Alpaca RLHF is for instruction only. Make it interactive doesn't really make sense. It would be simpler just to have two different command line arguments tailored for instructions with or without input field respectively.

jxy · 2023-03-21T03:41:48Z

Tried https://github.com/tloen/alpaca-lora on a 13B model from Hugging Face. This is the diff for alpaca-lora,

diff --git a/export_state_dict_checkpoint.py b/export_state_dict_checkpoint.py
index 78e9d1f..3b88cb9 100644
--- a/export_state_dict_checkpoint.py
+++ b/export_state_dict_checkpoint.py
@@ -11,10 +11,10 @@ assert (
 ), "LLaMA is now in HuggingFace's main branch.\nPlease reinstall it: pip uninstall transformers && pip install git+https://github.com/huggingface/transformers.git"
 from transformers import LlamaTokenizer, LlamaForCausalLM
 
-tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
+tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-13b-hf")
 
 base_model = LlamaForCausalLM.from_pretrained(
-    "decapoda-research/llama-7b-hf",
+    "decapoda-research/llama-13b-hf",
     load_in_8bit=False,
     torch_dtype=torch.float16,
     device_map={"": "cpu"},
@@ -22,7 +22,7 @@ base_model = LlamaForCausalLM.from_pretrained(
 
 lora_model = PeftModel.from_pretrained(
     base_model,
-    "tloen/alpaca-lora-7b",
+    "mattreid/alpaca-lora-13b",
     device_map={"": "cpu"},
     torch_dtype=torch.float16,
 )
@@ -37,10 +37,10 @@ lora_model.train(False)
 lora_model_sd = lora_model.state_dict()
 
 params = {
-    "dim": 4096,
+    "dim": 5120,
     "multiple_of": 256,
-    "n_heads": 32,
-    "n_layers": 32,
+    "n_heads": 40,
+    "n_layers": 40,
     "norm_eps": 1e-06,
     "vocab_size": -1,
 }

With the above patch, running python3 export_state_dict_checkpoint.py gave me a 36G ckpt/consolidated.00.pth. With this I generated a 24G ggml-model-f16.bin, and 8G ggml-model-q4_0.bin. I had to modify the code here to load a single file for 13B model instead of the default 2.

diff --git a/main.cpp b/main.cpp
index 3321818..e26a26d 100644
--- a/main.cpp
+++ b/main.cpp
@@ -90,7 +90,7 @@ struct llama_model {
 };
 
 // load the model's weights from a file
-bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, ggml_type memory_type = GGML_TYPE_F32) {
+bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab & vocab, int n_ctx, int n_parts = 0, ggml_type memory_type = GGML_TYPE_F32) {
     fprintf(stderr, "%s: loading model from '%s' - please wait ...\n", __func__, fname.c_str());
 
     std::vector<char> f_buf(1024*1024);
@@ -127,7 +127,6 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
     }
 
     int n_ff = 0;
-    int n_parts = 0;
 
     // load hparams
     {
@@ -145,7 +144,8 @@ bool llama_model_load(const std::string & fname, llama_model & model, gpt_vocab
         hparams.n_ctx = n_ctx;
 
         n_ff = ((2*(4*hparams.n_embd)/3 + hparams.n_mult - 1)/hparams.n_mult)*hparams.n_mult;
-        n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
+        if (n_parts < 1)
+            n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
 
         fprintf(stderr, "%s: n_vocab = %d\n", __func__, hparams.n_vocab);
         fprintf(stderr, "%s: n_ctx   = %d\n", __func__, hparams.n_ctx);
@@ -839,7 +839,7 @@ int main(int argc, char ** argv) {
     {
         const ggml_type memory_type = params.memory_f16 ? GGML_TYPE_F16 : GGML_TYPE_F32;
         const int64_t t_start_us = ggml_time_us();
-        if (!llama_model_load(params.model, model, vocab, params.n_ctx, memory_type)) {
+        if (!llama_model_load(params.model, model, vocab, params.n_ctx, params.n_parts, memory_type)) {
             fprintf(stderr, "%s: failed to load model from '%s'\n", __func__, params.model.c_str());
             return 1;
         }
diff --git a/utils.cpp b/utils.cpp
index 188f114..163441d 100644
--- a/utils.cpp
+++ b/utils.cpp
@@ -64,6 +64,8 @@ bool gpt_params_parse(int argc, char ** argv, gpt_params & params) {
             params.n_batch = std::stoi(argv[++i]);
         } else if (arg == "-m" || arg == "--model") {
             params.model = argv[++i];
+        } else if (arg == "--n_parts") {
+            params.n_parts = std::stoi(argv[++i]);
         } else if (arg == "-i" || arg == "--interactive") {
             params.interactive = true;
         } else if (arg == "-ins" || arg == "--instruct") {
@@ -119,6 +121,7 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
     fprintf(stderr, "  -b N, --batch_size N  batch size for prompt processing (default: %d)\n", params.n_batch);
     fprintf(stderr, "  -m FNAME, --model FNAME\n");
     fprintf(stderr, "                        model path (default: %s)\n", params.model.c_str());
+    fprintf(stderr, "  --n_parts N           number of model files, 0 automatic based on model size (default: %d)\n", params.n_parts);
     fprintf(stderr, "\n");
 }
 
diff --git a/utils.h b/utils.h
index 65fe02b..0939117 100644
--- a/utils.h
+++ b/utils.h
@@ -30,6 +30,7 @@ struct gpt_params {
 
     std::string model      = "models/lamma-7B/ggml-model.bin"; // model path
     std::string prompt     = "";
+    int32_t n_parts = 0; // default based on the model size
 
     bool random_prompt = false;

I'm not sure if people would prefer sharded model weights or not. If needed, I can make a pull request for the patch above.

gjmulder · 2023-03-21T09:39:04Z

@jxy

I'm not sure if people would prefer sharded model weights or not. If needed, I can make a pull request for the patch above.

Sharded would be preferred. And also sha256 sums from alpaca 7B, 13B and 30B models converted with the latest file format.

I'm trying to compile a complete list of sha256 checksums for all the *bin* models in the latest file format.

DanielWicz · 2023-03-27T13:34:12Z

Does Alpaca model require quantization ? Or they are already quantized ?

edmundronald · 2023-03-28T23:32:02Z

Hi,

I don't think changing the model format is very smart, as people will end up witha bunch of HUGE incompatible files on their laptop disks. There will be a real issue figuring out which format works with what forked version downstream. It might be nice to have a base format -eg. the current one, and with each version of llama.cpp or alpaca.cpp, a script which converts to the needed format and writes the right magic numbers. I would expect a 100B and then a 200B or so model to arrive within a few months, which will make the space problem worse.

I am using alpaca.cpp and not llama, but I believe my comment is relevant to both.

anzz1 · 2023-03-29T11:58:54Z

Reading the data release closely,

During inference (eg for the web demo), we use the user instruction with an empty input field (second option).

How it's currently implemented is how it's "supposed" to be used. It was trained with the instruction-input-response model, but inference is done with the instruction-response model.

That being said, it is not a rule to use it like that. The instruction-input-response option should be researched whether the output is improved when using the 2-input approach. These models are an ongoing research and there is no right and wrong, and testing all different things is fun and provide valuable feedback.

A basic example, does this

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
List 3 ingredients for the following recipe.

### Input:
Spaghetti Bolognese

### Response:

produce better output than this

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
List 3 ingredients for the following recipe: Spaghetti Bolognese

### Response:

You can already try these out as one-shots as-is, when not using the --instruct or --interactive args and using -f prompt.txt to enter those as a regular prompt. After all, there is really nothing special to the --instruct mode except hiding the ### Instruction: and ### Response: parts and injecting them accordingly.

With the interactive mode, it could be almost done already with a command line like this
main -m ./models/llama-13B-ggml/ggml-model-q4_0.bin -i -c 2024 -n -1 -f ./prompts/instruction-input.txt -r "### Instruction:"
However it's not exactly the same since you cannot input newlines in the commandline arguments and newlines should be a part of the instruction reverse prompt. An option so that both reverse prompt and input prefix would allow using a file like prompt already does would definitely be an enhancement.

If this is something that works well, it's not a problem at all to implement. The only hard part I can think of is communicating and describing the command line options to the user so it's understandable how it works. 😄

MarkSchmidty · 2023-04-07T10:09:25Z

Vicuna appears to be trained to use:

### Assistant:
Text

### Human:
Text

Using "### Human:" as a reverse prompt partially works. But instruct mode support could be cleaner.

ai2p · 2023-04-11T14:06:10Z

"### Human:"

How to delete "### Human:" part from Vicuna's response?

Alpaca format compatibility with openai's 'system', 'assistant' and 'user' roles lowercase. Alpaca prompt syntax is as follows: ``` Task description... ### Instruction: Summarize following text. ### Input: Text to be summarized ### Response: ``` https://github.com/tatsu-lab/stanford_alpaca/blob/65512697dc67779a6e53c267488aba0ec4d7c02a/train.py#L31 ggerganov/llama.cpp#302 Notice how the "roles" (Instruction, Input, Response) have a capital letter. Same goes when using the following format: ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Write a poem about the transformers Python library. ### Response: ``` Or for openai's compatibility: ``` Below is an instruction that describes a task. Write a response that appropriately completes the request. ### User: Write a poem about the transformers Python library. ### Assistant: ```

phamkhactu · 2023-07-18T10:56:41Z

Alpaca LoRA model was trained on the same dataset as original Stanford Alpaca.

However, this dataset contains two types of instructions, namely:

instructions with input

instructions without input

For more details about the instructions format see details here.

In case of instructions such as text summarization, instruction alone only "explain" the task, while the text to be summarized is inserted into the "input" part of the prompt.

Current integration of alpaca in llama.cpp mimics the current integration in alpaca.cpp which completely omits the "instructions with input" type of instructions. This may have significant impact on the model performance using task which were trained to be used in "instruction with input" prompt syntax when using just ordinary "instruction without input" prompt syntax instead.

I suggest to build some small tutorial with example usage in order for users to be able to know which type of instruction should be used in input mode and which not.

Then I suggest to integrate this "input" mode somehow into the current implementation. Easiest way would be to let user type text prompt like:
Summarize following text.***input***Text to be summarized
which will be transformed into:
### Instruction:
Summarize following text.

### Input:
Text to be summarized

### Response:
While when user don't specify ***input*** tag, the instruction will be transformed into "standard" (currently implemented) format:
### Instruction:
Instruction text from user

### Response:

Hi all,

I have question that: "if I want to combine 2 type instruction with input and without input. How to do this?".
Why I do that? Some time you don't give input to comprehend context, but sometime you need.

I have followed the instruction with input. if no input. I've set "". But I get error: Can not cast string, because of not parse string, some prompts have input text, but some prompt don't have input.

I am very happy if you give me advance methods?? Or it only create step by step, not combine 2 methods above?
Thank you.

ggerganov added enhancement New feature or request help wanted Extra attention is needed high priority Very important issue labels Mar 19, 2023

gjmulder pinned this issue Mar 27, 2023

4eJIoBek1 mentioned this issue Apr 4, 2023

Vicuna works sometimes strange #757

Closed

jmtatsch mentioned this issue Apr 12, 2023

[Feature] Dynamic Model Loading and Model Endpoint in FastAPI abetlen/llama-cpp-python#17

Closed

ggerganov unpinned this issue Apr 12, 2023

monatis mentioned this issue Jul 26, 2023

WIP: Implement GGUF #2397

Merged

ggerganov closed this as completed Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Alpaca integration to match it's trained prompt syntax #302

Improve Alpaca integration to match it's trained prompt syntax #302

nitram147 commented Mar 19, 2023 •

edited

Loading

ggerganov commented Mar 19, 2023

Green-Sky commented Mar 19, 2023

lofcz commented Mar 19, 2023

jxy commented Mar 20, 2023

jxy commented Mar 21, 2023

gjmulder commented Mar 21, 2023

DanielWicz commented Mar 27, 2023 •

edited

Loading

edmundronald commented Mar 28, 2023 •

edited

Loading

anzz1 commented Mar 29, 2023

MarkSchmidty commented Apr 7, 2023

ai2p commented Apr 11, 2023

phamkhactu commented Jul 18, 2023 •

edited

Loading

Improve Alpaca integration to match it's trained prompt syntax #302

Improve Alpaca integration to match it's trained prompt syntax #302

Comments

nitram147 commented Mar 19, 2023 • edited Loading

ggerganov commented Mar 19, 2023

Green-Sky commented Mar 19, 2023

lofcz commented Mar 19, 2023

jxy commented Mar 20, 2023

jxy commented Mar 21, 2023

gjmulder commented Mar 21, 2023

DanielWicz commented Mar 27, 2023 • edited Loading

edmundronald commented Mar 28, 2023 • edited Loading

anzz1 commented Mar 29, 2023

MarkSchmidty commented Apr 7, 2023

ai2p commented Apr 11, 2023

phamkhactu commented Jul 18, 2023 • edited Loading

nitram147 commented Mar 19, 2023 •

edited

Loading

DanielWicz commented Mar 27, 2023 •

edited

Loading

edmundronald commented Mar 28, 2023 •

edited

Loading

phamkhactu commented Jul 18, 2023 •

edited

Loading