From 5aa4e0a87b318b25c185b1b2608330d64e51cb8d Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 15:01:53 +0200 Subject: [PATCH 01/11] [Llama2] Section on system prompt --- llama2.md | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/llama2.md b/llama2.md index 082192d87a..548da3ebb7 100644 --- a/llama2.md +++ b/llama2.md @@ -33,6 +33,7 @@ We’ve collaborated with Meta to ensure smooth integration into the Hugging Fac - [With Transformers](#using-transformers) - [With Inference Endpoints](#using-text-generation-inference-and-inference-endpoints) - [Fine-tuning with PEFT](#fine-tuning-with-peft) +- [How to Prompt Llama 2](#how-to-prompt-llama-2) - [Additional Resources](#additional-resources) - [Conclusion](#conclusion) @@ -177,12 +178,43 @@ python trl/examples/scripts/sft_trainer.py \ --gradient_accumulation_steps 2 ``` +## How to Prompt Llama 2 + +One of the unsung advantages of open-access models is that you have full control over the _system prompt_ in chat applications. This is essential to specify the behaviour of your chat assistant –and even imbue it with some personality–, but it's unreachable in models served behind APIs. + +If you take a close look at the source code of our Llama 2 70B demo, you can see [how the system prompt is formatted](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/dc2b3191cca384687bbed001fcb6baedaf8d732b/app.py#L38-L42) with delimiters around the [instructions given to the system](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/main/app.py#L12) and the text entered by the user. + +To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: + +``` +[INST] <> +You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. + +If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. +<> +There's a llama in my garden 😱 What should I do?[/INST] +``` + +As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because these tokens were used during training with a wide variety of combinations for different tasks. + +As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The model is stateless and does not "remember" previous fragments of the conversation, we must always supply it with all the context so the conversation can continue. + +This is the reason why **context length** is a very important parameter to maximize, as it allows for longer conversations and larger amounts of information to be used. + +### Ignore previous instructions + +In API-based models, people resort to tricks in an attempt to override the system prompt and change the default model behaviour. As imaginative as these solutions are, this is not necessary in open-access models: anyone can use a different prompt, as long as it follows the format described above. We believe that this will be an important tool for researchers to study the impact of prompts on both desired and unwanted characteristics. For example, when people [are surprised with absurdly cautious generations](https://twitter.com/lauraruis/status/1681612002718887936), you can explore whether maybe [a different prompt would work](https://twitter.com/overlordayn/status/1681631554672513025). + +In our [`13B`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) and [`7B`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) demos you can easily explore this feature by disclosing the "Advanced Options" UI and simply writing your desired instructions. You can also duplicate those demos and use them privately for fun or research! + ## Additional Resources - [Paper Page](https://huggingface.co/papers/2307.09288) - [Models on the Hub](https://huggingface.co/meta-llama) - [Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - [Meta Examples and recipes for Llama model](https://github.com/facebookresearch/llama-recipes/tree/main) +- [`Chat demo (13B)`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) +- [`Chat demo (7B)`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) ## Conclusion From f5d647ca1d42d8da0b65783b8b1bc3954dc4533b Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 15:06:25 +0200 Subject: [PATCH 02/11] h/t clefourrier --- llama2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llama2.md b/llama2.md index 548da3ebb7..2ca24b5c42 100644 --- a/llama2.md +++ b/llama2.md @@ -203,7 +203,7 @@ This is the reason why **context length** is a very important parameter to maxim ### Ignore previous instructions -In API-based models, people resort to tricks in an attempt to override the system prompt and change the default model behaviour. As imaginative as these solutions are, this is not necessary in open-access models: anyone can use a different prompt, as long as it follows the format described above. We believe that this will be an important tool for researchers to study the impact of prompts on both desired and unwanted characteristics. For example, when people [are surprised with absurdly cautious generations](https://twitter.com/lauraruis/status/1681612002718887936), you can explore whether maybe [a different prompt would work](https://twitter.com/overlordayn/status/1681631554672513025). +In API-based models, people resort to tricks in an attempt to override the system prompt and change the default model behaviour. As imaginative as these solutions are, this is not necessary in open-access models: anyone can use a different prompt, as long as it follows the format described above. We believe that this will be an important tool for researchers to study the impact of prompts on both desired and unwanted characteristics. For example, when people [are surprised with absurdly cautious generations](https://twitter.com/lauraruis/status/1681612002718887936), you can explore whether maybe [a different prompt would work](https://twitter.com/overlordayn/status/1681631554672513025). (🎩 h/t [Clémentine Fourrier](https://huggingface.co/clefourrier) for the links to this example). In our [`13B`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) and [`7B`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) demos you can easily explore this feature by disclosing the "Advanced Options" UI and simply writing your desired instructions. You can also duplicate those demos and use them privately for fun or research! From 1095a84523b0276d6e7607f9aed921917418aa61 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 19:41:54 +0200 Subject: [PATCH 03/11] Update llama2.md Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> --- llama2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llama2.md b/llama2.md index 2ca24b5c42..f9afb8fad0 100644 --- a/llama2.md +++ b/llama2.md @@ -180,7 +180,7 @@ python trl/examples/scripts/sft_trainer.py \ ## How to Prompt Llama 2 -One of the unsung advantages of open-access models is that you have full control over the _system prompt_ in chat applications. This is essential to specify the behaviour of your chat assistant –and even imbue it with some personality–, but it's unreachable in models served behind APIs. +One of the unsung advantages of open-access models is that you have full control over the `system` prompt in chat applications. This is essential to specify the behavior of your chat assistant –and even imbue it with some personality–, but it's unreachable in models served behind APIs. If you take a close look at the source code of our Llama 2 70B demo, you can see [how the system prompt is formatted](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/dc2b3191cca384687bbed001fcb6baedaf8d732b/app.py#L38-L42) with delimiters around the [instructions given to the system](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/main/app.py#L12) and the text entered by the user. From 08cb2843ab902c259a286a7934605646376d835e Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 19:54:57 +0200 Subject: [PATCH 04/11] Add prompt templates as suggested by Philipp. Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> --- llama2.md | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/llama2.md b/llama2.md index f9afb8fad0..836bf18e82 100644 --- a/llama2.md +++ b/llama2.md @@ -184,10 +184,19 @@ One of the unsung advantages of open-access models is that you have full control If you take a close look at the source code of our Llama 2 70B demo, you can see [how the system prompt is formatted](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/dc2b3191cca384687bbed001fcb6baedaf8d732b/app.py#L38-L42) with delimiters around the [instructions given to the system](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/main/app.py#L12) and the text entered by the user. -To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: +The prompt template for the first turn looks like this: ``` [INST] <> +{{ system_prompt }} +<> +{{ user_message }} [/INST] +``` + +To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: + +```b +[INST] <> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. @@ -197,9 +206,17 @@ There's a llama in my garden 😱 What should I do?[/INST] As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because these tokens were used during training with a wide variety of combinations for different tasks. -As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The model is stateless and does not "remember" previous fragments of the conversation, we must always supply it with all the context so the conversation can continue. +As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The template used during multi-turn conversations follows this structure: + +```b +[INST] <> +{{ system_prompt }} +<> + +{{ user_msg_1 }} [/INST] {{ assistant_response_1 }}[INST] {{ user_msg_2 }} [/INST] +``` -This is the reason why **context length** is a very important parameter to maximize, as it allows for longer conversations and larger amounts of information to be used. +The model is stateless and does not "remember" previous fragments of the conversation, we must always supply it with all the context so the conversation can continue. This is the reason why **context length** is a very important parameter to maximize, as it allows for longer conversations and larger amounts of information to be used. ### Ignore previous instructions From 0b28b2575dde9cc04520c1836db8480e9d91499e Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 20:11:45 +0200 Subject: [PATCH 05/11] Align prompt templates with spec --- llama2.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/llama2.md b/llama2.md index 836bf18e82..51314e7e3a 100644 --- a/llama2.md +++ b/llama2.md @@ -187,7 +187,7 @@ If you take a close look at the source code of our Llama 2 70B demo, you can see The prompt template for the first turn looks like this: ``` -[INST] <> +[INST] <> {{ system_prompt }} <> {{ user_message }} [/INST] @@ -196,12 +196,12 @@ The prompt template for the first turn looks like this: To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: ```b -[INST] <> +[INST] <> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <> -There's a llama in my garden 😱 What should I do?[/INST] +There's a llama in my garden 😱 What should I do? [/INST] ``` As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because these tokens were used during training with a wide variety of combinations for different tasks. @@ -209,11 +209,12 @@ As you can see, the instructions between the special `<>` tokens provide co As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The template used during multi-turn conversations follows this structure: ```b -[INST] <> +[INST] <> {{ system_prompt }} <> - -{{ user_msg_1 }} [/INST] {{ assistant_response_1 }}[INST] {{ user_msg_2 }} [/INST] +{{ user_msg_1 }} [/INST] {{ model_answer_1 }} +[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} +[INST] {{ user_msg_3 }} [/INST] ``` The model is stateless and does not "remember" previous fragments of the conversation, we must always supply it with all the context so the conversation can continue. This is the reason why **context length** is a very important parameter to maximize, as it allows for longer conversations and larger amounts of information to be used. From 71153530ded9906bcc32de94f3aee1659c24bbf8 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Thu, 20 Jul 2023 20:23:01 +0200 Subject: [PATCH 06/11] More clarification that the format is important. --- llama2.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/llama2.md b/llama2.md index 51314e7e3a..affff61243 100644 --- a/llama2.md +++ b/llama2.md @@ -193,6 +193,8 @@ The prompt template for the first turn looks like this: {{ user_message }} [/INST] ``` +This template follows the model's training procedure. We can use any `system_prompt` we want, but it's important that the format matches the one used during training. + To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: ```b @@ -204,7 +206,7 @@ If a question does not make any sense, or is not factually coherent, explain why There's a llama in my garden 😱 What should I do? [/INST] ``` -As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because these tokens were used during training with a wide variety of combinations for different tasks. +As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because exactly the same format was used during training with a wide variety of system prompts intended for different tasks. As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The template used during multi-turn conversations follows this structure: From 48238a37bc92eb68e6483c40567448ce6a4759f7 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 21 Jul 2023 01:14:05 +0200 Subject: [PATCH 07/11] Link to 70B demo. --- llama2.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/llama2.md b/llama2.md index affff61243..dc3d344292 100644 --- a/llama2.md +++ b/llama2.md @@ -233,8 +233,9 @@ In our [`13B`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-ch - [Models on the Hub](https://huggingface.co/meta-llama) - [Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) - [Meta Examples and recipes for Llama model](https://github.com/facebookresearch/llama-recipes/tree/main) -- [`Chat demo (13B)`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) -- [`Chat demo (7B)`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) +- [Chat demo (7B)](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) +- [Chat demo (13B)](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) +- [Chat demo (70B) on TGI](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI) ## Conclusion From 5fe7b7f3ae495e757d87cb4d0b7a4c9c44ea3044 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 21 Jul 2023 11:50:57 +0200 Subject: [PATCH 08/11] Fix templates according to spec shared by ArthurZ --- llama2.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/llama2.md b/llama2.md index dc3d344292..a01984ba19 100644 --- a/llama2.md +++ b/llama2.md @@ -190,6 +190,7 @@ The prompt template for the first turn looks like this: [INST] <> {{ system_prompt }} <> + {{ user_message }} [/INST] ``` @@ -203,6 +204,7 @@ You are a helpful, respectful and honest assistant. Always answer as helpfully a If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <> + There's a llama in my garden 😱 What should I do? [/INST] ``` @@ -214,8 +216,9 @@ As the conversation progresses, _all_ the conversation between the human and the [INST] <> {{ system_prompt }} <> -{{ user_msg_1 }} [/INST] {{ model_answer_1 }} -[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} + +{{ user_msg_1 }} [/INST] {{ model_answer_1 }} \ +[INST] {{ user_msg_2 }} [/INST] {{ model_answer_2 }} \ [INST] {{ user_msg_3 }} [/INST] ``` From cf5e8927a48c37e4b26af2d5adec69e0ff6f75e1 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 21 Jul 2023 11:56:35 +0200 Subject: [PATCH 09/11] Mention ArthurZ --- llama2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llama2.md b/llama2.md index a01984ba19..7f727d23d6 100644 --- a/llama2.md +++ b/llama2.md @@ -194,7 +194,7 @@ The prompt template for the first turn looks like this: {{ user_message }} [/INST] ``` -This template follows the model's training procedure. We can use any `system_prompt` we want, but it's important that the format matches the one used during training. +This template (🎩 h/t [Arthur Zucker](https://huggingface.co/ArthurZ)) follows the model's training procedure. We can use any `system_prompt` we want, but it's crucial that the format matches the one used during training. To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: From c294f6cc11b604776453751696a17950353f3cfd Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 21 Jul 2023 12:22:09 +0200 Subject: [PATCH 10/11] Suggestion from Omar Co-authored-by: Omar Sanseviero --- llama2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/llama2.md b/llama2.md index 7f727d23d6..fa772365ed 100644 --- a/llama2.md +++ b/llama2.md @@ -228,7 +228,7 @@ The model is stateless and does not "remember" previous fragments of the convers In API-based models, people resort to tricks in an attempt to override the system prompt and change the default model behaviour. As imaginative as these solutions are, this is not necessary in open-access models: anyone can use a different prompt, as long as it follows the format described above. We believe that this will be an important tool for researchers to study the impact of prompts on both desired and unwanted characteristics. For example, when people [are surprised with absurdly cautious generations](https://twitter.com/lauraruis/status/1681612002718887936), you can explore whether maybe [a different prompt would work](https://twitter.com/overlordayn/status/1681631554672513025). (🎩 h/t [Clémentine Fourrier](https://huggingface.co/clefourrier) for the links to this example). -In our [`13B`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) and [`7B`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) demos you can easily explore this feature by disclosing the "Advanced Options" UI and simply writing your desired instructions. You can also duplicate those demos and use them privately for fun or research! +In our [`13B`](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) and [`7B`](https://huggingface.co/spaces/huggingface-projects/llama-2-7b-chat) demos, you can easily explore this feature by disclosing the "Advanced Options" UI and simply writing your desired instructions. You can also duplicate those demos and use them privately for fun or research! ## Additional Resources From 4266ea83949dd685eb434bb315728667f4163f59 Mon Sep 17 00:00:00 2001 From: Pedro Cuenca Date: Fri, 21 Jul 2023 14:30:45 +0200 Subject: [PATCH 11/11] Apply suggestions. Co-authored-by: Omar Sanseviero --- llama2.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/llama2.md b/llama2.md index fa772365ed..93511b615e 100644 --- a/llama2.md +++ b/llama2.md @@ -182,7 +182,7 @@ python trl/examples/scripts/sft_trainer.py \ One of the unsung advantages of open-access models is that you have full control over the `system` prompt in chat applications. This is essential to specify the behavior of your chat assistant –and even imbue it with some personality–, but it's unreachable in models served behind APIs. -If you take a close look at the source code of our Llama 2 70B demo, you can see [how the system prompt is formatted](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/dc2b3191cca384687bbed001fcb6baedaf8d732b/app.py#L38-L42) with delimiters around the [instructions given to the system](https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI/blob/main/app.py#L12) and the text entered by the user. +We're adding this section just a few days after the initial release of Llama 2, as we've had many questions from the community about how to prompt the models and how to change the system prompt. We hope this helps! The prompt template for the first turn looks like this: @@ -194,9 +194,9 @@ The prompt template for the first turn looks like this: {{ user_message }} [/INST] ``` -This template (🎩 h/t [Arthur Zucker](https://huggingface.co/ArthurZ)) follows the model's training procedure. We can use any `system_prompt` we want, but it's crucial that the format matches the one used during training. +This template follows the model's training procedure, as described in [the Llama 2 paper](https://huggingface.co/papers/2307.09288). We can use any `system_prompt` we want, but it's crucial that the format matches the one used during training. -To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) to initiate a chat: +To spell it out in full clarity, this is what is actually sent to the language model when the user enters some text (`There's a llama in my garden 😱 What should I do?`) in [our 13B chat demo](https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat) to initiate a chat: ```b [INST] <> @@ -210,7 +210,7 @@ There's a llama in my garden 😱 What should I do? [/INST] As you can see, the instructions between the special `<>` tokens provide context for the model so it knows how we expect it to respond. This works because exactly the same format was used during training with a wide variety of system prompts intended for different tasks. -As the conversation progresses, _all_ the conversation between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The template used during multi-turn conversations follows this structure: +As the conversation progresses, _all_ the interactions between the human and the "bot" are appended to the previous prompt, enclosed between `[INST]` delimiters. The template used during multi-turn conversations follows this structure (🎩 h/t [Arthur Zucker](https://huggingface.co/ArthurZ) for some final clarifications): ```b [INST] <>