Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix llama3 urls + chat completion termination + nightlies in readme #443

Merged
merged 6 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 27 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ The 'llama-recipes' repository is a companion to the [Meta Llama 2](https://gith
> | Token | Description |
> |---|---|
> `<\|begin_of_text\|>` | This is equivalent to the BOS token. |
> `<\|eot_id\|>` | This signifies the end of the message in a turn. |
> `<\|eot_id\|>` | This signifies the end of the message in a turn. The generate function needs to be set up as shown below or in [this example](./recipes/inference/local_inference/chat_completion/chat_completion.py) to terminate the generation after the turn.|
> `<\|start_header_id\|>{role}<\|end_header_id\|>` | These tokens enclose the role for a particular message. The possible roles can be: system, user, assistant. |
> `<\|end_of_text\|>` | This is equivalent to the EOS token. On generating this token, Llama 3 will cease to generate more tokens |
>
> `<\|end_of_text\|>` | This is equivalent to the EOS token. Its usually not used during multiturn-conversations. Instead, each message is terminated with `<\|eot_id\|>` |
mreso marked this conversation as resolved.
Show resolved Hide resolved
>
> A multiturn-conversation with Llama 3 follows this prompt template:
> ```
> <|begin_of_text|><|start_header_id|>system<|end_header_id|>
Expand All @@ -23,10 +23,24 @@ The 'llama-recipes' repository is a companion to the [Meta Llama 2](https://gith
>
> {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
> ```
> More details on the new tokenizer and prompt template: <PLACEHOLDER_URL>
>
> To signal the end of the current message the model emits the `<\|eot_id\|>` token. To terminate the generation we need to call the model's generate function as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message the model => message, the model

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generation we => generation, we

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreso is it like two the two EOS terminators should be used to stop generation early? wonder if that. matches our description here or we need a bit of lingo?

Copy link
Contributor

@subramen subramen Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the eos_token_id arg in model.generate specifying the stop sequence for generation?

I think some lingo around understanding the difference between eot_id and end_of_text usage would be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! Yes, the eos_token_id is the one thats checked in the stopping criteria and usually thats set to <|end_of_text|>. But for dialog kind of prompt the model is trained to use <|eot_id|> (probably to distinguish it from the more final end of sequence). Thats why we need to replace the eos_token_id against the latter id. Otherwise generate rambles on like in this example:

Model output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Always answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>

How to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

🛫️🚀🛬<|eot_id|><|start_header_id|>assistant<|end_header_id|>

🏨🛬🇨🇳                                                                                                                                                                                                                                 🕰️ 12+hours o
💺Business Class
[...]

The model learned that after an <|eot_id|> comes another header so its adds <|start_header_id|>assistant<|end_header_id|> and then comes another response. (The header is usually appended by the chat template, not the model)
If we exchange the eos_token_id in generate it stops after the model emits the first <|eot_id|>:

Model output:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Always answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>

How to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

✈️ 🗼️🛬<|eot_id|>

Will rework the text accordingly before merigng.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this should have been addressed in this PR which has been merged, https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/4/files

> ```
> terminators = [
> tokenizer.eos_token_id,
> tokenizer.convert_tokens_to_ids("<|eot_id|>")
> ]
> ...
> outputs = model.generate(
> ...
> eos_token_id=terminators,
> )
> ```
>
> More details on the new tokenizer and prompt template: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3#special-tokens-used-with-meta-llama-3
> [!NOTE]
> The llama-recipes repository was recently refactored to promote a better developer experience of using the examples. Some files have been moved to new locations. The `src/` folder has NOT been modified, so the functionality of this repo and package is not impacted.
>
>
> Make sure you update your local clone by running `git pull origin main`

## Table of Contents
Expand Down Expand Up @@ -55,29 +69,29 @@ These instructions will get you a copy of the project up and running on your loc
### Prerequisites

#### PyTorch Nightlies
Some features (especially fine-tuning with FSDP + PEFT) currently require PyTorch nightlies to be installed. Please make sure to install the nightlies if you're using these features following [this guide](https://pytorch.org/get-started/locally/).
I you want to use PyTorch nightlies instead of the stable release, go to [this guide](https://pytorch.org/get-started/locally/) to retrieve the right `--extra-index-url URL` parameter for the `pip install` commands on your platform.
mreso marked this conversation as resolved.
Show resolved Hide resolved

### Installing
Llama-recipes provides a pip distribution for easy install and usage in other projects. Alternatively, it can be installed from source.

#### Install with pip
```
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes
pip install llama-recipes
```

#### Install with optional dependencies
Llama-recipes offers the installation of optional packages. There are three optional dependency groups.
To run the unit tests we can install the required dependencies with:
```
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes[tests]
pip install llama-recipes[tests]
```
For the vLLM example we need additional requirements that can be installed with:
```
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes[vllm]
pip install llama-recipes[vllm]
```
To use the sensitive topics safety checker install with:
```
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 llama-recipes[auditnlg]
pip install llama-recipes[auditnlg]
```
Optional dependencies can also be combines with [option1,option2].

Expand All @@ -87,14 +101,14 @@ To install from source e.g. for development use these commands. We're using hatc
git clone git@github.com:meta-llama/llama-recipes.git
cd llama-recipes
pip install -U pip setuptools
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .
pip install -e .
```
For development and contributing to llama-recipes please install all optional dependencies:
```
git clone git@github.com:meta-llama/llama-recipes.git
cd llama-recipes
pip install -U pip setuptools
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .[tests,auditnlg,vllm]
pip install -e .[tests,auditnlg,vllm]
```


Expand All @@ -120,7 +134,7 @@ python src/transformers/models/llama/convert_llama_weights_to_hf.py \


## Repository Organization
Most of the code dealing with Llama usage is organized across 2 main folders: `recipes/` and `src/`.
Most of the code dealing with Llama usage is organized across 2 main folders: `recipes/` and `src/`.

### `recipes/`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ def main(

chats = tokenizer.apply_chat_template(dialogs)

terminators = [
mreso marked this conversation as resolved.
Show resolved Hide resolved
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

with torch.no_grad():
for idx, chat in enumerate(chats):
safety_checker = get_safety_checker(enable_azure_content_safety,
Expand Down Expand Up @@ -113,6 +118,7 @@ def main(
top_k=top_k,
repetition_penalty=repetition_penalty,
length_penalty=length_penalty,
eos_token_id=terminators,
mreso marked this conversation as resolved.
Show resolved Hide resolved
**kwargs
)

Expand Down
1 change: 1 addition & 0 deletions scripts/spellcheck_conf/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1294,3 +1294,4 @@ EOS
eot
multiturn
tiktoken
eos
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from transformers import AutoTokenizer

ACCESS_ERROR_MSG = "Could not access tokenizer at 'meta-llama/Llama-2-7b-hf'. Did you log into huggingface hub and provided the correct token?"
LLAMA_VERSIONS = ["meta-llama/Llama-2-7b-hf", "meta-llama/Llama-3-8b-hf"]
LLAMA_VERSIONS = ["meta-llama/Llama-2-7b-hf", "meta-llama/Meta-Llama-3-8B"]

@pytest.fixture(params=LLAMA_VERSIONS)
def llama_version(request):
Expand Down
2 changes: 1 addition & 1 deletion tests/datasets/test_custom_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"example_1": "[INST] Who made Berlin [/INST] dunno",
"example_2": "[INST] Quiero preparar una pizza de pepperoni, puedes darme los pasos para hacerla? [/INST] Claro!",
},
"meta-llama/Llama-3-8b-hf":{
"meta-llama/Meta-Llama-3-8B":{
"example_1": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nWho made Berlin<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\ndunno<|eot_id|><|end_of_text|>",
"example_2": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow to start learning guitar and become a master at it?",
},
Expand Down
2 changes: 1 addition & 1 deletion tests/datasets/test_grammar_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"label": 1152,
"pos": 31,
},
"meta-llama/Llama-3-8b-hf":{
"meta-llama/Meta-Llama-3-8B":{
"label": 40,
"pos": 26,
},
Expand Down
2 changes: 1 addition & 1 deletion tests/datasets/test_samsum_datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"label": 8432,
"pos": 242,
},
"meta-llama/Llama-3-8b-hf":{
"meta-llama/Meta-Llama-3-8B":{
"label": 2250,
"pos": 211,
},
Expand Down
2 changes: 1 addition & 1 deletion tests/test_batching.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"train": 96,
"eval": 42,
},
"meta-llama/Llama-3-8b-hf": {
"meta-llama/Meta-Llama-3-8B": {
"train": 79,
"eval": 34,
}
Expand Down
Loading