Add support for Llama 3 (and Llama-2-70b-hf) #549

joelburget · 2024-04-20T18:27:02Z

Description

This adds all four current Llama 3 models and enables Llama-2-70b-hf. No dependencies required for this change (but you must have been granted access on Hugging Face to download).

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

I didn't add tests but I did write a sanity check:

test.py:

import transformer_lens

model = transformer_lens.HookedTransformer.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct"
)

prompts = [
    "Hey how are you doing today?",
    "Two households, both alike in dignity\n(In fair Verona, where we lay our scene),",
    "The Times 03/Jan/2009",
]

for prompt in prompts:
    print(prompt)
    print(model.generate(prompt))
    tokens = model.to_tokens(prompt)
    logits, cache = model.run_with_cache(tokens, remove_batch_dim=True)
    print(type(logits))
    print(type(cache))

output:

(transformer-lens-py3.11) root@db2b43c33b4c:/workspace/TransformerLens# python3 test.py
/workspace/TransformerLens/.venv/lib/python3.11/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 161.14it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:18<00:00, 49.55s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:root:You are not using LayerNorm, so the writing weights can't be centered! Skipping
Loaded pretrained model meta-llama/Meta-Llama-3-8B-Instruct into HookedTransformer
Hey how are you doing today?
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.46s/it]
Hey how are you doing today? I am having a pretty good day so far.
<class 'torch.Tensor'>
<class 'transformer_lens.ActivationCache.ActivationCache'>
Two households, both alike in dignity
(In fair Verona, where we lay our scene),
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:17<00:00,  1.74s/it]
Two households, both alike in dignity
(In fair Verona, where we lay our scene), - a grand and beautiful phrase. These words are
<class 'torch.Tensor'>
<class 'transformer_lens.ActivationCache.ActivationCache'>
The Times 03/Jan/2009
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.53s/it]
The Times 03/Jan/2009
Breast cancer trial hears of ‘miracle
<class 'torch.Tensor'>
<class 'transformer_lens.ActivationCache.ActivationCache'>

llama-2-70b-hf and Llama 3 models all have n_key_value_heads set so they'll use Grouped-Query Attention.

joelburget · 2024-04-20T19:32:10Z

Looks like the docs fail to build because "Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it." Might be the same as #548.

neelnanda-io · 2024-04-20T20:02:27Z

Thanks so much for the PR! Yeah our solution to the docs thing is to hard code the config as a dict. It's hacky, but it's not like the config is ever going to change. Can you do that? Should be easy enough to copy what the existing llama models do

…

On Sat, 20 Apr 2024, 8:32 pm Joel Burget, ***@***.***> wrote: Looks like the docs fail to build because "Repo model meta-llama/Llama-2-7b-hf is gated. You must be authenticated to access it." Might be the same as #548 <#548>. — Reply to this email directly, view it on GitHub <#549 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASRPNKNNW3VLI5Y7JAPVKCTY6K7FDAVCNFSM6AAAAABGQV45CGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXG43DGMJQGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

See discussion on TransformerLensOrg#549 for why.

joelburget · 2024-04-21T00:35:57Z

Sure thing! Hopefully what I have now looks okay. Note that the docs are failing to build but I think it's because of #548:

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-v0.1/resolve/main/config.json.
Repo model mistralai/Mistral-7B-v0.1 is gated. You must be authenticated to access it.

joelburget · 2024-04-21T00:39:03Z

Note in case someone needs to do this in the future:

Since the Llama 3 configs are exactly the same as the other Llama models (LlamaForCausalLM), copy their configs into a helper:

def mk_cfg(hf_config):
    return {
    "d_model": hf_config.hidden_size,
    "d_head": hf_config.hidden_size // hf_config.num_attention_heads,
    "n_heads": hf_config.num_attention_heads,
    "d_mlp": hf_config.intermediate_size,
    "n_layers": hf_config.num_hidden_layers,
    "n_ctx": hf_config.max_position_embeddings,
    "eps": hf_config.rms_norm_eps,
    "d_vocab": hf_config.vocab_size,
    "act_fn": hf_config.hidden_act,
    "n_key_value_heads": (
        hf_config.num_key_value_heads
        if hf_config.num_key_value_heads != hf_config.num_attention_heads
        else None
    ),
    # This is done because the current implementation of GQA will use Grouped-Query Attention if
    # n_key_value_heads is not None, but hf_config.num_key_value_heads is sometimes specified as
    # the same as hf_config.num_attention_heads, in which case GQA should not be used.
    "normalization_type": "RMS",
    "positional_embedding_type": "rotary",
    "rotary_adjacent_pairs": False,
    "rotary_dim": hf_config.hidden_size // hf_config.num_attention_heads,
    "final_rms": True,
    "gated_mlp": True,
}

Then, on a computer signed in to a HF account with access:

>>> mk_cfg(AutoConfig.from_pretrained("meta-llama/Meta-Llama-3-8B"))
{'d_model': 4096, 'd_head': 128, 'n_heads': 32, 'd_mlp': 14336, 'n_layers': 32, 'n_ctx': 8192, 'eps': 1e-05, 'd_vocab': 128256, 'act_fn': 'silu', 'n_key_value_heads': 8, 'normalization_type': 'RMS', 'positional_embedding_type': 'rotary', 'rotary_adjacent_pairs': False, 'rotary_dim': 128, 'final_rms': True, 'gated_mlp': True}
>>> mk_cfg(AutoConfig.from_pretrained("meta-llama/Meta-Llama-3-70B"))
{'d_model': 8192, 'd_head': 128, 'n_heads': 64, 'd_mlp': 28672, 'n_layers': 80, 'n_ctx': 8192, 'eps': 1e-05, 'd_vocab': 128256, 'act_fn': 'silu', 'n_key_value_heads': 8, 'normalization_type': 'RMS', 'positional_embedding_type': 'rotary', 'rotary_adjacent_pairs': False, 'rotary_dim': 128, 'final_rms': True, 'gated_mlp': True}

bryce13950 · 2024-04-24T00:46:39Z

Yeah, that config is definitely a bit unruly. Revising it to find ways to eliminate shared code, or finding other ways to make it more manageable is a worthwhile undertaking. The docs issue is resolved. As long as everything is still passing with the recent changes, I should be able to get this merged shortly.

joelburget added 4 commits April 19, 2024 14:36

Start work on adding llama.

aab4e90

Remove v2 from arxiv URL.

94d5b3d

Remove llama special case (breaks because hf_config is not defined).

83ecf49

Remove TODO.

22b7e2e

llama-2-70b-hf and Llama 3 models all have n_key_value_heads set so they'll use Grouped-Query Attention.

joelburget marked this pull request as draft April 20, 2024 18:57

Add back check for non-hf-hosted models.

e90c652

joelburget force-pushed the llama3 branch from d2a3401 to e90c652 Compare April 20, 2024 19:29

joelburget marked this pull request as ready for review April 20, 2024 19:32

Hardcode Llama-3 configs.

639571e

See discussion on TransformerLensOrg#549 for why.

Merge branch 'main' into llama3

ccf3548

bryce13950 merged commit 2092dc9 into TransformerLensOrg:main Apr 24, 2024
8 checks passed

joelburget deleted the llama3 branch April 25, 2024 00:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Llama 3 (and Llama-2-70b-hf) #549

Add support for Llama 3 (and Llama-2-70b-hf) #549

joelburget commented Apr 20, 2024

joelburget commented Apr 20, 2024

neelnanda-io commented Apr 20, 2024 via email

joelburget commented Apr 21, 2024 •

edited

joelburget commented Apr 21, 2024 •

edited

bryce13950 commented Apr 24, 2024

Add support for Llama 3 (and Llama-2-70b-hf) #549

Add support for Llama 3 (and Llama-2-70b-hf) #549

Conversation

joelburget commented Apr 20, 2024

Description

Type of change

Checklist:

joelburget commented Apr 20, 2024

neelnanda-io commented Apr 20, 2024 via email

joelburget commented Apr 21, 2024 • edited

joelburget commented Apr 21, 2024 • edited

bryce13950 commented Apr 24, 2024

joelburget commented Apr 21, 2024 •

edited

joelburget commented Apr 21, 2024 •

edited