# Meta-Llama-3-8B-Instruct

Ref. https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

In [None]:
# !pip install accelerate --user

In [1]:
# import os
# os.environ['HF_TOKEN']="Hugging Face Token"

In [None]:
from huggingface_hub import login
login()

The Llama3 models were trained using bfloat16, but the original inference uses float16. The checkpoints uploaded on the Hub use torch_dtype = 'float16', which will be used by the AutoModel API to cast the checkpoints from torch.float32 to torch.float16.

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "meta-llama/Meta-Llama-3-8b-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16, # bfloat16
    device_map="auto",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



In [2]:
messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

response = outputs[0][input_ids.shape[-1]:]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


In [3]:
print(tokenizer.decode(response, skip_special_tokens=True))

Arrrr, shiver me timbers! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas o' Text! Me be here to chat ye up and swab the decks with me witty banter and clever responses. So hoist the colors, me hearty, and let's set sail fer a swashbucklin' good time!


### Your Turn! 한국어로 챗봇 대화, 파이썬 코드 생성, 영→한 번역, 요약 등을 테스트해 보세요!!

### Model Save & Load

In [None]:
model.save_pretrained("./Meta-Llama3-8B-Instruct")

In [None]:
tokenizer.save_pretrained("./Meta-Llama3-8B-Instruct")

In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "/group-volume/sr_edu/AI-Application-Specialist/LLM/Meta-Llama3-8B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.float16, # bfloat16
    device_map="auto",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]