# **Simple Generation Example**

### **Login To CLI**
- Make sure you have accepted the llama licensing agreement

In [1]:
!huggingface-cli login --token TOKEN

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/rshaw/.cache/huggingface/token
Login successful


### **Load Model**

- Load model from hugging face hub

In [1]:
from transformers import LlamaForCausalLM, AutoTokenizer
import torch

MODEL_ID_0 = "meta-llama/Llama-2-7b-hf"
MODEL_ID_1 = "revellabs/llama2-reinvent-base-v1"

In [2]:
model_0 = LlamaForCausalLM.from_pretrained(MODEL_ID_0, torch_dtype=torch.bfloat16) # switch to float16 if not ampere
model_1 = LlamaForCausalLM.from_pretrained(MODEL_ID_1, torch_dtype=torch.bfloat16) # switch to float16 if not ampere

tokenizer_0 = AutoTokenizer.from_pretrained(MODEL_ID_0)
tokenizer_1 = AutoTokenizer.from_pretrained(MODEL_ID_0)

if torch.cuda.is_available():
    model_0.to("cuda:0")
    model_1.to("cuda:1")

print(model_0.device)
print(model_1.device)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

cuda:0
cuda:1


### **Generate Text**

- See [here for more details](https://huggingface.co/docs/transformers/llm_tutorial)
- See [here for even more details](https://huggingface.co/docs/transformers/generation_strategies)

In [4]:
def generate(prompt, model, tokenizer):
    
    tokenizer.pad_token_id = tokenizer.eos_token_id
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    model.eval()
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs, 
            max_new_tokens=100, 
        )
    
    print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

What is AWS re:invent?
AWS re:Invent is the largest global customer and partner conference focused on the cloud.
AWS re:Invent is an annual event hosted by Amazon Web Services (AWS). It is the largest global customer and partner conference focused on the cloud. The conference is held in Las Vegas and typically runs from mid-November to early December.
AWS re:Invent 2022 will be held from November 28 to December 2, 
What is AWS re:invent? AWS re:invent is a learning conference hosted by Amazon Web Services (AWS) for the global cloud computing community. It is an opportunity for the cloud community to learn, connect, and be inspired by the latest innovations in cloud technologies, services, and best practices. AWS re:invent is a free event, and you can register here. What is AWS re:invent 2022? AWS re:invent 2022 is a hybrid


In [5]:
generate("What is AWS re:invent?", model_0, tokenizer_0)

What is AWS re:invent?
What is AWS re:invent? AWS re:invent is a conference hosted by Amazon Web Services (AWS) that brings together technology leaders, developers, and IT professionals from around the world to learn about the latest innovations in cloud computing and to connect with peers and experts in the field.
What is AWS re:invent? AWS re:invent is a conference hosted by Amazon Web Services (AWS) that brings together technology leaders, developers, and


In [6]:
generate("What is AWS re:invent?", model_1, tokenizer_1)

What is AWS re:invent? AWS re:invent is a learning conference hosted by Amazon Web Services (AWS) for the global cloud computing community. It is an opportunity for the cloud community to learn, connect, and be inspired by the latest innovations in cloud technologies, services, and best practices. AWS re:invent is a free event, and you can register here. What is AWS re:invent 2022? AWS re:invent 2022 is a hybrid
