# **Simple Generation Example**

### **Login To CLI**
- Make sure you have accepted the llama licensing agreement

In [1]:
!huggingface-cli login --token TOKEN

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /home/rshaw/.cache/huggingface/token
Login successful


### **Load Model**

- Load model from hugging face hub

In [1]:
from transformers import LlamaForCausalLM, AutoTokenizer
import torch

MODEL_ID = "meta-llama/Llama-2-7b-hf"

In [2]:
model = LlamaForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16) # switch to float16 if not ampere
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

if torch.cuda.is_available():
    model.to("cuda:0")
print(model.device)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

cuda:0


### **Generate Text**

- See [here for more details](https://huggingface.co/docs/transformers/llm_tutorial)
- See [here for even more details](https://huggingface.co/docs/transformers/generation_strategies)

In [5]:
prompt = "Hello my name is"

tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    generated_ids = model.generate(
        **inputs, 
        max_new_tokens=100, 
    )
    
    print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])

Hello my name is Nicole and I am a 26 year old female. I am looking for a roommate that is clean and respectful. I am a quiet person and I like to keep to myself. I do not smoke and I am not a party animal. I do like to go out and have fun but I am not a big drinker. I am a very friendly person and I am easy to get along with. I am not picky with the type of person I would like to


In [5]:
from transformers import TextStreamer

inputs = tokenizer(["An increasing sequence: one,"], return_tensors="pt").to("cuda")
streamer = TextStreamer(tokenizer)

# Despite returning the usual output, the streamer will also print the generated text to stdout.
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=100)

<s> An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, twenty-one, twenty-two, twenty-three, twenty-four, twenty-five, twenty-six, twenty-seven, twenty-eight, twenty-nine, thirty, thirty-one, thirty-two, thirty-three, thirty
