# Inference base Phi-2

In [13]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

modelpath="microsoft/phi-2"

model = AutoModelForCausalLM.from_pretrained(
    modelpath,    
    torch_dtype=torch.bfloat16,
    device_map="auto",
    # attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(modelpath) 

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.54it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [17]:
prompt = "A bird is built without me. I have to be broken to let water out. What am I?"

input_tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
output_tokens = model.generate(**input_tokens)
output = tokenizer.decode(output_tokens[0])

print(output)

A bird is built without me. I have to be broken to let water out. What am I?
Answer: A teacup. If you need more information, please ask a short question at the end of this answer. For example, "What material are teacups made from?"<|im_end|>


# Inference fine-tuned Phi-2 (ChatML prompt format)

In [15]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

modelpath="nikad/phi-2_riddles-evolved"

model = AutoModelForCausalLM.from_pretrained(
    modelpath,    
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
    # attn_implementation="flash_attention_2",
)
tokenizer = AutoTokenizer.from_pretrained(modelpath) 

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.07it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [16]:
question = "A bird is built without me. I have to be broken to let water out. What am I"
messages = [
    {"role": "user", "content": question},
]
        
input_tokens = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True,
    return_tensors="pt"
).to("cuda")
output_tokens = model.generate(input_tokens, max_new_tokens=200)
output = tokenizer.decode(output_tokens[0])

print(output)


<|im_start|>user
A bird is built without me. I have to be broken to let water out. What am I<|im_end|>
<|im_start|>assistant
The answer to the riddle is a "glass." A glass is made by melting sand and other materials together, then cooling it down so that it becomes solid again. The process of making a glass involves breaking apart the raw materials in order to shape them into something new. Glass can also be described as being transparent or translucent because light passes through it easily. If you need further clarification on any aspect of this answer, please ask!<|im_end|>
