<a href="https://colab.research.google.com/github/mllopart/google-colab-notebooks/blob/main/phi_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Please run this notebook with a gpu (T4 at least)

In [None]:
!pip install sentencepiece transformers accelerate einops

In [3]:
import torch

In [None]:
torch.cuda.is_available()

In [5]:
from huggingface_hub import snapshot_download

In [None]:
repo_id = 'microsoft/Phi-3-mini-4k-instruct'

model_path = snapshot_download(repo_id=repo_id,repo_type="model", local_dir="./phi-3", local_dir_use_symlinks=False)

model_path = "./phi-3"

In [7]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Needs 12GB of vRAM to run in float32 (default)
# Run this line to load in float16. You need Gb of vRAM
torch.set_default_dtype(torch.float16)

model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)

In [None]:
model.device

In [10]:
import time
def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
    s = time.time()

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, **generation_params)
    completion = tokenizer.batch_decode(outputs)[0]

    elapsed = time.time() - s

    num_input_tokens = inputs['input_ids'].shape[1]
    num_total_tokens = outputs.shape[1]
    num_output_tokens = float(num_total_tokens) - num_input_tokens
    speed = num_output_tokens / elapsed

    print(f"Took {round(elapsed,1)} seconds to generate {int(num_output_tokens)} new tokens at speed {round(speed, 1)} tokens/seconds")

    return completion

In [11]:
prompt = \
'''
<|system|>
You are a helpful AI assistant.<|end|>
<|user|>
How to explain Internet for a medieval knight?<|end|>
<|assistant|>
'''

In [12]:
result = generate(prompt, generation_params={"max_length":200})
result



Took 9.5 seconds to generate 174 new tokens at speed 18.4 tokens/seconds


'<s> \n<|system|> You are a helpful AI assistant.<|end|><|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Greetings, noble knight! I understand that the concept of the Internet may seem foreign and complex, but let me attempt to explain it to you in a way that relates to your time and experiences.\n\nImagine the Internet as a vast, magical library that spans the entire kingdom, connecting every castle, town, and village. This library is filled with countless scrolls, each containing knowledge, stories, and information about the world. However, instead of being physically present in one location, these scrolls are stored in a mystical realm called "cyberspace."\n\nTo access this library, you would need a special tool called a "computing device," which is akin to a magical crystal ball. This crystal ball allows you to connect with other crystal balls across the kingdom, enabling'

In [13]:
result = generate(prompt, generation_params={"max_length":200})
result

Took 7.3 seconds to generate 174 new tokens at speed 23.9 tokens/seconds


'<s> \n<|system|> You are a helpful AI assistant.<|end|><|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Greetings, noble knight! I understand that the concept of the Internet may seem foreign and complex, but let me attempt to explain it to you in a way that relates to your time and experiences.\n\nImagine the Internet as a vast, magical library that spans the entire kingdom, connecting every castle, town, and village. This library is filled with countless scrolls, each containing knowledge, stories, and information about the world. However, instead of being physically present in one location, these scrolls are stored in a mystical realm called "cyberspace."\n\nTo access this library, you would need a special tool called a "computing device," which is akin to a magical crystal ball. This crystal ball allows you to connect with other crystal balls across the kingdom, enabling'

In [14]:
result = generate(prompt, generation_params={"max_length":2000})
result

Took 20.8 seconds to generate 458 new tokens at speed 22.1 tokens/seconds


'<s> \n<|system|> You are a helpful AI assistant.<|end|><|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Greetings, noble knight! I understand that the concept of the Internet may seem foreign and complex, but let me attempt to explain it to you in a way that relates to your time and experiences.\n\nImagine the Internet as a vast, magical library that spans the entire kingdom, connecting every castle, town, and village. This library is filled with countless scrolls, each containing knowledge, stories, and information about the world. However, instead of being physically present in one location, these scrolls are stored in a mystical realm called "cyberspace."\n\nTo access this library, you would need a special tool called a "computing device," which is akin to a magical crystal ball. This crystal ball allows you to connect with other crystal balls across the kingdom, enabling you to share and exchange information with fellow knights, scholars, and even common