<h1>LLM - first touch</h1>

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths.
https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3


In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Note that we assume you have an NVIDIA GPU (device_map="cuda") but you can choose a different device instead. 


Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [2]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)


* `return_full_text` By setting this to False, the prompt will not be returned but merely the output of the model.
* `max_new_tokens` The maximum number of tokens the model will generate. By setting a limit, we prevent long and unwieldy output as some models might continue generating output until they reach their context window.
* * `do_sample` Whether the model uses a sampling strategy to choose the next token. By setting this to False, the model will always select the next most probable token. 

Finally, we create our prompt as a user and give it to the model:

In [4]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

 Why did the chicken join the band? Because it had the drumsticks!
