## Language Model


Jay Urbain, PhD

Hugging Face is the organization behind the well-known Transformers package, which for years has driven the development of language models in general.

The main generative model we use throughout the book is Phi-3-mini, which is a relatively small (3.8 billion parameters) but quite performant model.16 Due to its small size, the model can be run on devices with less than 8 GB of VRAM.

In [4]:
#!pip install accelerate
#!pip install -U transformers

When you use an LLM, two models are loaded: the generative model and the underlying tokenizer

The tokenizer splits the input text into tokens before feeding it to the generative model. 

Use transformers to load both the tokenizer and model. Works best with GPU.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    # device_map="cuda",
    device_map="cpu",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   6%|6         | 304M/4.97G [00:00<?, ?B/s]

transformers.pipeline simplifie the process of generating text. It encapsulates the model, tokenizer, and text generation process into a single function:

Relevant parameters:

return_full_text - By setting this to False, the prompt will not be returned, only the output of the model.

max_new_tokens - The maximum number of tokens the model will generate. By setting a limit, we prevent long and unwieldy output as some models might continue generating output until they reach their context window.

do_sample - Whether the model uses a sampling strategy to choose the next token. By setting this to False, the model will always select the next most probable token. 

To generate our first text, instruct the model to tell a joke about chickens. 

Format the prompt in a list of dictionaries where each dictionary relates to an entity in the conversation. 

Our role is that of “user” and we use the “content” key to define our prompt:

In [None]:

# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])