In [None]:
# Transformers installation
! pip install transformers datasets evaluate accelerate
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git

# Quickstart

Transformers is designed to be fast and easy to use so that everyone can start learning or building with transformer models.

The number of user-facing abstractions is limited to only three classes for instantiating a model, and two APIs for inference or training. This quickstart introduces you to Transformers' key features and shows you how to:

- load a pretrained model
- run inference with [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline)
- fine-tune a model with [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer)

## Set up

To start, we recommend creating a Hugging Face [account](https://hf.co/join). An account lets you host and access version controlled models, datasets, and [Spaces](https://hf.co/spaces) on the Hugging Face [Hub](https://hf.co/docs/hub/index), a collaborative platform for discovery and building.

Create a [User Access Token](https://hf.co/docs/hub/security-tokens#user-access-tokens) and log in to your account.

<hfoptions id="authenticate">
<hfoption id="notebook">

Paste your User Access Token into [notebook_login](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/authentication#huggingface_hub.notebook_login) when prompted to log in.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

</hfoption>
<hfoption id="CLI">

Make sure the [huggingface_hub[cli]](https://huggingface.co/docs/huggingface_hub/guides/cli#getting-started) package is installed and run the command below. Paste your User Access Token when prompted to log in.

```bash
hf auth login
```

</hfoption>
</hfoptions>

Install Pytorch.

```bash
!pip install torch
```

Then install an up-to-date version of Transformers and some additional libraries from the Hugging Face ecosystem for accessing datasets and vision models, evaluating training, and optimizing training for large models.

```bash
!pip install -U transformers datasets evaluate accelerate timm
```

## Pretrained models

Each pretrained model inherits from three base classes.

| **Class** | **Description** |
|---|---|
| [PreTrainedConfig](https://huggingface.co/docs/transformers/main/en/main_classes/configuration#transformers.PreTrainedConfig) | A file that specifies a models attributes such as the number of attention heads or vocabulary size. |
| [PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel) | A model (or architecture) defined by the model attributes from the configuration file. A pretrained model only returns the raw hidden states. For a specific task, use the appropriate model head to convert the raw hidden states into a meaningful result (for example, [LlamaModel](https://huggingface.co/docs/transformers/main/en/model_doc/llama2#transformers.LlamaModel) versus [LlamaForCausalLM](https://huggingface.co/docs/transformers/main/en/model_doc/llama2#transformers.LlamaForCausalLM)). |
| Preprocessor | A class for converting raw inputs (text, images, audio, multimodal) into numerical inputs to the model. For example, [PreTrainedTokenizer](https://huggingface.co/docs/transformers/main/en/main_classes/tokenizer#transformers.PreTrainedTokenizer) converts text into tensors and [ImageProcessingMixin](https://huggingface.co/docs/transformers/main/en/main_classes/image_processor#transformers.ImageProcessingMixin) converts pixels into tensors. |

We recommend using the [AutoClass](https://huggingface.co/docs/transformers/main/en/./model_doc/auto) API to load models and preprocessors because it automatically infers the appropriate architecture for each task and machine learning framework based on the name or path to the pretrained weights and configuration file.

Use [from_pretrained()](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) to load the weights and configuration file from the Hub into the model and preprocessor class.

When you load a model, configure the following parameters to ensure the model is optimally loaded.

- `device_map="auto"` automatically allocates the model weights to your fastest device first.
- `dtype="auto"` directly initializes the model weights in the data type they're stored in, which can help avoid loading the weights twice (PyTorch loads weights in `torch.float32` by default).

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2-large", dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2-large")

Tokenize the text and return PyTorch tensors with the tokenizer. Move the model to an accelerator if it's available to accelerate inference.

In [None]:
model_inputs = tokenizer(["The secret to baking a good cake is "], return_tensors="pt").to(model.device)

In [None]:
model_inputs

The model is now ready for inference or training.

For inference, pass the tokenized inputs to [generate()](https://huggingface.co/docs/transformers/main/en/main_classes/text_generation#transformers.GenerationMixin.generate) to generate text. Decode the token ids back into text with [batch_decode()](https://huggingface.co/docs/transformers/main/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.batch_decode).

In [None]:
generated_ids = model.generate(**model_inputs, max_length=50)
tokenizer.batch_decode(generated_ids)[0]

## Pipeline

The [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline) class is the most convenient way to inference with a pretrained model. It supports many tasks such as text generation, image segmentation, automatic speech recognition, document question answering, and more.

> [!TIP]
> Refer to the [Pipeline](https://huggingface.co/docs/transformers/main/en/./main_classes/pipelines) API reference for a complete list of available tasks.

Create a [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline) object and select a task. By default, [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline) downloads and caches a default pretrained model for a given task. Pass the model name to the `model` parameter to choose a specific model.

<hfoptions id="pipeline-tasks">
<hfoption id="text generation">

Use `Accelerator` to automatically detect an available accelerator for inference.

In [None]:
from transformers import pipeline
from accelerate import Accelerator

device = Accelerator().device


### Text Generation

In [None]:
generator = pipeline("text-generation", model="openai-community/gpt2-large", device=device)

Prompt [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline) with some initial text to generate more text.

In [None]:
generator("The secret to baking a good cake is ", max_new_tokens=50)

### Question Answering

In [None]:
qa = pipeline("question-answering", model="distilbert/distilbert-base-cased-distilled-squad", device=device)

In [None]:
qa(question="Who wrote Frankenstein?",
    context="William Shakespeare wrote Macbeth, Mary Shelley Frankenstein, \
              and Ray Bradbury Fahrenheit")

### Summarization

In [None]:
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6", device=device)

In [None]:
summarizer("""
The FitnessGramâ„¢ Pacer Test is a multistage aerobic capacity test that
progressively gets more difficult as it continues. The 20 meter pacer
test will begin in 30 seconds. Line up at the start. The running speed
starts slowly, but gets faster each minute after you hear this signal.
[beep] A single lap should be completed each time you hear this sound.
[ding] Remember to run in a straight line, and run as long as possible.
The second time you fail to complete a lap before the sound, your test
is over. The test will begin on the word start. On your mark, get ready,
start.
""")

### Machine Translation

In [None]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr", device=device)

In [None]:
translator("Yesterday, I ate strawberry ice cream and watched the new Chainsaw Man movie")

### LLM Chats

In [None]:
chatbot = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-1.7B-Instruct", device=device)

In [None]:
chat = [
    {"role": "system", "content": "You are an evil devil that loves mischief."},
    {"role": "user", "content": "Hey I found a wallet on the floor, what should I do?"}
]

In [None]:
response = chatbot(chat, max_new_tokens=64)
print(response[0]["generated_text"][-1]["content"])

## Trainer

[Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) is a complete training and evaluation loop for PyTorch models. It abstracts away a lot of the boilerplate usually involved in manually writing a training loop, so you can start training faster and focus on training design choices. You only need a model, dataset, a preprocessor, and a data collator to build batches of data from the dataset.

Use the [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) class to customize the training process. It provides many options for training, evaluation, and more. Experiment with training hyperparameters and features like batch size, learning rate, mixed precision, torch.compile, and more to meet your training needs. You could also use the default training parameters to quickly produce a baseline.

Load a model, tokenizer, and dataset for training.

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset, DatasetDict

model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
dataset = load_dataset("ucirvine/sms_spam")

### Dataset Finagling

Datasets come in all shapes and sizes, we have to do some preprocessing befre heading to our model.

In [None]:
dataset

In [None]:
dataset = dataset['train'].train_test_split(test_size=0.1)

In [None]:
dataset

In [None]:
train = dataset["train"]
test = dataset["test"]

In [None]:
train = train.train_test_split(test_size=0.2)

In [None]:
train

In [None]:
# Compile Train/Val/Test
dataset = DatasetDict({"train": train["train"], "val": train["test"], "test": test})

In [None]:
dataset

Create a function to tokenize the text and convert it into PyTorch tensors. Apply this function to the whole dataset with the [map](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) method.

### Tokenization

In [None]:
def tokenize_dataset(dataset):
    return tokenizer(dataset["sms"])
dataset = dataset.map(tokenize_dataset, batched=True)

In [None]:
dataset

Load a data collator to create batches of data and pass the tokenizer to it.

### Data Collation

In [None]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
data_collator

Next, set up [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) with the training features and hyperparameters.

### Training Arguments

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    report_to="none",
    output_dir="spam-detect",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_strategy="epoch",
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

Finally, pass all these separate components to [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) and call [train()](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train) to start.

### Metrics

In [None]:
import evaluate
import numpy as np

In [None]:
def compute_metrics(eval_preds):
    metric = evaluate.load("accuracy", "f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

### Trainer

In [None]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["val"],
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()

### Metrics

In [None]:
predictions = trainer.predict(dataset["test"])

In [None]:
predictions.metrics

Congratulations, you just trained your first model with Transformers!

## Head Tuning

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset, DatasetDict
from transformers import TrainingArguments
from transformers import DataCollatorWithPadding
from transformers import Trainer
import evaluate
import numpy as np


model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
dataset = load_dataset("ucirvine/sms_spam")

In [None]:
dataset = dataset['train'].train_test_split(test_size=0.1)

In [None]:
train = dataset["train"]
test = dataset["test"]

In [None]:
train = train.train_test_split(test_size=0.2)

In [None]:
# Compile Train/Val/Test
dataset = DatasetDict({"train": train["train"], "val": train["test"], "test": test})

In [None]:
def tokenize_dataset(dataset):
    return tokenizer(dataset["sms"])
dataset = dataset.map(tokenize_dataset, batched=True)

In [None]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

We can look at the components of our models by doing model.parameters() this allows us to look how the model was constructed. We can also "freeze" layers by not allowing gradients to flow during back propagation. Super light wieght and efficient to just modify the classification head or create your own!

In [None]:
# Freeze the parameters of the base model
for param in model.base_model.parameters():
    param.requires_grad = False

# The 'classifier' (the head) is already trainable by default, but just to make sure
for param in model.classifier.parameters():
    param.requires_grad = True

In [None]:
training_args = TrainingArguments(
    report_to="none",
    output_dir="spam-detect",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_strategy="epoch",
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [None]:
def compute_metrics(eval_preds):
    metric = evaluate.load("accuracy", "f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["val"],
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [None]:
trainer.train()

In [None]:
predictions = trainer.predict(dataset["test"])

In [None]:
predictions.metrics

## PEFT (LoRA)

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset, DatasetDict
from transformers import TrainingArguments
from transformers import DataCollatorWithPadding
from transformers import Trainer
import evaluate
import numpy as np
from peft import LoraConfig, get_peft_model

base_model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
dataset = load_dataset("ucirvine/sms_spam")

R represents the rank of the low-rank approximation.
alpha represents the scaling factor for the low-rank approximation.
dropout is the dropout probability
and target_modules are the layers to apply the low-rank approximation to, we can apply it to our attention or our FFN!

We combine the lora model to our base model its basically just adding to matrixes per target layer. Super lightweight and efficient!

In [None]:
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules="all-linear",
    bias="none",
)
peft_model = get_peft_model(base_model, config)
peft_model.print_trainable_parameters() # To verify only a small % is trainable

In [None]:
dataset = dataset['train'].train_test_split(test_size=0.1)
train = dataset["train"]
test = dataset["test"]
train = train.train_test_split(test_size=0.2)
# Compile Train/Val/Test
dataset = DatasetDict({"train": train["train"], "val": train["test"], "test": test})

In [None]:
def tokenize_dataset(dataset):
    return tokenizer(dataset["sms"])
dataset = dataset.map(tokenize_dataset, batched=True)

In [None]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
training_args = TrainingArguments(
    report_to="none",
    output_dir="spam-detect",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_strategy="epoch",
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [None]:
def compute_metrics(eval_preds):
    metric = evaluate.load("accuracy", "f1")
    logits, labels = eval_preds
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [None]:
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["val"],
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

In [None]:
trainer.train()

In [None]:
predictions = trainer.predict(dataset["test"])

In [None]:
predictions.metrics

## Mini Aside: Propmpting Decoders

Decoders can be trained but one major pro for decoders is that they can be tuned without training. We call this prompting! We allow the model to gain context through previous responses to get an answer!

In [None]:
from transformers import pipeline
from accelerate import Accelerator

device = Accelerator().device

In [None]:
dataset = load_dataset("ucirvine/sms_spam")

In [None]:
generator = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-1.7B-Instruct", device=device)

In [None]:
prompt = """
This is a text classification task.
sms: {sms}
Is this text spam or ham:
"""

In [None]:
example = """
Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's\n
"""

This is what we call few shot prompting, we provide a couple of "training examples" and let the model prect everything after!

Very prone to bias but for smarter models they can learn complex relationships really fast!

In [None]:
chat = [
    {"role": "system", "content": "You are a Spam detection bot"},
    {"role": "user", "content": f"{prompt.format(sms=example)}"},
    {"role": "assistant", "content": "spam"}
]

In [None]:
chat

This is also parrellizable, Try find out how we can parallelize this function!

In [None]:
def get_score(dataset, chat):
    chat = chat.copy()
    chat.append({"role": "user", "content": f"{prompt.format(sms=dataset['sms'])}"})
    outputs = generator(chat, max_new_tokens=2, pad_token_id=tokenizer.eos_token_id)
    dataset["guess"] = outputs[0]["generated_text"]
    return dataset

In [None]:
get_score(dataset["train"][3], chat)

## Next steps

Now that you have a better understanding of Transformers and what it offers, it's time to keep exploring and learning what interests you the most.

- **Base classes**: Learn more about the configuration, model and processor classes. This will help you understand how to create and customize models, preprocess different types of inputs (audio, images, multimodal), and how to share your model.
- **Inference**: Explore the [Pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.Pipeline) further, inference and chatting with LLMs, agents, and how to optimize inference with your machine learning framework and hardware.
- **Training**: Study the [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer) in more detail, as well as distributed training and optimizing training on specific hardware.
- **Quantization**: Reduce memory and storage requirements with quantization and speed up inference by representing weights with fewer bits.
- **Resources**: Looking for end-to-end recipes for how to train and inference with a model for a specific task? Check out the task recipes!