# Working with HuggingFace's Transformers Library

In this tutorial, we'll explore the Transformers library by HuggingFace, a powerful tool for working with state-of-the-art NLP models like BERT, GPT-2, and more.

## Prerequisites:
Basic understanding of PyTorch.
Familiarity with deep learning concepts.

## Setting up the Environment
First, let's set up our Colab environment:

In [9]:
!pip install -q torch torchvision transformers

## 1. Loading Pre-trained Models
HuggingFace's Transformers library provides a plethora of pre-trained models. Let's start by loading the BERT model.

In [2]:
from transformers import BertTokenizer, BertModel

# Load pre-trained BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

## 2. Tokenization
Before feeding text to BERT, we need to tokenize it. The tokenizer will convert our text into tokens that correspond to BERT's vocabulary.

In [3]:
text = "Hello, HuggingFace!"
encoded_input = tokenizer(text, return_tensors='pt')
print(encoded_input)

{'input_ids': tensor([[  101,  7592,  1010, 17662, 12172,   999,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}


## 3. Model Inference
Now, let's use the BERT model to get embeddings for our text.

In [5]:
import torch
with torch.no_grad():
    output = model(**encoded_input)

# Extract the sequence output (representations for each token in the input)
sequence_output = output.last_hidden_state
print(sequence_output)


tensor([[[-0.2291,  0.0135, -0.1188,  ..., -0.2613, -0.0115,  0.6596],
         [-0.0107,  0.0749,  0.6104,  ..., -0.1752,  0.3824, -0.0338],
         [-0.7685,  0.8493,  0.3917,  ..., -1.2588, -0.3544,  0.0577],
         ...,
         [ 0.2830, -0.0994,  1.1351,  ..., -0.4314,  0.6670, -0.0023],
         [-0.5194, -0.1928, -0.3060,  ...,  0.6108, -0.3512,  0.1036],
         [ 0.7359,  0.3968, -0.0619,  ...,  0.4613, -0.4899, -0.3350]]])


## 4. Using Transformer Models for Tasks
HuggingFace provides interfaces for specific tasks like sequence classification, token classification, etc. Let's use BERT for sequence classification.

In [6]:
from transformers import BertForSequenceClassification

# Load the BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased')

# Example input text
texts = ["Hello, HuggingFace!", "Transformers are awesome!"]

# Tokenize and get predictions
encoded_inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
outputs = model(**encoded_inputs)

# Get the logits from the model's output
logits = outputs.logits
print(logits)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tensor([[0.5261, 0.4207],
        [0.4768, 0.3721]], grad_fn=<AddmmBackward0>)


## 5. Fine-tuning on Custom Data
For this example, let's assume we have a binary classification task. We'll create dummy data and fine-tune BERT on it.

In [7]:
# Dummy data
texts = ["I love Transformers.", "I don't like this library."]
labels = torch.tensor([1, 0])  # 1: Positive, 0: Negative

# Tokenize the data
encoded_inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')
encoded_inputs["labels"] = labels

# Fine-tuning
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
model.train()

for epoch in range(3):
    optimizer.zero_grad()
    outputs = model(**encoded_inputs)
    loss = outputs.loss
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch + 1}, Loss: {loss.item()}")


Epoch 1, Loss: 0.6741079092025757
Epoch 2, Loss: 0.6127657294273376
Epoch 3, Loss: 0.6706300973892212


## 6. Saving & Loading Fine-tuned Models
After fine-tuning, you might want to save your model for later use.

In [8]:
# Save model
model.save_pretrained("./my_bert_model")

# Load model
loaded_model = BertForSequenceClassification.from_pretrained("./my_bert_model")
