# Fine-Tuning Transformers with Hugging Face
In this notebook, we delve deeper into the Hugging Face ecosystem by exploring one of the most crucial tasks: fine-tuning pre-trained models on custom datasets.

## Introduction
Fine-tuning is the process of taking a pre-trained model (a model trained on a large dataset) and refining it on a smaller, specific dataset. This enables us to leverage the power of large-scale models like BERT or GPT-2 for our specific tasks without training from scratch.



In [None]:
# Setting Up
# Ensure you have the required libraries installed:
!pip install transformers
!pip install datasets
!pip install torch
!pip install accelerate -U

## Loading a Dataset
For this demonstration, we'll use the imdb dataset. However, the process we'll follow is applicable to any dataset.

In [None]:
from datasets import load_dataset

imdb = load_dataset("imdb")
print(imdb['train'][0:5])

## Preprocessing the Data
Before fine-tuning, we need to preprocess our data into a format suitable for the model:

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def encode(examples):
    return tokenizer(examples['text'], truncation=True, padding='max_length', max_length=256)

encoded_imdb = imdb.map(encode, batched=True)

## Loading a Pre-trained Model
We'll use the BertForSequenceClassification model, a BERT model fine-tuned for sequence classification tasks:

In [None]:
from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

## Fine-Tuning the Model
Now, we're all set to fine-tune our model on the IMDB dataset:

In [None]:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    evaluation_strategy="epoch",
    logging_dir="./logs",
    logging_steps=500,
    do_train=True,
    do_eval=True,
    output_dir="./results",
    overwrite_output_dir=True,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_imdb["train"],
    eval_dataset=encoded_imdb["test"],
)

trainer.train()

In [None]:
#Evaluating the Model
#Let's assess the performance of our fine-tuned model:

results = trainer.evaluate()
print(results)


## Inference with the Fine-Tuned Model
Now that we have a fine-tuned model, we can use it to predict the sentiment of new sentences. Let's see how to do this.

### Tokenizing New Data
First, let's create some new example sentences and tokenize them:

In [None]:
sentences = [
    "I absolutely loved that movie. It was fantastic!",
    "The film was too long and quite boring.",
    "The direction and acting were mediocre at best."
]

encoded_sentences = tokenizer(sentences, truncation=True, padding='max_length', max_length=256, return_tensors='pt')


## Making Predictions
Using our model, we can now make predictions on the tokenized sentences:

In [None]:
with torch.no_grad():
    logits = model(**encoded_sentences).logits

predictions = torch.argmax(logits, dim=1)
sentiments = ["Positive" if pred == 1 else "Negative" for pred in predictions]

for sentence, sentiment in zip(sentences, sentiments):
    print(f"'{sentence}' has a {sentiment} sentiment.")

## Conclusion
You've just fine-tuned a transformer model on a custom dataset! This process is at the heart of many NLP applications, allowing developers to harness the power of state-of-the-art models for specific tasks. Dive deeper, experiment with different models, and datasets, and unlock the full potential of transformers in your applications.

You can expand the notebook by exploring hyperparameter tuning, different architectures, and other advanced topics. Remember to provide explanations and comments alongside the code to make it more instructional.