In [14]:
!nvidia-smi

Fri Apr 11 07:00:17 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   51C    P0             29W /   70W |     870MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [15]:
!pip install transformers --quiet

## Hugging face pipeline.
The pipeline is a high-level API from Hugging Face's transformers library that allows you to use state-of-the-art models in just one line of code, without worrying about preprocessing, tokenization, or model configuration.

| Task                          | Description                                                 | Example Input                          | Pipeline Name                      |
|-------------------------------|-------------------------------------------------------------|----------------------------------------|------------------------------------|
| Audio Classification          | Classify audio clips into categories                        | Audio of dog barking                   | `audio-classification`             |
| Automatic Speech Recognition  | Transcribe spoken words from audio to text                  | Audio of someone saying a sentence     | `automatic-speech-recognition`     |
| Depth Estimation              | Estimate depth in images                                    | Image with multiple objects            | `depth-estimation`                 |
| Document Question Answering   | Answer questions from documents (PDF/images)                | Scanned receipt or invoice             | `document-question-answering`      |
| Feature Extraction            | Extract numerical features from text/image/audio            | "I love NLP" or an image               | `feature-extraction`               |
| Fill Mask                     | Predict missing token in a sentence                         | "Hugging Face is [MASK]"               | `fill-mask`                        |
| Image Classification          | Classify image into categories                              | Cat image                              | `image-classification`             |
| Image Feature Extraction      | Extract vector embeddings from image                        | Image of an apple                      | `image-feature-extraction`         |
| Image Segmentation            | Segment different parts of an image                         | Image with multiple objects            | `image-segmentation`               |
| Image-to-Text (Captioning)    | Generate captions for an image                              | Image of a sunset                      | `image-to-text`                    |
| Image-to-Image                | Transform input image to output image                       | Image colorization or enhancement      | `image-to-image`                   |
| Image Text-to-Text            | OCR + text transformation                                   | Image with text                        | `image-text-to-text`               |
| Mask Generation               | Generate mask over regions in image                         | Satellite images                       | `mask-generation`                  |
| Named Entity Recognition (NER)| Extract entities like names, places                         | "Elon works at SpaceX"                 | `ner`                              |
| Object Detection              | Identify and localize objects in an image                   | Street image with cars and people      | `object-detection`                 |
| Question Answering            | Answer a question from a context                            | Context + "What is AI?"                | `question-answering`               |
| Sentiment Analysis            | Classify emotion in a sentence                              | "This is awesome!"                     | `sentiment-analysis`               |
| Summarization                 | Summarize long documents                                    | Long article                           | `summarization`                    |
| Table Question Answering      | Answer from data tables                                     | Table + "What is the total revenue?"   | `table-question-answering`         |
| Text Classification           | Classify text into categories                               | "This is spam"                         | `text-classification`              |
| Text Generation               | Generate text like GPT models                               | "Once upon a time..."                  | `text-generation`                  |
| Text-to-Audio                 | Generate sound or speech from text                          | "Hello, world!"                        | `text-to-audio`                    |
| Text-to-Speech (TTS)          | Convert text to spoken voice                                | "Welcome to my channel"                | `text-to-speech`                   |
| Text2Text Generation          | General text input/output like T5                           | "Translate English to German: Hello"   | `text2text-generation`            |
| Token Classification          | Classify each token (e.g., NER, POS tagging)                | "Apple is a company"                   | `token-classification`             |
| Translation                   | Translate text to another language                          | "Hello" (English to French)            | `translation`                      |
| Translation_XX_to_YY          | Specific language pair translation                          | "Hello" (en_to_fr)                     | `translation_en_to_fr` (example)   |
| Video Classification          | Classify actions/events in video                            | Clip of someone dancing                | `video-classification`             |
| Visual Question Answering     | Answer questions about an image                             | Image + "What is in the picture?"      | `visual-question-answering` or `vqa`|
| Zero-Shot Audio Classification| Classify audio without retraining                           | Barking sound + ["dog", "cat", "car"]  | `zero-shot-audio-classification`   |
| Zero-Shot Classification      | Classify text using provided labels                         | "This is urgent" + ["spam", "work"]    | `zero-shot-classification`         |
| Zero-Shot Image Classification| Classify images using label prompts                         | Image of dog + ["animal", "vehicle"]   | `zero-shot-image-classification`   |
| Zero-Shot Object Detection    | Detect objects with custom classes                          | Image + ["helmet", "bag", "book"]      | `zero-shot-object-detection`       |


In [3]:
from transformers import pipeline
from transformers import logging
logging.set_verbosity_error()

In [4]:
classification = pipeline(task="text-classification", model="distilbert/distilbert-base-uncased-finetuned-sst-2-english")
classification("I am here today")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

[{'label': 'POSITIVE', 'score': 0.9994605183601379}]

## Tokenizer
A tokenizer in Hugging Face (and NLP in general) is a tool that converts raw text into tokens (numbers or subwords) that a model can understand.



- Each model expects input tokenized in a specific way (e.g., BERT vs GPT).

- You must use the tokenizer that matches the pretrained model.

In [5]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [6]:
model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)

encoded_inputs = tokenizer("Hello, this is febin")
print(encoded_inputs['input_ids'])

[101, 7592, 1010, 2023, 2003, 13114, 2378, 102]


In [7]:
# decode the encoded input ids.
input_ids = encoded_inputs['input_ids']
tokenizer.decode(input_ids)

'[CLS] hello, this is febin [SEP]'

## Creating custom pipeline

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text_classification_pipeline = pipeline(task="text-classification", model=model, tokenizer=tokenizer)
text_classification_pipeline("I am here today")

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

[{'label': 'LABEL_0', 'score': 0.6600065231323242}]

## FineTuning

### Install necessary Libraries

In [9]:
!pip install datasets --quiet

In [10]:
from datasets import load_dataset
dataset = load_dataset("imdb")

In [11]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

In [12]:
dataset['train'][0]

{'text': 'I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far be

In [17]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_dataset(examples, i=0):
  return tokenizer(examples['text'], padding="max_length", truncation=True)

processed_dataset = dataset.map(tokenize_dataset, batched=True)

Map:   0%|          | 0/50000 [00:00<?, ? examples/s]

### Training Arguments for model training configurations.

transformers.TrainingArguments is a configuration class that holds all the training hyperparameters and settings. It defines where, how long, how often, how fast, and how to save and evaluate your model.

In [18]:
from transformers import TrainingArguments

In [28]:
training_args = TrainingArguments(
    output_dir="./results",          # where to save model checkpoints
    num_train_epochs=1,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per GPU/CPU during training
    per_device_eval_batch_size=16,   # batch size for evaluation
    eval_strategy="epoch",     # evaluate after each epoch
    save_strategy="epoch",           # save checkpoint after each epoch
    logging_dir="./logs",            # directory for TensorBoard logs
    logging_steps=10,                # log every 10 steps
    load_best_model_at_end=True,     # useful for picking best model
    metric_for_best_model="accuracy",# which metric to monitor
    greater_is_better=True,          # whether higher metric = better
    save_total_limit=2,              # only keep 2 best checkpoints
    report_to="none"
)

## Trainer for Training the Model

In [29]:
from transformers import Trainer

In [30]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=processed_dataset["train"],
    eval_dataset=processed_dataset["test"],
)

In [31]:
trainer.train()

{'loss': 0.6785, 'grad_norm': 5.774885177612305, 'learning_rate': 4.9893367455747496e-05, 'epoch': 0.006397952655150352}
{'loss': 0.6585, 'grad_norm': 10.156241416931152, 'learning_rate': 4.978673491149499e-05, 'epoch': 0.012795905310300703}
{'loss': 0.6303, 'grad_norm': 7.669811248779297, 'learning_rate': 4.968010236724249e-05, 'epoch': 0.019193857965451054}
{'loss': 0.5167, 'grad_norm': 2.6079823970794678, 'learning_rate': 4.9573469822989975e-05, 'epoch': 0.025591810620601407}
{'loss': 0.4252, 'grad_norm': 9.308971405029297, 'learning_rate': 4.946683727873747e-05, 'epoch': 0.03198976327575176}
{'loss': 0.3072, 'grad_norm': 6.216147422790527, 'learning_rate': 4.936020473448497e-05, 'epoch': 0.03838771593090211}
{'loss': 0.3695, 'grad_norm': 4.831631183624268, 'learning_rate': 4.925357219023246e-05, 'epoch': 0.044785668586052464}
{'loss': 0.3277, 'grad_norm': 1.929017424583435, 'learning_rate': 4.9146939645979955e-05, 'epoch': 0.05118362124120281}
{'loss': 0.309, 'grad_norm': 13.090867

KeyboardInterrupt: 