In [None]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    FlaxAutoModelForSeq2SeqLM,
    AutoModelForCausalLM,
    AutoModelForDocumentQuestionAnswering,
    AutoModelForMaskedLM,
    AutoModelForMaskGeneration,
    AutoModelForObjectDetection,
    AutoModelForSeq2SeqLM,
    AutoModelForMultipleChoice,
    AutoModelForNextSentencePrediction,
    AutoModelForPreTraining,
    AutoModelForTableQuestionAnswering,
    AutoModelForTextEncoding,
    AutoModelForQuestionAnswering,
    AutoModelForSemanticSegmentation,
    AutoModelForTokenClassification,
    TrainingArguments,
    Trainer
)

In [15]:
dataset = load_dataset("imdb")

In [16]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

In [17]:
def tokenizer_fn(batch):
    return tokenizer(
        batch["text"],
        truncation=True,
        padding="max_length",
        max_length=256
    )

In [18]:
tokenized_ds=dataset.map(tokenizer_fn,batched=True)
tokenized_ds=tokenized_ds.remove_columns(['text'])
tokenized_ds=tokenized_ds.rename_column("label","labels")
tokenized_ds.set_format("torch")

Map: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 25000/25000 [00:01<00:00, 13002.73 examples/s]


In [19]:
model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=2
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
import evaluate
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = torch.argmax(torch.tensor(logits), dim=1)
    return accuracy.compute(predictions=preds, references=labels)


In [21]:
training_args = TrainingArguments(
    output_dir="./imdb_bert",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)


In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)


  trainer = Trainer(


In [23]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.2229,0.202762,0.92044
2,0.1365,0.259592,0.92416


TrainOutput(global_step=3126, training_loss=0.2077153043264925, metrics={'train_runtime': 285.3903, 'train_samples_per_second': 175.199, 'train_steps_per_second': 10.953, 'total_flos': 6577776384000000.0, 'train_loss': 0.2077153043264925, 'epoch': 2.0})

In [24]:
trainer.evaluate()

{'eval_loss': 0.259591668844223,
 'eval_accuracy': 0.92416,
 'eval_runtime': 39.0234,
 'eval_samples_per_second': 640.642,
 'eval_steps_per_second': 40.053,
 'epoch': 2.0}

### AutoTokenizer
Loads the correct tokenizer for a pretrained model

Converts raw text ‚Üí tokens ‚Üí input IDs
Problems it supports

* Required for all NLP tasks

Handles:

tokenization

padding

truncation

attention masks

üîπ Example use cases

Sentiment analysis

QA

NER

Text generation

### AutoModelForSequenceClassification
- Model type

    Encoder-based models (BERT, RoBERTa, DistilBERT)

    Adds a classification head
- Problems:

    Sentiment analysis (IMDB, Amazon)

    Topic classification (AG News, BBC)

    Spam detection

    Intent classification

    Toxic comment detection
-  Output

    One label per sequence
- Variants

    bert-base-uncased

    distilbert-base-uncased (faster)

    roberta-base (better accuracy)

### AutoModelForMultipleChoiceüîπ Model type

    Encoder models + multiple-choice head

- Problems solved

    Exam-style questions

    Reading comprehension with options

    SWAG, RACE datasets

- Output

    Probability for each choice

2. Token Classification (NER, POS)
### AutoModelForTokenClassification

- Problems

    Named Entity Recognition (NER)

    Part-of-Speech tagging

    Slot filling

- Datasets

    CoNLL-2003

    OntoNotes


üîπ 3. Question Answering
### AutoModelForQuestionAnswering
- Problems

    Extract answer spans from text

    Reading comprehension

- Datasets

    SQuAD

    Natural Questions

4. Text Pair Classification
### AutoModelForSequenceClassification
- Problems

    Sentence similarity

    Paraphrase detection

    Natural Language Inference (NLI)

- Datasets

    MRPC

    SNLI

    MNLI

5. Zero-Shot Classification
### pipeline("zero-shot-classification")
- Problems

    Topic classification without training

    Intent detection on new labels

- Uses:

    facebook/bart-large-mnli

6. Masked Language Modeling (Pretraining / Adaptation)
###  AutoModelForMaskedLM

- Problems

    Domain adaptation

    Vocabulary learning

7. Text Generation (Not BERT, but AutoModels)
### AutoModelForCausalLM

- Models

    GPT-2

    LLaMA

    Falcon

### AutoModelForTableQuestionAnswering
üîπ Model type

Table-aware transformers (TAPAS)

üîπ Problems solved

QA over structured tables

    Spreadsheets

    CSV-style data

### AutoModelForMaskedLM

- Model type

Encoder models (BERT)

- Problems solved

Masked word prediction

Pretraining

Domain adaptation

- Example

‚ÄúThe movie was [MASK].‚Äù

### AutoModelForCausalLM
- Model type

    Decoder-only models (GPT, LLaMA)

- Problems solved

    Text generation

    Chatbots

    Code generation

    Story writing

- Generation style

    Left-to-right

AutoModelForSeq2SeqLM
- Model type

    Encoder‚ÄìDecoder (T5, BART)

- Problems solved

    Translation

    Summarization

    Paraphrasing

    Question generation

# TAsk Specific Model Selection

Sentiment Analysis / Text Classification
üìå Datasets

IMDB

Amazon Reviews

Yelp Reviews

AG News

BBC News

TweetEval

‚úÖ AutoModel
AutoModelForSequenceClassification
- explanation: 

‚ÄúThis is used when the entire input text maps to a single label, such as sentiment, topic, or intent. The model outputs logits for each class.‚Äù

2Ô∏è‚É£ Named Entity Recognition (NER) / POS
üìå Datasets

CoNLL-2003

OntoNotes

WikiANN

‚úÖ AutoModel
AutoModelForTokenClassification

üé§ Interview-ready explanation

‚ÄúThis model performs token-level classification, meaning it predicts a label for each token, which is ideal for NER, POS tagging, and slot filling.‚Äù

3Ô∏è‚É£ Question Answering (Extractive)
üìå Datasets

SQuAD

Natural Questions

TriviaQA

‚úÖ AutoModel
AutoModelForQuestionAnswering

üé§ Interview-ready explanation

‚ÄúThis model predicts start and end token positions in the context, allowing it to extract an answer span for a given question.‚Äù

4Ô∏è‚É£ Document Question Answering (Invoices, PDFs)
üìå Datasets

DocVQA

RVL-CDIP

FUNSD

‚úÖ AutoModel
AutoModelForDocumentQuestionAnswering

üé§ Interview-ready explanation

‚ÄúThis model combines text and layout information to answer questions over structured documents like invoices and forms.‚Äù

5Ô∏è‚É£ Table Question Answering
üìå Datasets

WikiTableQuestions

SQA

‚úÖ AutoModel
AutoModelForTableQuestionAnswering

üé§ Interview-ready explanation

‚ÄúThis model understands tabular data and answers questions by reasoning over rows and columns instead of free text.‚Äù

6Ô∏è‚É£ Text Generation / Chatbots
üìå Datasets

OpenWebText

WikiText

Custom conversational data

‚úÖ AutoModel
AutoModelForCausalLM

üé§ Interview-ready explanation

‚ÄúThis model generates text autoregressively, predicting the next token based on previous tokens, which is ideal for chatbots and text generation.‚Äù

Translation / Summarization / Paraphrasing
üìå Datasets

WMT (translation)

CNN/DailyMail (summarization)

XSum

‚úÖ AutoModel
AutoModelForSeq2SeqLM

üé§ Interview-ready explanation

‚ÄúThis encoder‚Äìdecoder model transforms one sequence into another, making it suitable for translation, summarization, and text rewriting.‚Äù

8Ô∏è‚É£ Masked Language Modeling (Pretraining)
üìå Datasets

Wikipedia

BookCorpus

Domain-specific corpora

‚úÖ AutoModel
AutoModelForMaskedLM

üé§ Interview-ready explanation

‚ÄúThis model predicts masked tokens in a sentence and is primarily used for pretraining or domain adaptation.‚Äù

9Ô∏è‚É£ Sentence Embeddings / Semantic Search
üìå Datasets

STS-B

MS MARCO

Custom similarity data

‚úÖ AutoModel
AutoModelForTextEncoding

üé§ Interview-ready explanation

‚ÄúThis model converts text into dense vector embeddings used for semantic similarity, clustering, and retrieval tasks.‚Äù

üîü Multiple Choice QA
üìå Datasets

SWAG

RACE

CommonsenseQA

‚úÖ AutoModel
AutoModelForMultipleChoice

üé§ Interview-ready explanation

‚ÄúThis model scores multiple candidate answers and selects the most likely one, commonly used in exam-style QA tasks.‚Äù

1Ô∏è‚É£1Ô∏è‚É£ Vision Tasks (Non-NLP, but good to know)
üìå Object Detection

COCO ‚Üí AutoModelForObjectDetection

üìå Semantic Segmentation

Cityscapes ‚Üí AutoModelForSemanticSegmentation

üé§ Interview-ready explanation

‚ÄúThese models extend transformers to vision tasks such as object detection and pixel-level segmentation.‚Äù