# Using a Task-Specific Model
##### cardiffnlp/twitter-roberta-base-sentiment-latest


Task-specific models in Large Language Models (LLMs) are specialized versions of general-purpose LLMs that are optimized to perform particular tasks or function within specific domains with higher accuracy and efficiency. Unlike general-purpose LLMs that are designed to handle a wide range of language tasks, task-specific models are fine-tuned or trained to excel in narrowly defined applications.

In [1]:
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Load the dataset
data = load_dataset("rotten_tomatoes")
data

# we will use the train split when we train a model and the test split for validating
# the results.

# additional validation split can be used to further validate
# generalization if you used the train and test splits to perform hyperparameter tuning.

NameError: name 'load_dataset' is not defined

In [3]:
data["train"][0, -1]

{'text': ['the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',
  'things really get weird , though not particularly scary : the movie is all portent and no content .'],
 'label': [1, 0]}

In [4]:
# we are choosing the Twitter-RoBERTa-base for Sentiment Analysis model
# fine-tuned on tweets for sentiment analysis 


In [5]:
import torch

# Check if CUDA is available
print("CUDA available:", torch.cuda.is_available())

# If CUDA is available, print additional info
if torch.cuda.is_available():
    print("CUDA device name:", torch.cuda.get_device_name(0))
    print("CUDA device count:", torch.cuda.device_count())

CUDA available: True
CUDA device name: NVIDIA GeForce RTX 3070 Laptop GPU
CUDA device count: 1


In [6]:
from transformers import pipeline

# Path to the HF model
model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"

# Load model into pipeline
pipe = pipeline(
    model = model_path,
    tokenizer = model_path,
    return_all_scores = True,
    device="cuda:0"
)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [9]:
import numpy as np  # Numerical computing library for array operations
from tqdm import tqdm  # Progress bar library for visualizing iteration progress
from transformers.pipelines.pt_utils import KeyDataset  # Helper class for dataset handling in Hugging Face pipelines

y_pred = []  # Initialize empty list to store prediction results
for output in tqdm(pipe(KeyDataset(data["test"], "text")), total=len(data["test"])):  # Process test data with progress bar
    negative_score = output[0]["score"]  # Get negative sentiment score (first output position)
    positive_score = output[2]["score"]  # Get positive sentiment score (third output position)
    assignment = np.argmax([negative_score, positive_score])  # Choose class with highest score (0=negative, 1=positive)
    y_pred.append(assignment)

100%|██████████| 1066/1066 [00:07<00:00, 138.28it/s]


In [11]:
from sklearn.metrics import classification_report  # Import performance metrics module

def evaluate_performance(y_true, y_pred):  # Define evaluation function taking true labels and predictions
    performance = classification_report(  # Generate classification report
        y_true, y_pred,  # Compare ground truth vs predicted labels
        target_names=["Negative Revi    ew", "Positive Review"]  # Label classes for readable output
    )
    print(performance)  # Print precision, recall, f1-score metrics


In [12]:
evaluate_performance(data["test"]["label"], y_pred)  # ← This should be at module level

                     precision    recall  f1-score   support

Negative Revi    ew       0.76      0.88      0.81       533
    Positive Review       0.86      0.72      0.78       533

           accuracy                           0.80      1066
          macro avg       0.81      0.80      0.80      1066
       weighted avg       0.81      0.80      0.80      1066

