# Assignment

## Instructions

Use the following code as a starting point to load the rotten tomatoes dataset:

**Model Application:**

- Load a pre-trained sentiment analysis model from Hugging Face Transformers.
- Apply the model to a subset of the chosen dataset (e.g., the first 1000 samples from the training set).
- Evaluate the model's performance. You can start with qualitative analysis (inspecting predictions) and then explore quantitative metrics.

In [None]:
from datasets import load_dataset
from transformers import pipeline
from sklearn.metrics import classification_report, accuracy_score

In [None]:
# Load the Rotten Tomatoes dataset
dataset = load_dataset("rotten_tomatoes")

# Print the dataset information
print(dataset)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.parquet:   0%|          | 0.00/699k [00:00<?, ?B/s]



validation.parquet:   0%|          | 0.00/90.0k [00:00<?, ?B/s]

test.parquet:   0%|          | 0.00/92.2k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/8530 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/1066 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1066 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})


In [None]:
# Example: Accessing the training split
train_dataset = dataset["train"]

# Print the first example in the training set
print(train_dataset[0])

{'text': 'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .', 'label': 1}


In [None]:
# Apply the model to a subset of the chosen dataset (first 1000 samples from the training set)
subset_size = 1000
subset_train_dataset = train_dataset.select(range(subset_size))

# Loading a pre-trained DistilBERT model fine-tuned on the SST-2 dataset
# This model is optimized for binary sentiment (Positive/Negative)
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
classifier = pipeline("sentiment-analysis", model=model_name)

texts = list(subset_train_dataset["text"])
predictions = classifier(texts)

# Evaluation Preparation
# Dataset labels: 0 = Negative, 1 = Positive
# Model labels: 'NEGATIVE', 'POSITIVE'
label_map = {"NEGATIVE": 0, "POSITIVE": 1}
predicted_labels = [label_map[pred['label']] for pred in predictions]
true_labels = subset_train_dataset["label"]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
print("--- Qualitative Analysis ---")
for i in range(5):
    print(f"Review: {texts[i][:90]}...")
    print(f"Actual: {'Positive' if true_labels[i] == 1 else 'Negative'}")
    print(f"Predicted: {predictions[i]['label']} (Confidence: {predictions[i]['score']:.4f})")
    print()

Qualitative Analysis
Review: the rock is destined to be the 21st century's new " conan " and that he's going to make a ...
Actual: Positive
Predicted: POSITIVE (Confidence: 0.9998)

Review: the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that...
Actual: Positive
Predicted: POSITIVE (Confidence: 0.9998)

Review: effective but too-tepid biopic...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9960)

Review: if you sometimes like to go to the movies to have fun , wasabi is a good place to start ....
Actual: Positive
Predicted: POSITIVE (Confidence: 0.9998)

Review: emerges as something rare , an issue movie that's so honest and keenly observed that it do...
Actual: Positive
Predicted: POSITIVE (Confidence: 0.9998)



In [16]:
print("--- False Positive and Negative Analysis ---")
i=0
for i in range(50):
  if (true_labels[i] == 0 and predicted_labels[i] == 1) or (true_labels[i] == 1 and predicted_labels[i] == 0):
    print(f"Review: {texts[i][:90]}...")
    print(f"Actual: {'Positive' if true_labels[i] == 1 else 'Negative'}")
    print(f"Predicted: {predictions[i]['label']} (Confidence: {predictions[i]['score']:.4f})")
    print()

--- False Positive and Negative Analysis ---
Review: effective but too-tepid biopic...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9960)

Review: perhaps no picture ever made has more literally showed that the road to hell is paved with...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9846)

Review: at about 95 minutes , treasure planet maintains a brisk pace as it races through the famil...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.8286)

Review: if there's a way to effectively teach kids about the dangers of drugs , i think it's in pr...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9984)

Review: though everything might be literate and smart , it never took off and always seemed static...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9821)

Review: like most bond outings in recent years , some of the stunts are so outlandish that they bo...
Actual: Positive
Predicted: NEGATIVE (Confidence: 0.9939)

Review: 'compleja e intelectualmente retadora , e

In [None]:
print("--- Quantitative Metrics ---")
print(f"Overall Accuracy: {accuracy_score(true_labels, predicted_labels):.2%}")
print("\nDetailed Classification Report:")
print(classification_report(true_labels, predicted_labels, target_names=["Negative", "Positive"]))


--- Quantitative Metrics ---
Overall Accuracy: 88.80%

Detailed Classification Report:
              precision    recall  f1-score   support

    Negative       0.00      0.00      0.00         0
    Positive       1.00      0.89      0.94      1000

    accuracy                           0.89      1000
   macro avg       0.50      0.44      0.47      1000
weighted avg       1.00      0.89      0.94      1000



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
