### Zero-Shot Classification Example

- **Classification Setup**: The `facebook/bart-large-mnli` model is used for zero-shot classification, where the input sequence is classified into predefined categories from a list of candidate labels.

- **Classification Output**: The model predicts the most probable category for the review "satisfied little vacuum," and the result is printed as the most likely label.

In [3]:
from transformers import pipeline

# Device: -1 = CPU, 0 = GPU
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=0)

sequence_to_classify = "satisfied little vacuum"
# Updated candidate labels
candidate_labels = [
    'Crossbody Bags', 'Car Phone Holders', 'Dashcams', 'Portable Car Vacuums','Automotive Accessories',
    'Electric Toothbrushes','Dog Collars','Wallets','Phone Cases','Bluetooth Earbuds','Smartwatches','Sofa Covers','Broom Holders',
'Kitchen Accessories','LED Lamps','Fairy Light','Security Cameras','Pendrive','Fitness Equipment','Beauty and Health Products']

result = classifier(sequence_to_classify, candidate_labels)

most_probable_label = result['labels'][0]
print(most_probable_label)


Device set to use cuda:0


Automotive Accessories


### Zero-Shot Classification for Product Reviews

- **Dataset Loading and Classification**: The dataset is loaded and each review is classified using a zero-shot classification pipeline with the "facebook/bart-large-mnli" model, categorizing reviews into predefined product categories.

- **Progress Bar and Saving**: The classification process is executed with a progress bar using `tqdm`, and the results are saved to a new CSV file for further analysis or use.

In [2]:
import pandas as pd
from transformers import pipeline
from tqdm import tqdm
import torch

# Enable tqdm for pandas apply
tqdm.pandas()

# Load your dataset
df = pd.read_csv('/kaggle/input/labeling-for-product-category-of-sentiment/Product Review Of AliExpress SENTIMENT Processed.csv')

candidate_labels = [
    'Crossbody Bags', 'Car Phone Holders', 'Dashcams', 'Portable Car Vacuums','Automotive Accessories',
    'Electric Toothbrushes','Dog Collars','Wallets','Phone Cases','Bluetooth Earbuds','Smartwatches','Sofa Covers','Broom Holders',
'Kitchen Accessories','LED Lamps','Fairy Light','Security Cameras','Pendrive','Fitness Equipment','Beauty and Health Products']

# Initialize the classifier with GPU
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli", device=0)

# Function to classify a single review
def classify_review(review):
    if pd.isna(review) or len(review.strip()) == 0:
        return "Unknown"
    result = classifier(review, candidate_labels)
    return result['labels'][0]  # Most probable category

# Apply classification to each review (with progress bar)
df['categoryLabel'] = df['reviewContent'].progress_apply(classify_review)

# Save the new annotated CSV
df.to_csv('Final Product Review Of AliExpress Product Category Processed.csv', index=False)

print("Annotation complete. Saved to annotated_reviews.csv")


Device set to use cuda:0
100%|██████████| 12916/12916 [1:28:15<00:00,  2.44it/s]


Annotation complete. Saved to annotated_reviews.csv


## Mapping Product Categories to Base Classes

- Mapped specific product types (e.g., `Pendrive`, `Smartwatches`) to broader base categories like `electronics`, `fashion`, `home`, etc.
- Used a dictionary and `.map()` to update the `categoryLabel` column in the DataFrame.
- Saved the updated DataFrame as `Final Mapped Product Review Of AliExpress Product Category Processed.csv`.


In [3]:

# Mapping of products to their respective base categories
category_mapping = {
    'Pendrive': 'electronics',
    'Bluetooth Earbuds': 'fashion',
    'Smartwatches': 'fashion',
    'Security Cameras': 'electronics',
    'Dashcams': 'electronics',
    
    'Portable Car Vacuums': 'automotive',
    'Automotive Accessories': 'automotive',
    'Car Phone Holders': 'automotive',
    
    'Electric Toothbrushes': 'health',
    'Fitness Equipment': 'health',
    'Beauty and Health Products': 'health',
    
    'Dog Collars': 'fashion',
    'Wallets': 'fashion',
    'Crossbody Bags': 'fashion',
    'Phone Cases': 'fashion',
    
    'Sofa Covers': 'home',
    'Broom Holders': 'home',
    'Kitchen Accessories': 'home',
    'LED Lamps': 'home',
    'Fairy Light': 'home'
}

# Rename the 'categoryLabel' in the dataframe based on the mapping
df['categoryLabel'] = df['categoryLabel'].map(category_mapping)

In [5]:
df.to_csv('Final Mapped Product Review Of AliExpress Product Category Processed.csv', index=False)


In [4]:
import pandas as pd

# Group by 'categoryLabel' and count the number of reviews in each category
category_counts = df['categoryLabel'].value_counts()

# Print the results
print(category_counts)


categoryLabel
electronics    5606
automotive     5062
fashion        1391
home            642
health          215
Name: count, dtype: int64
