# Alternative sentiment classification using a zeroshot model

There are a lot of sentiment classification models available on [Hugging Face](https://huggingface.co/models?pipeline_tag=text-classification&sort=trending&search=sentiment). For us, however, sentiment classification is just a proxy for some *real* classification work you might want to perform. Therefore, we use an alternative approach which is not limited to sentiments called [Zero Shot Classification](https://en.wikipedia.org/wiki/Zero-shot_learning). This is sometimes also called *natural language inference* and can be used for any classification tasks. 

Many models exist on [Hugging Face](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=trending) for this task, we chose the one based on [ModernBERT](https://huggingface.co/MoritzLaurer/ModernBERT-large-zeroshot-v2.0) to compare it to our [previous approach](11-bert-finetune-classification.ipynb).

In [None]:
import pandas as pd

df = pd.read_json("10000_All_Beauty.json.xz")

In [None]:
from transformers import pipeline
zeroshot_classifier = pipeline("zero-shot-classification", 
                               model="MoritzLaurer/ModernBERT-large-zeroshot-v2.0", 
                               device_map="auto")

In [None]:
hypothesis_template = "This text has a {} sentiment"
classes = ["positive", "negative"]

In [None]:
zeroshot_classifier(df.iloc[0]["text"], classes, hypothesis_template=hypothesis_template, multi_label=False)

In [None]:
from tqdm.auto import tqdm
res = []
for i, r in tqdm(df.iterrows(), total=len(df)):
    o = zeroshot_classifier(r["text"], classes, hypothesis_template=hypothesis_template, multi_label=False)
    res.append(o)

It is possible to speed that up by factors when passing a list object:

In [None]:
%%time
zeroshot_preds = zeroshot_classifier(df["text"].values.tolist(), 
                                     batch_size=32, candidate_labels=classes,
                                     hypothesis_template=hypothesis_template, 
                                     multi_label=False)

In [None]:
zeroshot_preds[0:5]

In [None]:
rdf = pd.DataFrame([{ r["labels"][0]: r["scores"][0], 
                      r["labels"][1]: r["scores"][1] } for r in zeroshot_preds])

In [None]:
tdf = pd.concat([df, rdf], axis=1)

In [None]:
tdf.sample(10, random_state=42)

In [None]:
tdf["zeroshot_rating"] = None
for i, r in tdf.iterrows():
    if r["positive"] > 5*r["negative"]:
        tdf.at[i, "zeroshot_rating"] = 5
    else:
        tdf.at[i, "zeroshot_rating"] = 1        

In [None]:
pd.set_option('display.max_colwidth', None)
wrong = tdf[tdf["rating"] != tdf["zeroshot_rating"]]
wrong

In [None]:
1-len(wrong)/len(df)