# ✳ Few-shot sentiment analysis on Greek proverbs

**Baseline**
*   Random shots for each sentiment from the datasets below

**Models**
*   Krikri


| Dataset | Num of Instances | Place Info | Sentiment Info | Notes                            |
| ----------------------------- | -------: | ---------- | -------------- | -------------------------------- |
| Proverbs with **sentiment**   |      300 | ❌          | ✅              | Annotated with emotion/sentiment |


## ⚓ Prerequisites

In [1]:
%%capture
!pip install --upgrade gspread
!pip install transformers bitsandbytes accelerate torch --quiet

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

import zipfile

import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

from tqdm.notebook import tqdm
import time

from sklearn.metrics import classification_report

In [2]:
# load and read data
!gdown 1Ky59XvMfhHffoWKGc3J62y_uwh3mn_I2
!gdown  1SsMFiBWmfCNfbkGYVOMcp1hV9dUopxLU
!gdown 1cOqJ8SKJffJXYOmp-R4d2a-vjzXesvQH

Downloading...
From: https://drive.google.com/uc?id=1Ky59XvMfhHffoWKGc3J62y_uwh3mn_I2
To: /content/greek_proverbs.csv
100% 386k/386k [00:00<00:00, 6.38MB/s]
Downloading...
From: https://drive.google.com/uc?id=1SsMFiBWmfCNfbkGYVOMcp1hV9dUopxLU
To: /content/proverbs_majority_sent_gold.csv
100% 32.9k/32.9k [00:00<00:00, 55.6MB/s]
Downloading...
From: https://drive.google.com/uc?id=1cOqJ8SKJffJXYOmp-R4d2a-vjzXesvQH
To: /content/predicted_region_loc_basic_gr.csv
100% 3.81M/3.81M [00:00<00:00, 24.5MB/s]


In [3]:
norm_prov = pd.read_csv('greek_proverbs.csv')
loc_prov = pd.read_csv('predicted_region_loc_basic_gr.csv')
norm_prov_ann = pd.read_csv('proverbs_majority_sent_gold.csv')

## 📬 Extract random few-shot examples for sentiment

In [4]:
# let's check the label distribution for sentiment - there are the result of majority voting
norm_prov_ann['Gold_Sentiment'].value_counts()

# now let's take 10 shots from each class
ten_shot = []

if 'Gold_Sentiment' in norm_prov_ann.columns:
    for label in norm_prov_ann['Gold_Sentiment'].unique():
        ten_shot.append(norm_prov_ann[norm_prov_ann['Gold_Sentiment'] == label].sample(10)) # everytime it's going to be random, ie no seed

In [5]:
# remove ten_shots from the dataset to use it as test set
# Combine all few-shot IDs
few_shot_ids = pd.concat(ten_shot)['Proverb_id'].unique()
test_prov = norm_prov_ann[~norm_prov_ann['Proverb_id'].isin(few_shot_ids)].copy()

## 🧐 Load model and do inference

In [6]:
model_id = "ilsp/Llama-Krikri-8B-Instruct"
bnb_conf = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_conf,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# # Greek Chat Example
# prompt = "Παρακαλώ γράψε μια σύντομη περίληψη στα Ελληνικά για το μυθιστόρημα 'Το Τρίτο Στεφάνι'."
# inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# out = model.generate(**inputs, max_new_tokens=200)
# print(tokenizer.decode(out[0], skip_special_tokens=True))


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/955 [00:00<?, ?B/s]

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.91G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.57G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/19.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/752 [00:00<?, ?B/s]

In [7]:
# to map the shot labels with the sentiments
label_map = {
    -1: "Negative",
    0: "Neutral",
    1: "Positive"
}

def build_prompt(ten_shot, test_proverb):
    prompt = "Classify the sentiment of each proverb, also taking into account the example, as Negative, Neutral, Positive.\n\n"
    for class_df in ten_shot:
        for _, row in class_df.iterrows():
            proverb = row["Proverb"].strip().replace('\n', ' ')
            label = label_map[row["Gold_Sentiment"]]
            prompt += f"Proverb: \"{proverb}\"\nSentiment: {label}\n\n"
    prompt += f"Proverb: \"{test_proverb.strip()}\"\nSentiment:"
    return prompt


In [8]:
# Individual example test proverb
test_proverb = "Όποιος βιάζεται σκοντάφτει."

# Build few-shot prompt
prompt = build_prompt(ten_shot, test_proverb)
# print(prompt)

# Tokenize and run model
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=10, do_sample=False)

# Extract the model's answer
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
result = decoded.split("Sentiment:")[-1].strip().split('\n')[0]
print(f"Predicted sentiment: {result}")


Predicted sentiment: Positive


In [9]:
prompt

'Classify the sentiment of each proverb, also taking into account the example, as Negative, Neutral, Positive.\n\nProverb: "Ο τελευταίος, για καλομοίρης για κακομοίρης."\nSentiment: Negative\n\nProverb: "Η μαϊμού είδε τον κώλο της και τρόμαξε.για αυτούς που συνειδητοποιούν ένα ελάττωμά τους  και μένουν έκπληκτοι"\nSentiment: Negative\n\nProverb: "Εδώ σε θέλω κάβουρα, να περπατάς στα κάρβουνα."\nSentiment: Negative\n\nProverb: "Απ’ τα πολλά τα γέλια, τον καταλαβαίνεις τον τρελό."\nSentiment: Negative\n\nProverb: "Πάρε την κάργια οδηγό, να φας σκατό με το κιλό."\nSentiment: Negative\n\nProverb: "Όπου δεις κακή γυναίκα δυο βολές τηνε χαιρέτα."\nSentiment: Negative\n\nProverb: "Σ’ έναν δίνουν και δεν παίρνει, άλλον δέρνουν και δε φεύγει.δηλώνει αδυναμία να πεισθούν κάποιοι με διαφόρους τρόπους"\nSentiment: Negative\n\nProverb: "Τάξε του να χαίρεται και άστονε να χέζεται."\nSentiment: Negative\n\nProverb: "Τα μεταξωτά βρακιά θέλουν και επιδέξιους κώλους."\nSentiment: Negative\n\nProverb: "Ό

In [10]:
def predict_sentiment(example_text, ten_shot):
    prompt = build_prompt(ten_shot, example_text)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=10,
            do_sample=False,
            temperature=0.0,
            pad_token_id=tokenizer.eos_token_id
        )

    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    result = decoded.split("Sentiment:")[-1].strip().split('\n')[0]

    return result


In [11]:
# test on the rest of the data
tqdm.pandas()
test_prov['predicted_sentiment'] = test_prov['Proverb'].progress_apply(lambda x: predict_sentiment(x, ten_shot))

  0%|          | 0/270 [00:00<?, ?it/s]

In [16]:
import re

# small correction
test_prov['predicted_sentiment'] = test_prov['predicted_sentiment'].replace(
    to_replace=r"(?i)^negative.*",
    value="Negative",
    regex=True
)

In [17]:
# check if there are values that need some post processing
test_prov['predicted_sentiment'].value_counts()

Unnamed: 0_level_0,count
predicted_sentiment,Unnamed: 1_level_1
Positive,238
Negative,28
Neutral,4


In [18]:
# map again because the gold labels are int
label_map = {
    -1: "Negative",
    0: "Neutral",
    1: "Positive"
}

test_prov['gold_label'] = test_prov['Gold_Sentiment'].map(label_map)

In [19]:
print(classification_report(
    test_prov['gold_label'],
    test_prov['predicted_sentiment']
))


              precision    recall  f1-score   support

    Negative       0.96      0.18      0.30       151
     Neutral       0.50      0.02      0.04       100
    Positive       0.08      1.00      0.15        19

    accuracy                           0.18       270
   macro avg       0.51      0.40      0.16       270
weighted avg       0.73      0.18      0.19       270

