# More sentiment models and their accuracy per language

## Intro
This document is used to explore diffrent sentiment models which were trained specifically for sentiment per language and evaluate their performance in order to find the most accurate model compared to the multilingual model. 
As for the multilingual model, here we also use the data clean 1 with the sentence extracted. And for english we compare the accuracy of the datasets with the extracted sentences from data clean 1 and data condensed to determine whether less data is better for the model. This was only done for the english data condensed, because of time limitations that didn't allow further data labeling.

The examined models are:
* German: oliverguhr/german-sentiment-bert
* English: bert-base-uncased and VADER
* Spanish: beto-sentiment-analysis and bert-base-spanish-wwm-uncased

For the english and spanish models we use the pipeline function from the transformers library, as it includes the tokenization and the sentiment model within one step, which makes it easy to use.

The performances of the models are evaluated using the function evaluate_performance that returns the accuracy, the unique predicted labels, the confusion matrix and the classification report.

## Import packages

In [1]:
import torch
import numpy as np
import pandas as pd
from gensim.parsing.preprocessing import strip_punctuation, strip_multiple_whitespaces

# Models:
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification # for german
from pattern.en import sentiment # for english
import nltk # for english
from nltk.sentiment import SentimentIntensityAnalyzer # for english
nltk.download('vader_lexicon') # for english
from pysentimiento import create_analyzer # for spanish

# Accuracy
from utils import evaluate_performance, transform_scores

  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\joana\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


### Load labeled data: d1_sen

In [2]:
# Load labeled CSV files into a DataFrame
df_de = pd.read_csv('../data_files/data_clean/labeled-data/labeled-de_clean_1-1.csv', sep=';')
df_en = pd.read_csv('../data_files/data_clean/labeled-data/labeled-en_clean_1-1_not101010.csv')
df_es = pd.read_csv('../data_files/data_clean/labeled-data/labeled-es_clean_1-1.csv', sep=';')

In [120]:
# Strip punctuation
df_de['data'] = df_de['data'].apply(strip_punctuation)
df_en['data'] = df_en['data'].apply(strip_punctuation)
df_es['data'] = df_es['data'].apply(strip_punctuation)

# Strip white spaces
df_de['data'] = df_de['data'].apply(strip_multiple_whitespaces)
df_en['data'] = df_en['data'].apply(strip_multiple_whitespaces)
df_es['data'] = df_es['data'].apply(strip_multiple_whitespaces)

### Load labeled data: data condensed for english

In [149]:
# Load labeled CSV file into a DataFrame
df_en_con = pd.read_csv('../data_files/data_clean/labeled-data/labeled-en_clean_con_sen.csv')

In [150]:
# The data condensed is already stripped of punctuation.

# Strip white spaces
df_en_con['data'] = df_en_con['data'].apply(strip_multiple_whitespaces)

## German

### Model: oliverguhr/german-sentiment-bert
The model is a binary classifier on sentence level which is why the sentiment scores here are transformed in two-dimensional labels instead of three-dimensional labels.

In [123]:
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("oliverguhr/german-sentiment-bert")
model = AutoModelForSequenceClassification.from_pretrained("oliverguhr/german-sentiment-bert")

# Create an empty list to store the sentiment scores
sentiment_scores = []

# Iterate over the 'data' column in the DataFrame
for text in df_de['data']:
    # Tokenize the input text
    tokens = tokenizer.encode_plus(text, padding="max_length", truncation=True, max_length=128, return_tensors="pt")

    # Perform the sentiment analysis
    with torch.no_grad():
        logits = model(**tokens)[0]

    # Convert logits to predicted label (positive/negative)
    predicted_label = torch.argmax(logits, dim=1).item()
    sentiment = "positive" if predicted_label == 1 else "negative"

    # Append the sentiment score to the list
    sentiment_scores.append(sentiment)

# Add the sentiment scores as a new column in the DataFrame
df_de['sentiment'] = sentiment_scores

df_de.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment
0,trainer alonso vor den mitgereisten fans in mo...,palacios,de,2023-02-24T09:33:31Z,,negative
1,zudem ist die konkurrenzsituation auf der dopp...,palacios,de,2023-03-03T21:35:13Z,,negative
2,wie auch palacios sah der defensive mittelfeld...,palacios,de,2023-03-07T11:34:39Z,,negative
3,er ist eine option erklart alonso der im mitt...,palacios,de,2023-03-08T14:25:18Z,,negative
4,allerdings waren in andrich und dem argentini...,palacios,de,2023-03-09T19:53:46Z,,negative


#### Evaluate model performance for german bert model

In [124]:
# Drop rows where 'Label' is NaN or empty
df_de.dropna(subset=['Label'], inplace=True)

In [125]:
print('Performance evaluation for oliverguhr/german-sentiment-bert')

# Evaluate the performance of the model
accuracy_de, unique_predicted_de, confusion_matrix_de, classification_report_de = evaluate_performance(df_de, 'sentiment_bert', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_de)
print('Classification report: ')
print(classification_report_de)

Performance evaluation for oliverguhr/german-sentiment-bert
Confusion matrix: 
         negativ  positiv
negativ       10        0
positiv       10        0
Classification report: 
              precision    recall  f1-score   support

     negativ       0.50      1.00      0.67        10
     positiv       0.00      0.00      0.00        10

    accuracy                           0.50        20
   macro avg       0.25      0.50      0.33        20
weighted avg       0.25      0.50      0.33        20



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## English

### Model 1: sentiment-analysis from bert-base-uncased

In [126]:
# Initiate model
sentiment_classifier_en = pipeline('sentiment-analysis', model='bert-base-uncased')

# Apply sentiment analysis on the 'data' column  and store the sentiment in a new column "sentiment_bert"
df_en['sentiment_bert'] = df_en['data'].apply(lambda x: sentiment_classifier_en(x)[0]['score'])

# Print the updated dataframe
df_en.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment_bert
0,ten if you included the toe poked volley to te...,palacios,en,2023-02-16T23:56:00Z,,0.51435
1,bayerleverkusen took the lead again in the st ...,palacios,en,2023-02-23T20:50:50Z,,0.624176
2,wissam ben yedder levelled straight away from ...,palacios,en,2023-02-23T20:53:59Z,positiv,0.598611
3,midfielders leandro paredes juventus angel di ...,palacios,en,2023-03-03T16:40:46Z,neutral,0.694157
4,midfielders rodrigo de paul atletico madrid le...,palacios,en,2023-03-03T18:17:37Z,neutral,0.694063


### Model 2: Sentiment Intensity Analyzer from VADER

In [127]:
# Create an instance of the VADER sentiment analyzer
sid = SentimentIntensityAnalyzer()

# Function to get sentiment polarity
def get_sentiment(text):
    sentiment_scores = sid.polarity_scores(text)
    return sentiment_scores['compound']


# Apply sentiment analysis to the "data" column and store the sentiment in a new column "sentiment_nltk"
df_en['sentiment_nltk'] = df_en['data'].apply(get_sentiment)

# Print the updated dataframe
df_en.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment_bert,sentiment_nltk
0,ten if you included the toe poked volley to te...,palacios,en,2023-02-16T23:56:00Z,,0.51435,0.0
1,bayerleverkusen took the lead again in the st ...,palacios,en,2023-02-23T20:50:50Z,,0.624176,-0.0516
2,wissam ben yedder levelled straight away from ...,palacios,en,2023-02-23T20:53:59Z,positiv,0.598611,0.2263
3,midfielders leandro paredes juventus angel di ...,palacios,en,2023-03-03T16:40:46Z,neutral,0.694157,0.4215
4,midfielders rodrigo de paul atletico madrid le...,palacios,en,2023-03-03T18:17:37Z,neutral,0.694063,0.4215


#### Evaluate model performance for all english models

In [128]:
# Drop rows where 'Label' is NaN or empty
df_en.dropna(subset=['Label'], inplace=True)

In [129]:
print('Performance evaluation for bert-base-uncased')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_en, 'sentiment_3_label_bert')
df_en['sentiment_3_label_bert'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_en_bert, unique_predicted_en_bert, confusion_matrix_en_bert, classification_report_en_bert = evaluate_performance(df_en, 'sentiment_3_label_bert', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_en_bert)
print('Classification report: ')
print(classification_report_en_bert)


Performance evaluation for bert-base-uncased
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        1        3
neutral        0        5        8
positiv        0        5        8
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00         4
     neutral       0.45      0.38      0.42        13
     positiv       0.42      0.62      0.50        13

    accuracy                           0.43        30
   macro avg       0.29      0.33      0.31        30
weighted avg       0.38      0.43      0.40        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [130]:
print('Performance evaluation for nltk')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_en, 'sentiment_bert')
df_en['sentiment_3_label_bert'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_en_nltk, unique_predicted_en_nltk, confusion_matrix_en_nltk, classification_report_en_nltk = evaluate_performance(df_en, 'sentiment_3_label_bert', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_en_nltk)
print('Classification report: ')
print(classification_report_en_nltk)

Performance evaluation for nltk
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        1        3
neutral        0        5        8
positiv        0        5        8
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00         4
     neutral       0.45      0.38      0.42        13
     positiv       0.42      0.62      0.50        13

    accuracy                           0.43        30
   macro avg       0.29      0.33      0.31        30
weighted avg       0.38      0.43      0.40        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## Spanish

### Model 1: sentiment-analysis from spanish bert: beto-sentiment-analysis

In [131]:
sentiment_classifier_es_beto = pipeline('sentiment-analysis', model='finiteautomata/beto-sentiment-analysis')

In [132]:
# Apply sentiment analysis on the 'data' column  and store the sentiment in a new column "sentiment_beto"
df_es['sentiment_beto'] = df_es['data'].apply(lambda x: sentiment_classifier_es_beto(x)[0]['score'])

# Print the updated dataframe
df_es.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment_beto
0,adeyemi firmo el que es su primer gol en lo qu...,palacios,es,2023-01-29T18:25:03Z,,0.430724
1,el club aleman que siempre se ha caracterizado...,palacios,es,2023-01-31T20:41:38Z,,0.98989
2,alberto fernandez el presidente de la afa clau...,palacios,es,2023-02-09T18:32:38Z,,0.979528
3,alberto fernandez tambien participaron los otr...,palacios,es,2023-02-12T21:13:55Z,,0.977206
4,fue el momento en que desde las tribunas se de...,palacios,es,2023-02-13T01:05:15Z,,0.99516


### Model 2: sentiment-analysis from another spanish bert: bert-base-spanish-wwm-uncased

In [133]:
sentiment_classifier_es_bert = pipeline('sentiment-analysis', model='dccuchile/bert-base-spanish-wwm-uncased')

In [134]:
# Apply sentiment analysis on the 'data' column  and store the sentiment in a new column "sentiment_bert"
df_es['sentiment_bert'] = df_es['data'].apply(lambda x: sentiment_classifier_es_bert(x)[0]['score'])

# Print the updated dataframe
df_es.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment_beto,sentiment_bert
0,adeyemi firmo el que es su primer gol en lo qu...,palacios,es,2023-01-29T18:25:03Z,,0.430724,0.553467
1,el club aleman que siempre se ha caracterizado...,palacios,es,2023-01-31T20:41:38Z,,0.98989,0.504837
2,alberto fernandez el presidente de la afa clau...,palacios,es,2023-02-09T18:32:38Z,,0.979528,0.522085
3,alberto fernandez tambien participaron los otr...,palacios,es,2023-02-12T21:13:55Z,,0.977206,0.534249
4,fue el momento en que desde las tribunas se de...,palacios,es,2023-02-13T01:05:15Z,,0.99516,0.562792


#### Evaluate model performance for all spanish models 

In [135]:
# Drop rows where 'Label' is NaN or empty
df_es.dropna(subset=['Label'], inplace=True)

In [136]:
print('Performance evaluation for beto-sentiment-analysis')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_en, 'sentiment_beto')
df_es['sentiment_3_label_beto'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_es_beto, unique_predicted_es_beto, confusion_matrix_es_beto, classification_report_es_beto = evaluate_performance(df_es, 'sentiment_3_label_beto', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_es_beto)
print('Classification report: ')
print(classification_report_es_beto)

Performance evaluation for beto-sentiment-analysis
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        2        8
neutral        0        4        6
positiv        0        5        5
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00        10
     neutral       0.36      0.40      0.38        10
     positiv       0.26      0.50      0.34        10

    accuracy                           0.30        30
   macro avg       0.21      0.30      0.24        30
weighted avg       0.21      0.30      0.24        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [137]:
print('Performance evaluation for bert-base-spanish-wwm-uncased')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_es, 'sentiment_bert')
df_es['sentiment_3_label_bert'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_es_bert, unique_predicted_es_bert, confusion_matrix_es_bert, classification_report_es_bert = evaluate_performance(df_es, 'sentiment_3_label_bert', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_es_bert)
print('Classification report: ')
print(classification_report_es_bert)

Performance evaluation for bert-base-spanish-wwm-uncased
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        7        3
neutral        0        6        4
positiv        0        9        1
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00        10
     neutral       0.27      0.60      0.37        10
     positiv       0.12      0.10      0.11        10

    accuracy                           0.23        30
   macro avg       0.13      0.23      0.16        30
weighted avg       0.13      0.23      0.16        30



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## English: data condensed

### Model 1: sentiment-analysis from bert-base-uncased

In [151]:
# Model loaded above: sentiment_classifier_en = pipeline('sentiment-analysis', model='bert-base-uncased')

# Apply sentiment analysis on the 'data' column  and store the sentiment in a new column "sentiment_bert"
df_en_con['sentiment_bert'] = df_en_con['data'].apply(lambda x: sentiment_classifier_en(x)[0]['score'])

# Print the updated dataframe
df_en_con.head()

Unnamed: 0,data,player,language,publishedAt,Label,sentiment_bert
0,bayerleverkusen took lead minute midfielder pa...,exequiel palacios,en,2023-02-23T20:50:50Z,,0.620365
1,midfielders leandro paredes juventus angel mar...,exequiel palacios,en,2023-03-03T16:42:19Z,neutral,0.695331
2,half goal joshua kimmich canceled penalties pa...,exequiel palacios,en,2023-03-19T18:30:00Z,positiv,0.650257
3,by reuters bayerleverkusen s palacios scored s...,exequiel palacios,en,2023-03-19T18:42:59Z,,0.64237
4,bayerleverkusen s palacios scored second half ...,exequiel palacios,en,2023-03-19T19:05:09Z,positiv,0.652657


### Model 2: Sentiment Intensity Analyzer from nltk

In [152]:
# Create an instance of the VADER sentiment analyzer
sid = SentimentIntensityAnalyzer()

# Function to get sentiment polarity
def get_sentiment(text):
    sentiment_scores = sid.polarity_scores(text)
    return sentiment_scores['compound']


# Apply sentiment analysis to the "data" column and store the sentiment in a new column "sentiment_nltk"
df_en_con['sentiment_nltk'] = df_en_con['data'].apply(get_sentiment)

# Print the updated dataframe
df_en_con.head()


Unnamed: 0,data,player,language,publishedAt,Label,sentiment_bert,sentiment_nltk
0,bayerleverkusen took lead minute midfielder pa...,exequiel palacios,en,2023-02-23T20:50:50Z,,0.620365,-0.0516
1,midfielders leandro paredes juventus angel mar...,exequiel palacios,en,2023-03-03T16:42:19Z,neutral,0.695331,0.4215
2,half goal joshua kimmich canceled penalties pa...,exequiel palacios,en,2023-03-19T18:30:00Z,positiv,0.650257,0.5859
3,by reuters bayerleverkusen s palacios scored s...,exequiel palacios,en,2023-03-19T18:42:59Z,,0.64237,-0.34
4,bayerleverkusen s palacios scored second half ...,exequiel palacios,en,2023-03-19T19:05:09Z,positiv,0.652657,-0.34


#### Evaluate model performance for english condensed model

In [153]:
# Drop rows where 'Label' is NaN or empty
df_en_con.dropna(subset=['Label'], inplace=True)

In [154]:
print('Performance evaluation for bert-base-uncased on english condensed')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_en_con, 'sentiment_bert')
df_en_con['sentiment_3_label_bert'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_en_bert, unique_predicted_en_bert, confusion_matrix_en_bert, classification_report_en_bert = evaluate_performance(df_en_con, 'sentiment_3_label_bert', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_en_bert)
print('Classification report: ')
print(classification_report_en_bert)

Performance evaluation for bert-base-uncased on english condensed
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        2        1
neutral        0        6        4
positiv        0        1        9
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00         3
     neutral       0.67      0.60      0.63        10
     positiv       0.64      0.90      0.75        10

    accuracy                           0.65        23
   macro avg       0.44      0.50      0.46        23
weighted avg       0.57      0.65      0.60        23



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [155]:
print('Performance evaluation for nltk')

# Transform score into three-dimensional label for Performance evaluation
sentiment_3_labels = transform_scores(df_en_con, 'sentiment_3_label_nltk')
df_en_con['sentiment_3_label_nltk'] = sentiment_3_labels

# Evaluate the performance of the model
accuracy_en_nltk, unique_predicted_en_nltk, confusion_matrix_en_nltk, classification_report_en_nltk = evaluate_performance(df_en_con, 'sentiment_3_label_nltk', 'Label')

# Print the evaluation results
print('Confusion matrix: ')
print(confusion_matrix_en_nltk)
print('Classification report: ')
print(classification_report_en_nltk)

Performance evaluation for nltk
Confusion matrix: 
         negativ  neutral  positiv
negativ        0        2        1
neutral        0        6        4
positiv        0        1        9
Classification report: 
              precision    recall  f1-score   support

     negativ       0.00      0.00      0.00         3
     neutral       0.67      0.60      0.63        10
     positiv       0.64      0.90      0.75        10

    accuracy                           0.65        23
   macro avg       0.44      0.50      0.46        23
weighted avg       0.57      0.65      0.60        23



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# Summary
From the classification reports we see that the best accuracy is achieved by both models on the english data condensed with each an accuracy of 0,65. Next are the bert and nltk model on the english dataset data clean 1 each with an accuracy of 0,43 followed by the spanish beto model with an accuracy of 0,3. The least good model is the spanish bert model with an accuracy of 0,23.
The accuracy of the german bert model is not comparable in this case because the accuracy was determined on two-dimensionable labels.

# Next steps for Bayer04 Leverkusen
As next steps Bayer04 could apply the multilingual model on data condensed with the sentence extracted. This could improve the accuracy of the sentiment model.  Further next steps are ligned out in the multilingual file.