## Imports

In [None]:
from code.data_preprocessing import *
from code.tweets_embedding import *
from code.frnn_owa_eval import *

## Prepare data

Before the embedding process, we applied some operations to clean the tweets.

In the first, general step, we deleted account tags starting with ′@′, extra whitespaces, newline symbols (’\n’), all numbers, and punctuation marks. We did not delete hashtags because they can be a source of useful information, so we just removed ′#′ symbols. Also, we replaced ’&’ with the word ’and’ and replaced emojis with their textual descriptions. 

The second step of tweet preprocessing is stop-word removal. 

Both general preprocessing and stop-word removal are optional for our purposes:  during  the  experimental  stage,  we  examined  how  they  improved classification results and detected the best set up for each embedding method.

In [None]:
# Get original data with tweets preprocessing and stop-words cleaning
# Dataset like 'anger_data' is a concatenation of train 'anger_train' and development 'anger_dev' datasets

anger_train, anger_dev, anger_data, anger_test = upload_datasets('../data/SemEval2018-Task1-all-data/English_EI-oc/training/EI-oc-En-anger-train.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/development/2018-EI-oc-En-anger-dev.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/test-gold/2018-EI-oc-En-anger-test-gold.txt')
joy_train, joy_dev, joy_data, joy_test = upload_datasets('../data/SemEval2018-Task1-all-data/English_EI-oc/training/EI-oc-En-joy-train.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/development/2018-EI-oc-En-joy-dev.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/test-gold/2018-EI-oc-En-joy-test-gold.txt')
sad_train, sad_dev, sad_data, sad_test = upload_datasets('../data/SemEval2018-Task1-all-data/English_EI-oc/training/EI-oc-En-sadness-train.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/development/2018-EI-oc-En-sadness-dev.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/test-gold/2018-EI-oc-En-sadness-test-gold.txt')
fear_train, fear_dev, fear_data, fear_test = upload_datasets('../data/SemEval2018-Task1-all-data/English_EI-oc/training/EI-oc-En-fear-train.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/development/2018-EI-oc-En-fear-dev.txt', '../data/SemEval2018-Task1-all-data/English_EI-oc/test-gold/2018-EI-oc-En-fear-test-gold.txt')

In [119]:
# Example of the dataset

anger_data.head()

Unnamed: 0,ID,Tweet,Cleaned_tweet,Cleaned_tweet_wt_stopwords,Class
0,2017-En-10264,@xandraaa5 @amayaallyn6 shut up hashtags are c...,shut up hashtags are cool offended,shut hashtags cool offended,2
1,2017-En-10072,it makes me so fucking irate jesus. nobody is ...,it makes me so fucking irate jesus nobody is c...,makes fucking irate jesus nobody calling ppl l...,3
2,2017-En-11383,Lol Adam the Bull with his fake outrage...,lol adam the bull with his fake outrage,lol adam bull fake outrage,1
3,2017-En-11102,@THATSSHAWTYLO passed away early this morning ...,passed away early this morning in a fast and f...,passed away early morning fast furious styled ...,0
4,2017-En-11506,@Kristiann1125 lol wow i was gonna say really?...,lol wow i was gonna say really haha have you s...,lol wow gonna say really haha seen chris nah d...,1


In [128]:
# Datasets charachteristics

for dataset in [anger_data, joy_data, sad_data, fear_data]:
    print('Characteristics of ', namestr(dataset, globals())[0])
    print('Number of instances: ', len(dataset))
    print('Size of the smallest class: ', min([len(dataset[dataset.Class == i]) for i in range(4)]))
    print('Imbalance Ratio (IR): ', round(max([len(dataset[dataset.Class == i]) for i in range(4)])/min([len(dataset[dataset.Class == i]) for i in range(4)]), 2))
    print('\n')

Characteristics of  anger_data
Number of instances:  2089
Size of the smallest class:  376
Imbalance Ratio (IR):  1.68


Characteristics of  joy_data
Number of instances:  1906
Size of the smallest class:  410
Imbalance Ratio (IR):  1.47


Characteristics of  sad_data
Number of instances:  1930
Size of the smallest class:  348
Imbalance Ratio (IR):  2.2


Characteristics of  fear_data
Number of instances:  2641
Size of the smallest class:  217
Imbalance Ratio (IR):  8.02




## Apply embedding methods to calculate vectors

After preprocessing, we represent each tweet as a vector, to perform classification. For this purpose, we use the following six word- or sentence-level embedding techniques. 

For each embedding method we already defined the best tweet preprocessing technique: none ('Tweet'), the general cleaning ('Cleaned_tweet'), or the general cleaning with stop-words removing ('Cleaned_tweet_wt_stopwords').

In [None]:
# DeepMoji
# No preprocessing needed for all emotion datasets

anger_data['Vector_deepmoji'] = get_vectors_deepmoji(anger_data, 'Tweet')
anger_test['Vector_deepmoji'] = get_vectors_deepmoji(anger_test, 'Tweet')

joy_data['Vector_deepmoji'] = get_vectors_deepmoji(joy_data, 'Tweet')
joy_test['Vector_deepmoji'] = get_vectors_deepmoji(joy_test, 'Tweet')

sad_data['Vector_deepmoji'] = get_vectors_deepmoji(sad_data, 'Tweet')
sad_test['Vector_deepmoji'] = get_vectors_deepmoji(sad_test, 'Tweet')

fear_data['Vector_deepmoji'] = get_vectors_deepmoji(fear_data, 'Tweet')
fear_test['Vector_deepmoji'] = get_vectors_deepmoji(fear_test, 'Tweet')

In [None]:
# Twitter-roBERTa-based
# Tweets cleaning needed fo rall emotion datasets

anger_data['Vector_roBERTa'] = anger_data['Cleaned_tweet'].apply(get_vector_roberta)
anger_test['Vector_roBERTa'] = anger_test['Cleaned_tweet'].apply(get_vector_roberta)

joy_data['Vector_roBERTa'] = joy_data['Cleaned_tweet'].apply(get_vector_roberta)
joy_test['Vector_roBERTa'] = joy_test['Cleaned_tweet'].apply(get_vector_roberta)

sad_data['Vector_roBERTa'] = sad_data['Cleaned_tweet'].apply(get_vector_roberta)
sad_test['Vector_roBERTa'] = sad_test['Cleaned_tweet'].apply(get_vector_roberta)

fear_data['Vector_roBERTa'] = fear_data['Cleaned_tweet'].apply(get_vector_roberta)
fear_test['Vector_roBERTa'] = fear_test['Cleaned_tweet'].apply(get_vector_roberta)

In [None]:
# Word2Vec
# With preprocessing and stop-words cleaning for all emotion datasets

anger_data["Vector_w2v"] = anger_data['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))
anger_test["Vector_w2v"] = anger_test['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))
   
joy_data["Vector_w2v"] = joy_data['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))
joy_test["Vector_w2v"] = joy_test['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))

sad_data["Vector_w2v"] = sad_data['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))
sad_test["Vector_w2v"] = sad_test['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))

fear_data["Vector_w2v"] = fear_data['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))
fear_test["Vector_w2v"] = fear_test['Cleaned_tweet_wt_stopwords'].apply(lambda x: get_vector_w2v(x))

In [None]:
# Universal Sentence Encoder 
# With preprocessing for all emotion datasets

anger_data['Vector_use'] = anger_data['Cleaned_tweet'].apply(lambda x: get_vector_use(x))
anger_test['Vector_use'] = anger_test['Cleaned_tweet'].apply(lambda x: get_vector_use(x))

joy_data['Vector_use'] = joy_data['Cleaned_tweet'].apply(lambda x: get_vector_use(x))
joy_test['Vector_use'] = joy_test['Cleaned_tweet'].apply(lambda x: get_vector_use(x))

sad_data['Vector_use'] = sad_data['Cleaned_tweet'].apply(lambda x: get_vector_use(x))
sad_test['Vector_use'] = sad_test['Cleaned_tweet'].apply(lambda x: get_vector_use(x))

fear_data['Vector_use'] = fear_data['Cleaned_tweet'].apply(lambda x: get_vector_use(x))
fear_test['Vector_use'] = fear_test['Cleaned_tweet'].apply(lambda x: get_vector_use(x))

In [None]:
# Sentence-BERT 
# With preprocessing for all emotion datasets

anger_data['Vector_sbert'] = anger_data['Cleaned_tweet'].apply(get_vector_sbert)
anger_test['Vector_sbert'] = anger_test['Cleaned_tweet'].apply(get_vector_sbert)

joy_data['Vector_sbert'] = joy_data['Cleaned_tweet'].apply(get_vector_sbert)
joy_test['Vector_sbert'] = joy_test['Cleaned_tweet'].apply(get_vector_sbert)

sad_data['Vector_sbert'] = sad_data['Cleaned_tweet'].apply(get_vector_sbert)
sad_test['Vector_sbert'] = sad_test['Cleaned_tweet'].apply(get_vector_sbert)

fear_data['Vector_sbert'] = fear_data['Cleaned_tweet'].apply(get_vector_sbert)
fear_test['Vector_sbert'] = fear_test['Cleaned_tweet'].apply(get_vector_sbert)

In [None]:
# BERT 
# With raw tweets 

anger_data['Vector_bert'] = anger_data['Tweet'].apply(get_vector_bert)
anger_test['Vector_bert'] = anger_test['Tweet'].apply(get_vector_bert)

joy_data['Vector_bert'] = joy_data['Tweet'].apply(get_vector_bert)
joy_test['Vector_bert'] = joy_test['Tweet'].apply(get_vector_bert)

sad_data['Vector_bert'] = sad_data['Tweet'].apply(get_vector_bert)
sad_test['Vector_bert'] = sad_test['Tweet'].apply(get_vector_bert)

fear_data['Vector_bert'] = fear_data['Tweet'].apply(get_vector_bert)
fear_test['Vector_bert'] = fear_test['Tweet'].apply(get_vector_bert)

## Perform cross-evaluation of different embedding methods with FRNN OWA classification

We used the FRNN-OWA classifier for each embedding appraoch. In order to examine the influence of the obtained classification results, we will used different 'k' values for the best-performing approaches in our experiments for each dataset. The best obtained 'k' values for each combination dataset-embedding are used below. We also figured out, that the best results were obtained with the additive OWA type for upper and lower ap-proximations for most embeddings, so we chose them for the further experiments. 

Initially, we used only labels provided by FRNN-OWA classifiers, without confidence scores usage.

We  used  5-fold  cross-validation  to  evaluate  the  results  of  our  approaches.  As evaluation measure, the Pearson Correlation Coefficient (PCC) was chosen, as it was also the evaluation measure used for the competition.

In [None]:
# The number of cross-validation folds
K_fold = 5

In [89]:
# We use 'cross_validation_ensemble_owa' function with one vector and one corresponded k

# roBERTa-based model
print('Anger dataset with roBERTa-based embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_roBERTa'], K_fold, [19], additive(), additive(), 'labels'))
print('Joy dataset with roBERTa-based embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_roBERTa'], K_fold, [9], additive(), additive(), 'labels'))
print('Sadness dataset with roBERTa-based embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_roBERTa'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with roBERTa-based embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_roBERTa'], K_fold, [9], additive(), additive(), 'labels'))

Anger dataset with roBERTa-based embedding
PCC:  0.6464873634462883
Joy dataset with roBERTa-based embedding
PCC:  0.6735542105170109
Sadness dataset with roBERTa-based embedding
PCC:  0.6611541153014013
Fear dataset with roBERTa-based embedding
PCC:  0.5888929793465107


In [90]:
# DeepMoji model
print('Anger dataset with DeepMoji embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_deepmoji'], K_fold, [23], additive(), additive(), 'labels'))
print('Joy dataset with DeepMoji embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_deepmoji'], K_fold, [19], additive(), additive(), 'labels'))
print('Sadness dataset with DeepMoji embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_deepmoji'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with DeepMoji embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_deepmoji'], K_fold, [21], additive(), additive(), 'labels'))

Anger dataset with DeepMoji embedding
PCC:  0.5438111296817684
Joy dataset with DeepMoji embedding
PCC:  0.6214244583495802
Sadness dataset with DeepMoji embedding
PCC:  0.576854211632575
Fear dataset with DeepMoji embedding
PCC:  0.5043689724239788


In [91]:
# BERT model
print('Anger dataset with BERT embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_bert'], K_fold, [19], additive(), additive(), 'labels'))
print('Joy dataset with BERT embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_bert'], K_fold, [17], additive(), additive(), 'labels'))
print('Sadness dataset with BERT embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_bert'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with BERT embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_bert'], K_fold, [7], additive(), additive(), 'labels'))

Anger dataset with BERT embedding
PCC:  0.4456954766377472
Joy dataset with BERT embedding
PCC:  0.5065254334783876
Sadness dataset with BERT embedding
PCC:  0.4488626587861287
Fear dataset with BERT embedding
PCC:  0.45844506210205427


In [92]:
# SBERT model 
print('Anger dataset with SBERT embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_sbert'], K_fold, [19], additive(), additive(), 'labels'))
print('Joy dataset with SBERT embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_sbert'], K_fold, [15], additive(), additive(), 'labels'))
print('Sadness dataset with SBERT embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_sbert'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with SBERT embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_sbert'], K_fold, [11], additive(), additive(), 'labels'))

Anger dataset with SBERT embedding
PCC:  0.5008545771452125
Joy dataset with SBERT embedding
PCC:  0.5598644655143654
Sadness dataset with SBERT embedding
PCC:  0.5442157010834586
Fear dataset with SBERT embedding
PCC:  0.49480438073320665


In [93]:
# USE model 
print('Anger dataset with USE embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_use'], K_fold, [23], additive(), additive(), 'labels'))
print('Joy dataset with USE embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_use'], K_fold, [23], additive(), additive(), 'labels'))
print('Sadness dataset with USE embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_use'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with USE embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_use'], K_fold, [21], additive(), additive(), 'labels'))

Anger dataset with USE embedding
PCC:  0.5042334104402324
Joy dataset with USE embedding
PCC:  0.5513573121589473
Sadness dataset with USE embedding
PCC:  0.5865315296525251
Fear dataset with USE embedding
PCC:  0.5441177544624183


In [103]:
# Word2Vec model 
print('Anger dataset with Word2Vec embedding')
print('PCC: ', cross_validation_ensemble_owa(anger_data, ['Vector_w2v'], K_fold, [21], additive(), additive(), 'labels'))
print('Joy dataset with Word2Vec embedding')
print('PCC: ', cross_validation_ensemble_owa(joy_data, ['Vector_w2v'], K_fold, [23], additive(), additive(), 'labels'))
print('Sadness dataset with Word2Vec embedding')
print('PCC: ', cross_validation_ensemble_owa(sad_data, ['Vector_w2v'], K_fold, [23], additive(), additive(), 'labels'))
print('Fear dataset with Word2Vec embedding')
print('PCC: ', cross_validation_ensemble_owa(fear_data, ['Vector_w2v'], K_fold, [7], additive(), additive(), 'labels'))

Anger dataset with Word2Vec embedding
PCC:  0.4846692630756858
Joy dataset with Word2Vec embedding
PCC:  0.5161972827682905
Sadness dataset with Word2Vec embedding
PCC:  0.47221836156489855
Fear dataset with Word2Vec embedding
PCC:  0.4311917784564037


## Cross-validation evaluation for ensemble of FRNN OWA models based on different embeddings

We used the FRNN-OWA method both as a standalone method  and  as  part  of  a  classification  ensemble.  For  this  purpose,  a  separate model was trained for every choice of tweet embedding. Each model was based on each dataset’s best setup and embedding (choice of tweet preprocessing, OWA types, and the number of neighbours 'k'). 

To determine the test label, we use a weighted voting function on the different outputs of our models. The mean performed the best among other and was chosen as a primary voting function.

### Voting function - the mean of labels

In this approach all models of ensemble have the same weights and FRNN-OWA methos returns the predicted label.

In [116]:
# Anger dataset with all embeggings and k-s

cross_validation_ensemble_owa(anger_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [19, 23, 19, 19, 23, 21], additive(), additive(), 'labels')

0.6842759501834952

In [111]:
# Joy dataset with all embeggings and k-s

cross_validation_ensemble_owa(joy_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [9, 19, 17, 15, 23, 23], additive(), additive(), 'labels')

0.7420823382643671

In [112]:
# Sad dataset with all embeggings and k-s

cross_validation_ensemble_owa(sad_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [23, 23, 23, 23, 23, 23], additive(), additive(), 'labels')

0.7393055766899576

In [113]:
# Fear dataset with all embeggings and k-s

cross_validation_ensemble_owa(fear_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [9, 21, 7, 11, 21, 7], additive(), additive(), 'labels')

0.6481174352923854

### Voting function - the mean of labels calculated with confidence scores

In this approach the FRNN-OWA clasiffier return the confidence scores of our labels, that are used to put different weightes on the models' outputs.

A confidence score is a float value, usually between 0 and 1, provided by aclassification model for each prediction class. This value illustrates the accuracy of the model’s prediction for a particular class.

In the end we figured out, that weighted average with confidence scores performed the best. It means that we upgrade the mean voting function with confidence scores as weights to calculate the prediction label as a weighted average of labels.

In [183]:
# Anger dataset with all embeggings and k-s

cross_validation_ensemble_owa(anger_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [19, 23, 19, 19, 23, 21], additive(), additive(), 'conf_scores')

0.6314359629929253

In [230]:
# Joy dataset with all embeggings and k-s

cross_validation_ensemble_owa(joy_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [9, 19, 17, 15, 23, 23], additive(), additive(), 'conf_scores')

0.6616195692551319

In [231]:
# Sad dataset with all embeggings and k-s

cross_validation_ensemble_owa(sad_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [23, 23, 23, 23, 23, 23], additive(), additive(), 'conf_scores')

0.6643773685191207

In [232]:
# Fear dataset with all embeggings and k-s

cross_validation_ensemble_owa(fear_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [9, 21, 7, 11, 21, 7], additive(), additive(), 'conf_scores')

0.5302812588261683

### The best set up for each dataset

The  last  step  of  ensemble  tuning  was  to  determine  the  most  accurate  set  of models in the ensemble. For this purpose, we used grid search, where the PCC score was calculated for each subset of all six models (features) and compared. The predicted label was calculated using a rounded average function with weights equal to the scaled confidence scores. 

In this way, we detected the best setup for each emotion dataset, that also contains the best 'k' value, the voting function and OWA types.

In [240]:
# For anger - confidence scores; features: roBERTa, DeepMoji, BERT, USE, Word2Vec; alpha = 0.0420

cross_validation_ensemble_owa(anger_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_use', 'Vector_w2v'], K_fold, [19, 23, 19, 23, 21], additive(), additive(), 'conf_scores', 0.0420)

0.6801480669739245

In [241]:
# For joy - confidence scores; features: roBERTa, DeepMoji, BERT, USE, SBERT; alpha = 0.0320

cross_validation_ensemble_owa(joy_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use'], K_fold, [9, 19, 17, 15, 23], additive(), additive(), 'conf_scores', 0.0320)

0.7397918225616584

In [242]:
# For sadness - confidence scores; features: roBERTa, DeepMoji, USE, SBERT; alpha = 0.0320

cross_validation_ensemble_owa(sad_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_sbert', 'Vector_use'], K_fold, [23, 23, 23, 23], additive(), additive(), 'conf_scores', 0.0320)

0.7247581287413064

In [243]:
# For fear - confidence scores; features: roBERTa, DeepMoji, SBERT, USE, Word2Vec; alpha = 0.0460

cross_validation_ensemble_owa(fear_data, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], K_fold, [9, 21, 11, 21, 7], additive(), additive(), 'conf_scores', 0.0460)

0.6245784952669604

## Test data evaluation of the best approaches

To measure the best ensemble’s effectiveness, we evaluate it on the test data. We calculate PCC values for each emotion dataset and average the results, as it was done by the competition organizers.

As we can see, results for the test data are predictably worse than those for the combined training and development datasets. 

In [254]:
# For anger

anger_test_labels = test_ensemble_confscores(anger_data, anger_data['Class'], anger_test, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_use', 'Vector_w2v'], [19, 23, 19, 23, 21], additive(), additive(), 0.0420)
anger_test_PCC = pearsonr(anger_test['Class'], anger_test_labels)[0]
print("Test PCC score for anger data: ", anger_test_PCC)

Test PCC score for anger data:  0.6432847625677238


In [256]:
# For joy

joy_test_labels = test_ensemble_confscores(joy_data, joy_data['Class'], joy_test, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_bert', 'Vector_sbert', 'Vector_use'], [9, 19, 17, 15, 23], additive(), additive(), 0.0320)
joy_test_PCC = pearsonr(joy_test['Class'], joy_test_labels)[0]
print("Test PCC score for joy data: ", joy_test_PCC)

Test PCC score for joy data:  0.6819870615829331


In [258]:
# For sadness

sad_test_labels = test_ensemble_confscores(sad_data, sad_data['Class'], sad_test, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_sbert', 'Vector_use'], [23, 23, 23, 23], additive(), additive(), 0.0320)
sad_test_PCC = pearsonr(sad_test['Class'], sad_test_labels)[0]
print("Test PCC score for sad data: ", sad_test_PCC)

Test PCC score for sad data:  0.6900805693475882


In [260]:
# For fear

fear_test_labels = test_ensemble_confscores(fear_data, fear_data['Class'], fear_test, ['Vector_roBERTa', 'Vector_deepmoji', 'Vector_sbert', 'Vector_use', 'Vector_w2v'], [9, 21, 11, 21, 7], additive(), additive(), 0.0460)
fear_test_PCC = pearsonr(fear_test['Class'], fear_test_labels)[0]
print("Test PCC score for fear data: ", fear_test_PCC)

Test PCC score for fear data:  0.5759393410711254


In [261]:
# The average PCC value

print("The average PCC value for 4 datasets: ", (anger_test_PCC+joy_test_PCC+sad_test_PCC+fear_test_PCC)/4)

The average PCC value for 4 datasets:  0.6478229336423427


We submitted the predicted labels for the test data in the required format to the competition web page - https://competitions.codalab.org/competitions/17751#learn_the_details-evaluation. 

After submission, we took **the second place** in the competition leader board with **PCC = 0.654**.