In [25]:
from transformers import pipeline

We use some sentences to discard models quickly. In the next step we are gonna test the selected models with at least +100k with spanish and english sentences to select what fit our needs best.

We let just usefull models but we tested an approximate amount of 35 models

In [26]:
# These sentences could be used in a work environment in the first two languagues. We want to see the differences between the 2 languages
# in the same model and what he tends to do.

sentences = [ 'Lo que se hizo es malisimo lo detesto.', 
             'I really hate what he did. It was not a good week.',
             'me parece muy bueno todo estoy agradecido.',
             'I think all the people has the comittment to work fine and I am thanks because of that.',
             'I am so happy to be here',
             'Me sentí muy feliz a lo largo de la jornada. Quiero destacar que el trabajo en equipo fue muy positivo y me apoyo aunque no fue del todo fácil. Entiendo que es parte de un período de adaptación.',
             'I felt very happy throughout the day. I want to highlight that the teamwork was very positive and they supported me although it was not entirely easy. I understand that it is part of a period of adaptation.',
             'Creo que el stack tecnologico no es muy bueno y no me quedan claros los requerimentos. Debería haber más esfuerzo por parte de los jefes y líderes del equipo',
             'I think the technological stack is not very good and the requirements are not clear to me. There should be more effort on the part of the bosses and team leaders.'
             ]

# Models

Most of these models don't understand irony, double negatives, and words or expressions specific to groups or places (slang) well, but since we're talking about a work environment we don't worry about those kinds of things. If we want to improve their performance in these fields, we need to tune the models for that.

## Base model

In [27]:
# Base model possibility. It has to be finnetuned to be used in a specific way.

base_model = pipeline("sentiment-analysis", model="xlm-roberta-base", framework="pt")

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [28]:

print('-------------------------------------------------------------------------')
print(f'Model {base_model.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(base_model(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model xlm-roberta-base
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'LABEL_0', 'score': 0.503447413444519}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'LABEL_0', 'score': 0.513603150844574}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'LABEL_0', 'score': 0.5075839757919312}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'LABEL_1', 'score': 0.5018354654312134}]
--------------------------------------------------------------------------------

Sentence: I am so happy to be her

## Spanish

In [29]:
# Lightweight model fine-tuned for Spanish sentiment analysis. Usually predicts Neutral when the sentence is in English.

# Sentiment analysis in Spanish

analyzer_es = pipeline("sentiment-analysis", model="Christiansg/finetuning-sentiment_spanish-amazon-group23", framework="pt", max_length=512, truncation=True)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [30]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_es.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_es(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model Christiansg/finetuning-sentiment_spanish-amazon-group23
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'NEG', 'score': 0.9998739957809448}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'NEU', 'score': 0.9997308850288391}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'NEU', 'score': 0.999830961227417}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'NEU', 'score': 0.9953254461288452}]
--------------------------------------------------------------------------------

Sentence:

## Default - English model

In [31]:
# Default model for sentimental analysis in English (distilbert-base-uncased-finetuned-sst-2-english). Really good performance
# Positive and negative label with accurraccy of their responses.

analyzer_eng = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [32]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_1.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_eng(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model lxyuan/distilbert-base-multilingual-cased-sentiments-student
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'NEGATIVE', 'score': 0.883853554725647}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'NEGATIVE', 'score': 0.9997681975364685}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'NEGATIVE', 'score': 0.923772394657135}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'POSITIVE', 'score': 0.99969482421875}]
---------------------------------------------------------------------

## Multilingual

Tends to be lighter use 1 Multilingual Model than use 2 models because it has to be finnetuned, mainly in spanish. His default language and learning is in english

In [33]:

# Really good multilingual model, lightweight and quite accurate in Spanish and English.

analyzer_1 = pipeline("sentiment-analysis", model="lxyuan/distilbert-base-multilingual-cased-sentiments-student", framework="pt")

In [34]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_1.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_1(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model lxyuan/distilbert-base-multilingual-cased-sentiments-student
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'negative', 'score': 0.9654184579849243}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'negative', 'score': 0.9675229787826538}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'positive', 'score': 0.986839771270752}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'positive', 'score': 0.6303771138191223}]
------------------------------------------------------------------

In [35]:

# Multilingual model with 5 tags that describes from worst to best with 5 stars. His interpretation is not especially good in Spanish 
# but it is trained in other languages with more data and better accuracy, mainly in English.

analyzer_2 = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment", framework="pt")

In [36]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_2.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_2(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model nlptown/bert-base-multilingual-uncased-sentiment
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': '1 star', 'score': 0.9083830118179321}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': '1 star', 'score': 0.5446393489837646}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': '5 stars', 'score': 0.6715044379234314}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': '5 stars', 'score': 0.47652143239974976}]
--------------------------------------------------------------------------------



In [37]:

# Really balanced model: Multilingual, good features and very light. The model tends to lose precision with some long words and phrases
# but continues to understand the trend and avoid making mistakes in the label decision.

analyzer_3 = pipeline("sentiment-analysis", model="philschmid/distilbert-base-multilingual-cased-sentiment", framework="pt")

In [38]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_3.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_3(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model philschmid/distilbert-base-multilingual-cased-sentiment
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'negative', 'score': 0.9141145348548889}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'negative', 'score': 0.9255995750427246}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'positive', 'score': 0.9130302667617798}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'positive', 'score': 0.9707993268966675}]
----------------------------------------------------------------------

In [39]:
# Tends to central scoring and has worst performance with long sentences or a lot of words. I need to do more testing.

analyzer_4 = pipeline("sentiment-analysis", model="arjuntheprogrammer/distilbert-base-multilingual-cased-sentiment-2", framework="pt")

In [40]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_4.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_4(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model arjuntheprogrammer/distilbert-base-multilingual-cased-sentiment-2
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'negative', 'score': 0.8638889789581299}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'negative', 'score': 0.8020395040512085}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'positive', 'score': 0.9395526647567749}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'neutral', 'score': 0.6768832206726074}]
-------------------------------------------------------------

In [41]:
# Testing...

analyzer_5 = pipeline("sentiment-analysis", model="philschmid/distilbert-base-multilingual-cased-sentiment-2", framework="pt")

In [42]:

print('-------------------------------------------------------------------------')
print(f'Model {analyzer_5.model.name_or_path}')
print('-------------------------------------------------------------------------\n')
    
for i in sentences:
    
    print(f'Sentence: {i}')
    print(analyzer_5(i))
    print('--------------------------------------------------------------------------------\n')

-------------------------------------------------------------------------
Model philschmid/distilbert-base-multilingual-cased-sentiment-2
-------------------------------------------------------------------------

Sentence: Lo que se hizo es malisimo lo detesto.
[{'label': 'negative', 'score': 0.8163114190101624}]
--------------------------------------------------------------------------------

Sentence: I really hate what he did. It was not a good week.
[{'label': 'negative', 'score': 0.6100201606750488}]
--------------------------------------------------------------------------------

Sentence: me parece muy bueno todo estoy agradecido.
[{'label': 'positive', 'score': 0.920467734336853}]
--------------------------------------------------------------------------------

Sentence: I think all the people has the comittment to work fine and I am thanks because of that.
[{'label': 'positive', 'score': 0.9330447912216187}]
---------------------------------------------------------------------