# C-More

In [1]:
import pandas as pd

from collections import Counter

from nltk.tokenize import TweetTokenizer
import string

Our goal is to use the HuggingFace Inference API, https://huggingface.co/docs/api-inference/quicktour, to analyse the sentiment of tweets written in Portuguese (PT). We have two options to achieve this:

* get the sentiment directly from the tweets in PT
* translate the tweets to EN and get the sentiment of the translated tweets

We will start by replacing some of the most common abbreviations in social media texts with full words.

### 1. Process text

In [2]:
df = pd.read_pickle('data_galp.pkl')

In [3]:
df.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125 entries, 0 to 124
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype              
---  ------      --------------  -----              
 0   id          125 non-null    int64              
 1   text        125 non-null    object             
 2   retweets    125 non-null    int64              
 3   replies     125 non-null    int64              
 4   likes       125 non-null    int64              
 5   quotes      125 non-null    int64              
 6   created_at  125 non-null    datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), int64(5), object(1)
memory usage: 7.0+ KB


We can tokenize our texts to identify some of the most common abbreviations.

In [5]:
# tokenize text and remove punctuation

def tokens_nopunct(text):
    
    tokens = [token for token in TweetTokenizer(reduce_len=True, strip_handles=True).tokenize(text)]
    return [token for token in tokens if token not in string.punctuation]

In [6]:
df['tokens'] = df['text'].map(tokens_nopunct)

In [7]:
counter = Counter()

for tokens in df['tokens']:
    counter.update(tokens)

In [9]:
# most common tokens with length < 4

[token for token in counter.most_common(50) if len(token[0]) < 4]

[('a', 146),
 ('de', 97),
 ('e', 85),
 ('que', 77),
 ('o', 69),
 ('da', 63),
 ('do', 36),
 ('é', 33),
 ('em', 33),
 ('não', 29),
 ('com', 29),
 ('as', 25),
 ('A', 24),
 ('os', 22),
 ('na', 22),
 ('no', 22),
 ('uma', 19),
 ('um', 19),
 ('...', 18),
 ('dos', 17),
 ('EDP', 15),
 ('por', 14),
 ('O', 13),
 ('nos', 12),
 ('ao', 11),
 ('q', 11),
 ('tem', 11),
 ('se', 11),
 ('foi', 11),
 ('à', 9),
 ('só', 8),
 ('me', 8),
 ('Não', 8),
 ('ser', 8),
 ('140', 7),
 ('25', 7),
 ('“', 7),
 ('”', 7)]

In [10]:
# 'q' is one of the most common abbreviations

print(df[df['text'].str.contains(" q ", case=False)]['text'].values)

['A Galp convida portugueses a pensar fora do carro..e quem disser Q isto é só “uma limpeza de imagem”, a Galp, manda “dar banho ao cão”. A máscara 😷 cai. #De-evolution. AbsurdQ https://t.co/TW2XljNvxI'
 '@davidkirzner @LiberalNova @LiberalPT Achas? A moça terminou o curso há pouco tempo e concorreu á Galp... O q estou a dizer é q na Galp trabalham milhares e milhares de pessoas, não é por trabalhar na Galp q somos maus. Nem sequer são eles q decidem preços e afins 😕'
 '@davidkirzner @LiberalNova @LiberalPT David mas eu tenho uma amiga q não é Rica, nunca ligou a Política, é uma excelente pessoa e profissional e trabalha na Galp. Tipo, a Galp tem milhares de funcionários'
 '@Fragoso_1906 sim, a Galp tem uma cena com o continente q tens desconto na fatura e vai para o cartão!'
 '@RuiPaiva5 Já andam todos em pânico. Até lhes tremem as pernas só de pensar no q aconteceu à GALP. https://t.co/puXUVI70da'
 '@tiagojcgodinho @MestredoUnive19 @PSSantiago88 Tens noção q há "ativos" encostados n 

In [11]:
# dictionary with abbreviations to replace

abrev = {" q ": " que "}

In [12]:
# other abbreviations: 'tb'

print(df[df['text'].str.contains(" tb ", case=False)]['text'].values)

["@psocialista @LiberalPT @ppdpsd \nMAL q é feito aos portugueses.\nÑ basta falar na elevada 'carga' fiscal, tb dos baixos salários face a inflação ~9%\nPQ isto tb conta, qd ESTADO é acionista da Galp, em 7%\nhttps://t.co/sYtJAQQAQ4\n'TUGA' SOFRE...\nhttps://t.co/eD0R4EIrXL via @expresso"
 '@jbizarro Acidentes de viação, segundo as estatísticas, tb foram menos, e a Galp, segundo se soube, teve prejuízos avultados, teve menos lucros.🤦']


In [13]:
abrev[" tb "] = " também "

In [14]:
# other abbreviations: 'qd'

print(df[df['text'].str.contains(" qd ", case=False)]['text'].values)

["@psocialista @LiberalPT @ppdpsd \nMAL q é feito aos portugueses.\nÑ basta falar na elevada 'carga' fiscal, tb dos baixos salários face a inflação ~9%\nPQ isto tb conta, qd ESTADO é acionista da Galp, em 7%\nhttps://t.co/sYtJAQQAQ4\n'TUGA' SOFRE...\nhttps://t.co/eD0R4EIrXL via @expresso"]


In [15]:
# other abbreviations to replace

abrev[" qd "] = " quando "
abrev[" ñ "] = " não "
abrev[" pq "] = " porque "
abrev[" hj "] = " hoje "

In [16]:
abrev

{' q ': ' que ',
 ' tb ': ' também ',
 ' qd ': ' quando ',
 ' ñ ': ' não ',
 ' pq ': ' porque ',
 ' hj ': ' hoje '}

This is not an exhaustive list of abbreviations to replace, but it could be updated according to our needs.

Another way of doing this would be using regular expressions to account for all occurrences (and not only lowercase abbreviations with blank spaces at both ends).

We could also try to use a spell checker.

In [17]:
# replace abbreviations by word

def replace_abrev(text, dic):
    for k, v in dic.items():
        text = text.replace(k, v)
    return text

In [18]:
df['text_full'] = df['text'].apply(replace_abrev, dic=abrev)

In [19]:
# example 1

print(df[df['text'].str.contains("milhares e milhares", case=False)]['text'].values, 
      '\n\n', 
      df[df['text_full'].str.contains("milhares e milhares", case=False)]['text_full'].values)

['@davidkirzner @LiberalNova @LiberalPT Achas? A moça terminou o curso há pouco tempo e concorreu á Galp... O q estou a dizer é q na Galp trabalham milhares e milhares de pessoas, não é por trabalhar na Galp q somos maus. Nem sequer são eles q decidem preços e afins 😕'] 

 ['@davidkirzner @LiberalNova @LiberalPT Achas? A moça terminou o curso há pouco tempo e concorreu á Galp... O que estou a dizer é que na Galp trabalham milhares e milhares de pessoas, não é por trabalhar na Galp que somos maus. Nem sequer são eles que decidem preços e afins 😕']


In [20]:
# example 2

print(df[df['text'].str.contains("estatísticas", case=False)]['text'].values, 
      '\n\n', 
      df[df['text_full'].str.contains("estatísticas", case=False)]['text_full'].values)

['@jbizarro Acidentes de viação, segundo as estatísticas, tb foram menos, e a Galp, segundo se soube, teve prejuízos avultados, teve menos lucros.🤦'] 

 ['@jbizarro Acidentes de viação, segundo as estatísticas, também foram menos, e a Galp, segundo se soube, teve prejuízos avultados, teve menos lucros.🤦']


### 2. Sentiment Analysis with a multilingual model fine-tuned for PT

In [37]:
import requests

Before we begin, we need to take into consideration the usage limits of the API. The free version of HugginFace's Inference API is limited to 30000 characters per month: https://huggingface.co/pricing .

In [25]:
# total number of characters of our tweets

sum([len(text) for text in df['text_full']])

20923

Since our tweets have more than 20000 characters, we are going to use a smaller number of them to test these new approaches (otherwise we would quickly get to the monthly limit).

In [27]:
df_sample = df[0:20].copy()

In [28]:
sum([len(text) for text in df_sample['text_full']])

3666

In [30]:
# HuggingFace token

hf_token = ""

In [98]:
# multilingual model for sentiment analysis:
# https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment

model = "cardiffnlp/twitter-xlm-roberta-base-sentiment"

In [99]:
API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}

In [33]:
def apply_model(data):
    payload = dict(inputs=data, options=dict(wait_for_model=True))
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

In [100]:
# test

apply_model("Isto é um teste!")

[[{'label': 'Neutral', 'score': 0.8557949066162109},
  {'label': 'Negative', 'score': 0.08614175766706467},
  {'label': 'Positive', 'score': 0.058063309639692307}]]

In [83]:
%%time

tweets_sentiment = []

for tweet in df_sample['text_full']:
    
    sentiment_result = apply_model(tweet)[0] # the result is a list inside a list
    tweets_sentiment.append({'sentiment': sentiment_result})

Wall time: 12.1 s


In [84]:
df_sample['sentiment'] = pd.DataFrame(tweets_sentiment)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155..."
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404..."
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177..."
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356..."


In [85]:
# sentiment score ranges from [0, 1] for each label

# top sentiment
df_sample['top_sentiment'] = df_sample['sentiment'].map(lambda sentiment: max(sentiment, key=lambda x: x['score']))

# top score
df_sample['score'] = df_sample['top_sentiment'].map(lambda x: x['score'])

# top label
df_sample['label'] = df_sample['top_sentiment'].map(lambda x: x['label'])

In [86]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral


In [88]:
df_sample['label'].value_counts()

Neutral     14
Negative     3
Positive     3
Name: label, dtype: int64

In [92]:
print(df_sample[df_sample['label'] == 'Negative']['text_full'].values)

['A Galp convida portugueses a pensar fora do carro..e quem disser Q isto é só “uma limpeza de imagem”, a Galp, manda “dar banho ao cão”. A máscara 😷 cai. #De-evolution. AbsurdQ https://t.co/TW2XljNvxI'
 '@davidkirzner @LiberalNova @LiberalPT Achas? A moça terminou o curso há pouco tempo e concorreu á Galp... O que estou a dizer é que na Galp trabalham milhares e milhares de pessoas, não é por trabalhar na Galp que somos maus. Nem sequer são eles que decidem preços e afins 😕'
 '#galp Um belo exemplo da areia que nos atiram para os olhos quando dizem que a petrolíferas não ganham ( tanto) dinheiro como deviam, #edp mesma coisa, chulos e miseráveis é o que fulanos são, com a conivência governamental https://t.co/nL08rnxd8p']


In [93]:
print(df_sample[df_sample['label'] == 'Positive']['text_full'].values)

['Mais um ano que renovo o cartão jovem, mais um ano em que a câmara não me da acesso ao codigo de desconto \U0001fae0\U0001fae0 ate hoje a parte boa do CJ é o desconto que dão com o cartão jovem galp fica ao preço da gasolina low cost hehehe'
 '@davidkirzner @LiberalNova @LiberalPT David mas eu tenho uma amiga que não é Rica, nunca ligou a Política, é uma excelente pessoa e profissional e trabalha na Galp. Tipo, a Galp tem milhares de funcionários'
 'Asociádevos ao #GALP e fomentemos xuntos o crecemento económico, a inclusión social, a creación de emprego e o apoio á empregabilidade e á mobilidade laboral nas comunidades pesqueiras e acuícolas. \n\n#galp #mar #pesca #Galicia https://t.co/30c2OZqfZW']


Most of the tweets are classified as neutral.

Of the 3 tweets classified as negative, 2 are clearly negative and 1 would be more neutral.

Of the 3 tweets classified as positive, the 3 can be considered positive, but 1 is in Galician and not in Portuguese! - The language identification we used from Twitter is not completely accurate...

### 3. Sentiment Analysis of translated tweets

In [43]:
from transformers import pipeline

from nltk.sentiment.vader import SentimentIntensityAnalyzer

#### 3.1. Translate tweets with a multilingual model to EN

In [101]:
# multilingual model for translation to EN:
# https://huggingface.co/Helsinki-NLP/opus-mt-mul-en

model = "Helsinki-NLP/opus-mt-mul-en"

In [102]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [97]:
# test

apply_model("Isto é um teste!")

[{'translation_text': 'This is a test!'}]

In [107]:
%%time

tweets_translation = []

for tweet in df_sample['text_full']:
    
    translation_result = apply_model(tweet)
    tweets_translation.append({'translation': translation_result[0]['translation_text']})

Wall time: 1min 2s


In [110]:
df_sample['translation'] = pd.DataFrame(tweets_translation)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative,The Galp invites Portuguese to think out of th...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive,"More one year that renews the young card, more..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral,#sicnotices I don't know! With France to build...


Other models for translation to try later:

* https://huggingface.co/unicamp-dl/translation-pt-en-t5

* https://huggingface.co/Narrativa/mbart-large-50-finetuned-opus-pt-en-translation

* https://huggingface.co/salesken/translation-spanish-and-portuguese-to-english

#### 3.2. Analyse sentiment of translated tweets

In [172]:
# EN model for sentiment analysis:
# https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest

model = "cardiffnlp/twitter-roberta-base-sentiment-latest"

We can also try the multilingual model we used before for sentiment analysis: https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment .

In [173]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [174]:
# test

apply_model("This a test!")

[[{'label': 'Negative', 'score': 0.05583813413977623},
  {'label': 'Neutral', 'score': 0.7195356488227844},
  {'label': 'Positive', 'score': 0.22462621331214905}]]

In [175]:
%%time

tweets_sentiment = []

for tweet in df_sample['translation']:
    
    sentiment_result = apply_model(tweet)[0] # the result is a list inside a list
    tweets_sentiment.append({'sentiment': sentiment_result})

Wall time: 12.8 s


In [176]:
df_sample['translation_sent'] = pd.DataFrame(tweets_sentiment)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...,"[{'label': 'Negative', 'score': 0.167378321290...","{'label': 'Neutral', 'score': 0.8546541333198547}",0.854654,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative,The Galp invites Portuguese to think out of th...,"[{'label': 'Negative', 'score': 0.335718154907...","{'label': 'Neutral', 'score': 0.6241118311882019}",0.624112,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive,"More one year that renews the young card, more...","[{'label': 'Negative', 'score': 0.242178291082...","{'label': 'Neutral', 'score': 0.8553747534751892}",0.855375,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,"[{'label': 'Negative', 'score': 0.154701188206...","{'label': 'Neutral', 'score': 0.8400129675865173}",0.840013,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral,#sicnotices I don't know! With France to build...,"[{'label': 'Negative', 'score': 0.242266610264...","{'label': 'Neutral', 'score': 0.8227448463439941}",0.822745,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi..."


In [177]:
# top sentiment
df_sample['translation_top_sent'] = df_sample['translation_sent'].map(lambda sentiment: max(sentiment, key=lambda x: x['score']))

# top score
df_sample['translation_score'] = df_sample['translation_top_sent'].map(lambda x: x['score'])

# top label
df_sample['translation_label'] = df_sample['translation_top_sent'].map(lambda x: x['label'])

In [178]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...,"[{'label': 'Negative', 'score': 0.167378321290...","{'label': 'Neutral', 'score': 0.7906758189201355}",0.790676,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative,The Galp invites Portuguese to think out of th...,"[{'label': 'Negative', 'score': 0.335718154907...","{'label': 'Neutral', 'score': 0.6248010993003845}",0.624801,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive,"More one year that renews the young card, more...","[{'label': 'Negative', 'score': 0.242178291082...","{'label': 'Neutral', 'score': 0.6383802890777588}",0.63838,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,"[{'label': 'Negative', 'score': 0.154701188206...","{'label': 'Neutral', 'score': 0.7540382742881775}",0.754038,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral,#sicnotices I don't know! With France to build...,"[{'label': 'Negative', 'score': 0.242266610264...","{'label': 'Neutral', 'score': 0.6727580428123474}",0.672758,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi..."


In [179]:
df_sample['translation_label'].value_counts()

Neutral     16
Positive     2
Negative     2
Name: translation_label, dtype: int64

In [182]:
df_sample[(df_sample['label'] == df_sample['translation_label']) & (df_sample['label'] == 'Positive')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl
6,1550228150503833600,@davidkirzner @LiberalNova @LiberalPT David ma...,0,1,0,0,2022-07-21 21:16:07+00:00,"[David, mas, eu, tenho, uma, amiga, q, não, é,...",@davidkirzner @LiberalNova @LiberalPT David ma...,"[{'label': 'Positive', 'score': 0.606086671352...","{'label': 'Positive', 'score': 0.6060866713523...",0.606087,Positive,@davidkirzner @LiberalNova @LiberalPT David bu...,"[{'label': 'Negative', 'score': 0.028445854783...","{'label': 'Positive', 'score': 0.6959779262542...",0.695978,Positive,"So I have a friend who is not rich, never linked",@davidkirzner @LiberalNew @LiberalPT David but...
8,1550227697737125889,Asociádevos ao #GALP e fomentemos xuntos o cre...,0,1,0,0,2022-07-21 21:14:19+00:00,"[Asociádevos, ao, #GALP, e, fomentemos, xuntos...",Asociádevos ao #GALP e fomentemos xuntos o cre...,"[{'label': 'Positive', 'score': 0.499892264604...","{'label': 'Positive', 'score': 0.4998922646045...",0.499892,Positive,Join #GALP and together we promote economic gr...,"[{'label': 'Negative', 'score': 0.003218225203...","{'label': 'Positive', 'score': 0.9122379422187...",0.912238,Positive,#galp #mar #fish #Galicia https://t.co/30,Supporters of the #GALP and let us drive forwa...


In [183]:
df_sample[(df_sample['label'] == df_sample['translation_label']) & (df_sample['label'] == 'Negative')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl
13,1550176312920346625,#galp Um belo exemplo da areia que nos atiram ...,0,0,0,0,2022-07-21 17:50:08+00:00,"[#galp, Um, belo, exemplo, da, areia, que, nos...",#galp Um belo exemplo da areia que nos atiram ...,"[{'label': 'Negative', 'score': 0.791540443897...","{'label': 'Negative', 'score': 0.7915404438972...",0.79154,Negative,#galp A beautiful example of the arena that at...,"[{'label': 'Negative', 'score': 0.469873368740...","{'label': 'Negative', 'score': 0.4698733687400...",0.469873,Negative,A beautiful example of the sand that throw us to,#galp An excellent example of the sand being t...


In [187]:
df_sample[df_sample['label'] != df_sample['translation_label']][['text_full', 'label', 'translation', 'translation_label']]

Unnamed: 0,text_full,label,translation,translation_label
1,A Galp convida portugueses a pensar fora do ca...,Negative,The Galp invites Portuguese to think out of th...,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,"More one year that renews the young card, more...",Neutral
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,Neutral
10,Se fores ao Avante apoias a invasão da Rússia....,Neutral,"If you're on the move, you'll support Russia's...",Negative


In [188]:
len(df_sample[df_sample['label'] == df_sample['translation_label']])

16

Out of our 20 tweets, the sentiment is the same for the original and translated versions of 16 tweets.

For the remaining 4 tweets:

* 2 negative tweets were considered to be neutral for the translated version
* 1 positive tweet was considered to be neutral for the translated version
* 1 neutral tweet was considered to be negative for the translated version

No positive (negative) tweet was considered to be negative (positive) for the translated version.

We will now try other models for translating text from PT to EN.

#### 3.3. Translate tweets with an implementation of T5 for PT-EN

Nota: ver https://github.com/google-research/text-to-text-transfer-transformer/

In [120]:
# T5 model for translation to EN:
# https://huggingface.co/unicamp-dl/translation-pt-en-t5

model = "unicamp-dl/translation-pt-en-t5"

In [121]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [122]:
# test

apply_model("Isto é um teste!")

[{'translation_text': 'This is a test!'}]

In [123]:
%%time

tweets_translation = []

for tweet in df_sample['text_full']:
    
    translation_result = apply_model(tweet)
    tweets_translation.append({'translation': translation_result[0]['translation_text']})

Wall time: 22.9 s


In [124]:
df_sample['translation_t5'] = pd.DataFrame(tweets_translation)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...,"[{'label': 'Negative', 'score': 0.080651327967...","{'label': 'Neutral', 'score': 0.8546541333198547}",0.854654,Neutral,Galp buys for 140 million the 25% of Titan Sol...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative,The Galp invites Portuguese to think out of th...,"[{'label': 'Negative', 'score': 0.335132241249...","{'label': 'Neutral', 'score': 0.6241118311882019}",0.624112,Neutral,A Galp invites Portuguese to think outside the...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive,"More one year that renews the young card, more...","[{'label': 'Negative', 'score': 0.024882558733...","{'label': 'Neutral', 'score': 0.8553747534751892}",0.855375,Neutral,"Another year that renew the young card, anothe..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,"[{'label': 'Negative', 'score': 0.117106109857...","{'label': 'Neutral', 'score': 0.8400129675865173}",0.840013,Neutral,What I am saying is that in Galp there are thous
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral,#sicnotices I don't know! With France to build...,"[{'label': 'Negative', 'score': 0.108840383589...","{'label': 'Neutral', 'score': 0.8227448463439941}",0.822745,Neutral,I don't know! With France building the World's...


In [125]:
text_columns = ['text_full', 'translation', 'translation_t5']

df_sample[text_columns]

Unnamed: 0,text_full,translation,translation_t5
0,Galp compra por 140 milhões os 25% da Titan So...,Galp buys 140 million to 25% of Titan Solar th...,Galp buys for 140 million the 25% of Titan Sol...
1,A Galp convida portugueses a pensar fora do ca...,The Galp invites Portuguese to think out of th...,A Galp invites Portuguese to think outside the...
2,"Mais um ano que renovo o cartão jovem, mais um...","More one year that renews the young card, more...","Another year that renew the young card, anothe..."
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,@davidkirzner @LiberalNova @LiberalPT Achas? T...,What I am saying is that in Galp there are thous
4,#sicnoticias Não sei! Com a França a construir...,#sicnotices I don't know! With France to build...,I don't know! With France building the World's...
5,"@RuiSilv30012076 @LiberalNova @LiberalPT Pois,...","@RuiSilv30012076 @LiberalNova @LiberalPT Yes, ...","Because, your friend was from CDS and gets a f..."
6,@davidkirzner @LiberalNova @LiberalPT David ma...,@davidkirzner @LiberalNova @LiberalPT David bu...,"So I have a friend who is not rich, never linked"
7,Estamos a elaborar a nova Estratexia de Desenv...,We are developing the new Local Development St...,We are developing the new Participatory Local ...
8,Asociádevos ao #GALP e fomentemos xuntos o cre...,Join #GALP and together we promote economic gr...,#galp #mar #fish #Galicia https://t.co/30
9,"Os campos de óleo ""supergigantes"" possuem + de...","The ""supergigant"" oil fields have + 5 billion ...","The ""super giant"" oil fields have + than 5 bil..."


In [130]:
print(df_sample[text_columns].iloc[3].values)

['@davidkirzner @LiberalNova @LiberalPT Achas? A moça terminou o curso há pouco tempo e concorreu á Galp... O que estou a dizer é que na Galp trabalham milhares e milhares de pessoas, não é por trabalhar na Galp que somos maus. Nem sequer são eles que decidem preços e afins 😕'
 "@davidkirzner @LiberalNova @LiberalPT Achas? The girl finished the course a little while ago and competed at Galp... What I'm saying is that in Galp thousands and thousands of people work, it's not about working in Galp that we're bad. Not even they are who decide prices and business"
 'What I am saying is that in Galp there are thous']


In [131]:
print(df_sample[text_columns].iloc[13].values)

['#galp Um belo exemplo da areia que nos atiram para os olhos quando dizem que a petrolíferas não ganham ( tanto) dinheiro como deviam, #edp mesma coisa, chulos e miseráveis é o que fulanos são, com a conivência governamental https://t.co/nL08rnxd8p'
 "#galp A beautiful example of the arena that attracted us to our eyes when they say that oillifters don't earn (as much) money as they should, #edp the same thing, noise and misery is what crazy are, with the government conviction https://t.co/nL08rnxd8p"
 'A beautiful example of the sand that throw us to']


In [136]:
print(df_sample[text_columns].iloc[4].values)

['#sicnoticias Não sei! Com a França a construir a Nuclear maior do Mundo em produção de energia.....que vai exportar, a curto prazo não sei porque não á incentivos a aquecimento ect...cozinhar tudo a indução, até energia do vento tomar conta do assunto ... Á EDP ou Galp sem luvas!'
 "#sicnotices I don't know! With France to build the world's largest nuclear power production...... which will export, the short term I don't know why not to the incentives for heating etc...coincin all the induction, until wind energy take account of the matter ... to EDP or Galp without holes!"
 "I don't know! With France building the World's larges"]


In [141]:
# lenght of each original and translated tweet

df_sample[text_columns].applymap(lambda x: len(x))

Unnamed: 0,text_full,translation,translation_t5
0,95,90,64
1,199,195,51
2,222,249,57
3,274,284,48
4,280,294,53
5,117,118,50
6,206,207,48
7,239,204,73
8,250,222,41
9,278,240,50


For some reason, most of the tweets translated with T5 are truncated, but I cannot understand why...

#### 3.4. Translate tweets with an mBART model fine-tuned for PT-EN

In [158]:
# mBART model for translation to EN:
# https://huggingface.co/Narrativa/mbart-large-50-finetuned-opus-pt-en-translation

model = "Narrativa/mbart-large-50-finetuned-opus-pt-en-translation"

In [159]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [160]:
# test

apply_model("Isto é um teste!")

{'error': 'Translation requires a `src_lang` and a `tgt_lang` for this model'}

We get an error that apparently we cannot correct through the Inference API. Check: https://huggingface.co/docs/api-inference/detailed_parameters#translation-task .

We can try to run this model locally at a later stage (I'm not sure this will be feasible in terms of runtime).

#### 3.5. Translate tweets with a model fine-tuned on the Europarl parallel corpus for PT-EN

In [161]:
# model fine-tuned on the Europarl parallel corpus for translation to EN:
# https://huggingface.co/salesken/translation-spanish-and-portuguese-to-english

model = "salesken/translation-spanish-and-portuguese-to-english"

In [162]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [163]:
# test

apply_model("Isto é um teste!")

[{'translation_text': 'This is a test!'}]

In [164]:
%%time

tweets_translation = []

for tweet in df_sample['text_full']:
    
    translation_result = apply_model(tweet)
    tweets_translation.append({'translation': translation_result[0]['translation_text']})

Wall time: 46.5 s


In [165]:
df_sample['translation_europarl'] = pd.DataFrame(tweets_translation)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,top_sentiment,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...","{'label': 'Neutral', 'score': 0.7745079398155212}",0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...,"[{'label': 'Negative', 'score': 0.080651327967...","{'label': 'Neutral', 'score': 0.8546541333198547}",0.854654,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...","{'label': 'Negative', 'score': 0.8365047574043...",0.836505,Negative,The Galp invites Portuguese to think out of th...,"[{'label': 'Negative', 'score': 0.335132241249...","{'label': 'Neutral', 'score': 0.6241118311882019}",0.624112,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...","{'label': 'Positive', 'score': 0.5692487359046...",0.569249,Positive,"More one year that renews the young card, more...","[{'label': 'Negative', 'score': 0.024882558733...","{'label': 'Neutral', 'score': 0.8553747534751892}",0.855375,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...","{'label': 'Negative', 'score': 0.9013399481773...",0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,"[{'label': 'Negative', 'score': 0.117106109857...","{'label': 'Neutral', 'score': 0.8400129675865173}",0.840013,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...","{'label': 'Neutral', 'score': 0.4744070768356323}",0.474407,Neutral,#sicnotices I don't know! With France to build...,"[{'label': 'Negative', 'score': 0.108840383589...","{'label': 'Neutral', 'score': 0.8227448463439941}",0.822745,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi..."


In [168]:
# lenght of each original and translated tweet

text_columns.append('translation_europarl')

df_sample[text_columns].applymap(lambda x: len(x))

Unnamed: 0,text_full,translation,translation_t5,translation_europarl
0,95,90,64,105
1,199,195,51,182
2,222,249,57,226
3,274,284,48,281
4,280,294,53,313
5,117,118,50,116
6,206,207,48,215
7,239,204,73,230
8,250,222,41,234
9,278,240,50,264


We now get a complete translation for each tweet. We can now perform sentiment analysis on these translated tweets.

#### 3.6. Analyse sentiment of translated tweets

In [189]:
# EN model for sentiment analysis:
# https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest

model = "cardiffnlp/twitter-roberta-base-sentiment-latest"

In [190]:
API_URL = "https://api-inference.huggingface.co/models/" + model

In [191]:
# test

apply_model("This a test!")

[[{'label': 'Negative', 'score': 0.05583813413977623},
  {'label': 'Neutral', 'score': 0.7195356488227844},
  {'label': 'Positive', 'score': 0.22462621331214905}]]

In [192]:
%%time

tweets_sentiment = []

for tweet in df_sample['translation_europarl']:
    
    sentiment_result = apply_model(tweet)[0] # the result is a list inside a list
    tweets_sentiment.append({'sentiment': sentiment_result})

Wall time: 11.7 s


In [193]:
df_sample['translation_europarl_sent'] = pd.DataFrame(tweets_sentiment)

df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,score,label,translation,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...",...,0.774508,Neutral,Galp buys 140 million to 25% of Titan Solar th...,"[{'label': 'Negative', 'score': 0.167378321290...","{'label': 'Neutral', 'score': 0.7906758189201355}",0.790676,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...,"[{'label': 'Negative', 'score': 0.119433254003..."
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...",...,0.836505,Negative,The Galp invites Portuguese to think out of th...,"[{'label': 'Negative', 'score': 0.335718154907...","{'label': 'Neutral', 'score': 0.6248010993003845}",0.624801,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...,"[{'label': 'Negative', 'score': 0.192618951201..."
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...",...,0.569249,Positive,"More one year that renews the young card, more...","[{'label': 'Negative', 'score': 0.242178291082...","{'label': 'Neutral', 'score': 0.6383802890777588}",0.63838,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y...","[{'label': 'Negative', 'score': 0.284317612648..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,0.90134,Negative,@davidkirzner @LiberalNova @LiberalPT Achas? T...,"[{'label': 'Negative', 'score': 0.154701188206...","{'label': 'Neutral', 'score': 0.7540382742881775}",0.754038,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706..."
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...",...,0.474407,Neutral,#sicnotices I don't know! With France to build...,"[{'label': 'Negative', 'score': 0.242266610264...","{'label': 'Neutral', 'score': 0.6727580428123474}",0.672758,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi...","[{'label': 'Negative', 'score': 0.606779038906..."


In [194]:
# top sentiment
df_sample['translation_europarl_top_sent'] = df_sample['translation_europarl_sent'].map(lambda sentiment: max(sentiment, key=lambda x: x['score']))

# top score
df_sample['translation_europarl_score'] = df_sample['translation_europarl_top_sent'].map(lambda x: x['score'])

# top label
df_sample['translation_europarl_label'] = df_sample['translation_europarl_top_sent'].map(lambda x: x['label'])

In [195]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...",...,"[{'label': 'Negative', 'score': 0.167378321290...","{'label': 'Neutral', 'score': 0.7906758189201355}",0.790676,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...,"[{'label': 'Negative', 'score': 0.119433254003...","{'label': 'Neutral', 'score': 0.8383119702339172}",0.838312,Neutral
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...",...,"[{'label': 'Negative', 'score': 0.335718154907...","{'label': 'Neutral', 'score': 0.6248010993003845}",0.624801,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...,"[{'label': 'Negative', 'score': 0.192618951201...","{'label': 'Neutral', 'score': 0.7435513734817505}",0.743551,Neutral
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...",...,"[{'label': 'Negative', 'score': 0.242178291082...","{'label': 'Neutral', 'score': 0.6383802890777588}",0.63838,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y...","[{'label': 'Negative', 'score': 0.284317612648...","{'label': 'Neutral', 'score': 0.43263858556747...",0.432639,Neutral
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,"[{'label': 'Negative', 'score': 0.154701188206...","{'label': 'Neutral', 'score': 0.7540382742881775}",0.754038,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706...","{'label': 'Neutral', 'score': 0.7766045928001404}",0.776605,Neutral
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...",...,"[{'label': 'Negative', 'score': 0.242266610264...","{'label': 'Neutral', 'score': 0.6727580428123474}",0.672758,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi...","[{'label': 'Negative', 'score': 0.606779038906...","{'label': 'Negative', 'score': 0.6067790389060...",0.606779,Negative


In [196]:
df_sample['translation_europarl_label'].value_counts()

Neutral     15
Positive     3
Negative     2
Name: translation_europarl_label, dtype: int64

In [199]:
df_sample[(df_sample['label'] == df_sample['translation_europarl_label']) & (df_sample['label'] == 'Positive')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label
6,1550228150503833600,@davidkirzner @LiberalNova @LiberalPT David ma...,0,1,0,0,2022-07-21 21:16:07+00:00,"[David, mas, eu, tenho, uma, amiga, q, não, é,...",@davidkirzner @LiberalNova @LiberalPT David ma...,"[{'label': 'Positive', 'score': 0.606086671352...",...,"[{'label': 'Negative', 'score': 0.028445854783...","{'label': 'Positive', 'score': 0.6959779262542...",0.695978,Positive,"So I have a friend who is not rich, never linked",@davidkirzner @LiberalNew @LiberalPT David but...,"[{'label': 'Negative', 'score': 0.032849933952...","{'label': 'Positive', 'score': 0.6411634087562...",0.641163,Positive
8,1550227697737125889,Asociádevos ao #GALP e fomentemos xuntos o cre...,0,1,0,0,2022-07-21 21:14:19+00:00,"[Asociádevos, ao, #GALP, e, fomentemos, xuntos...",Asociádevos ao #GALP e fomentemos xuntos o cre...,"[{'label': 'Positive', 'score': 0.499892264604...",...,"[{'label': 'Negative', 'score': 0.003218225203...","{'label': 'Positive', 'score': 0.9122379422187...",0.912238,Positive,#galp #mar #fish #Galicia https://t.co/30,Supporters of the #GALP and let us drive forwa...,"[{'label': 'Negative', 'score': 0.003946688957...","{'label': 'Positive', 'score': 0.9115272164344...",0.911527,Positive


In [200]:
df_sample[(df_sample['label'] == df_sample['translation_europarl_label']) & (df_sample['label'] == 'Negative')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label
13,1550176312920346625,#galp Um belo exemplo da areia que nos atiram ...,0,0,0,0,2022-07-21 17:50:08+00:00,"[#galp, Um, belo, exemplo, da, areia, que, nos...",#galp Um belo exemplo da areia que nos atiram ...,"[{'label': 'Negative', 'score': 0.791540443897...",...,"[{'label': 'Negative', 'score': 0.469873368740...","{'label': 'Negative', 'score': 0.4698733687400...",0.469873,Negative,A beautiful example of the sand that throw us to,#galp An excellent example of the sand being t...,"[{'label': 'Negative', 'score': 0.875455021858...","{'label': 'Negative', 'score': 0.8754550218582...",0.875455,Negative


In [201]:
df_sample[df_sample['label'] != df_sample['translation_europarl_label']][['text_full', 'label', 'translation_europarl', 'translation_europarl_label']]

Unnamed: 0,text_full,label,translation_europarl,translation_europarl_label
1,A Galp convida portugueses a pensar fora do ca...,Negative,Galp invites Portuguese to think outside the c...,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,"Another year I renew the young card, another y...",Neutral
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,Negative,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,Neutral
4,#sicnoticias Não sei! Com a França a construir...,Neutral,"#sicnoticias I do not know, with France buildi...",Negative
7,Estamos a elaborar a nova Estratexia de Desenv...,Neutral,We are drawing up the new participatory local ...,Positive


In [202]:
len(df_sample[df_sample['label'] == df_sample['translation_europarl_label']])

15

In [204]:
# text and labels for original and translated tweets

label_columns = ['text_full', 'label', 'translation_label', 'translation_europarl_label']

df_sample[label_columns]

Unnamed: 0,text_full,label,translation_label,translation_europarl_label
0,Galp compra por 140 milhões os 25% da Titan So...,Neutral,Neutral,Neutral
1,A Galp convida portugueses a pensar fora do ca...,Negative,Neutral,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,Neutral,Neutral
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,Negative,Neutral,Neutral
4,#sicnoticias Não sei! Com a França a construir...,Neutral,Neutral,Negative
5,"@RuiSilv30012076 @LiberalNova @LiberalPT Pois,...",Neutral,Neutral,Neutral
6,@davidkirzner @LiberalNova @LiberalPT David ma...,Positive,Positive,Positive
7,Estamos a elaborar a nova Estratexia de Desenv...,Neutral,Neutral,Positive
8,Asociádevos ao #GALP e fomentemos xuntos o cre...,Positive,Positive,Positive
9,"Os campos de óleo ""supergigantes"" possuem + de...",Neutral,Neutral,Neutral


In [206]:
len(df_sample[(df_sample['label'] == df_sample['translation_label']) & 
              (df_sample['label'] == df_sample['translation_europarl_label'])])

14

Out of our 20 tweets, the sentiment is the same for the original and both of the translated versions of 14 tweets.

No positive (negative) tweet was considered to be negative (positive) for either of the translated versions.

As a next step, we could also use the previous approaches to sentiment/emotion analysis (VADER, TextBlob and DepecheMood) and apply them to the translated tweets.

Nota: para esta pequena amostra (e com alguns erros pelo meio), já quase que atingi o limite mensal de 30000 caracteres.

Se quisermos usar esta estratégia na fase de implementação do projecto, teremos de ponderar se valerá ou não a pena pagar pelo acesso: https://huggingface.co/pricing .

Alternativamente, em vez de usarmos a Inference API, creio que podemos fazer o download destes modelos e tentar corrê-los localmente. Contudo, não sei estimar o tempo de processamento que será necessário.

In [208]:
# save data frame to .pkl

df_sample.to_pickle('data_sentiment_PT.pkl')

#### 3.7. Translate tweets with an implementation of T5 for PT-EN with the pipeline() function

In [2]:
# read dataframe from .pkl

df_sample = pd.read_pickle('data_sentiment_PT.pkl')

In [3]:
translator = pipeline("translation", model="unicamp-dl/translation-pt-en-t5")

We just downloaded the PyTorch model (850 MB) and the rest of the necessary files.

In [4]:
%%time

translator("Isto é um teste com uma frase muito comprida. Isto é um teste com uma frase muito comprida. Isto é um teste com uma frase muito comprida.")

Your input_length: 37 is bigger than 0.9 * max_length: 20. You might consider increasing your max_length manually, e.g. translator('...', max_length=400)


Wall time: 851 ms


[{'translation_text': 'This is a test with a very long sentence. This is a test'}]

Has we had seen before, the output is truncated, but **now we can set the input length**.

In [5]:
%%time

translator("Isto é um teste com uma frase muito comprida. Isto é um teste com uma frase muito comprida. Isto é um teste com uma frase muito comprida.", 
           max_length=400)

Wall time: 2.01 s


[{'translation_text': 'This is a test with a very long sentence. This is a test with a very long sentence. This is a test with a very long sentence.'}]

Everything seems to be working just fine!

In [6]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_sent,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...",...,"[{'label': 'Negative', 'score': 0.167378321290...","{'label': 'Neutral', 'score': 0.7906758189201355}",0.790676,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...,"[{'label': 'Negative', 'score': 0.119433254003...","{'label': 'Neutral', 'score': 0.8383119702339172}",0.838312,Neutral
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...",...,"[{'label': 'Negative', 'score': 0.335718154907...","{'label': 'Neutral', 'score': 0.6248010993003845}",0.624801,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...,"[{'label': 'Negative', 'score': 0.192618951201...","{'label': 'Neutral', 'score': 0.7435513734817505}",0.743551,Neutral
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...",...,"[{'label': 'Negative', 'score': 0.242178291082...","{'label': 'Neutral', 'score': 0.6383802890777588}",0.63838,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y...","[{'label': 'Negative', 'score': 0.284317612648...","{'label': 'Neutral', 'score': 0.43263858556747...",0.432639,Neutral
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,"[{'label': 'Negative', 'score': 0.154701188206...","{'label': 'Neutral', 'score': 0.7540382742881775}",0.754038,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706...","{'label': 'Neutral', 'score': 0.7766045928001404}",0.776605,Neutral
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...",...,"[{'label': 'Negative', 'score': 0.242266610264...","{'label': 'Neutral', 'score': 0.6727580428123474}",0.672758,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi...","[{'label': 'Negative', 'score': 0.606779038906...","{'label': 'Negative', 'score': 0.6067790389060...",0.606779,Negative


In [9]:
# max length of input text

max(df_sample['text_full'].map(lambda x: len(x)))

280

We can set the `max_length` parameter to 300.

In [4]:
%%time

# the translator object accepts strings or lists as input

translations = translator(df_sample['text_full'].tolist(), max_length=300)

Wall time: 1min 17s


In [5]:
translations

[{'translation_text': 'Galp buys for 140 million the 25% of Titan Solar that it has not yet held https://t.co/tQNq9YoZMd'},
 {'translation_text': 'A Galp invites Portuguese to think outside the car..and who say that this is just “a cleaning of image”, a Galp, mandates “giving a bath to the dog”. The mask  falls. #De-evolution. AbsurdQ https://t.co/TW2XljNvxI invites Portuguese to think outside the car..and who says that this is just “a cleaning of image”, a Galp, mandates “give a bath to the dog”. The mask  falls. #De-evolution. AbsurdQ https://t.co/TW2XljNvxI invites Portuguese to think outside the car..and who says that this is just “a cleaning of image”, a Galp, mandates “give a bath to the dog”. The mask  falls. #De-evolution. AbsurdQ https://t.co/TW2XljNvxI invites Portuguese to think outside the car..and who'},
 {'translation_text': 'Another year that renew the young card, another year in which the chamber does not give me access to the discount code  to date the good part of the

Some of the translations are not really accurate and have "strange" repetitions, but now they are not truncated.

In [6]:
df_sample.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 24 columns):
 #   Column                         Non-Null Count  Dtype              
---  ------                         --------------  -----              
 0   id                             20 non-null     int64              
 1   text                           20 non-null     object             
 2   retweets                       20 non-null     int64              
 3   replies                        20 non-null     int64              
 4   likes                          20 non-null     int64              
 5   quotes                         20 non-null     int64              
 6   created_at                     20 non-null     datetime64[ns, UTC]
 7   tokens                         20 non-null     object             
 8   text_full                      20 non-null     object             
 9   sentiment                      20 non-null     object             
 10  top_sentiment               

In [7]:
df_sample["translation_t5_full"] = pd.DataFrame(translations)

In [8]:
df_sample["translation_t5_full"]

0     Galp buys for 140 million the 25% of Titan Sol...
1     A Galp invites Portuguese to think outside the...
2     Another year that renew the young card, anothe...
3     What I am saying is that in Galp there are tho...
4     I don't know! With France building the World's...
5     Because, your friend was from CDS and gets a f...
6     So I have a friend who is not rich, never link...
7     We are developing the new Participatory Local ...
8     #galp #mar #fish #Galicia https://t.co/30c2OZqfZW
9     The "super giant" oil fields have + than 5 bil...
10    But if you use Zara, Primark, H&amp;M, Gap, Sh...
11    Yes, Galp has a scene with the continent that ...
12    Titan 2020 was created in 2020 by Galp and the...
13    A beautiful example of the sand that throw us ...
14    Galp pays 140 million for another 25% of Titan...
15    Galp pays 140 million for another 25% of Titan...
16    Galp passes to detain Titan after buying the r...
17    Galp starts to hold Titan after buying the

We now have the full translations with the T5 model.

Since we did not use the Inference API but instead run the code locally with the help of the `pipeline()` function, our usage in terms of characters remained the same.

Our **limitation now is runtime and not the number of characters**.

#### 3.8. Analyse sentiment of translated tweets with the pipeline() function

In [9]:
classifier = pipeline("sentiment-analysis", model = "cardiffnlp/twitter-roberta-base-sentiment-latest")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


This PyTorch model is 480 MB in size.

In [23]:
%%time

classifier("Isto é um teste!", top_k=3) # top_k=3 to get the scores for all the possible labels

Wall time: 59.8 ms


[{'label': 'Neutral', 'score': 0.7641801238059998},
 {'label': 'Positive', 'score': 0.2026584893465042},
 {'label': 'Negative', 'score': 0.03316132351756096}]

In [19]:
%%time

sentiment_analysis = classifier(df_sample['translation_t5_full'].tolist(), top_k=3)

Wall time: 3.39 s


In [20]:
sentiment_analysis

[[{'label': 'Neutral', 'score': 0.8727536797523499},
  {'label': 'Negative', 'score': 0.07500078529119492},
  {'label': 'Positive', 'score': 0.052245523780584335}],
 [{'label': 'Neutral', 'score': 0.576215386390686},
  {'label': 'Negative', 'score': 0.3969547748565674},
  {'label': 'Positive', 'score': 0.026829885318875313}],
 [{'label': 'Negative', 'score': 0.4304381310939789},
  {'label': 'Neutral', 'score': 0.4217706620693207},
  {'label': 'Positive', 'score': 0.14779123663902283}],
 [{'label': 'Negative', 'score': 0.5356427431106567},
  {'label': 'Neutral', 'score': 0.41193896532058716},
  {'label': 'Positive', 'score': 0.0524182990193367}],
 [{'label': 'Neutral', 'score': 0.5460715889930725},
  {'label': 'Negative', 'score': 0.41088423132896423},
  {'label': 'Positive', 'score': 0.043044138699769974}],
 [{'label': 'Neutral', 'score': 0.7235640287399292},
  {'label': 'Positive', 'score': 0.19078437983989716},
  {'label': 'Negative', 'score': 0.08565160632133484}],
 [{'label': 'Posi

In [24]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_top_sent,translation_score,translation_label,translation_t5,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label,translation_t5_full
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...",...,"{'label': 'Neutral', 'score': 0.7906758189201355}",0.790676,Neutral,Galp buys for 140 million the 25% of Titan Sol...,Galp buys for 140 million the 25% of the Titan...,"[{'label': 'Negative', 'score': 0.119433254003...","{'label': 'Neutral', 'score': 0.8383119702339172}",0.838312,Neutral,Galp buys for 140 million the 25% of Titan Sol...
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...",...,"{'label': 'Neutral', 'score': 0.6248010993003845}",0.624801,Neutral,A Galp invites Portuguese to think outside the...,Galp invites Portuguese to think outside the c...,"[{'label': 'Negative', 'score': 0.192618951201...","{'label': 'Neutral', 'score': 0.7435513734817505}",0.743551,Neutral,A Galp invites Portuguese to think outside the...
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...",...,"{'label': 'Neutral', 'score': 0.6383802890777588}",0.63838,Neutral,"Another year that renew the young card, anothe...","Another year I renew the young card, another y...","[{'label': 'Negative', 'score': 0.284317612648...","{'label': 'Neutral', 'score': 0.43263858556747...",0.432639,Neutral,"Another year that renew the young card, anothe..."
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,"{'label': 'Neutral', 'score': 0.7540382742881775}",0.754038,Neutral,What I am saying is that in Galp there are thous,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706...","{'label': 'Neutral', 'score': 0.7766045928001404}",0.776605,Neutral,What I am saying is that in Galp there are tho...
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...",...,"{'label': 'Neutral', 'score': 0.6727580428123474}",0.672758,Neutral,I don't know! With France building the World's...,"#sicnoticias I do not know, with France buildi...","[{'label': 'Negative', 'score': 0.606779038906...","{'label': 'Negative', 'score': 0.6067790389060...",0.606779,Negative,I don't know! With France building the World's...


In [31]:
df_sample['translation_t5_full_sent'] = pd.Series(sentiment_analysis)

In [32]:
# top sentiment
df_sample['translation_t5_full_top_sent'] = df_sample['translation_t5_full_sent'].map(lambda sentiment: max(sentiment, key=lambda x: x['score']))

# top score
df_sample['translation_t5_full_score'] = df_sample['translation_t5_full_top_sent'].map(lambda x: x['score'])

# top label
df_sample['translation_t5_full_label'] = df_sample['translation_t5_full_top_sent'].map(lambda x: x['label'])

In [33]:
df_sample.head()

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label,translation_t5_full,translation_t5_full_sent,translation_t5_full_top_sent,translation_t5_full_score,translation_t5_full_label
0,1550250030476496900,Galp compra por 140 milhões os 25% da Titan So...,0,0,0,0,2022-07-21 22:43:04+00:00,"[Galp, compra, por, 140, milhões, os, 25, da, ...",Galp compra por 140 milhões os 25% da Titan So...,"[{'label': 'Neutral', 'score': 0.7745079398155...",...,Galp buys for 140 million the 25% of the Titan...,"[{'label': 'Negative', 'score': 0.119433254003...","{'label': 'Neutral', 'score': 0.8383119702339172}",0.838312,Neutral,Galp buys for 140 million the 25% of Titan Sol...,"[{'label': 'Neutral', 'score': 0.8727536797523...","{'label': 'Neutral', 'score': 0.8727536797523499}",0.872754,Neutral
1,1550246814963712001,A Galp convida portugueses a pensar fora do ca...,0,0,0,0,2022-07-21 22:30:17+00:00,"[A, Galp, convida, portugueses, a, pensar, for...",A Galp convida portugueses a pensar fora do ca...,"[{'label': 'Negative', 'score': 0.836504757404...",...,Galp invites Portuguese to think outside the c...,"[{'label': 'Negative', 'score': 0.192618951201...","{'label': 'Neutral', 'score': 0.7435513734817505}",0.743551,Neutral,A Galp invites Portuguese to think outside the...,"[{'label': 'Neutral', 'score': 0.5762153863906...","{'label': 'Neutral', 'score': 0.576215386390686}",0.576215,Neutral
2,1550243011350740992,"Mais um ano que renovo o cartão jovem, mais um...",0,0,0,0,2022-07-21 22:15:10+00:00,"[Mais, um, ano, que, renovo, o, cartão, jovem,...","Mais um ano que renovo o cartão jovem, mais um...","[{'label': 'Positive', 'score': 0.569248735904...",...,"Another year I renew the young card, another y...","[{'label': 'Negative', 'score': 0.284317612648...","{'label': 'Neutral', 'score': 0.43263858556747...",0.432639,Neutral,"Another year that renew the young card, anothe...","[{'label': 'Negative', 'score': 0.430438131093...","{'label': 'Negative', 'score': 0.4304381310939...",0.430438,Negative
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706...","{'label': 'Neutral', 'score': 0.7766045928001404}",0.776605,Neutral,What I am saying is that in Galp there are tho...,"[{'label': 'Negative', 'score': 0.535642743110...","{'label': 'Negative', 'score': 0.5356427431106...",0.535643,Negative
4,1550240739170439169,#sicnoticias Não sei! Com a França a construir...,0,0,0,0,2022-07-21 22:06:08+00:00,"[#sicnoticias, Não, sei, Com, a, França, a, co...",#sicnoticias Não sei! Com a França a construir...,"[{'label': 'Neutral', 'score': 0.4744070768356...",...,"#sicnoticias I do not know, with France buildi...","[{'label': 'Negative', 'score': 0.606779038906...","{'label': 'Negative', 'score': 0.6067790389060...",0.606779,Negative,I don't know! With France building the World's...,"[{'label': 'Neutral', 'score': 0.5460715889930...","{'label': 'Neutral', 'score': 0.5460715889930725}",0.546072,Neutral


In [34]:
df_sample['translation_t5_full_label'].value_counts()

Neutral     15
Negative     3
Positive     2
Name: translation_t5_full_label, dtype: int64

In [35]:
df_sample[(df_sample['label'] == df_sample['translation_t5_full_label']) & (df_sample['label'] == 'Positive')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label,translation_t5_full,translation_t5_full_sent,translation_t5_full_top_sent,translation_t5_full_score,translation_t5_full_label
6,1550228150503833600,@davidkirzner @LiberalNova @LiberalPT David ma...,0,1,0,0,2022-07-21 21:16:07+00:00,"[David, mas, eu, tenho, uma, amiga, q, não, é,...",@davidkirzner @LiberalNova @LiberalPT David ma...,"[{'label': 'Positive', 'score': 0.606086671352...",...,@davidkirzner @LiberalNew @LiberalPT David but...,"[{'label': 'Negative', 'score': 0.032849933952...","{'label': 'Positive', 'score': 0.6411634087562...",0.641163,Positive,"So I have a friend who is not rich, never link...","[{'label': 'Positive', 'score': 0.518101990222...","{'label': 'Positive', 'score': 0.5181019902229...",0.518102,Positive


In [36]:
df_sample[(df_sample['label'] == df_sample['translation_t5_full_label']) & (df_sample['label'] == 'Negative')]

Unnamed: 0,id,text,retweets,replies,likes,quotes,created_at,tokens,text_full,sentiment,...,translation_europarl,translation_europarl_sent,translation_europarl_top_sent,translation_europarl_score,translation_europarl_label,translation_t5_full,translation_t5_full_sent,translation_t5_full_top_sent,translation_t5_full_score,translation_t5_full_label
3,1550242176407425024,@davidkirzner @LiberalNova @LiberalPT Achas? A...,0,0,0,0,2022-07-21 22:11:51+00:00,"[Achas, A, moça, terminou, o, curso, há, pouco...",@davidkirzner @LiberalNova @LiberalPT Achas? A...,"[{'label': 'Negative', 'score': 0.901339948177...",...,@davidkirzner @LiberalNew @LiberalPT Achas? Th...,"[{'label': 'Negative', 'score': 0.097141742706...","{'label': 'Neutral', 'score': 0.7766045928001404}",0.776605,Neutral,What I am saying is that in Galp there are tho...,"[{'label': 'Negative', 'score': 0.535642743110...","{'label': 'Negative', 'score': 0.5356427431106...",0.535643,Negative
13,1550176312920346625,#galp Um belo exemplo da areia que nos atiram ...,0,0,0,0,2022-07-21 17:50:08+00:00,"[#galp, Um, belo, exemplo, da, areia, que, nos...",#galp Um belo exemplo da areia que nos atiram ...,"[{'label': 'Negative', 'score': 0.791540443897...",...,#galp An excellent example of the sand being t...,"[{'label': 'Negative', 'score': 0.875455021858...","{'label': 'Negative', 'score': 0.8754550218582...",0.875455,Negative,A beautiful example of the sand that throw us ...,"[{'label': 'Negative', 'score': 0.819003641605...","{'label': 'Negative', 'score': 0.8190036416053...",0.819004,Negative


In [38]:
df_sample[df_sample['label'] != df_sample['translation_t5_full_label']][['text_full', 'label', 'translation_t5_full', 'translation_t5_full_label']]

Unnamed: 0,text_full,label,translation_t5_full,translation_t5_full_label
1,A Galp convida portugueses a pensar fora do ca...,Negative,A Galp invites Portuguese to think outside the...,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,"Another year that renew the young card, anothe...",Negative
7,Estamos a elaborar a nova Estratexia de Desenv...,Neutral,We are developing the new Participatory Local ...,Positive
8,Asociádevos ao #GALP e fomentemos xuntos o cre...,Positive,#galp #mar #fish #Galicia https://t.co/30c2OZqfZW,Neutral


In [39]:
len(df_sample[df_sample['label'] == df_sample['translation_t5_full_label']])

16

In [40]:
# text and labels for original and translated tweets

label_columns = ['text_full', 'label', 'translation_label', 'translation_europarl_label', 'translation_t5_full_label']

df_sample[label_columns]

Unnamed: 0,text_full,label,translation_label,translation_europarl_label,translation_t5_full_label
0,Galp compra por 140 milhões os 25% da Titan So...,Neutral,Neutral,Neutral,Neutral
1,A Galp convida portugueses a pensar fora do ca...,Negative,Neutral,Neutral,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,Neutral,Neutral,Negative
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,Negative,Neutral,Neutral,Negative
4,#sicnoticias Não sei! Com a França a construir...,Neutral,Neutral,Negative,Neutral
5,"@RuiSilv30012076 @LiberalNova @LiberalPT Pois,...",Neutral,Neutral,Neutral,Neutral
6,@davidkirzner @LiberalNova @LiberalPT David ma...,Positive,Positive,Positive,Positive
7,Estamos a elaborar a nova Estratexia de Desenv...,Neutral,Neutral,Positive,Positive
8,Asociádevos ao #GALP e fomentemos xuntos o cre...,Positive,Positive,Positive,Neutral
9,"Os campos de óleo ""supergigantes"" possuem + de...",Neutral,Neutral,Neutral,Neutral


In [41]:
len(df_sample[(df_sample['label'] == df_sample['translation_label']) & 
              (df_sample['label'] == df_sample['translation_europarl_label']) & 
              (df_sample['label'] == df_sample['translation_t5_full_label'])
             ])

13

Out of our 20 tweets, the sentiment remains the same for 13 of the original and translated tweets in all 3 versions.

We've managed to translate and analyse the sentiment of our tweets without using HugginFace's Inference API. We downloaded both models an ran them locally.

#### 3.9. Analyse sentiment of translated tweets with VADER

In [44]:
sid = SentimentIntensityAnalyzer()

In [50]:
df_sample['vader_scores'] = df_sample['translation'].map(lambda tweet: sid.polarity_scores(tweet))

df_sample['vader_compound']  = df_sample['vader_scores'].map(lambda score_dict: score_dict['compound'])

df_sample['vader_label'] = df_sample['vader_compound'].map(lambda comp: 'Positive' if comp >=0.05 else ('Negative' if comp<=-0.05 else 'Neutral'))

In [51]:
df_sample['vader_label'].value_counts()

Positive    10
Neutral      8
Negative     2
Name: vader_label, dtype: int64

In [49]:
df_sample['translation_label'].value_counts()

Neutral     16
Positive     2
Negative     2
Name: translation_label, dtype: int64

We get much more positive tweets with VADER than we did with the first machine learning model we used to analyse the sentiment of the translated tweets.

We can change the threshold to get more coherent results.

In [54]:
df_sample['vader_compound']

0     0.0000
1     0.0000
2     0.2023
3    -0.5423
4     0.2076
5     0.8625
6     0.8847
7     0.0000
8     0.8807
9     0.5106
10    0.4404
11    0.4574
12    0.2500
13    0.1531
14    0.0000
15    0.0000
16   -0.1531
17    0.0000
18    0.0000
19    0.0000
Name: vader_compound, dtype: float64

In [55]:
# threshold >= 0.5 for positive tweets

df_sample['vader_label'] = df_sample['vader_compound'].map(lambda comp: 'Positive' if comp >=0.5 else ('Negative' if comp<=-0.05 else 'Neutral'))

In [57]:
df_sample['vader_label'].value_counts()

Neutral     14
Positive     4
Negative     2
Name: vader_label, dtype: int64

In [58]:
len(df_sample[df_sample['translation_label'] == df_sample['vader_label']])

14

In [60]:
len(df_sample[df_sample['label'] == df_sample['vader_label']])

14

In [61]:
len(df_sample[(df_sample['label'] == df_sample['vader_label']) & 
              (df_sample['translation_label'] == df_sample['vader_label'])])

12

In [59]:
# text and labels for the original and translated tweets with an ML model and VADER

label_columns = ['text_full', 'label', 'translation_label', 'vader_label']

df_sample[label_columns]

Unnamed: 0,text_full,label,translation_label,vader_label
0,Galp compra por 140 milhões os 25% da Titan So...,Neutral,Neutral,Neutral
1,A Galp convida portugueses a pensar fora do ca...,Negative,Neutral,Neutral
2,"Mais um ano que renovo o cartão jovem, mais um...",Positive,Neutral,Neutral
3,@davidkirzner @LiberalNova @LiberalPT Achas? A...,Negative,Neutral,Negative
4,#sicnoticias Não sei! Com a França a construir...,Neutral,Neutral,Neutral
5,"@RuiSilv30012076 @LiberalNova @LiberalPT Pois,...",Neutral,Neutral,Positive
6,@davidkirzner @LiberalNova @LiberalPT David ma...,Positive,Positive,Positive
7,Estamos a elaborar a nova Estratexia de Desenv...,Neutral,Neutral,Neutral
8,Asociádevos ao #GALP e fomentemos xuntos o cre...,Positive,Positive,Positive
9,"Os campos de óleo ""supergigantes"" possuem + de...",Neutral,Neutral,Positive


Out of our 20 tweets, 12 have the same label for the original and translated text, using either an ML model or VADER.

14 tweets have the same label when using an ML model for the original text and VADER for the translated text.

14 tweets have the same label when comparing an ML model with VADER for the translated text.