# Sentiment Analysis

![Sonny and Mariel high fiving.](https://github.com/senolcemhan98/templates/blob/main/sentiment.png?raw=true)

Model (HuggingFace) : https://huggingface.co/pysentimiento/robertuito-sentiment-analysis
- The model has several different language options (es, en, it, pt). (pt:Portuguese)
- Base model : BERT
- pysentimiento is an **open-source** library

In [5]:
from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="pt") 

def analyze_sentiment(text:str):

    probs = analyzer.predict(text).probas
    # Calculate the weighted average for sentiment_score
    return (probs['POS'] * 1 + probs['NEU'] * 0 + probs['NEG'] * -1)





In [25]:
import pandas as pd
data = pd.read_csv('./S_Data/order_reviews.csv')
from scipy.stats import spearmanr
data = data[data['review_comment_message'].isna() == False]
data = data[['review_score','review_comment_message']]

In [26]:
data['sentiment_score'] = data['review_comment_message'].apply(analyze_sentiment)

In [27]:
data.head()

Unnamed: 0,review_score,review_comment_message,sentiment_score
3,5,Recebi bem antes do prazo estipulado.,0.031753
4,5,Parabéns lojas lannister adorei comprar pela I...,0.986891
9,4,aparelho eficiente. no site a marca do aparelh...,-0.597534
12,4,"Mas um pouco ,travando...pelo valor ta Boa.\r\n",-0.578317
15,5,"Vendedor confiável, produto ok e entrega antes...",0.056698


In [28]:
data.describe()

Unnamed: 0,review_score,sentiment_score
count,41753.0,41753.0
mean,3.640409,0.253674
std,1.626383,0.700631
min,1.0,-0.991368
25%,2.0,-0.260423
50%,4.0,0.260692
75%,5.0,0.977939
max,5.0,0.992501


In [44]:
# Filter minimum sentiment_score
pd.set_option('display.max_colwidth', None)
min_sentiment_score = data['sentiment_score'].min()

print(f"Comment : {data.loc[data[data['sentiment_score'] == min_sentiment_score].index, 'review_comment_message']}")
print(f"Review Score : {data.loc[data[data['sentiment_score'] == min_sentiment_score].index, 'review_score']}")
print(f"Sentiment Score : {data.loc[data[data['sentiment_score'] == min_sentiment_score].index, 'sentiment_score']}")

Comment : 41817    Saca rolhas de plástico, EXTREMAMENTE FRACO, que não seria capaz de abrir nem uma mamadeira, quanto mais uma garrafa de vinho. Quebrou no 1° uso! DINHEIRO TOTALMENTE JOGADO FORA! PÉSSIMO! Loja targaryen.
Name: review_comment_message, dtype: object
Review Score : 41817    1
Name: review_score, dtype: int64
Sentiment Score : 41817   -0.991368
Name: sentiment_score, dtype: float64


In [45]:
# Filter maximum sentiment_score
max_sentiment_score = data['sentiment_score'].max()

print(f"Comment : {data.loc[data[data['sentiment_score'] == max_sentiment_score].index, 'review_comment_message']}")
print(f"Review Score : {data.loc[data[data['sentiment_score'] == max_sentiment_score].index, 'review_score']}")
print(f"Sentiment Score : {data.loc[data[data['sentiment_score'] == max_sentiment_score].index, 'sentiment_score']}")

Comment : 8425    Adorei a cauterização da trivitt quero pra vida inteira😍😍
Name: review_comment_message, dtype: object
Review Score : 8425    4
Name: review_score, dtype: int64
Sentiment Score : 8425    0.992501
Name: sentiment_score, dtype: float64


<img src="https://github.com/senolcemhan98/templates/blob/main/reviews.gif?raw=true" width="800" />

# Calculate Correlation

The **Spearman rank-order correlation coefficient** is a nonparametric measure of the monotonicity of the relationship between two datasets. Like other correlation coefficients, this one varies between -1 and +1 with **0 implying no correlation**. Correlations of **-1 or +1 imply an exact monotonic relationship**. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

In [47]:
correlation, _ = spearmanr(data['sentiment_score'], data['review_score'])
print(f"Spearman's correlation coefficient: {correlation}")

Spearman's correlation coefficient: 0.7283938241629598


# Conclusion

We can say that there is a positive correlation between our sentiment scores and review scores(data)!

Note: Model need further train. It sometimes cannot distinguish either positive or negative. Ie. "The model sometimes cannot distinguish between good and bad". Model predict as notr however it might positive because it's actually positive for receiver(customer).  