# Sentiment Analysis Modelling

The goal of this notebook is to test the performace of [CAMeL-BERT](https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment) model. At first, we will do some iterations on the model pipeline and then we will evaluate it on the 3 datasets that we have collected before.

In [50]:
import pandas as pd
import numpy as np

import torch
from transformers import pipeline
from scipy.special import softmax

In [51]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Available device:", device)

Available device: cpu


## Load and preprocess the data

In [52]:
URL = "https://github.com/swarmsTeam/swarms-ai/raw/main/sentiment-analysis/data/"
CompanyReviews = pd.read_csv(URL + "CompanyReviews.csv", index_col=0)
RestaurantReviews = pd.read_csv(URL + "RestaurantReviewsSample.csv", index_col=0)
appReviews = pd.read_csv(URL + "appReviews.csv", index_col=0)

In [53]:
appReviews = appReviews.dropna()
appReviews.head()

Unnamed: 0,review_description,rating,company
0,سيئ جدا بعد الإصدار الجديد,-1,alahli_bank
1,ابلكيشن زباله بجد,-1,alahli_bank
2,سيئ التطبيق لايعمل,-1,alahli_bank
3,للأسف التطبيق للأسوأ كان جدا رائع وسهل وبسيط ا...,-1,alahli_bank
4,التحديث بطيئ جدا جدا عند الفتح,-1,alahli_bank


## Model pipeline

In [54]:
MODEL = 'CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment'
pipe = pipeline("text-classification", model=MODEL)



### runing the model in a small examples

In [55]:
examples = ["ده ايفنت مش نافع", "كان حاجة اخر ملل", "معقول", "يعني حاسس انه في العموم مقبول وعادي", "روعة بجد"]
pipe(examples)

[{'label': 'negative', 'score': 0.7197925448417664},
 {'label': 'negative', 'score': 0.9548670053482056},
 {'label': 'neutral', 'score': 0.4663392901420593},
 {'label': 'negative', 'score': 0.6648894548416138},
 {'label': 'positive', 'score': 0.9908993244171143}]

In [56]:
output = pipe(examples)

In [57]:
for example in examples:
  output = pipe(example)[0]
  print(output['label'])

negative
negative
neutral
negative
positive


In [58]:
df = appReviews.iloc[:20, :]
df['label'] = None

for index, review in df.iterrows():
    output = pipe(review['review_description'])[0]
    if output['label'] == 'positive':
      label = 1
    elif output['label'] == 'negative':
      label = -1
    else: label = 0
    df.at[index, 'label'] = label

df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['label'] = None


Unnamed: 0,review_description,rating,company,label
0,سيئ جدا بعد الإصدار الجديد,-1,alahli_bank,-1
1,ابلكيشن زباله بجد,-1,alahli_bank,-1
2,سيئ التطبيق لايعمل,-1,alahli_bank,-1
3,للأسف التطبيق للأسوأ كان جدا رائع وسهل وبسيط ا...,-1,alahli_bank,-1
4,التحديث بطيئ جدا جدا عند الفتح,-1,alahli_bank,-1


In [59]:
print(df['label'] == df['rating'])

0      True
1      True
2      True
3      True
4      True
5      True
6      True
7     False
8      True
9     False
10     True
11     True
12     True
13     True
14     True
15     True
16     True
17     True
18     True
19     True
dtype: bool


In [60]:
df.iloc[7, :]

Unnamed: 0,7
review_description,اذا قمت بفتح التطبيق يطلب تحديث واذا ضغطت على ...
rating,0
company,alahli_bank
label,-1


In [61]:
df.iloc[7, 0]

'اذا قمت بفتح التطبيق يطلب تحديث واذا ضغطت على التحديث ما يقبل التحديث لاهو اللي فتح ولاهو اللي تحدث وش المشكله'

For index 7, our model classify it as negative while in the dataset it was rated as neutral. And here from the context we can say that our model has the correct choice in this case.

## Final pipeline

In [62]:
def get_scores(data):
  scores = []
  result = pipe(data)
  for i in range(len(result)):
    dic = dict(result[i])
    s = dic['score']
    if dic['label'] == 'negative':
      s = 1 - s
    elif dic['label'] == 'neutral':
      s = 0.5
    scores.append(s)
  return scores

In [63]:
def star_rating(scores):

  star_increment = len(scores) / 5
  total_score = sum(scores)
  star_rating = total_score / star_increment
  star_rating = round(star_rating, 2)
  return star_rating

# Example usage:
scores = get_scores(examples)
rating = star_rating(scores)
print(rating)