## Netflix Sentiment Analysis with Hugging Face Transformers

![cover](C:\Users\65814\Deskto\Netflix.jpg)

### Source of Dataset

Kaggle: https://www.kaggle.com/datasets/odins0n/top-20-play-store-app-reviews-daily-update?select=Netflix.csv

### Background

This notebook explores the use of the Hugging Face Transformer model 'distilbert-base-uncased-finetuned-sst-2-english' to perform sentiment analysis.

### Import Essential Libraries

In [1]:
# import required libraries
import pandas as pd
import numpy as np

### Load the Dataset

In [27]:
# load the dataset
data = pd.read_csv('../data/Netflix.csv')

# check the dimension of the dataset
numrow, numcol = data.shape
print('The dataset contains {} rows and {} columns.'.format(numrow, numcol))

# check the attributes in the dataset
attributes = data.columns
print('Attributes in the dataset:', attributes)

# display the first 5 rows of the dataset
display(data.head(n=5))

The dataset contains 10000 rows and 3 columns.
Attributes in the dataset: Index(['reviewId', 'content', 'score'], dtype='object')


Unnamed: 0,reviewId,content,score
0,bdd267b4-4231-4a5d-b369-3ac9e5082fc5,Your device is not part of the Netflix Househo...,1
1,ccbfabb0-606f-4596-b269-9e805ca4d89f,I've been trying to pay for a month since I cr...,1
2,fe550ddd-1ae5-4902-9593-824ecb9b6598,I give netflix a two because even though it is...,2
3,af88cca4-6b92-4bac-bc1f-ac32e2591ac7,Abdulrhamam Sekh,4
4,44dc9335-68e4-4f30-8c0a-241c25be252b,Good,5


In [4]:
# summary of the dataset
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   reviewId  10000 non-null  object
 1   content   9997 non-null   object
 2   score     10000 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 234.5+ KB


In [28]:
# check duplicates and missing value in each attribute
print('Number of duplicate records in the dataset:', data.duplicated().sum())
print('Number of missing value in each attributes:')
print(data.isna().sum())

Number of duplicate records in the dataset: 0
Number of missing value in each attributes:
reviewId    0
content     3
score       0
dtype: int64


In [29]:
# remove the records where variable 'content' is Null Value
data.dropna(inplace=True)

### Sentiment Analysis

In [12]:
from transformers import pipeline
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification

tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased-finetuned-sst-2-english')

nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

In [13]:
# sentiment analysis
text = list(data['content'].values)
results = nlp(text)

In [30]:
# insert the result columns in to the dataframe
data['sentiment'] = [r['label'] for r in results]

In [32]:
# display a random sample of the dataste
display(data.sample(n=10))

Unnamed: 0,reviewId,content,score,sentiment
4462,5a47448a-33d1-4cf9-8879-09c7a1062bbe,A stupid game add takes the whole screen of th...,1,NEGATIVE
1317,b3c439ce-98bf-401b-8eda-446165b041ff,Hi how are you watching movies with Netflix on it,2,POSITIVE
502,aa362273-0df5-439d-84b5-969d2e6cde15,Super,5,POSITIVE
1263,a7202d0f-633c-4efe-9f15-74f2afd44995,Thanks for giving me this app,5,POSITIVE
2793,25c5972e-842a-4a19-8171-85c5563ba23e,Z AA c😪,5,NEGATIVE
8293,1202804a-2406-4879-bb5d-417aeac27062,Getting rid of the basic plan and treating loy...,1,NEGATIVE
3580,d64ba9e5-7a52-4973-84ce-ffbf87c34b0a,Todos os conteúdos ficam carregando infinitame...,1,NEGATIVE
9809,55a999c0-4e9b-42eb-9a91-a3e78fcb430e,After downloaded the update. My netflix experi...,4,NEGATIVE
4382,efb0182f-1382-4029-ae4f-2d8b58af3465,Nood Ako Ng Netflix,5,NEGATIVE
9557,610fd332-2506-4a13-810c-dacd81f48815,Really glad with the app .it is convenient and...,5,POSITIVE
