# SENTIMENT ANALYSIS
Sentiment analysis also known as 'opinion mining' is the ability to classify opinions expressed in text and determine whether authors attitude is positive,neutral or negative.
This classification is vital as it helps make brand decisions and build brand reputations over time. It also helps monitor customer feedback,brand and brand campaign.
In this case we will use facebook comments attributed to Ponea Health account and products offered by Ponea Health.
We will attempt to use both the three and five point ordinal scales to classify the comments.
Since sentiment analysis involves natural language processing we will download and import a couple of python libraries that we intend to use.
We will use a neural network in our attempt to build a model that classifies sentiments.

In [2]:
#importing requisite libraries
import numpy as np
import pandas as pd
import matplotlib as plt
from sklearn.feature_extraction.text import CountVectorizer
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense,Embedding,LSTM,SpatialDropout1D
from sklearn.model_selection import train_test_split
from keras.utils.np_utils import to_categorical

In [9]:
#installing facebook-scraper(extracts posts withouth using the API Key)
#pip install facebook_scraper==0.2.49

In [10]:
#enabling GPU accelerator if available
DEVICE=torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (f'Device Available: DEVICE')

Device Available: DEVICE


### Data Mining and Obtaining Facebook Comments From Ponea Kenya

In [11]:
#getting comments from facebook using facebook_scraper 
#scrapes posts and comments withou the API Key
#'Comments=True/any number limits the comments'
#pages are the number of pages the script runs to obtain posts and comments
from facebook_scraper import *

#obtaining data from facebook using the facebook_scraper
listposts=[]
for posts in get_posts('PoneaHealth', pages=1000,options={'comments':True}):
    listposts.append(posts)
print('Number of Posts:{}'.format(len(listposts)))

Number of Posts:166


In [12]:
#copying raw data into a dataframe
columns=['post_id','time', 'text', 'likes', 'comments', 'shares', 'comments_full']
df_posts=pd.DataFrame(listposts)[columns]

In [13]:
df_posts

Unnamed: 0,post_id,time,text,likes,comments,shares,comments_full
0,525943392459447,2022-02-04 12:32:47,"As we mark World Cancer Day, let's be intentio...",0,0,0,[]
1,504735794580207,2022-01-01 16:28:34,Happy New Year from all of us at Ponea Health....,4,0,0,[]
2,500781688308951,2021-12-26 14:15:09,"Happy Boxing Day, from Ponea Health.\n\n#Ponea...",0,0,0,[]
3,499993968387723,2021-12-25 11:55:27,"Ponea Health wishes you a happy, healthy, and ...",0,0,0,[]
4,497610005292786,2021-12-21 22:03:37,Join us tomorrow evening as we debunk the Omic...,3,0,0,[]
...,...,...,...,...,...,...,...
161,134143058306151,2020-06-12 12:28:12,We are now hiring! Dreaming of creating a star...,6,1,2,"[{'comment_id': '141896854197438', 'comment_ur..."
162,106687097718414,2020-05-15 14:36:36,UNLEARNING WHAT YOU THOUGHT YOU KNEW ABOUT COS...,2,0,0,[]
163,106680087719115,2020-05-15 14:28:20,,4,0,0,[]
164,106678241052633,2020-05-15 14:20:52,,4,0,0,[]


In [14]:
df_posts.shape

(166, 7)

In [15]:
df_posts.head()

Unnamed: 0,post_id,time,text,likes,comments,shares,comments_full
0,525943392459447,2022-02-04 12:32:47,"As we mark World Cancer Day, let's be intentio...",0,0,0,[]
1,504735794580207,2022-01-01 16:28:34,Happy New Year from all of us at Ponea Health....,4,0,0,[]
2,500781688308951,2021-12-26 14:15:09,"Happy Boxing Day, from Ponea Health.\n\n#Ponea...",0,0,0,[]
3,499993968387723,2021-12-25 11:55:27,"Ponea Health wishes you a happy, healthy, and ...",0,0,0,[]
4,497610005292786,2021-12-21 22:03:37,Join us tomorrow evening as we debunk the Omic...,3,0,0,[]


In [16]:
df_posts.tail()

Unnamed: 0,post_id,time,text,likes,comments,shares,comments_full
161,134143058306151,2020-06-12 12:28:12,We are now hiring! Dreaming of creating a star...,6,1,2,"[{'comment_id': '141896854197438', 'comment_ur..."
162,106687097718414,2020-05-15 14:36:36,UNLEARNING WHAT YOU THOUGHT YOU KNEW ABOUT COS...,2,0,0,[]
163,106680087719115,2020-05-15 14:28:20,,4,0,0,[]
164,106678241052633,2020-05-15 14:20:52,,4,0,0,[]
165,106678644385926,2020-05-15 14:20:52,,4,0,0,[]


In [18]:
#saving the dataframe as a .csv file
df_posts.to_csv('facebook_sentiment_test.csv')

# Building the Model to Use
In this instance we will build a neural network that we will then use to predict sentiments from the facebook comments. Due to the limited comments extracted from the ponea kenya facebook page, we used dummy data to build the model which will then be used to predict the polarity the facebook comments obtained from ponea kenya facebook page.

In [4]:
#importing the  and data understanding
fb1=pd.read_csv('fb_sentiment_train.csv')
#the data picked out has comments and labels. The labels are O,P,N. They classify the sentiments in comments. The data also has rows and the sentiments of the comments are randomly distributed.

## Data Exploration and Cleaning

In [16]:
fb1.head()

Unnamed: 0,unnamed: 0,fbpost,label
0,0,drug runners and a us senator have something ...,O
1,1,heres a single to add to kindle just read this...,O
2,2,if you tire of nonfiction check out httpwwwama...,O
3,3,ghost of round island is supposedly nonfiction,O
4,4,why is barnes and nobles version of the kindle...,N


In [6]:
fb1.tail()

Unnamed: 0.1,Unnamed: 0,FBPost,Label
995,995,I liked it. Its youth oriented and I think th...,P
996,996,"I think the point of the commercial is that, e...",P
997,997,Kindle 3 is such a great product. I could not ...,P
998,998,develop a way to share books! that is a big d...,N
999,999,I love my kindle! =),P


In [8]:
fb1.shape

(1000, 3)

### Lowercasing Column names

In [None]:
fb1.columns=map(str.lower,fb1.columns)

In [11]:
fb1

Unnamed: 0,unnamed: 0,fbpost,label
0,0,Drug Runners and a U.S. Senator have somethin...,O
1,1,"Heres a single, to add, to Kindle. Just read t...",O
2,2,If you tire of Non-Fiction.. Check out http://...,O
3,3,Ghost of Round Island is supposedly nonfiction.,O
4,4,Why is Barnes and Nobles version of the Kindle...,N
...,...,...,...
995,995,I liked it. Its youth oriented and I think th...,P
996,996,"I think the point of the commercial is that, e...",P
997,997,Kindle 3 is such a great product. I could not ...,P
998,998,develop a way to share books! that is a big d...,N


### Lowercasing Comments and Removing Symbols Using RegEx

In [13]:
#we will lowercase the text in the comments and remove symbols using python library Regular Expressions
import re

In [14]:
fb1['fbpost']=fb1['fbpost'].apply(lambda x:x.lower())
fb1['fbpost']=fb1['fbpost'].apply((lambda x:re.sub('[^a-zA-z0-9\s]','',x)))

In [15]:
#data after cleaning
fb1

Unnamed: 0,unnamed: 0,fbpost,label
0,0,drug runners and a us senator have something ...,O
1,1,heres a single to add to kindle just read this...,O
2,2,if you tire of nonfiction check out httpwwwama...,O
3,3,ghost of round island is supposedly nonfiction,O
4,4,why is barnes and nobles version of the kindle...,N
...,...,...,...
995,995,i liked it its youth oriented and i think thi...,P
996,996,i think the point of the commercial is that ev...,P
997,997,kindle 3 is such a great product i could not b...,P
998,998,develop a way to share books that is a big dr...,N


### Tokenizing The Comments
Language in its original form cannot be accurately processed by a computer, so we need to breakdown the language to make it easier for the computer to understand. The first part of making sense of the comments data is by a process called tokenization which is splitting strings into smaller parts called tokens.
A token is a sequence of characters in text that serves as a unit. Based on how the tokens are created , they may consist of words, emoticons, hashtags, links, or even individual characters. A basic way of breaking language into tokens is by splitting the text based on whitespace and punctuation.
In this instance we will tokenize using whitespaces.