# Predicting Sentiment of Youtube video using VADER

https://www.youtube.com/watch?v=1qS6J-WbTD8&t=620s

Putin's Speech on Ukrane and US Foreign Policy and Nato 

### Sentiment Analysis
Sentiment Analysis isthe process of computationally identifying and categorizing opinions expressed in a piece of text, in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.

Supervised learning algorithms like ANN rely on examples to learn from. Each instance should be a pair of a sample with its respected target value. The number of examples we need to train these models depends on how deep your network is and how much labeled data you have. 

In this exercise, as the text data is scraped from the comments section of a video, the data has no labels. Therefore, we create labels using the VADER algorithm. It is an unsupervised, rule-based method for sentiment analysis, and it is accessible through the NLTK package. 

### More about VADER

VADER is a lexicon and rule-based feeling analysis instrument that is explicitly sensitive to suppositions communicated in web-based media. VADER utilizes a mix of lexical highlights (e.g., words) that are, for the most part, marked by their semantic direction as one or the other positive or negative. Thus, VADER not only tells about the Polarity score yet, in addition, it tells us concerning how positive or negative a conclusion is.

VADER belongs to a kind of sentiment analysis that depends on lexicons of sentiment-related words. In this methodology, every one of the words in the vocabulary is appraised with respect to whether it is positive or negative, and, how +ve or -ve. Beneath you can see an extract from VADER’s vocabulary, where more positive words have higher positive evaluations and more adverse words have lower negative grades.

VADER produces four sentiment measurements from these word grading. The initial three, +ve, neutral, and -ve, address the extent of the content that falls into those classifications. The last measurement, the compound score, is the total amount of the lexicon grades, which have been normalized to run between – 1 and 1.


In [3]:
# DataFrame
import pandas as pd

# Matplot
import matplotlib.pyplot as plt
%matplotlib inline

# Scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
from sklearn.manifold import TSNE
from sklearn.feature_extraction.text import TfidfVectorizer

# Keras
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, Embedding, Flatten, Conv1D, MaxPooling1D, LSTM
from keras import utils
from keras.callbacks import ReduceLROnPlateau, EarlyStopping

# nltk
import nltk
from nltk.corpus import stopwords
from  nltk.stem import SnowballStemmer

# Utility
import re
import numpy as np
import os
from collections import Counter
import logging
import time
import pickle
import itertools

import spacy
import string

In [4]:
df= pd.read_csv('cleaned_comments.csv')
df.head()

Unnamed: 0,Comment
0,max lebreton 1 second ago time man bomb shit s...
1,still army hasnt gain ground still
2,great man
3,god im sorry cant help right
4,fight sick b


In [5]:
nltk.download("vader_lexicon")

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/cindyyu/nltk_data...


True

In [6]:
# add vadar scores

from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

df['scores'] = df['Comment'].apply(lambda x: analyzer.polarity_scores(x))

In [8]:
df

Unnamed: 0,Comment,scores
0,max lebreton 1 second ago time man bomb shit s...,"{'neg': 0.47, 'neu': 0.53, 'pos': 0.0, 'compou..."
1,still army hasnt gain ground still,"{'neg': 0.357, 'neu': 0.643, 'pos': 0.0, 'comp..."
2,great man,"{'neg': 0.0, 'neu': 0.196, 'pos': 0.804, 'comp..."
3,god im sorry cant help right,"{'neg': 0.411, 'neu': 0.347, 'pos': 0.243, 'co..."
4,fight sick b,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound..."
...,...,...
18756,ya fuck crazy man think,"{'neg': 0.663, 'neu': 0.337, 'pos': 0.0, 'comp..."
18757,russian horrible thank subtitle,"{'neg': 0.438, 'neu': 0.25, 'pos': 0.312, 'com..."
18758,thank,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound..."
18759,best please keep every future speech,"{'neg': 0.0, 'neu': 0.381, 'pos': 0.619, 'comp..."


Now will call out compound as a separate column and all values greater than zeroes will be considered a positive review and all values less than zero would be considered as a negative review.

In [10]:
df['compound']  = df['scores'].apply(lambda score_dict: score_dict['compound'])

df.head()

Unnamed: 0,Comment,scores,compound
0,max lebreton 1 second ago time man bomb shit s...,"{'neg': 0.47, 'neu': 0.53, 'pos': 0.0, 'compou...",-0.9657
1,still army hasnt gain ground still,"{'neg': 0.357, 'neu': 0.643, 'pos': 0.0, 'comp...",-0.4168
2,great man,"{'neg': 0.0, 'neu': 0.196, 'pos': 0.804, 'comp...",0.6249
3,god im sorry cant help right,"{'neg': 0.411, 'neu': 0.347, 'pos': 0.243, 'co...",-0.1174
4,fight sick b,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",-0.7096


In [11]:
df['comp_score'] = df['compound'].apply(lambda c: 'pos' if c >=0 else 'neg')

df

Unnamed: 0,Comment,scores,compound,comp_score
0,max lebreton 1 second ago time man bomb shit s...,"{'neg': 0.47, 'neu': 0.53, 'pos': 0.0, 'compou...",-0.9657,neg
1,still army hasnt gain ground still,"{'neg': 0.357, 'neu': 0.643, 'pos': 0.0, 'comp...",-0.4168,neg
2,great man,"{'neg': 0.0, 'neu': 0.196, 'pos': 0.804, 'comp...",0.6249,pos
3,god im sorry cant help right,"{'neg': 0.411, 'neu': 0.347, 'pos': 0.243, 'co...",-0.1174,neg
4,fight sick b,"{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound...",-0.7096,neg
...,...,...,...,...
18756,ya fuck crazy man think,"{'neg': 0.663, 'neu': 0.337, 'pos': 0.0, 'comp...",-0.7096,neg
18757,russian horrible thank subtitle,"{'neg': 0.438, 'neu': 0.25, 'pos': 0.312, 'com...",-0.2500,neg
18758,thank,"{'neg': 0.0, 'neu': 0.0, 'pos': 1.0, 'compound...",0.3612,pos
18759,best please keep every future speech,"{'neg': 0.0, 'neu': 0.381, 'pos': 0.619, 'comp...",0.7579,pos


In [12]:
df['comp_score'].value_counts()

pos    11255
neg     7506
Name: comp_score, dtype: int64

The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1(most extreme negative) and +1 (most extreme positive).

positive sentiment : (compound score >= 0.05) 

neutral sentiment : (compound score > -0.05) and (compound score < 0.05) 

negative sentiment : (compound score <= -0.05)

### It can be concluded that the overall sentiment of the video is positive