# Finding the sentiments of MySynovus App users


The days when one would get data in tabulated spreadsheets are truly behind us. A moment of silence for the data residing in the spreadsheet pockets. Today, more than 80% of the data is unstructured – it is either present in data silos or scattered around the digital archives. Data is being produced as we speak – from every conversation we make in the social media to every content generated from news sources. In order to produce any meaningful actionable insight from data, it is important to know how to work with it in its unstructured form.


One of the first steps in working with text data is to pre-process it. It is an essential step before the data is ready for analysis. Majority of available text data is highly unstructured and noisy in nature – to achieve better insights or to build better algorithms, it is necessary to play with clean data and especially social media data is highly unstructured – it is an informal communication – typos, sarcasm, usage of slang, presence of unwanted content like URLs, Stopwords, Expressions etc. are the usual suspects. I'm coming up with the below 10 steps to clean the reviews which could be made ready for the analysis. Pls remember that these are not limited and we could come up with more steps to pre-process/clean the data. 

In [1]:
#Importing Standard Libraries
import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from nltk.stem import WordNetLemmatizer
import nltk
import re
import os
import gc
from nltk.corpus import wordnet, stopwords
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer
from colorama import Fore, Back, Style

# Pre-Processing/ Cleansing:

# Load Dataset 

In [2]:
reviews = pd.read_csv("snv_reviews_complete.csv")
text = reviews['description']  #extracting the descriptions
title = reviews['Title']   #extracting the titles
for row in text[:10]:
    print(row)
    
#let's review the first 10 reviews.

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.
Where there when I need it.
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.
They stole my id and open account in GA..
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS
Very helpful and convenient.
balance is never correct causing me to overdraft my account. happens every month. not cool
synovus mobile apps is too slow! I'm still waiting. almost 1hr now!
glitches every few weeks and i have to delete and redownload because it says invalid credentials
I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgages. I did mine in 14 days and the person was accessible, trustworthy and friendly. Synovus is a great mortgage local bank.


## 1. Remove Numbers
**Example:** I'm 24 years old and Scicom could be a 100 million dollar company by the time I turn 30. -->  I'm  years old and Scicom could be a  million dollar company by the time I turn.

In [3]:
def removeNumbers(text):
    """ Removes integers """
    text = ''.join([i for i in text if not i.isdigit()])         
    return text

text_removeNumbers = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_removeNumbers['TextBefore'] = text.copy()

In [4]:
for index, row in text_removeNumbers.iterrows():
    row['TextAfter'] = removeNumbers(str(row['TextBefore']))

In [5]:
text_removeNumbers['Changed'] = np.where(text_removeNumbers['TextBefore']==text_removeNumbers['TextAfter'], 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_removeNumbers[text_removeNumbers['Changed']=='yes']), len(text_removeNumbers), 100*len(text_removeNumbers[text_removeNumbers['Changed']=='yes'])/len(text_removeNumbers)))

200 of 997 (20.0602%) reviews have been changed.


In [6]:
for index, row in text_removeNumbers[text_removeNumbers['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

synovus mobile apps is too slow! I'm still waiting. almost 1hr now! -> synovus mobile apps is too slow! I'm still waiting. almost hr now!
I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgages. I did mine in 14 days and the person was accessible, trustworthy and friendly. Synovus is a great mortgage local bank. -> I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives $ bills. Is excellent for mortgages. I did mine in  days and the person was accessible, trustworthy and friendly. Synovus is a great mortgage local bank.
It says "Application Failed to Load"...I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking...I MISS NBSC -> It says "Application Failed to Load"..

## 2. Replace Repetitions of Punctuation

Capturing these items will help us in our calculations since they tend to carry high emotions

This technique:
 - replaces repetitions of exlamation marks with the tag "multiExclamation"
 - replaces repetitions of question marks with the tag "multiQuestion"
 - replaces repetitions of stop marks with the tag "multiStop"
 
 **Example:** Bitcoin is Awesome!!--> Bitcoin is Awesome multiExclamation

In [7]:
def replaceMultiExclamationMark(text):
    """ Replaces repetitions of exlamation marks """
    text = re.sub(r"(\!)\1+", ' multiExclamation ', text)
    return text

def replaceMultiQuestionMark(text):
    """ Replaces repetitions of question marks """
    text = re.sub(r"(\?)\1+", ' multiQuestion ', text)
    return text

def replaceMultiStopMark(text):
    """ Replaces repetitions of stop marks """
    text = re.sub(r"(\.)\1+", ' multiStop ', text)
    return text

text_replaceRepOfPunct = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_replaceRepOfPunct['TextBefore'] = text.copy()

In [8]:
for index, row in text_replaceRepOfPunct.iterrows():
    row['TextAfter'] = replaceMultiExclamationMark(str(row['TextBefore']))
    row['TextAfter'] = replaceMultiQuestionMark(str(row['TextBefore']))
    row['TextAfter'] = replaceMultiStopMark(str(row['TextBefore']))

In [9]:
text_replaceRepOfPunct['Changed'] = np.where(text_replaceRepOfPunct['TextBefore']==text_replaceRepOfPunct['TextAfter'], 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_replaceRepOfPunct[text_replaceRepOfPunct['Changed']=='yes']), len(text_replaceRepOfPunct), 100*len(text_replaceRepOfPunct[text_replaceRepOfPunct['Changed']=='yes'])/len(text_replaceRepOfPunct)))

93 of 997 (9.3280%) reviews have been changed.


In [10]:
for index, row in text_replaceRepOfPunct[text_replaceRepOfPunct['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

They stole my id and open account in GA.. -> They stole my id and open account in GA multiStop 
It says "Application Failed to Load"...I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking...I MISS NBSC -> It says "Application Failed to Load" multiStop I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking multiStop I MISS NBSC
Synovus cannot send OTP overseas!! The online website constantly requests for a new OTP......I live mostly overseas now and do not have access to US mobile cell number and I am BLOCKED from banking!  Why can't the app generate OTP?  I have previously entered OTP on my laptop and could access banking website for a whil

## 3. Remove Punctuation
**Example:** Adam White, COO of the New York Stock Exchange's crypto venture, Bakkt, had this to say: "What we previously saw only in the domains of sovereign nation states—like money—now via technology and software, we can now do and accomplish in a decentralized way." --> Adam White COO of the New York Stock Exchanges crypto venture Bakkt had this to say What we previously saw only in the domains of sovereign nation states like money now via technology and software we can now do and accomplish in a decentralized way

In [11]:
import string
translator = str.maketrans('', '', string.punctuation)
text_removePunctuation = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_removePunctuation['TextBefore'] = text.copy()

In [12]:
for index, row in text_removePunctuation.iterrows():
    row['TextAfter'] = str(row['TextBefore']).translate(translator)

In [13]:
text_removePunctuation['Changed'] = np.where(text_removePunctuation['TextBefore']==text_removePunctuation['TextAfter'], 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_removePunctuation[text_removePunctuation['Changed']=='yes']), len(text_removePunctuation), 100*len(text_removePunctuation[text_removePunctuation['Changed']=='yes'])/len(text_removePunctuation)))

885 of 997 (88.7663%) reviews have been changed.


In [14]:
for index, row in text_removePunctuation[text_removePunctuation['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> It keeps giving me an error with my fingerprint scanner since the last update Other than that its a good app
Where there when I need it. -> Where there when I need it
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called. -> I love being able to check my account But I lost my phone and I had no way into my account at all till I called
They stole my id and open account in GA.. -> They stole my id and open account in GA
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> every time I open the app it says application failed to load stoptyping i have already updated twice sTill SUCKS


In [15]:
for index, row in text_removePunctuation[text_removePunctuation['Changed']=='no'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

glitches every few weeks and i have to delete and redownload because it says invalid credentials -> glitches every few weeks and i have to delete and redownload because it says invalid credentials
will not work right it keeps givong errors -> will not work right it keeps givong errors
see -> see
one star I do not recommend -> one star I do not recommend
wonderful app -> wonderful app


## 4. Replace Contractions
This techniques replaces contractions to their equivalents.

**Example:** Millennials aren't buying homes -> Millennials are not buying homes

In [16]:
contraction_patterns = [ (r'won\'t', 'will not'), (r'can\'t', 'cannot'), (r'i\'m', 'i am'), (r'ain\'t', 'is not'), (r'(\w+)\'ll', '\g<1> will'), (r'(\w+)n\'t', '\g<1> not'),
                         (r'(\w+)\'ve', '\g<1> have'), (r'(\w+)\'s', '\g<1> is'), (r'(\w+)\'re', '\g<1> are'), (r'(\w+)\'d', '\g<1> would'), (r'&', 'and'), (r'dammit', 'damn it'), (r'dont', 'do not'), (r'wont', 'will not')
                        , (r'Can\'t', 'cannot'), (r'I\'m', 'i am'), (r'Are\'t', 'Are not') ]
def replaceContraction(text):
    patterns = [(re.compile(regex), repl) for (regex, repl) in contraction_patterns]
    for (pattern, repl) in patterns:
        (text, count) = re.subn(pattern, repl, text)
    return text

text_replaceContractions = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_replaceContractions['TextBefore'] = text.copy()

In [17]:
for index, row in text_replaceContractions.iterrows():
    row['TextAfter'] = replaceContraction(str(row['TextBefore']))

In [18]:
text_replaceContractions['Changed'] = np.where(text_replaceContractions['TextBefore']==text_replaceContractions['TextAfter'], 'no', 'yes')
print("{} of {} ({:.4f}%) questions have been changed.".format(len(text_replaceContractions[text_replaceContractions['Changed']=='yes']), len(text_replaceContractions), 100*len(text_replaceContractions[text_replaceContractions['Changed']=='yes'])/len(text_replaceContractions)))

422 of 997 (42.3270%) questions have been changed.


In [19]:
for index, row in text_replaceContractions[text_replaceContractions['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it is a good app.
synovus mobile apps is too slow! I'm still waiting. almost 1hr now! -> synovus mobile apps is too slow! i am still waiting. almost 1hr now!
It says "Application Failed to Load"...I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking...I MISS NBSC -> It says "Application Failed to Load"...I have uninstalled, reinstalled, and still does not work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking...I MISS NBSC
Application failed to load. Last three months. Can't use it.

## 5. Lowercase
**Example:** The Amazon Fires Have Disastrous Environmental Consequences -> the amazon fires have disastrous environmental consequences

In [20]:
text_lowercase = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_lowercase['TextBefore'] = text.copy()

In [21]:
for index, row in text_lowercase.iterrows():
    row['TextAfter'] = str(row['TextBefore']).lower()

In [22]:
text_lowercase['Changed'] = np.where(text_lowercase['TextBefore']==text_lowercase['TextAfter'], 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_lowercase[text_lowercase['Changed']=='yes']), len(text_lowercase), 100*len(text_lowercase[text_lowercase['Changed']=='yes'])/len(text_lowercase)))

927 of 997 (92.9789%) reviews have been changed.


In [23]:
for index, row in text_lowercase[text_lowercase['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> it keeps giving me an error with my fingerprint scanner since the last update. other than that, it's a good app.
Where there when I need it. -> where there when i need it.
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called. -> i love being able to check my account. but i lost my phone and i had no way into my account at all till i called.
They stole my id and open account in GA.. -> they stole my id and open account in ga..
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> every time i open the app it says "application failed to load". stoptyping i have already updated. twice! still sucks


Some question are written only in lowercase. This happens when they start with a number.

In [24]:
for index, row in text_lowercase[text_lowercase['Changed']=='no'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

balance is never correct causing me to overdraft my account. happens every month. not cool -> balance is never correct causing me to overdraft my account. happens every month. not cool
glitches every few weeks and i have to delete and redownload because it says invalid credentials -> glitches every few weeks and i have to delete and redownload because it says invalid credentials
will not work right it keeps givong errors -> will not work right it keeps givong errors
see -> see
wonderful app -> wonderful app


## 6. Replace Negations with Antonyms
**Example:** She is not happy with this gift :( --> She is sad with this gift :(

In [25]:
import nltk
from nltk.corpus import wordnet

def replace(word, pos=None):
    """ Creates a set of all antonyms for the word and if there is only one antonym, it returns it """
    antonyms = set()
    for syn in wordnet.synsets(word, pos=pos):
        for lemma in syn.lemmas():
            for antonym in lemma.antonyms():
                antonyms.add(antonym.name())
    if len(antonyms) == 1:
        return antonyms.pop()
    else:
        return None

def replaceNegations(text):
    """ Finds "not" and antonym for the next word and if found, replaces not and the next word with the antonym """
    i, l = 0, len(text)
    words = []
    while i < l:
        word = text[i]
        if word == 'not' and i+1 < l:
            ant = replace(text[i+1])
            if ant:
                words.append(ant)
                i += 2
                continue
        words.append(word)
        i += 1
    return words

def tokenize1(text):
    tokens = nltk.word_tokenize(text)
    tokens = replaceNegations(tokens)
    text = " ".join(tokens)
    return text

text_replaceNegations = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_replaceNegations['TextBefore'] = text.copy()

In [26]:
for index, row in text_replaceNegations.iterrows():
    row['TextAfter'] = tokenize1(str(row['TextBefore']))

In [27]:
text_replaceNegations['Changed'] = np.where(text_replaceNegations['TextBefore'].str.replace(" ","")==text_replaceNegations['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_replaceNegations[text_replaceNegations['Changed']=='yes']), len(text_replaceNegations), 100*len(text_replaceNegations[text_replaceNegations['Changed']=='yes'])/len(text_replaceNegations)))

95 of 997 (9.5286%) reviews have been changed.


In [28]:
for index, row in text_replaceNegations[text_replaceNegations['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

edit: this app works properly now. it's pretty good. 1 star to 5 original: The amount you entered did not match the amount detected. Stahppp -> edit : this app works properly now . it 's pretty good . 1 star to 5 original : The amount you entered did disagree the amount detected . Stahppp
Slow, glitchy, unresponsive. The old app was way better, this one is terrible. Somehow it's already on V3.0? Of all the mobile banking apps I've used, this is unfortunately the worst. Edit: App still has very slow load times, even with a fresh install. UI is slow. Account balances do not refresh after a transfer, even with dragging to refresh. The menu options are odd (pay a bill and transfer money are the same button). I still do not recommend this app. -> Slow , glitchy , unresponsive . The old app was way better , this one is terrible . Somehow it 's already on V3.0 ? Of all the mobile banking apps I 've used , this is unfortunately the worst . Edit : App still has very slow load times , even with 

## 7. Handle Capitalized Words
**Example:** Hey SIRI, whos better: you or ALEXA? --> Hey ALL_CAPS_SIRI , whos better : you or ALL_CAPS_ALEXA 

In [29]:
def addCapTag(word):
    """ Finds a word with at least 3 characters capitalized and adds the tag ALL_CAPS_ """
    if(len(re.findall("[A-Z]{3,}", word))):
        word = word.replace('\\', '' )
        transformed = re.sub("[A-Z]{3,}", "ALL_CAPS_"+word, word)
        return transformed
    else:
        return word

def tokenize2(text):
    finalTokens = []
    tokens = nltk.word_tokenize(text)
    for w in tokens:
        finalTokens.append(addCapTag(w))
    text = " ".join(finalTokens)
    return text

text_handleCapWords = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_handleCapWords['TextBefore'] = text.copy()

In [30]:
for index, row in text_handleCapWords.iterrows():
    row['TextAfter'] = tokenize2(str(row['TextBefore']))

In [31]:
text_handleCapWords['Changed'] = np.where(text_handleCapWords['TextBefore'].str.replace(" ","")==text_handleCapWords['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_handleCapWords[text_handleCapWords['Changed']=='yes']), len(text_handleCapWords), 100*len(text_handleCapWords[text_handleCapWords['Changed']=='yes'])/len(text_handleCapWords)))

149 of 997 (14.9448%) reviews have been changed.


In [32]:
for index, row in text_handleCapWords[text_handleCapWords['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> every time I open the app it says `` application failed to load '' . stoptyping i have already updated . twice ! sTill ALL_CAPS_SUCKS
I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgages. I did mine in 14 days and the person was accessible, trustworthy and friendly. Synovus is a great mortgage local bank. -> I like the location , and the indoor customer service . I like to bank with them because they have ALL_CAPS_ATM that gives 1 $ bills . Is excellent for mortgages . I did mine in 14 days and the person was accessible , trustworthy and friendly . Synovus is a great mortgage local bank .
It says "Application Failed to Load"...I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me ha

## 8. Remove Stopwords

Stopwords are considered as noise in the data

**Example:** Bitcoin, BAT and Dent are my favourite cryptocurrencies. -> Bitcoin , BAT Dent favourite cryptocurrencies

In [33]:
from nltk.corpus import stopwords
stoplist = stopwords.words('english')

def tokenize(text):
    finalTokens = []
    tokens = nltk.word_tokenize(text)
    for w in tokens:
        if (w not in stoplist):
            finalTokens.append(w)
    text = " ".join(finalTokens)
    return text

text_removeStopwords = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_removeStopwords['TextBefore'] = text.copy()

In [34]:
for index, row in text_removeStopwords.iterrows():
    row['TextAfter'] = tokenize(str(row['TextBefore']))

In [35]:
text_removeStopwords['Changed'] = np.where(text_removeStopwords['TextBefore'].str.replace(" ","")==text_removeStopwords['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_removeStopwords[text_removeStopwords['Changed']=='yes']), len(text_removeStopwords), 100*len(text_removeStopwords[text_removeStopwords['Changed']=='yes'])/len(text_removeStopwords)))

922 of 997 (92.4774%) reviews have been changed.


In [36]:
for index, row in text_removeStopwords[text_removeStopwords['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> It keeps giving error fingerprint scanner since last update . Other , 's good app .
Where there when I need it. -> Where I need .
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called. -> I love able check account . But I lost phone I way account till I called .
They stole my id and open account in GA.. -> They stole id open account GA..
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> every time I open app says `` application failed load '' . stoptyping already updated . twice ! sTill SUCKS


## 9. Replace Elongated Words
This technique replaces an elongated word with its basic form, unless the word exists in the lexicon.

**Example:** Bitcoin is going to the mooooooooooooooooooooooooooon --> Bitcoin is going to the moon

In [37]:
def replaceElongated(word):
    """ Replaces an elongated word with its basic form, unless the word exists in the lexicon """

    repeat_regexp = re.compile(r'(\w*)(\w)\2(\w*)')
    repl = r'\1\2\3'
    if wordnet.synsets(word):
        return word
    repl_word = repeat_regexp.sub(repl, word)
    if repl_word != word:      
        return replaceElongated(repl_word)
    else:       
        return repl_word
    
def tokenize(text):
    finalTokens = []
    tokens = nltk.word_tokenize(text)
    for w in tokens:
        finalTokens.append(replaceElongated(w))
    text = " ".join(finalTokens)
    return text

text_removeElWords = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_removeElWords['TextBefore'] = text.copy()

In [38]:
for index, row in text_removeElWords.iterrows():
    row['TextAfter'] = tokenize(str(row['TextBefore']))

In [39]:
text_removeElWords['Changed'] = np.where(text_removeElWords['TextBefore'].str.replace(" ","")==text_removeElWords['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_removeElWords[text_removeElWords['Changed']=='yes']), len(text_removeElWords), 100*len(text_removeElWords[text_removeElWords['Changed']=='yes'])/len(text_removeElWords)))

627 of 997 (62.8887%) reviews have been changed.


In [40]:
for index, row in text_removeElWords[text_removeElWords['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> It keeps giving me an error with my fingerprint scanner since the last update . Other than that , it 's a good ap .
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> every time I open the ap it says `` application failed to load '' . stoptyping i have already updated . twice ! sTill SUCKS
synovus mobile apps is too slow! I'm still waiting. almost 1hr now! -> synovus mobile aps is too slow ! I 'm still waiting . almost 1hr now !
It says "Application Failed to Load"...I have uninstalled, reinstalled, and still doesn't work. It has done this since DAY 1. Everytime I try to login through the internet, it makes me have to put in a verification code. Synovus and their app sucks with mobile banking...I MISS NBSC -> It says `` Application Failed to Load '' ... I have uninstaled , reinstalled , and still does n

## 10. Stemming/Lemmatizing
**Example:** I love swimming!!! --> I love smim!!!

In [41]:
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer() #set stemmer
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer() # set lemmatizer

def tokenize(text):
    finalTokens = []
    tokens = nltk.word_tokenize(text)
    for w in tokens:
        finalTokens.append(stemmer.stem(w)) 
    text = " ".join(finalTokens)
    return text

text_stemming = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_stemming['TextBefore'] = text.copy()

In [42]:
for index, row in text_stemming.iterrows():
    row['TextAfter'] = tokenize(str(row['TextBefore']))

In [43]:
text_stemming['Changed'] = np.where(text_stemming['TextBefore'].str.replace(" ","")==text_stemming['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_stemming[text_stemming['Changed']=='yes']), len(text_stemming), 100*len(text_stemming[text_stemming['Changed']=='yes'])/len(text_stemming)))

968 of 997 (97.0913%) reviews have been changed.


In [44]:
for index, row in text_stemming[text_stemming['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> It keep give me an error with my fingerprint scanner sinc the last updat . other than that , it 's a good app .
Where there when I need it. -> where there when I need it .
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called. -> I love be abl to check my account . but I lost my phone and I had no way into my account at all till I call .
They stole my id and open account in GA.. -> they stole my id and open account in ga..
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> everi time I open the app it say `` applic fail to load '' . stoptyp i have alreadi updat . twice ! still suck


## Combos
Now let's apply all these techniques at the same time. The order is essential here.


In [82]:
def tokenize(text):
    finalTokens = []
    tokens = nltk.word_tokenize(text)
    for w in tokens:
        if (w not in stoplist):
            w = addCapTag(w) # Handle Capitalized Words
            w = w.lower() # Lowercase
            w = replaceElongated(w) # Replace Elongated Words
            w = stemmer.stem(w) # Stemming
            finalTokens.append(w)
    text = " ".join(finalTokens)
    return text

text_combos = pd.DataFrame(columns=['TextBefore', 'TextAfter', 'Changed'])
text_combos['TextBefore'] = text.copy()

In [83]:
for index, row in text_combos.iterrows():
    row['TextAfter'] = replaceContraction(str(row['TextBefore'])) # Replace Contractions
    row['TextAfter'] = removeNumbers(str(row['TextAfter'])) # Remove Integers
    row['TextAfter'] = replaceMultiExclamationMark(str(row['TextAfter'])) # Replace Multi Exclamation Marks
    row['TextAfter'] = replaceMultiQuestionMark(str(row['TextAfter'])) # Replace Multi Question Marks
    row['TextAfter'] = replaceMultiStopMark(str(row['TextAfter'])) # Repalce Multi Stop Marks
    row['TextAfter'] = str(row['TextAfter']).translate(translator) # Remove Punctuation
    row['TextAfter'] = tokenize(str(row['TextAfter']))

In [84]:
text_combos['Changed'] = np.where(text_combos['TextBefore'].str.replace(" ","")==text_combos['TextAfter'].str.replace(" ","").str.replace("``",'"').str.replace("''",'"'), 'no', 'yes')
print("{} of {} ({:.4f}%) reviews have been changed.".format(len(text_combos[text_combos['Changed']=='yes']), len(text_combos), 100*len(text_combos[text_combos['Changed']=='yes'])/len(text_combos)))

992 of 997 (99.4985%) reviews have been changed.


In [85]:
for index, row in text_combos[text_combos['Changed']=='yes'].head().iterrows():
    print(row['TextBefore'],'->',row['TextAfter'])

It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app. -> it keep give error fingerprint scanner sinc last updat other good ap
Where there when I need it. -> where i need
I love being able to check my account. But I lost my phone and I had no way into my account at all till I called. -> i love abl check account but i lost phone i way account till i call
They stole my id and open account in GA.. -> they stole id open account ga multistop
every time I open the app it says "application failed to load". stoptyping i have already updated. twice! sTill SUCKS -> everi time i open ap say applic fail load stoptyp alreadi updat twice still al_caps_suck


In [86]:
text_combos.head()

Unnamed: 0,TextBefore,TextAfter,Changed
0,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",it keep give error fingerprint scanner sinc last updat other good ap,yes
1,Where there when I need it.,where i need,yes
2,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,i love abl check account but i lost phone i way account till i call,yes
3,They stole my id and open account in GA..,they stole id open account ga multistop,yes
4,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",everi time i open ap say applic fail load stoptyp alreadi updat twice still al_caps_suck,yes


Oops! I'm not able to clearly see the complete review. I'll increase the display width of the pandas data field

In [87]:
pd.options.display.max_colwidth = 150
text_combos.head(10)

Unnamed: 0,TextBefore,TextAfter,Changed
0,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",it keep give error fingerprint scanner sinc last updat other good ap,yes
1,Where there when I need it.,where i need,yes
2,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,i love abl check account but i lost phone i way account till i call,yes
3,They stole my id and open account in GA..,they stole id open account ga multistop,yes
4,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",everi time i open ap say applic fail load stoptyp alreadi updat twice still al_caps_suck,yes
5,Very helpful and convenient.,veri help conveni,yes
6,balance is never correct causing me to overdraft my account. happens every month. not cool,balanc never correct caus overdraft account happen everi month cool,yes
7,synovus mobile apps is too slow! I'm still waiting. almost 1hr now!,synovu mobil ap slow still wait almost hr,yes
8,glitches every few weeks and i have to delete and redownload because it says invalid credentials,glitch everi week delet redownload say invalid credenti,yes
9,"I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgag...",i like locat indoor custom servic i like bank al_caps_atm give bill is excel mortgag i mine day person access trustworthi friendli synovu great mo...,yes


In [88]:
print(reviews.shape)
print(text_combos.shape)

(997, 8)
(997, 3)


let's concatenate the two data sets

In [89]:
cleansed_data = pd.concat([reviews, text_combos],axis =1 , sort = False)
cleansed_data.head(10)

Unnamed: 0,Source,Date,Title,description,Name,Rating,Version,IOS/Android,TextBefore,TextAfter,Changed
0,My Synovus Mobile Banking,8/23/2019,,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",Nikki Nyce,3,,Play Store,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",it keep give error fingerprint scanner sinc last updat other good ap,yes
1,My Synovus Mobile Banking,8/20/2019,,Where there when I need it.,Leesa Gore,5,,Play Store,Where there when I need it.,where i need,yes
2,My Synovus Mobile Banking,8/19/2019,,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,Sylvia Mabrey,4,,Play Store,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,i love abl check account but i lost phone i way account till i call,yes
3,My Synovus Mobile Banking,8/14/2019,,They stole my id and open account in GA..,Sherye Epps,1,,Play Store,They stole my id and open account in GA..,they stole id open account ga multistop,yes
4,My Synovus Mobile Banking,8/11/2019,,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",A Google User,1,,Play Store,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",everi time i open ap say applic fail load stoptyp alreadi updat twice still al_caps_suck,yes
5,My Synovus Mobile Banking,8/10/2019,,Very helpful and convenient.,Brenden Mercado,4,,Play Store,Very helpful and convenient.,veri help conveni,yes
6,My Synovus Mobile Banking,8/6/2019,,balance is never correct causing me to overdraft my account. happens every month. not cool,Terrance Green,1,,Play Store,balance is never correct causing me to overdraft my account. happens every month. not cool,balanc never correct caus overdraft account happen everi month cool,yes
7,My Synovus Mobile Banking,8/1/2019,,synovus mobile apps is too slow! I'm still waiting. almost 1hr now!,Estrella Ernfridsson,1,,Play Store,synovus mobile apps is too slow! I'm still waiting. almost 1hr now!,synovu mobil ap slow still wait almost hr,yes
8,My Synovus Mobile Banking,7/31/2019,,glitches every few weeks and i have to delete and redownload because it says invalid credentials,DBoi Graves,2,,Play Store,glitches every few weeks and i have to delete and redownload because it says invalid credentials,glitch everi week delet redownload say invalid credenti,yes
9,My Synovus Mobile Banking,7/29/2019,,"I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgag...",Dialmarys Velez,5,,Play Store,"I like the location, and the indoor customer service. I like to bank with them because they have ATM that gives 1$ bills. Is excellent for mortgag...",i like locat indoor custom servic i like bank al_caps_atm give bill is excel mortgag i mine day person access trustworthi friendli synovu great mo...,yes


# Finding Sentiments:

Now we got our data in a good shape and in a clean format. We should now focus on calculating the sentiments of the customers reviews. 

There are mainly two approaches for performing sentiment analysis.

- Lexicon-based: count number of positive and negative words in given text and the larger count will be the sentiment of text.

- Machine learning based approach: Develop a classification model, which is trained using the pre-labeled dataset of positive, negative, and neutral.


My idea is combine the two approaches and come up with a multi hybrid approach to to predict the sentiments for the MySynovus reviews.But, here is a problem with Machine Learning approach, we don't have a pre-labled dataset for synovus reviews where we could train different models on the pre-labled data and later come up with the best model to predict the sentiment of the reviews. I'm thinking to use the model trained using thousands of pre-classified tweets.Bottom line point is 

Multi Hybrid Approach:

- Lexicon: Applying Vader Algorithm and TextBlob Algorithm on the description of the review and the title of the review

- ML Based: Applying the model trained on classifying the Tweets  

  

In [90]:
cleansed_data = cleansed_data[cleansed_data['Date'] >= '2019-01-01']

In [91]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\saira\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [92]:
#Tuning the sentiments of the words specific to our area

new_words = {
    'freeze': -1,
    'error': -1,
    'reinstall': -1,
    'uninstall': -1,
    'fix': -1,
    'garbage': -1,
    'terrible': -1,
    'balance': -1,
    'slow': -1,
    'crashes': -1
}

In [93]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sai = SentimentIntensityAnalyzer()
sai.lexicon.update(new_words)

In [94]:
# Add a new column with scores JSON
#Applying Vader approch to the descriptions of the reviews
cleansed_data['scores'] = cleansed_data['TextAfter'].apply(lambda review:sai.polarity_scores(review))

#Add a new column with the compound score value extracted from score column
cleansed_data['vader_compound'] = cleansed_data['scores'].apply(lambda d:d['compound'])
cleansed_data['vader_positive'] = cleansed_data['scores'].apply(lambda d:d['pos'])
cleansed_data['vader_negative'] = cleansed_data['scores'].apply(lambda d:d['neg'])
cleansed_data['vader_compscore'] = cleansed_data['vader_compound'].apply(lambda score: 'positive' if score>=0.05 else('negative' if score <= -0.05 else 'neutral'))

In [95]:
# def tokenize2(text):
#     finalTokens = []
#     tokens = nltk.word_tokenize(text)
#     for w in tokens:
#         if (w not in stoplist):
#             #w = addCapTag(w) # Handle Capitalized Words
#             #w = w.lower() # Lowercase
#             w = replaceElongated(w) # Replace Elongated Words
#             w = stemmer.stem(w) # Stemming
#             finalTokens.append(w)
#     text = " ".join(finalTokens)
#     return text


# title_combos = pd.DataFrame(columns=['TitleBefore', 'TitleAfter', 'Changed'])
# title_combos['TitleBefore'] = title.copy()

In [96]:
# for index, row in text_combos.iterrows():
#     #row['TitleAfter'] = replaceContraction((row['TitleBefore'])) # Replace Contractions
#     row['TitleAfter'] = removeNumbers(str(row['TitleAfter'])) # Remove Integers
#     row['TitleAfter'] = replaceMultiExclamationMark(str(row['TitleAfter'])) # Replace Multi Exclamation Marks
#     row['TitleAfter'] = replaceMultiQuestionMark(str(row['TitleAfter'])) # Replace Multi Question Marks
#     row['TitleAfter'] = replaceMultiStopMark(str(row['TitleAfter'])) # Repalce Multi Stop Marks
#     row['TitleAfter'] = str(row['TitleAfter']).translate(translator) # Remove Punctuation
#     row['TitleAfter'] = tokenize2(str(row['TitleAfter']))

In [97]:
# Add a new column with scores JSON
#Applying Vader approch to the Titles of the reviews
cleansed_data['Title_scores'] = cleansed_data['Title'].apply(lambda review:sai.polarity_scores(str(review)))
cleansed_data['Title_vader_compound'] = cleansed_data['Title_scores'].apply(lambda d:d['compound'])
cleansed_data['Title_vader_compscore'] = cleansed_data['Title_vader_compound'].apply(lambda score: 'positive' if score>=0.05 else('negative' if score <= -0.05 else 'neutral'))

In [98]:
#APplying Textblob appraoch on the descriptions of the reviews
from textblob import TextBlob
def detect_polarity(TextAfter):
    return TextBlob(TextAfter).sentiment.polarity 
def detect_subjectivity(TextAfter):
    return TextBlob(TextAfter).sentiment.subjectivity

cleansed_data['polarity'] = cleansed_data.TextAfter.apply(detect_polarity)
cleansed_data['subjectivity'] = cleansed_data.TextAfter.apply(detect_subjectivity)
cleansed_data['textblob_score'] = cleansed_data['polarity'].apply(lambda polarity: 'positive' if polarity > 0 else 'negative')

In [99]:
#APplying Textblob appraoch on the Titles of the reviews
from textblob import TextBlob
def detect_polarity(Title):
    return TextBlob(str(Title)).sentiment.polarity 
def detect_subjectivity(Title):
    return TextBlob(str(Title)).sentiment.subjectivity


In [100]:
cleansed_data['Title_polarity'] = cleansed_data['Title'].apply(detect_polarity)
cleansed_data['Title_subjectivity'] = cleansed_data['Title'].apply(detect_subjectivity)
cleansed_data['Title_textblob_score'] = cleansed_data['Title_polarity'].apply(lambda polarity: 'positive' if polarity > 0 else 'negative')
cleansed_data['rating_score'] = np.where(cleansed_data['Rating'] > 2,'positive', 'negative')

In [101]:
pd.set_option('display.max_columns', None)
cleansed_data.head()

Unnamed: 0,Source,Date,Title,description,Name,Rating,Version,IOS/Android,TextBefore,TextAfter,Changed,scores,vader_compound,vader_positive,vader_negative,vader_compscore,Title_scores,Title_vader_compound,Title_vader_compscore,polarity,subjectivity,textblob_score,Title_polarity,Title_subjectivity,Title_textblob_score,rating_score
0,My Synovus Mobile Banking,8/23/2019,,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",Nikki Nyce,3,,Play Store,"It keeps giving me an error with my fingerprint scanner since the last update. Other than that, it's a good app.",it keep give error fingerprint scanner sinc last updat other good ap,yes,"{'neg': 0.134, 'neu': 0.671, 'pos': 0.195, 'compound': 0.2263}",0.2263,0.195,0.134,positive,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,neutral,0.191667,0.347222,positive,0.0,0.0,negative,positive
1,My Synovus Mobile Banking,8/20/2019,,Where there when I need it.,Leesa Gore,5,,Play Store,Where there when I need it.,where i need,yes,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,0.0,0.0,neutral,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,neutral,0.0,0.0,negative,0.0,0.0,negative,positive
2,My Synovus Mobile Banking,8/19/2019,,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,Sylvia Mabrey,4,,Play Store,I love being able to check my account. But I lost my phone and I had no way into my account at all till I called.,i love abl check account but i lost phone i way account till i call,yes,"{'neg': 0.203, 'neu': 0.619, 'pos': 0.179, 'compound': -0.09}",-0.09,0.179,0.203,negative,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,neutral,0.5,0.6,positive,0.0,0.0,negative,positive
3,My Synovus Mobile Banking,8/14/2019,,They stole my id and open account in GA..,Sherye Epps,1,,Play Store,They stole my id and open account in GA..,they stole id open account ga multistop,yes,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,0.0,0.0,neutral,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,neutral,0.0,0.5,negative,0.0,0.0,negative,negative
4,My Synovus Mobile Banking,8/11/2019,,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",A Google User,1,,Play Store,"every time I open the app it says ""application failed to load"". stoptyping i have already updated. twice! sTill SUCKS",everi time i open ap say applic fail load stoptyp alreadi updat twice still al_caps_suck,yes,"{'neg': 0.212, 'neu': 0.788, 'pos': 0.0, 'compound': -0.5423}",-0.5423,0.0,0.212,negative,"{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}",0.0,neutral,-0.25,0.4,negative,0.0,0.0,negative,negative


In [None]:
# labels = df_train.target.unique().tolist()
# labels.append(NEUTRAL)
# labels

In [None]:
# encoder = LabelEncoder()
# encoder.fit(df_train.target.tolist())

# y_train = encoder.transform(df_train.target.tolist())
# y_test = encoder.transform(df_test.target.tolist())

# y_train = y_train.reshape(-1,1)
# y_test = y_test.reshape(-1,1)

# print("y_train",y_train.shape)
# print("y_test",y_test.shape)

In [None]:
# embedding_matrix = np.zeros((vocab_size, W2V_SIZE))
# for word, i in tokenizer.word_index.items():
#   if word in w2v_model.wv:
#     embedding_matrix[i] = w2v_model.wv[word]
# print(embedding_matrix.shape)

In [None]:
# model = Sequential()
# model.add(embedding_layer)
# model.add(Dropout(0.5))
# model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
# model.add(Dense(1, activation='sigmoid'))

# model.summary()

In [None]:
# %%time
# history = model.fit(x_train, y_train,
#                     batch_size=BATCH_SIZE,
#                     epochs=EPOCHS,
#                     validation_split=0.1,
#                     verbose=1,
#                     callbacks=callbacks)

In [None]:
# %%time
# score = model.evaluate(x_test, y_test, batch_size=BATCH_SIZE)
# print()
# print("ACCURACY:",score[1])
# print("LOSS:",score[0])

In [108]:
# cleansed_data = cleansed_data.replace('positive', 1)
# cleansed_data = cleansed_data.replace('negative', -1)
# cleansed_data = cleansed_data.replace('neutral', 0)
# cleansed_data['Total sentiment'] = np.where((cleansed_data['vader_compscore'] +cleansed_data['textblob_score'] + cleansed_data['Title_textblob_score'] +cleansed_data['Title_vader_compscore'] + cleansed_data['rating_score'] ) > 0, 'positive', 'negative')

Alright, we have successfully classified the sentiments of the reviews, but, how do we determine the accuracy of our approach?

In [109]:
 pd.crosstab(cleansed_data['Total sentiment'],cleansed_data['Rating'], rownames=['Predicted sentiment'], colnames=['Ratings'],margins=True)

Ratings,1,2,3,4,5,All
Predicted sentiment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
negative,230,65,10,11,21,337
positive,11,7,24,17,55,114
All,241,72,34,28,76,451


337 out of 451 reviews, 74.7%, of the total reviews have negative sentiments. Majority of the one star and two star reviews have negative sentiments which makes sense that customers who are not happy with  one of the aspects of the service/app made them rate One or Two stars. We could confidently say that our model's accuracy is very good. 

# Next Steps

How cool will this be if we build this in real time? Just like AppDynamics, a performance monitoring tool; we could build something similar to AppD which acts like a social media monitoring tool giving us the overall sentiment of customers reviews/posts/comments filtering by the duration of the posts?????