# Using speech recogniser to understand sentiment of a customer

### So in this I used a previously made review dataset for sentiment analysis , that is by imdb as it has good classified and unbiased dataset. Then we use the speech recogniser (Google Speech Recognition) to convert the voice sample review given by a customer into text and then analyse it using the trained classifier to know whether the customer had a positive or negative review of the product.

### First we start off with importing the essential libraries

In [1]:
#import essential libraries
import pandas as pd
import numpy as np
import nltk

### Import required components from the "sklearn" library

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics import accuracy_score, precision_score

### Using the "pandas" library, we read the IMDB_Dataset

In [3]:
imdb = pd.read_csv("IMDB_Dataset.csv")
#reading in the dataset

In [4]:
imdb.shape

(50000, 2)

### We check if there are any empty cells or not

In [5]:
imdb.isnull().sum()

review       0
sentiment    0
dtype: int64

### The sentiment labels are then mapped to 1 (for positive) and 0 (for negative)

In [6]:
imdb.sentiment = imdb.sentiment.map({'positive':1,'negative': 0})
imdb.head()
#mapping positive as 1 and negative as 0

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,1
1,A wonderful little production. <br /><br />The...,1
2,I thought this was a wonderful way to spend ti...,1
3,Basically there's a family where a little boy ...,0
4,"Petter Mattei's ""Love in the Time of Money"" is...",1


### This is a function made for pre-processing of given text data (data is cleaned here)

In [7]:
import re
import string

def clean_text_round1(text):
    '''Make text lowercase, remove text in square brackets, remove punctuation and remove words containing numbers.'''
    text = text.lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\w*\d\w*', '', text)
    return text

round1 = lambda x: clean_text_round1(x)

In [8]:
imdb['review']=imdb['review'].apply(round1)

In [9]:
type(imdb['review'][0])

str

###### Uncomment the below code if punkt and wordnet from nltk had not been downloaded before

In [10]:
#nltk.download('punkt')
#nltk.download('wordnet')

### The data is tokenized

In [11]:
#Tokenization
from nltk.tokenize import word_tokenize
def token(list):
    return (word_tokenize(list))
imdb['review']=imdb['review'].apply(token)


### Lemmatization is applied here on the data and then untokenized

In [12]:
#limmatizing the data
from nltk.stem import wordnet
from nltk.stem import WordNetLemmatizer
word_lem=WordNetLemmatizer()
for review in imdb['review']:
    for word in review:
        word_lem.lemmatize(word)
def untokenize(list):
    return ' '.join(list)
imdb['review']=imdb['review'].apply(untokenize)

In [13]:
print(imdb.head())

                                              review  sentiment
0  one of the other reviewers has mentioned that ...          1
1  a wonderful little production br br the filmin...          1
2  i thought this was a wonderful way to spend ti...          1
3  basically theres a family where a little boy j...          0
4  petter matteis love in the time of money is a ...          1


### We start training from here, by splitting the dataset in train and test for validation

In [14]:
X_train, X_test, y_train, y_test = train_test_split(imdb['review'], 
                                                    imdb['sentiment'],test_size=0.33,
                                                    random_state=1)

In [15]:
count_vector = CountVectorizer(stop_words = 'english')

In [16]:
training_data = count_vector.fit_transform(X_train)

In [17]:
testing_data = count_vector.transform(X_test)

In [18]:
#using naivebayes for classification
from sklearn.naive_bayes import MultinomialNB

naive_bayes = MultinomialNB()
naive_bayes.fit(training_data, y_train)
predictions = naive_bayes.predict(testing_data)

In [19]:
#accuracy and precision
print('Accuracy score: ', format(accuracy_score(y_test, predictions)))
print('Precision score: ', format(precision_score(y_test, predictions)))

Accuracy score:  0.8588484848484849
Precision score:  0.8758219524532119


### Speech Recognition API from Google is used here

In [20]:
import speech_recognition as sr
r=sr.Recognizer()
R=sr.Recognizer()
#using speech recogniser

###### Below are customer voice review files

In [21]:
#Audio file. Using only wav format.
demo="ver10.wav"
demo1="ver1.wav"
demo2="ver2.wav"
demo3="ver3.wav"
demo4="ver4.wav"
demo5="ver5.wav"
demo6="ver6.wav"
demo7="ver7.wav"
demo8="ver8.wav"
demo9="ver9.wav"

###### Here audio files are converted from wav to capture data

In [22]:
with sr.AudioFile(demo) as source:
       audio=R.record(source)
with sr.AudioFile(demo1) as source:
       audio1=R.record(source)
with sr.AudioFile(demo2) as source:
       audio2=R.record(source)
with sr.AudioFile(demo3) as source:
       audio3=R.record(source)
with sr.AudioFile(demo4) as source:
       audio4=R.record(source)
with sr.AudioFile(demo5) as source:
       audio5=R.record(source)
with sr.AudioFile(demo6) as source:
       audio6=R.record(source)
with sr.AudioFile(demo7) as source:
       audio7=R.record(source)
with sr.AudioFile(demo8) as source:
       audio8=R.record(source)
with sr.AudioFile(demo9) as source:
       audio9=R.record(source)

###### The API is used to convert the audio to text

In [23]:
l=[]
text1=r.recognize_google(audio3)
l.append(r.recognize_google(audio2))
l.append(r.recognize_google(audio3))
l.append(r.recognize_google(audio4))
l.append(r.recognize_google(audio5))
l.append(r.recognize_google(audio6))
l.append(r.recognize_google(audio7))
l.append(r.recognize_google(audio8))
l.append(r.recognize_google(audio9))
l.append(r.recognize_google(audio))
l
test_df=pd.DataFrame(data=l,columns=['Reviews'])
print(test_df)

                                             Reviews
0  I find this product to be very bad this was a ...
1  I enjoyed this product a lot I want to use it ...
2      the product was well made I will use it again
3  this product was useless I will never ever buy...
4  I didn't like the product that very much but I...
5  this is the worst product that one could ever ...
6  this is the perfect product for me I would lov...
7  the product is an amazing one I will recommend...
8    I hated this product I am not buying it anymore


### Classification of the audio

In [24]:
#Finally doing Sentimental Analysis
test = count_vector.transform(test_df['Reviews'])
predict= naive_bayes.predict(test)
test_df['predictions']=predict
print(test_df)

                                             Reviews  predictions
0  I find this product to be very bad this was a ...            0
1  I enjoyed this product a lot I want to use it ...            1
2      the product was well made I will use it again            0
3  this product was useless I will never ever buy...            0
4  I didn't like the product that very much but I...            0
5  this is the worst product that one could ever ...            0
6  this is the perfect product for me I would lov...            1
7  the product is an amazing one I will recommend...            1
8    I hated this product I am not buying it anymore            0


In [25]:
def class_label(predictions):
    if predictions == 0:
        return "Negative"
    elif predictions==1:
        return "Positive"

In [26]:
test_df['Class_label']=test_df['predictions'].apply(class_label)

### Output of the result

In [27]:
test_df.head(10)

Unnamed: 0,Reviews,predictions,Class_label
0,I find this product to be very bad this was a ...,0,Negative
1,I enjoyed this product a lot I want to use it ...,1,Positive
2,the product was well made I will use it again,0,Negative
3,this product was useless I will never ever buy...,0,Negative
4,I didn't like the product that very much but I...,0,Negative
5,this is the worst product that one could ever ...,0,Negative
6,this is the perfect product for me I would lov...,1,Positive
7,the product is an amazing one I will recommend...,1,Positive
8,I hated this product I am not buying it anymore,0,Negative
