# Sentimental analysis of text data:
<p>
<a id= 'top'></a>

In this notebook I am red all the text files to a panda dataframe, cleaned the text and performed sentimental analysis. 
1. [Reading the data to a panda datafram](#1)
* [Cleaning the text data](#2)
* [Sentimental Analysis](#3)


In [1]:
##requirement 
import pandas as pd
import glob as glob
import numpy as np
import re
from multiprocessing import Pool ## great tool to keep track of loop process
from textblob import TextBlob, Word ## Sentimental Analysis
from tqdm import tqdm

## Reading the data to a panda datafram
<p>
<a id= '1'></a>
[Return to top](#top)

In [2]:
filelist = glob.glob('Speech_to_text/*.txt')
df = pd.DataFrame()
index = []
for file in tqdm(filelist):
    dfs= pd.read_table(file, names=['text'])
    df = pd.concat ([df,dfs])
    
    c = re.search(r'[0-9]+.txt', file)
    d = re.search(r'[0-9]+', c.group()).group()

    index.append(int(d))
    df.index = [index]
    
df = df.reset_index()
df.columns = ['gradable_id', 'text']
df.head()

100%|██████████| 5670/5670 [00:08<00:00, 694.77it/s]


Unnamed: 0,gradable_id,text
0,19761,['(0.9820305109024048): thank you for calling ...
1,20069,['(0.9764696955680847): thank you for calling ...
2,21858,['(0.9416371583938599): state of Tennessee Dep...
3,1327183,"[""(0.9171654582023621): thank you for calling ..."
4,20936,"['(0.8439109325408936): I have a, Tennessee De..."


## Cleaning the text data
<p>
<a id= '2'></a>
[Return to top](#top)

In [3]:
def clean(text):
    clean_tag = re.sub("[^a-zA-Zé]", " ", str(text)) 
    clean_tag = clean_tag.lower().split()
    
    return (" ".join( clean_tag ))

In [4]:
df['clean_text'] = df.text.apply(clean)
df.tail()

Unnamed: 0,gradable_id,text,clean_text
5665,1295295,['(0.7317875027656555): OK Google Harris my sp...,ok google harris my speaking msst n i recently...
5666,2086125,['(0.9204361438751221): Define happy peanut th...,define happy peanut this is christy how can i ...
5667,544750,['(0.959342360496521): hi thank you for callin...,hi thank you for calling fabfitfun this is jam...
5668,1120825,"[""(0.9657314419746399): hello thank you for ca...",hello thank you for calling harry s my name is...
5669,1183354,"[""(0.9389521479606628): hello thank you for ca...",hello thank you for calling harry s my name is...


## Sentimental Analysis
<p>
<a id= '3'></a>
[Return to top](#top)

In [5]:
def p_s(a):
    '''
    Sentiment/polarity and subjectivity
    is extracted by using TexBlob 
    '''
    testimonial = TextBlob(df.clean_text[a]).correct()
    polarity , subjectivity = testimonial.sentiment
    return(polarity,subjectivity)

if __name__ == '__main__':
    po = []
    se = []
    files = range(5670)


    pool = Pool(20) ## even with 20 processor this process will take up to 30 min for these 5670 files. 
    po, se =zip(* pool.map(p_s, files))   
    pool.close()
    pool.join()
    
pol = list(po)
sen = list(se)
df = df.assign(polarity = pol)
df = df.assign(subjectivity = se)

In [6]:
df.to_csv('Sentimentsubjectivity.csv')