# JK301 - Audiovisual Transcription Exploratory  Project

This project was developed to explore the feasability and issues related to the collection of full verbal and transcription report of videos of naturally spoken English sentences. The typed transcriptions were spell-corrected, aligned at the word and phonemic level and compared to the original text of the spoken sentences. The transcription accuracy each phoneme and word in each sentenece was coded to allow us to estimate the accuracy of the listener at multiple levels of processing over the course of each sentence.    

1. Questions for JK301:

     1. Is the system sufficiently robust against common spelling errors?
     
     2. Is the system correctly grading word errors?
     
     3. How well can normal hearing participants report the sentence words spoken without noise? 





### How well can normal hearing participants report the sentence words spoken without noise? 

First, let's load in the data and select only the trials without background noise.

In [407]:
#Load Big P
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
#plt.style.use('ggplot')

from statsmodels.nonparametric.smoothers_lowess import lowess 



bigP = pd.DataFrame.from_csv(os.path.normpath(r'C:\TCDTIMIT\dataOut\JK301\bigPJK301_r1.csv'))

#Recalculate word accuracy based on phoneme accuracy
allPhonsMatch = bigP.groupby('WordCount')['PhonemeHitBool'].transform(lambda x: np.mean(x) ==1)
allPhonsMatch.name = 'AllPhonsMatch'
bigP = bigP.join(allPhonsMatch)


isClear = bigP['BabbleCond'] == 'Off'
isNoisy = bigP['BabbleCond'] == 'On'

#Select only clear trials
bigP = bigP[isClear]

#Group by subject
groupedSubject = bigP.groupby('Subject')

#Mean values
subjectMean = groupedSubject['PhonemeHitBool'].mean()




IOError: File C:\TCDTIMIT\dataOut\JK301\bigPJK301_r1.csv does not exist

We can plot the accuracy (fraction of words correctly reported) for each participant. 

In [None]:
#Make plots prettier
#Edited from Randel Olson and many at StackOverflow
# These are the "Tableau 20" colors as RGB.    
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),    
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),    
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),    
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),    
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]    
  
# Scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.    
for i in range(len(tableau20)):    
    r, g, b = tableau20[i]    
    tableau20[i] = (r / 255., g / 255., b / 255.) 


plt.rc('text', color = 'black') 
plt.rc('font', family='sans-serif') 
plt.rc('font', serif='Helvetica Neue') 
plt.rc('axes', titlesize = 18, labelsize = 14,labelcolor ='black')  
plt.rc('lines', linewidth=2,markersize = 10)
plt.rc('xtick',labelsize = 10,color ='black')
plt.rc('ytick',labelsize = 10,color ='black')
def hide_spines():
    """Hides the top and rightmost axis spines from view for all active
    figures and their respective axes."""

    # Retrieve a list of all current figures.
    figures = [x for x in plt._pylab_helpers.Gcf.get_all_fig_managers()]
    for figure in figures:
        # Get all Axis instances related to the figure.
        for ax in figure.canvas.figure.get_axes():
            # Disable spines.
            ax.spines['right'].set_color('none')
            ax.spines['top'].set_color('none')
            # Disable ticks.
            ax.xaxis.set_ticks_position('bottom')
            ax.yaxis.set_ticks_position('left')




In [None]:
subjectMean

In [None]:
%matplotlib inline
ax = subjectMean.plot(kind='bar',figsize = (12,4),title = 'Phoneme Accuracy by Subject',color = tableau20[0])
ax.set_ylabel('Mean Phoneme Accuracy')
hide_spines()

As seen above, the worst particpant (p004) had 78% accuracy. The best had 94% accuracy.

### Are the missed words due to working memory loss?

Let's take a look to see if working memory is playing a role in clear voice transcription errors.

In [None]:
bigP.keys()

In [None]:
# Make a column for the number of words in the sentence
bigP['NumWordsInSentence'] = bigP.groupby(['SentenceCount'])['WordIdx'].transform(max)+1

#Sort sentence accuracy by number of words
sentenceACC = bigP.groupby(['NumWordsInSentence'])['PhonemeHitBool','NumWordsInSentence'].mean()
sentenceACC.set_index('NumWordsInSentence')
#sentenceACC = sentenceACC.sort_values('NumWordsInSentence')
filtered = lowess(sentenceACC['PhonemeHitBool'],np.arange(0,len(sentenceACC)))
sentenceACC['Filt'] = filtered[:,1]
#accplot = sentenceACC['PhonemeHitBool'].plot(kind='line',figsize = (12,4),color='black')
ax = sentenceACC[['PhonemeHitBool','Filt']].plot(kind='line',figsize = (12,4),color = np.array(tableau20)[[1,0]])
ax.legend(['Raw','Lowess Smoothed'])
ax.set_ylabel('Phoneme Hit Rate')
ax.set_xlabel('Number of Words in the Sentence')
hide_spines()



We can see we get about 30% drop in mean transcription acccuracy from 3 to 17 words!  This means that working memory is a critical factor for longer sentences, even when sentences are spoken clearly. 


Performance is fairly level at around 90% between 4 and 8 words to a sentence. Let's look only at sentences less than 9 words long and see what words these errors are coming from.    

In [None]:
bigPShort  = bigP[bigP['NumWordsInSentence'] < 9]

### Are some talkers harder to understand, even with fewer than 9 words?

Lets look again at individual subject performance, now including only sentences below 9 words. 

In [None]:
#Group by subject
groupedSubject = bigPShort.groupby('Subject')

#Mean values
subjectMean = groupedSubject['PhonemeHitBool'].mean()

In [None]:
%matplotlib inline
ax = subjectMean.plot(kind='bar',figsize = (12,4),title = 'Phoneme Accuracy by Subject',color = tableau20[0])
ax.set_ylabel('Mean Phoneme Accuracy')
hide_spines()

Nice! The worst performing subject (p004) has 86% phoneme transcription accuracy for clear sentences under 9 words long. Now lets break it down by individual talker. 

In [None]:
groupedSpeaker = bigPShort.groupby(['Subject','Speaker'])
speakerMean = groupedSpeaker.mean()
ax = speakerMean['PhonemeHitBool'].plot(kind='bar',figsize = (12,4))
plt.axhline(y=.80, xmin=0, xmax=59, linewidth=2, color = 'k')
hide_spines()

Each talker was only heard by one subject, so we won't be able to separate talker from subject completely. However, we could try to figure out which talkers were the most difficult for each subject.

In [None]:
 zscore = lambda x: (x - x.mean()) / x.std()
speakerMean['Zscore'] = 0    
scores = speakerMean['PhonemeHitBool'].reset_index().groupby(['Subject']).transform(zscore)
speakerMean.index.levels[1][speakerMean.index.labels[1]]

In [None]:
scores.plot(kind='bar',figsize = (12,4),x = speakerMean.index.levels[1][speakerMean.index.labels[1]])
          

Do we exclude speakers with harder to understand speech? Perhaps variation is not a bad thing.

### Which words are participants getting wrong and why?

In [None]:

#Grouped by word
groupedWord = bigPShort.groupby('WordCount')

wordACC = groupedWord.mean()['AllPhonsMatch']
words = groupedWord['TargetWord'].first()

#Grouped by Type
wordHits = wordACC.groupby(words).sum()
wordTotal = wordACC.groupby(words).count()
wordHitRate = wordACC.groupby(words).sum()/wordACC.groupby(words).count()
wordMisses = wordTotal-wordHits

missSorted = wordMisses.sort_values(ascending =False)
ax = missSorted[0:30].plot(kind='bar',figsize = (12,4))
plt.figure()
#Misses by word type as a percent of all missed words
missPercSorted = (wordMisses*100/float(wordMisses.sum())).sort_values(ascending =False)
ax = missPercSorted[0:30].plot(kind='bar',figsize = (12,4))
plt.figure()
#Hit rate for the worst hit rate words heard at least 6 times
missPercSorted = (wordHitRate[wordTotal > 6] *100).sort_values(ascending =True)
ax = missPercSorted[0:30].plot(kind='bar',figsize = (12,4))


In [None]:
wordGrouped =  bigPShort.groupby('WordCount')['TargetWord','SourceWord','AllPhonsMatch'].first()
trickyWords = wordGrouped[wordGrouped['TargetWord'].isin(missPercSorted.keys()[0:30])]
trickyWords[trickyWords['AllPhonsMatch'] == False]
missedWordsSpelling = wordGrouped[wordGrouped['AllPhonsMatch'] == False]
missedWordsSpelling
#[missPercSorted.keys()[0:30]]

A few of the misses seem to be due to spelling errors. But the vast majority of the misses seem to come from the subject reporting the wrong word or not reporting any word. A few correct words are listed as having the wrong phonemes! This is likely due to phoneme alignment dilemas. For instance, the 'ay' in 'holiday' got assigned to the 'a' in 'aprons' when this subject reported 'aprons' as prints. 

In [None]:
bigPShort[bigPShort['WordCount'] == 14237]

I can correct this issue by setting all phonemes of exact word matches to the target phonemes.

In [None]:
# Find where words match
matchIdx = bigPShort['SourceWord'] == bigPShort['TargetWord']
#Set the source phonemes to match the target
bigPShort.loc[matchIdx,('SourcePhoneme')] = bigPShort.loc[matchIdx,('TargetPhoneme')] 
#Reset the measure of phoneme accuracy
bigPShort.loc[:,'PhonemeHitBool'] = bigPShort['SourcePhoneme'] == bigPShort['TargetPhoneme']
#Reset the measure all phonemes matching
bigPShort.loc[:,'AllPhonsMatch'] = bigPShort.groupby('WordCount')['PhonemeHitBool'].transform(lambda x: np.mean(x) ==1)

In [None]:
wordGrouped =  bigPShort.groupby('WordCount')['TargetWord','SourceWord','AllPhonsMatch'].first()
trickyWords = wordGrouped[wordGrouped['TargetWord'].isin(missPercSorted.keys()[0:30])]
trickyWords[trickyWords['AllPhonsMatch'] == False]
missedWordsSpelling = wordGrouped[wordGrouped['AllPhonsMatch'] == False]
missedWordsSpelling

Pizzarias --> pizzarieas is probably a spelling error. So is for real --> forreal. But such errors seem to make up a very small fraction of overall errors, so we're pretty happy with the outcome.

### How does word accuracy relate to word frequency?

In [None]:
#Word Accuracy sorted by SubtlexUS Frequency 
vals = bigPShort.loc[:,('AllPhonsMatch','TargetWord','1LogGram')].groupby('TargetWord').mean().sort_values(['1LogGram'], ascending = False)
vals = vals.groupby('1LogGram').mean()
filtered = pd.ewma(vals['AllPhonsMatch'],span =50)
ax = filtered.plot(kind='line',figsize = (12,4))
ax.set_ylabel('Mean Word Accuracy')
hide_spines()

In [None]:
Subjects are much better at reporting frequent than very infrequent words.

In [None]:
filtered = lowess(meanType['WordHit'],np.arange(0,len(meanType['WordHit'])))
typeACC = meanType.sort_values('SFreq',ascending = False)['WordHit']
typeACCFilt = typeACC
filtered = lowess(typeACC.values,np.arange(0,len(typeACC)))
typeACCFilt[0:] =  filtered[:,1]
accplot = typeACCFilt.plot(kind='line',figsize = (12,4),color='black')

In [None]:
#Grouped by Speaker
groupedSpeaker = bigP.groupby('Speaker')
speakerMean = groupedSpeaker.mean()
speakerMean['PhonemeHitBool'].plot(kind='bar',figsize = (12,4))

#Grouped by word
groupedWord = bigP.groupby('WordCount')
wordMean = groupedWord.mean()
wordFirst = groupedWord.first()
wordMean['TargetWord'] = wordFirst['TargetWord']
wordMean['WordHit'] = wordMean['PhonemeHitBool'] == 1
#Grouped by type
groupedType = wordMean.groupby('TargetWord')

#Plot top 20 missed words
missedSorted = groupedType['WordHit'].apply(lambda x: np.sum(x == False)).sort_values(ascending = False)
missplot = missedSorted[0:20].plot(kind='bar',figsize = (12,4))
#Plot top 20 correct words
hitSorted = groupedType['WordHit'].apply(lambda x: np.sum(x == True)).sort_values(ascending = False)
hitplot = hitSorted[0:20].plot(kind='bar',figsize = (12,4))

#Plot top 20 most common  words in corpus
countSorted = groupedType['WordHit'].count().sort_values(ascending = False)
countplot = countSorted[0:20].plot(kind='bar',figsize = (12,4))

#Accuracy of top 20 most common words in english
meanType = groupedType.mean()
meanType['TypeCount'] = groupedType['WordHit'].count()
typeACC = meanType.sort_values('SFreq',ascending = False)['WordHit']
accbar = typeACC[0:60].plot(kind='bar',figsize = (12,4))
accplot = typeACC.plot(kind='line',figsize = (12,4))
typeACCFilt = typeACC
filtered = lowess(typeACC.values,np.arange(0,len(typeACC)))
typeACCFilt[0:] =  filtered[:,1]
accplot = typeACCFilt.plot(kind='line',figsize = (12,4),color='black')