# Audio Feature Engineering

This file contains all of the code used to extract speech/audio features for our analysis.


Chunks version: split each participant (so each transcript and video file) into 1 minute long chunks. Then perform feature engineering on each chunk. 

Since our data sample is relatively small we want to try chunking the data so instead of 1 row (set of features) for each eacher we actually have 5 (1 row for each minute of the experiment). However, since we're using the transcripts to get the speaker labels to extract just the teacher's audio I need chunks of the transcripts to align with the video chunks. The transcript timestamps are only labeled when there is a change in speaker so we will use the closest timestamp to determine the start/end time to use for each chunk so it will not be perfect 1 minute segments. We will also need to normalize some features by duration (like words per minute instead of count).




In [1]:
import os


In [2]:
# Functions are saved in a .py file
from speech_feature_extraction import *


## Setup

Mostly similar to original Audio_Feature_Engineering.ipynb file. However, since we already have .wav files we don't need to convert the m4a files, we will just start from the full .m4a files. Then we will do the following steps:
- Chunk the transcript and wav files into approximately 1 minute long segments (approximate because we have to use timestamps from the transcript files that are only timestamped when there is a change in a speaker)
- Extract only the teacher's audio
- Extract features as done before, just using the chunked files instead of the full files



In [3]:
# File paths
wav_path = './data/wav_files/' # Full wav files
transcript_path = './data/transcript_files/' # Transcripts from temi (downloaded as .txt column format with speaker numbers and timestamps)

# Chunked file paths
chunked_wav_path = './data/chunked_wav_files/'
chunked_transcript_path = './data/chunked_transcript_files/'
chunked_teacher_wav_path = './data/chunked_teacher_wav_files/' 



In [4]:
# Create empty directories for output files (if it doesn't exist)
if not os.path.exists(chunked_wav_path):
    os.makedirs(chunked_wav_path)
    
if not os.path.exists(chunked_transcript_path):
    os.makedirs(chunked_transcript_path)

if not os.path.exists(chunked_teacher_wav_path):
    os.makedirs(chunked_teacher_wav_path)

In [5]:
# List of file names
file_list = os.listdir(wav_path) # List all files in original directory

# Updated list of files names
# remove extension and skip files that start with '.' (e.g. ipynb checkpoints)
file_list = [x.replace('.wav', '')for x in file_list if '.wav' in x]

In [6]:
# # Temporarily testing on a small list of files
# file_list = file_list[0:2]

In [7]:
# Print length of file_list (expected number of files/participants)
print(len(file_list))

89


In [8]:
# Check that we have a transcript file for all of these
transript_file_list = [x.replace('.txt', '')for x in os.listdir(transcript_path) if '.txt' in x] 
for file in file_list:
    if file not in transript_file_list:
        print(file)
        

## 1. Chunk Files into Approximately 1 Minute Long Segments

In [9]:
for file in file_list:
    chunk_wav_transcript_file(file, wav_path, transcript_path,
                             chunked_wav_path, chunked_transcript_path)
    
    

In [10]:
# Check that we have 5 chunked wav files for all original files
chunked_wav_file_list = [x.replace('.wav', '')for x in os.listdir(chunked_wav_path) if '.wav' in x] 

assert len(file_list)*5 == len(chunked_wav_file_list)

        

In [11]:
# Create new file list that has chunks
chunked_file_list = os.listdir(chunked_wav_path) 

# Updated list of files names
# remove extension and skip files that start with '.' (e.g. ipynb checkpoints)
chunked_file_list = [x.replace('.wav', '')for x in chunked_file_list if '.wav' in x]

## 2. Extract Only the Teacher's Audio

Using timestamps from Temi transcriptions to cut out the avatar voices to get a .wav file with just the teacher's audio. 


In [12]:
# Get lookup of teacher speaker numbers
speaker_num_df = pd.read_excel('../speaker_identification.xlsx')
speaker_num_dict = dict(zip(speaker_num_df.file_name, speaker_num_df.teacher))
speaker_num_dict['348th_11.4.21.txt'] = speaker_num_dict['348_11.4.21.txt'] # manual update for one weird file name


In [13]:
# New speaker num dictionary that just uses the id, not the full file name
id_speaker_num_dict = {}
for key, val in speaker_num_dict.items():
    id_speaker_num_dict[key[0:3]] = val

In [14]:
# Extract just the teacher's audio into new wav file
for file in chunked_file_list:
    extract_teacher_audio(file, chunked_wav_path, chunked_teacher_wav_path, chunked_transcript_path, 
                          id_speaker_num_dict[file[0:3]],
                          txt_delimeter = '\t',
                          adjust_timestamps = True
                         )

In [15]:
# Check that we have a teacher only file for all original files
chunked_teacher_audio_file_list = [x.replace('.wav', '')for x in os.listdir(chunked_teacher_wav_path) if '.wav' in x] 

assert len(chunked_file_list) == len(chunked_teacher_audio_file_list)


## 3. Extract Features from Transcripts

Exploring some feature creation from the transcript files such as:
- **Duration** including duration in seconds (total, teacher, and student), average duration (total, teacher, and student), and percent of the time that the teacher is the speaker.
- **Word count** including total count (total, teacher, and student), percent of words said by teacher, and word rate (total, teacher, and student).
- **Line count** (aka number of changes in speakers) including number of speaker changes (total line count) and number of time student/teacher speaks (student/teacher line count).


These features could be used to analyze the frequency of student interruptions, speed of speech, and balance of talking between teacher and students.

In [16]:
# Create empty dataframe to save transcript features
transcript_features = pd.DataFrame()

In [17]:
# Extract transcript features
for file in chunked_file_list:
    # Extract features
    temp_df, temp_df_summary = extract_transcript_features(file, chunked_transcript_path, chunked_wav_path, 
                                                           id_speaker_num_dict[file[0:3]],
                                                           txt_delimeter = '\t')
    
    # Add features to final dataframe
    transcript_features = pd.concat([transcript_features, temp_df_summary])


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.iloc[-1]['Timestamp_Pairs'] = (df.iloc[-1]['Timestamp_Pairs'][0], duration)


In [18]:
transcript_features

Unnamed: 0,ID,File_Name,Total_Duration,Teacher_Duration,Student_Duration,Percent_Time_Teacher,Average_Speaker_Duration,Average_Teacher_Duration,Average_Student_Duration,Total_Word_Count,Teacher_Word_Count,Student_Word_Count,Teacher_Percent_Words,Total_Word_Rate,Teacher_Word_Rate,Student_Word_Rate,Total_Speaker_Line_Count,Teacher_Line_Count,Student_Line_Count
0,303,303_9.28.21_Chunk_0,66.000000,28.0,38.000000,0.424242,6.600000,5.600000,7.600000,151,132,19,0.874172,2.287879,4.714286,0.500000,10,5,5
0,208,208_1.31.20_S_SC_Chunk_4,68.675510,17.0,51.675510,0.247541,5.722959,2.833333,8.612585,192,147,45,0.765625,2.795756,8.647059,0.870819,12,6,6
0,310,310_10.04.21_Chunk_2,63.000000,32.0,31.000000,0.507937,7.000000,6.400000,7.750000,143,90,53,0.629371,2.269841,2.812500,1.709677,9,5,4
0,314,314_10.08.21_Chunk_3,62.000000,16.0,46.000000,0.258065,6.200000,3.200000,9.200000,176,125,51,0.710227,2.838710,7.812500,1.108696,10,5,5
0,302,302_9.28.21_Chunk_1,53.000000,25.0,28.000000,0.471698,5.300000,6.250000,4.666667,99,52,47,0.525253,1.867925,2.080000,1.678571,10,4,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,221,221_2.19.20_S_SC_Chunk_4,77.390771,43.0,34.390771,0.555622,7.739077,8.600000,6.878154,203,137,66,0.674877,2.623052,3.186047,1.919120,10,5,5
0,354,354_11.4.21_Chunk_2,63.000000,19.0,44.000000,0.301587,7.875000,4.750000,11.000000,195,89,106,0.456410,3.095238,4.684211,2.409091,8,4,4
0,229,229_3.4.20_S_SC_Chunk_3,58.000000,27.0,31.000000,0.465517,5.272727,5.400000,5.166667,199,91,108,0.457286,3.431034,3.370370,3.483871,11,5,6
0,312,312_10.08.21_Chunk_2,57.000000,28.0,29.000000,0.491228,4.384615,4.666667,4.142857,124,77,47,0.620968,2.175439,2.750000,1.620690,13,6,7


In [19]:
assert transcript_features.shape[0] == len(chunked_file_list)

## 4. Speech Feature Engineering

In [20]:
# Create empty dataframe to save audio features
audio_features = pd.DataFrame()

In [21]:
for file in chunked_file_list:
    # Extract features
    temp_df_summary = extract_audio_features(file, chunked_teacher_wav_path, num_mfccs = 13)
    # Add features to final dataframe
    audio_features = pd.concat([audio_features, temp_df_summary])


In [22]:
audio_features

Unnamed: 0,ID,File_Name,number_ of_syllables,number_of_pauses,rate_of_speech,articulation_rate,speaking_duration,original_duration,balance,f0_mean,...,Flatness_Min,Flatness_Std,Zero_Crossing_Rate_Mean,Zero_Crossing_Rate_Max,Zero_Crossing_Rate_Min,Zero_Crossing_Rate_Std,Loudness_Mean,Loudness_Max,Loudness_Min,Loudness_Std
0,303,303_9.28.21_Chunk_0,155,15,4,5,28.8,42,0.7,206.86,...,4.298032e-06,0.033334,0.124893,0.769531,0.000000,0.133817,-42.381813,-18.896294,-98.896294,16.756664
0,208,208_1.31.20_S_SC_Chunk_4,116,30,3,5,22.8,44.7,0.5,229.87,...,7.699496e-07,0.012671,0.097344,0.611816,0.000000,0.085166,-39.711517,-14.173560,-71.287323,14.434524
0,310,310_10.04.21_Chunk_2,107,24,3,4,24.7,39,0.6,263.03,...,2.356776e-06,0.199207,0.104987,0.704102,0.000000,0.120015,-43.865242,-15.438454,-95.438454,19.752428
0,314,314_10.08.21_Chunk_3,126,13,3,5,27.6,37,0.7,250.81,...,6.715505e-07,0.157522,0.110806,0.716309,0.000000,0.092276,-38.000004,-15.369362,-95.369362,19.350574
0,302,302_9.28.21_Chunk_1,63,10,3,4,14.2,25,0.6,235.69,...,3.880040e-06,0.129347,0.106826,0.676758,0.000000,0.116959,-45.842083,-19.206755,-99.206757,21.004633
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,221,221_2.19.20_S_SC_Chunk_4,156,20,4,6,28.3,43,0.7,220.83,...,1.658131e-07,0.013824,0.081620,0.589844,0.008301,0.081607,-42.602390,-15.901543,-73.287346,13.523428
0,354,354_11.4.21_Chunk_2,102,7,4,5,19.2,26,0.7,243.79,...,3.729588e-06,0.074127,0.103590,0.749512,0.000000,0.097851,-39.808952,-19.355728,-99.355728,18.414890
0,229,229_3.4.20_S_SC_Chunk_3,99,5,4,5,20.1,27,0.7,250.87,...,2.087527e-06,0.002529,0.076957,0.393555,0.000000,0.049598,-35.727997,-14.570728,-78.515327,17.598198
0,312,312_10.08.21_Chunk_2,105,8,4,5,20.7,28,0.7,260.06,...,2.722868e-06,0.202730,0.091341,0.696777,0.000000,0.087993,-40.587608,-17.150747,-97.150749,21.606583


In [23]:
assert audio_features.shape[0] == len(chunked_file_list)

## 4. Combine Features into Single Dataframe

In [24]:
# Merge the two dataframes with features into one
df = transcript_features.merge(audio_features, on = ['ID', 'File_Name'])

In [25]:
# Save to csv file for further analysis
df.to_csv('Teacher_Mindfulness_Audio_Transcript_Features_Chunked_20230324.csv', index = False)

In [26]:
df

Unnamed: 0,ID,File_Name,Total_Duration,Teacher_Duration,Student_Duration,Percent_Time_Teacher,Average_Speaker_Duration,Average_Teacher_Duration,Average_Student_Duration,Total_Word_Count,...,Flatness_Min,Flatness_Std,Zero_Crossing_Rate_Mean,Zero_Crossing_Rate_Max,Zero_Crossing_Rate_Min,Zero_Crossing_Rate_Std,Loudness_Mean,Loudness_Max,Loudness_Min,Loudness_Std
0,303,303_9.28.21_Chunk_0,66.000000,28.0,38.000000,0.424242,6.600000,5.600000,7.600000,151,...,4.298032e-06,0.033334,0.124893,0.769531,0.000000,0.133817,-42.381813,-18.896294,-98.896294,16.756664
1,208,208_1.31.20_S_SC_Chunk_4,68.675510,17.0,51.675510,0.247541,5.722959,2.833333,8.612585,192,...,7.699496e-07,0.012671,0.097344,0.611816,0.000000,0.085166,-39.711517,-14.173560,-71.287323,14.434524
2,310,310_10.04.21_Chunk_2,63.000000,32.0,31.000000,0.507937,7.000000,6.400000,7.750000,143,...,2.356776e-06,0.199207,0.104987,0.704102,0.000000,0.120015,-43.865242,-15.438454,-95.438454,19.752428
3,314,314_10.08.21_Chunk_3,62.000000,16.0,46.000000,0.258065,6.200000,3.200000,9.200000,176,...,6.715505e-07,0.157522,0.110806,0.716309,0.000000,0.092276,-38.000004,-15.369362,-95.369362,19.350574
4,302,302_9.28.21_Chunk_1,53.000000,25.0,28.000000,0.471698,5.300000,6.250000,4.666667,99,...,3.880040e-06,0.129347,0.106826,0.676758,0.000000,0.116959,-45.842083,-19.206755,-99.206757,21.004633
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
440,221,221_2.19.20_S_SC_Chunk_4,77.390771,43.0,34.390771,0.555622,7.739077,8.600000,6.878154,203,...,1.658131e-07,0.013824,0.081620,0.589844,0.008301,0.081607,-42.602390,-15.901543,-73.287346,13.523428
441,354,354_11.4.21_Chunk_2,63.000000,19.0,44.000000,0.301587,7.875000,4.750000,11.000000,195,...,3.729588e-06,0.074127,0.103590,0.749512,0.000000,0.097851,-39.808952,-19.355728,-99.355728,18.414890
442,229,229_3.4.20_S_SC_Chunk_3,58.000000,27.0,31.000000,0.465517,5.272727,5.400000,5.166667,199,...,2.087527e-06,0.002529,0.076957,0.393555,0.000000,0.049598,-35.727997,-14.570728,-78.515327,17.598198
443,312,312_10.08.21_Chunk_2,57.000000,28.0,29.000000,0.491228,4.384615,4.666667,4.142857,124,...,2.722868e-06,0.202730,0.091341,0.696777,0.000000,0.087993,-40.587608,-17.150747,-97.150749,21.606583
