# Practice Assignment 1: Working with audio in time 
Working with (possibly overlapping) time intervals and .wav files is important for almost all code in the project

Download necessary files here:
https://utdallas.box.com/s/0nwxfi1hwqrruows8a5bxx6cv8e400s8

### In doing this assignment, I make sure that you know the following:
* Familiarity with the .wav file format
* Concatenating audio files (binary files with header info) of a fixed format (fixed : sample rate, # channels, encoding, etc.)
* Running command line tools (force-allignment) in Python
* Using the interval tree data structure
* How to create plots in Python
* Upload your code to GitHub


# Instructions
Given a set of speech files, their transcriptions, and a phone dictionary (Librispeech):
1. Run montreal-force-alligner using os.system(cmd) and get the allignment .TextGrid file for each audio+transcript
2. Pad the audio clips with 3 seconds of silence at the end and then concatenate these padded files together 
3. Create an interval tree of phoneme timings for each audio clip using the .TextGrid files
4. Plot the amplitude vs time of the long concatenated file
5. Instantiate the following set as an interval tree
$$X = (1,2) \cup (3,4) \cup (6,7)$$
6. Place a mark on the plot for each phoneme's start and end time, excluding those lying in $X$

#### 1. Run montreal-force-alligner using os.system(cmd) and get the allignment .TextGrid file for each audio+transcript

* The transcriptions should be text files with the file extension '.lab' Do this if not already done.
* The transcriptions must be in the same directory as each audio segment.

The code should be similar to:

In [6]:
import os
from pathlib import Path

mfa_path = Path('C:/Deepcut/tests/Jerry/mfa.exe')
corpus_path = Path('C:/Deepcut/tests/Jerry/corpus') #i.e. input path
dictionary_path = Path('C:/Deepcut/tests/Jerry/librispeech-lexicon.txt')
output_path = Path('C:/Deepcut/tests/Jerry/alligned')

cmd = '%s %s %s %s --verbose' % (mfa_path, corpus_path, dictionary_path, output_path)

print('This block will run force allignment with the command:\n%s' % cmd)
os.system(cmd)

This block will run force allignment with the command:
C:/Deepcut/Tests/Jerry/mfa.exe C:/Deepcut/Tests/Jerry/corpus C:/Deepcut/Tests/Jerry/librispeech-lexicon.txt C:/Deepcut/Tests/Jerry/alligned --verbose


#### 2. Pad the audio clips with 3 seconds of silence at the end and then concatenate these padded files together 
* If possible, it is concatenate numpy arrays all at once, instead of incrementally
* Note that sample rate (<code>rate</code>) is given in the units:
$$sample\;rate\;(sr) = \frac{number\;of\;samples}{duration\;in\; second}$$

For an explaination on <code>**</code>, see https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-the-same-data

For an explaination on <code>***</code>, the scipy wav module encodes the .wav file using the datatype of the numpy array that it sees (<code>concat</code> in this case)  

The code should be similar to:

In [None]:
import scipy.io.wavfile as wav
import numpy as np

corpus_audio = [path for path in corpus_path.glob('*.wav')] # **
output_path = Path('C:/Deepcut/tests/Jerry/all.wav') 

pad_time = 3 # seconds
all_padded = [] # list that stores padded audio data before concat
for file in corpus_audio:
    rate, data = wav.read(file)
    padding = np.zeros(# number of zeros here)
    ###############################################
    # Determine how many zeros to pad at fixed sample rate 
    # Concatenate data and padding and add to end of list (all_padded) 
    ###############################################
    
    
###############################################
# Concatenate all_padded and write as all.wav 
###############################################
concat = np.
wav.write(output_path, rate, concat.astype(np.int32)) # ***

#### 3. Create an interval tree of phoneme timings for each audio clip using the .TextGrid files
* Be sure to install the interval tree module in your python env
https://pypi.org/project/intervaltree/
* Be sure to install the textgrid module in your python env
https://github.com/kylebgorman/textgrid
* Read the docs (and code, if necessary). Message me any questions

In [None]:
from intervaltree import IntervalTree
from textgrid import TextGrid

trees = []
for file in corpus_path.glob('*.TextGrid'):
    tree = IntervalTree()
    ###############################################
    # Determine how to index through textgrid intervals 
    # Add each interval to tree (use tree.addi)
    ###############################################
    intervals = # Read .textgrid file
    for interval in intervals:
        tree.addi()

#### 4. Plot the amplitude vs time of the long concatenated file
* There's a lot of documentation on this. The library chosen to plot isn't important.
* Different plotting libraries: matplotlib, plotly, seaborn, etc. (there's a lot of them. Choose one. They all work similarly)

In [None]:
import #Some plotting library

# Time axis will be given by
t = np.linspace(0, len(concat)/rate, num = len(concat))
##################
# Plot t vs data
##################

#### 5. Instantiate the following set as an interval tree
$$X = (1,2) \cup (3,4) \cup (6,7)$$
* Done similarly to part 3.

#### 6. Place a mark on the plot for each phoneme's start and end time, excluding those lying in $X$
* This will be a bit harder

In [None]:
###############################################
# Shift each interval tree based on order and lengths of padded data
# Take union of all shifted trees
# Only plot the points that exist in the Union_Tree - X (setminus)
# Equivalently use an if statement
###############################################