In this exercise, you will transform some annotation data into a format useful for training a frame-based "event detection" or "transcription" system.

QUESTION 1: Using the variables given below, calculate (and store in a new variable) the time step in seconds, from one spectrogram frame to the next.

In [None]:
samplerate = 22050    # in Hertz (1/second)
fftlength = 1024      # in samples (unitless)
hop = 0.5             # a ratio (unitless)

hop_len = ... # hop length
hop_size_s = ... # hop length in seconds

Next, load the CSV annotation data (code provided):

...first you will need to extract an annotation file. We've provided it, but it is in a Zip file. You'll need to unzip the CSV files first.

In [None]:
# This will unzip the data into a folder NEXT TO wherever this notebook file is stored.

!unzip -n "/home/jovyan/shared_storage/ECS7013P/nips4bplus/temporal_annotations_nips4b.zip"

# The above should work on the jhub server.
# If not on jhub server, you could download the dataset from:
#      https://figshare.com/articles/Transcriptions_of_NIPS4B_2013_Bird_Challenge_Training_Dataset/6798548

In [None]:
annotpath = "temporal_annotations_nips4b/annotation_train001.csv"

import csv, numpy

def processrow(arow):
    "This function takes a data row from text CSV, and converts data types"
    # the raw CSV format is starttime, duration, class
    return {
        'start': float(arow[0]),
        'dur':   float(arow[1]),
        'class': arow[2],
    }

# Now we do the actual loading, and process the CSV file line by line
with open(annotpath, 'r') as infp:
    rdr = csv.reader(infp)
    annots = [processrow(row) for row in rdr]
    
#print("Some of the data:")
#for entry in annots[:5]:
print("The data:")
for entry in annots:
    print(entry)

In [None]:
# for convenience, this block is to load the list of classes from the CSV file and build 2 maps: classname --> classid and classid --> classname
# you should view the file to see how it looks
class_list_path = "/home/jovyan/shared_storage/ECS7013P/nips4bplus/nips4b_birdchallenge_espece_list.csv"
classname2id = {} # classname --> classid map
id2classname = {} # classid --> classname map
with open(class_list_path, 'r') as infp:
    rdr = csv.reader(infp)
    next(rdr, None)  # skip the headers
    next(rdr, None)  # skip the first line with "Empty" class 
    for row in rdr:
        classname2id[row[1]] = int(row[0]) # map class name to class id
        id2classname[int(row[0])] = row[1] # map class id to class name
classname2id["Unknown"] = len(classname2id) # handle additional "Unknown" class
id2classname[len(id2classname)] = "Unknown" # handle additional "Unknown" class

print("Some of the data:")
print("Butbut_call" + " -> " + str(classname2id["Butbut_call"]))
print("Erirub_call" + " -> " + str(classname2id["Erirub_call"]))
print("4" + " -> " + id2classname[4])
print("22" + " -> " + id2classname[22])

QUESTION 2: Now, using the time step variable that you calculated above, write some code that will take the annotation data and convert it into a numpy matrix of ones and zeros.

The dimensions of the matrix should represent (#frame, #class) where #frame is the number of time frames and #class is the number of classes. The matrix should have a 1 where each class is active at that time frame and 0 where it is not.

__Hints__: 
- The maximum length of an audio file is __5 seconds__, you should use this to calculate #frame.
- Using the time step variable that you calculated in QUESTION 1 to calculate __frame rate__ (i.e. the number of frames in one second) from which a time in seconds can be converted to a time frame index.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

audio_max_len_samples = 5*samplerate # each audio file is 5 seconds
max_frames = ... # maximum number of frames per audio file
num_classes = ... # number of classes

frame_rate = ... # frame rate

annotation_matrix = numpy.zeros((max_frames, num_classes))

for entry in annots: # iterate the annotation one by one
    # COMPLETE YOUR CODE HERE
    
# try to visualize annotation matrix
plt.imshow(annotation_matrix,aspect='auto', origin='lower')

QUESTION 3: Once you have done the above, write some code that does the opposite - i.e. converts an annotation matrix back into a list of events given with start time, end time and class label. (This is the kind of post-processing we might do with the *output* from an event detection system.)

__Hints__: For each column (corresponding to one class), find the indices of non-zeros entries (you can use `numpy.where` function for this), then find chunks of continuous numbers in the list of indices.

In [None]:
output_dict = list() # the list to store the annotation

# COMPLETE YOUR CODE HERE
        
print("Some of the data:")
for entry in output_dict:
    print(entry)

QEUSTION 4: Compare the original annotations (as loaded) against the annotations produced when you apply your conversion followed by your back-conversion. Are there differences? Of what kind?

- The order
- Some differences in the time values due to rounding.