# The Utilities Module

The `utilities` module of the `ah` package holds functions commonly called by other modules in order to ...
`Utilities` includes the following functions:
- __add_annotations__: Adds annotations to each pair of repeated structures 
    according to their length and order of occurence. 
- __create_sdm__: Creates a self-dissimilarity matrix; this matrix is found 
    by creating audio shingles from feature vectors, and finding cosine 
    distance between shingles.
- __find_initial_repeats__: Finds all diagonals present in thresh_mat, 
    removing each diagonal as it is found.
- __reconstruct_full_block__: Creates a record of when pairs of repeated
    structures occur, from the first beat in the song to the last beat of the
    song. Pairs of repeated structures are marked with 1's.
- __reformat__: (??)Transforms a binary matrix representation of when repeats 
    occur in a song into a list of repeated structures detailing the length
    and occurence of each repeat. 
- __stretch_diags__: Fill out diagonals in binary self dissimilarity matrix
    from diagonal starts and lengths 

### Importing necessary modules

In [13]:
import numpy as np
from utilities import *

## add_annotations

Adds annotations to each pair of repeated structures according to their length and order of occurence. 

The inputs for the function are:
- __input_mat__: an array containing pairs of repeats. The first two columns refer to the first repeat of the pair. The third and fourth columns refer to the second repeat of the pair. The fifth column refers to the repeat lengths. The sixth column contains any previous annotations, which will be removed.
- __song_length__: an integer denoting the number of shingles in the song

The outputs for the function are:
- __anno_list__: an array of pairs of repeats with annotations marked. 

In [14]:
input_mat =np.array([[2,5,8,11,4,0],
[7,10,14,17,4,0],
[2,5,15,18,4,0],
[8,11,15,18,4,0],
[9,12,16,19,4,0]])

song_length = 19

print("The input array is:",input_mat)
print("The number of shingles is:",song_length)

The input array is: [[ 2  5  8 11  4  0]
 [ 7 10 14 17  4  0]
 [ 2  5 15 18  4  0]
 [ 8 11 15 18  4  0]
 [ 9 12 16 19  4  0]]
The number of shingles is: 19


In [15]:
annotated_array = add_annotations(input_mat,song_length)
print("The array of repeats with annotations is:",annotated_array)

The array of repeats with annotations is: [[ 2  5  8 11  4  1]
 [ 2  5 15 18  4  1]
 [ 8 11 15 18  4  1]
 [ 7 10 14 17  4  2]
 [ 9 12 16 19  4  3]]


## create_sdm

Creates a self-dissimilarity matrix; this matrix is found by creating audio shingles from feature vectors, and finding cosine distance between shingles.

The inputs for the function are:
- __fv_mat__: a matrix of feature vectors where each column is a timestep and each row includes feature information i.e. an array of 144 columns/beats and 12 rows corresponding to chroma values
- __num_fv_per_shingle__: an integer denoting the number of feature vectors per audio shingle

The outputs for the function are:
- __self_dissim_mat__: a self dissimilarity matrix with paired cosine distances between shingles

## find_initial_repeats

Identifies all repeated structures in a sequential data stream which are represented as diagonals in thresh_mat and then stores the pairs of repeats that correspond to each repeated structure in a list.

The inputs for the function are:
- __thresh_mat__: a thresholded matrix from which diagonals are extracted
- __bandwidth_vec__: a vector of lengths of diagonals to be found
- __thresh_bw__: an integer indicating the smallest allowed diagonal length

The outputs for the function are:
- __all_lst__: an array of pairs of repeats that correspond to diagonals in thresh_mat

## reconstruct_full_block

Creates a record of when pairs of repeated structures occur, from the first beat in the song to the end. This record is a binary matrix with a block of 1's for each repeat encoded in pattern_mat whose length is encoded in pattern_key

The inputs for the function are:
- __pattern_mat__: a binary matrix with 1's where repeats begin and 0's otherwise
- __pattern_key__: an integer denoting the number of feature vectors per audio shingle

The outputs for the function are:
- __pattern_block__: a binary matrix representation for pattern_mat with blocks of 1's equal to the length's prescribed in pattern_key

## stretch_diags

Creates binary matrix with full length diagonals from binary matrix of diagonal starts and length of diagonals
        
The inputs for the function are:
- __thresh_diags__: a binary matrix where entries equal to 1 signal the existence of a diagonal
- __band_width__: the length of encoded diagonals

The outputs for the function are:
- __stretch_diag_mat__: a logical matrix with diagonals of length band_width starting at each entry prescribed in thresh_diag

In [16]:
thresh_diags = np.matrix([[0,0,1,0,0],
                         [0,1,0,0,0],
                         [0,0,1,0,0],
                         [0,0,0,0,0],
                         [0,0,0,0,0]])

band_width = 3

print("The input matrix is:")
print(thresh_diags)
print("The length of the encoded diagonals is:",band_width)

The input matrix is:
[[0 0 1 0 0]
 [0 1 0 0 0]
 [0 0 1 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
The length of the encoded diagonals is: 3


In [17]:
stretched_diagonal = stretch_diags(thresh_diags,band_width)

print("The output matrix is:")
print(stretched_diagonal)

The output matrix is:
[[False False False False False False False]
 [False  True False False False False False]
 [ True False  True False False False False]
 [False  True False  True False False False]
 [False False  True False  True False False]
 [False False False False False False False]
 [False False False False False False False]]
