# The transform module

The `transform` module of the `ah` package holds functions used to 'transform' signal data into different forms.
It includes the following functions:

- **remove_overlaps**: Removes any pairs of repeats with the same length and annotation markers where at least one pair of repeats overlap in time.

- **\_\_create_anno_remove_overlaps**: Marks rows of repeats with annotation markers for the start indices and zeroes otherwise. After removing the annotations that have overlaps, creates separate array for annotations with overlaps and annotations without overlaps. Finally, the annotation markers are checked and fixed if necessary.

- **\_\_create_anno_rows**: Turns rows of repeats into marked rows with annotation markers for start indices and zeroes otherwise. Then checks if the correct annotation markers were given and fixes the markers if necessary.

- **\_\_separate_anno_markers**: Expands vector of non-overlapping repeats into a matrix representation. The matrix representation is a visual record of where all of the repeats in a song start and end.


As evidenced by the names, many of these functions overlap and perform similar tasks. In fact, all of the previous functions are called in remove_overlaps.

The functions in the `ah` package are meant to be used alongside other functions in the package, so many examples use multiple functions. In the examples below, the following functions from the [`utilities`](../ah/blob/master/aligned-hierarchies/utilities.py) module are called:
- add_annotations
- \_\_find_song_pattern
- reconstruct_full_block

For more in depth information on the function calls, an example function pipeline is shown below. Functions from the current module are shown in green.
![alt text](pictures/function_pipeline.png)

### Importing necessary modules

In [1]:
#used for mathematical calculations
import numpy as np

#import transform
from transform import *
from transform import __create_anno_rows, __create_anno_remove_overlaps, __separate_anno_markers

## remove_overlaps

Removes any pairs of repeat length and specific annotation marker where there exists at least one pair of repeats that do overlap in time.

The inputs for the function are:
- __input_mat__ (np.ndarray): A list of pairs of repeats with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation markers.
- __song_length__ (int): The number of audio shingles in the song.

The outputs for the function are:
- __list_no_overlaps__ (np.ndarray): A list of pairs of non-overlapping repeats with annotations marked. All the repeats of a given length and with a specific annotation marker do not overlap in time.
- __matrix_no_overlaps__ (np.ndarray): A matrix representation of _list\_no\_overlaps_ where each row corresponds to a group of repeats.
- __key_no_overlaps__ (np.ndarray): A vector containing the lengths of the repeats in each row of _matrix\_no\_overlaps_.
- __annotations_no_overlaps__ (np.ndarray): A vector containing the annotations of the repeats in each row of _matrix\_no\_overlaps_.
- __all_overlap_lst__ (np.ndarray): A list of pairs of repeats with annotations marked removed from _input\_mat_. For each pair of repeat length and specific annotation marker, there exists at least one pair of repeats that do overlap in time.

In [8]:
input_mat = np.array(([1, 15, 31, 45, 15, 0], 
                      [1, 10, 46, 55, 10, 0], 
                      [31, 40, 46, 55, 10, 0],
                      [1, 10, 31, 40, 10, 0], 
                      [11, 15, 41, 45, 5, 0]))
song_length = 55

print("The input array is: \n",input_mat)
print("The number of shingles is:",song_length)

The input array is: 
 [[ 1 15 31 45 15  0]
 [ 1 10 46 55 10  0]
 [31 40 46 55 10  0]
 [ 1 10 31 40 10  0]
 [11 15 41 45  5  0]]
The number of shingles is: 55


In [9]:
output = remove_overlaps(input_mat, song_length)
list_no_overlaps = output[0]
matrix_no_overlaps = output[1]
key_no_overlaps = output[2]
annotations_no_overlaps = output[3]
all_overlap_list = output[4]

print("The array of the non-overlapping repeats is: \n", list_no_overlaps)
print("The matrix representation of the non-overlapping repeats is: \n", matrix_no_overlaps)
print("The lengths of the repeats in matrix_no_overlaps are: \n", key_no_overlaps)
print("The annotations from matrix_no_overlaps are: \n", annotations_no_overlaps)
print("The array of overlapping repeats is: \n", all_overlap_list)

The array of the non-overlapping repeats is: 
 [[11 15 41 45  5  0]
 [ 1 10 31 40 10  0]
 [ 1 10 46 55 10  0]
 [31 40 46 55 10  0]
 [ 1 15 31 45 15  0]]
The matrix representation of the non-overlapping repeats is: 
 []
The lengths of the repeats in matrix_no_overlaps are: 
 []
The annotations from matrix_no_overlaps are: 
 []
The array of overlapping repeats is: 
 []


##  \_\_create_anno_remove_overlaps

Turns _rep_mat_ into marked rows with annotation markers for the start indices and zeroes otherwise. After removing the annotations that have overlaps, it outputs _k\_lst\_out_ which only contains rows that have no overlaps. Then takes the annotations that have overlaps from _k\_lst\_out_ and puts them in _overlap\_lst_. Lastly, this function checks if the proper sequence of annotation markers was given and fixes them if necessary.

The inputs for the function are:
- __k_mat__ (np.ndarray): A list of pairs of repeats of length 1 with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation.
- __song_length__ (int): The number of audio shingles in the song.
- __band_width__ (int): the length of the repeats in _k\_mat_.

The outputs for the function are:
- __pattern_row__ (np.ndarray): a 1-D array marking start indices of non-overlapping repeats with 1 and 0s otherwise.
- __k_list_out__ (np.ndarray): similar to input _rep_\__mat_ with overlapping repeats removed
- __overlap_list__ (np.ndarray): similar to input _rep_\__mat_ containing only overlapping repeats

In [4]:
k_mat = np.array(([1, 5, 41, 45, 5, 1], 
                    [36, 40, 51, 55, 5, 1], 
                    [1, 5, 36, 40, 5, 1], 
                    [11, 15, 41, 45, 5, 1]))
song_length = 55
band_width = 5

print("The input array is: \n",k_mat)
print("The number of shingles is:",song_length)
print("The length of the repeats is:",band_width)

The input array is: 
 [[ 1  5 41 45  5  1]
 [36 40 51 55  5  1]
 [ 1  5 36 40  5  1]
 [11 15 41 45  5  1]]
The number of shingles is: 55
The length of the repeats is: 5


In [5]:
output = __create_anno_remove_overlaps(rep_mat, song_length, band_width)
pattern_row = output[0]
k_list_out = output[1]
overlap_list = output[2]

print("The pattern of the repeats is:", pattern_row)
print("The array of non-overlapping repeats is: \n", k_list_out)
print("The array of overlapping repeats is: \n", overlap_list)

The pattern of the repeats is: [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0]
The array of non-overlapping repeats is: 
 [[ 1  5 36 40  5  1]
 [ 1  5 41 45  5  1]
 [11 15 41 45  5  1]
 [36 40 51 55  5  1]]
The array of overlapping repeats is: 
 []


## \_\_create_anno_rows

`create_anno_rows` turns a list of pairs of repeats with lengths of 1 into marked rows with annotation markers for the start indices and zeroes otherwise. It also checks if the proper sequence of annotation markers was given and fixes them if necessary. It loops over all annotations in ascending order. 

The inputs for the function are:
- __rep_mat__ (np.ndarray): pairs of repeats of the same length with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation. 
- __song_length__ (int): the number of shingles in the song

The outputs for the function are:
- __pattern_row__ (np.ndarray): a 1-D array marking start indices of non-overlapping repeats
- __k_list_out__ (np.ndarray): similar to input _rep\_mat_ with overlapping repeats removed

In [None]:
rep_mat = np.array(([1, 5, 41, 45, 5, 1], 
                    [36, 40, 51, 55, 5, 1], 
                    [1, 5, 36, 40, 5, 1], 
                    [11, 15, 41, 45, 5, 1]))
song_length = 55

print("The input array is: \n", rep_mat)
print("The number of shingles is:", song_length)

In [None]:
output = __create_anno_rows(rep_mat, song_length)
pattern_row = output[0]
k_list_out = output[1]

print("The pattern of the repeats is:\n", pattern_row)
print("The output array is: \n", k_list_out)

##  \_\_separate_anno_markers

`separate_anno_markers` expands _pattern\_row_, that marks where non-overlapping repeats occur, into a matrix representation of where all of the repeats in a song start and end. The dimension of this array is twice the pairs of repeats by the length of the song. _rep\_mat_ provides a list of annotation markers that is used in separating the repeats of length _band\_width_ into individual rows. Each row will mark the start and end time steps of a repeat with 1's and 0's otherwise. The array is a visual record of where all of the repeats in a song start and end.


The inputs for the function are:
- __rep_mat__ (np.ndarray): pairs of repeats with annotations marked. The first two columns refer to the first repeat, the second two columns refer to the second repeat, the fifth column denotes repeat length, and the last column contains the annotation
- __song_length__ (int): the number of shingles in the song
- __band_width__ (int): the length of the repeats in _rep\_mat_
- __pattern_row__ (np.ndarray): a 1-D array marking start indices of non-overlapping repeats

The outputs for the function are:
- __pattern_mat__ (np.ndarray): a matrix representation of where each row contains a marked group of repeats
- __pattern_key__ (np.ndarray): the lengths of the repeats in each row of _pattern\_mat_
- __anno_id_list__ (np.ndarray): the annotations of the repeats in each row of _pattern\_mat_

In [None]:
rep_mat = np.array(([1, 5, 41, 45, 5, 1], 
                    [36, 40, 51, 55, 5, 1], 
                    [1, 5, 36, 40, 5, 1], 
                    [11, 15, 41, 45, 5, 1]))
song_length = 55
band_width = 5
# the pattern_row used is the same that is created in the previous functions

print("The input array is: \n", rep_mat)
print("The number of shingles is:", song_length)
print("The length of the repeats is:", band_width)
print("The pattern of the repeats is:", pattern_row)

In [None]:
output = __separate_anno_markers(rep_mat, song_length, band_width, pattern_row)
pattern_mat = output[0]
pattern_key = output[1]
anno_id_list = output[2]

print("The matrix representation of the repeats is: \n", pattern_mat)
print("The lengths of the repeats in pattern_mat are: \n", pattern_key)
print("The annotations from pattern_mat are: \n", anno_id_list)