# Assemble

This journal runs examples for the following functions:
1. **breakup_overlaps_by_intersect** : Extract repeats in input_patter_obj that 
    has the starting indices of the repeats, into the essential structure 
    componets using bw_vec, that has the lengths of each repeat.
    
2. **check_overlaps**: Compares every pair of groups, determining if there are
    any repeats in any repeats in any pairs of the groups that overlap. 

3. **__num_of_parts** : Determine the number of blocks of consecutive time 
    steps in a list of time steps. A block of consecutive time steps represent 
    a distilled section of a repeat.    
4. **__inds_to_rows** :  Expands a vector containing the starting indices of a 
    piece or two of a repeat into a matrix representation recording when these
    pieces occur in the song with 1's. All remaining entries are marked with 
    0's.
5. **_compare_and_cut** : Compares two rows of repeats labeled RED and BLUE, and
    determines if there are any overlaps in time between them. If there is, 
    then we cut the repeats in RED and BLUE into up to 3 pieces. 
6. **merge_based_on_length** : Merges repeats that are the same length, as set 
    by full_bandwidth, and are repeats of the same piece of structure
7. **_merge_rows** : Merges rows that have at least one common repeat; said 
    common repeat(s) must occur at the same time step and be of common length
8. **hierarchical_structure** : Distills the repeats encoded in MATRIX_NO 
    (and KEY_NO) to the essential structure components and then builds the 
    hierarchical representation

### The pipeline in assemble module
<img src="assemble_color_sample.png" alt="Chart" style="width:150px;" align = "left"/>
<img src="assemble_pipeline.png" alt="Chart" style="width:340px;" align = "middle"/>


## Import Modules

In [2]:
import numpy as np
import assemble
from inspect import signature 
from search import find_all_repeats
from utilities import reconstruct_full_block

## 1. breakup_overlaps_by_intersect 

### About breakup_overlaps_by_intersect 

The purpose of this function is to create the essential structure components matrix. The essential structure components contain the smallest building blocks that form the basis for every repeat in the song. This matrix is created using an array, **input_pattern_object**, that has the starting indices of the repeats and a vector, **bw_vec**, that has the lengths of each repeats. 

The function also takes in a third parameter which is thresh_bw. Thresh_bw is the smallest allowable repeat length. 

The function includes a section where it checks how many arguments were passed by retrieving the function’s signature. If less than three arguments were passed, the thresh_bw argument is assigned to 0, otherwise thresh_bw is untouched. 

### Arguments
- input_pattern_obj: A binary matrix with 1's where repeats begin and 0's otherwise
-  bw_vec: A vector containing the lengths of the repeats encoded in input_pattern_obj
- thresh_bw: A number showing the smallest allowable repeat length 

### Returns
- pattern_no_overlaps: A binary matrix with 1's where repeats of essential structure components begin 
- pattern_no_overlaps_key: A vector containing the lengths of the repeats of essential structure components in pattern_no_overlaps 

### Example

#### Input

In [6]:
input_pattern_obj = np.array([[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ],
                              [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ],
                              [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
                              [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 ]])
bw_vec = np.array([[3],
                   [5],
                   [8],
                   [8]])
thresh_bw = 0
print("The input array is: \n", input_pattern_obj)
print("The lengths of the repeats in the input array is: \n", bw_vec)
print("The smallest allowable repeat length is: ", thresh_bw)

The input array is: 
 [[1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]]
The lengths of the repeats in the input array is: 
 [[3]
 [5]
 [8]
 [8]]
The smallest allowable repeat length is:  0


#### Output

In [7]:
output = assemble.breakup_overlaps_by_intersect(input_pattern_obj, bw_vec, thresh_bw)
print("The output array is: \n", output[0])
print("The lengths of the repeats in the output array is: \n", output[1])

The output array is: 
 [[1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0]
 [0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]]
The lengths of the repeats in the output array is: 
 [[3]
 [5]]


## 2. check_overlaps
### About check_overlaps:

This function compares every pair of groups and checks for overlaps between those pairs.To check every pair of groups, the function creates compare_left and compare_right. Compare_left repeats each row the number of rows times, and compare_rignt repeats the whole input the number of rows times times. By comparing each corresponding time step in compare_left and compare_right, it determines if there are any overlaps between groups.

### Arguments

- input_mat: An array waiting to be checked for overlaps

### Returns

- overlaps_yn: A logical array where (i,j) = 1 if row i of input matrix and row j of input matrix overlap and (i,j) = 0 elsewhere

### Example

In [9]:
input_mat = np.array([[0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
 [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0],
 [0,0,0,1,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0],
 [1,1,1,0,0,0,0,0,1,1,1,0,0,0,0,0,1,1,1]])
print("The input array waiting to be checked for overlaps is: \n", input_mat)

The input array waiting to be checked for overlaps is: 
 [[0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
 [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0]
 [1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1]]


In [10]:
output = assemble.check_overlaps(input_mat)
print("The output logical array is: \n", output)

The output logical array is: 
 [[False  True  True  True]
 [False False  True  True]
 [False False False False]
 [False False False False]]


## 3. __num_of_parts

### About __num_of_parts

This function is used to determine the number of blocks of consecutive time steps in a list of time steps. A block of consecutive time steps
represent a distilled section of a repeat. This distilled section will be replicated and the starting indices of the repeats within it will be 
returned. At the beginning of this function, it uses variable breakmark to check if input_vec contains a whole group of consecutive time steps or two groups of consecutive time steps. Then these two conditions go into the if-else statement separately and returns the moved starting indices and the corresponding lengths.

<img src="num_of_parts.png" alt="Chart" style="width:800px;" align = "middle"/>

As the above pictures shows, red and blue are two repeats, and purple is their intersection. Now we want to know the starting indices and length of the remaining part after cutting purple out of all of the repeats in red. Case a and b will go into if statement and c will go into else statement because we can see how many parts in input_vec in the picture clearly.

### Arguments

- input_vec: An array contains one or two parts of a repeat that are overlap(s) in time that may need to be replicated 
            
- input_start: An array contains starting index for the part to be replicated 
        
- input_all_starts: An array contains starting indices for replication 

### Returns

- start_mat: An array of one or two rows, containing the starting indices of the replicated repeats 
            
- length_vec: A column vector containing the lengths of the replicated parts 

### Examples

**Input 1** which goes into the if statement: 

In [12]:
input_vec = np.array([3,4])
input_start = np.array([0])
input_all_starts = np.array([3,7,10])
print("The input array is: \n", input_vec)
print("The starting indices for the part to be replicated is: \n", input_start)
print("The starting indices for replication is: \n", input_all_starts)

The input array is: 
 [3 4]
The starting indices for the part to be replicated is: 
 [0]
The starting indices for replication is: 
 [ 3  7 10]


**Output 1**

In [13]:
output = assemble.__num_of_parts(input_vec,input_start,input_all_starts)
print("The starting indices of the replicated repeats is: \n", output[0])
print("The lengths of the replicated parts is: ", output[1])

The starting indices of the replicated repeats is: 
 [ 6 10 13]
The lengths of the replicated parts is:  2


**Input 2** which goes into the else statement: 

In [14]:
input_vec = np.array([3,5])
input_start = np.array([3])
input_all_starts = np.array([3,7,10])
print("The input array is: \n", input_vec)
print("The starting indices for the part to be replicated is: \n", input_start)
print("The starting indices for replication is: \n", input_all_starts)

The input array is: 
 [3 5]
The starting indices for the part to be replicated is: 
 [3]
The starting indices for replication is: 
 [ 3  7 10]


**Output 2**

In [15]:
output = assemble.__num_of_parts(input_vec,input_start,input_all_starts)
print("The starting indices of the replicated repeats is: \n", output[0])
print("The lengths of the replicated parts is: \n", output[1])

The starting indices of the replicated repeats is: 
 [[ 3  7 10]
 [ 5  9 12]]
The lengths of the replicated parts is: 
 [[1]
 [1]]


## 4. __inds_to_rows

### About __inds_to_rows

This function expands a vector containing the starting indices of a piece or two of a repeat into a matrix representation recording when these pieces occur in the song with 1's. All remaining entries are marked with 0's. 

### Arguments

- start_mat: A matrix of one or two rows, containing the starting indices 
            
- row_length: length of the rows, an integer

### Returns

- new_mat: A matrix of one or two rows, with 1's where the starting indices and 0's otherwise 

### Examples

#### Input 

In [16]:
start_mat = np.array([0,1,6,7])
row_length = 10
print("The array containing the starting indices is: \n",start_mat)
print("The length of the rows is: ", row_length)

The array containing the starting indices is: 
 [0 1 6 7]
The length of the rows is:  10


#### Output

In [17]:
output = assemble.__inds_to_rows(start_mat, row_length)
print("The output array is: \n", output)

The output array is: 
 [[1 1 0 0 0 0 1 1 0 0]]


## 5. compare_and_cut

### About compare_and_cut

This function compares two rows of repeats labeled RED and BLUE, and determines if there are any overlaps in time between them, calling the repeats PURPLE. 

<img src="Red1Blue1Purple1.png" alt="Chart" style="width:800px;" align = "middle"/>

If there is, then it cuts the repeats in RED and BLUE into up to 3 pieces. This function first determines if there is any intersection between the rows, if there is, then it starts comparing one repeat in red to one repeat in blue. 



By using the intersection of two repeats(purple) and the function \__num_of_parts, we will know the new starting indices and its length. Then calling \__num_of_inds changes these new starting indices and lengths to binary matrixes with ones where repeats start and zeros otherwise. After we have the new matrixes, we call merge_based_on_length to merge repeats that are the same length.

<img src="Red2Blue2Purple2.png" alt="Chart" style="width:800px;" align = "middle"/>


If the merged results have repeats within a row, we will call compare_and_cut again on the row with overlapping repeats. In this case, PURPLE still have overlaps. When compare_and_cut is called on PURPLE, the output is:


<img src="Red3Blue3Purple3.png" alt="Chart" style="width:800px;" align = "middle"/>

After merge_based_on_length is called on these three rows, the output is:

<img src="Purple4.png" alt="Chart" style="width:800px;" align = "middle"/>

Now, this merged row is appended to the original cut red and blue, and the purple that had been cut is deleted. This finally outputs:

<img src="Red2Blue2Purple4.png" alt="Chart" style="width:800px;" align = "middle"/>




### Arguments

- red: A binary row vector encoding a set of repeats with 1's where each repeat starts and 0's otherwise 
            
- red_len: The length of repeats encoded in red 
            
- blue: A binary row vector encoding a set of repeats with 1's where each repeat starts and 0's otherwise 
            
- blue_len: The length of repeats encoded in blue 

### Returns

- union_mat: A binary matrix representation of up to three rows encoding non-overlapping repeats cut from red and blue
- union_length: A vector containing the lengths of the repeats encoded in union_mat

### Examples

#### Input

In [18]:
red = np.array([1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0])
red_len = np.array([5])
blue = np.array([1,1,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0])
blue_len = np.array([3])
print("The first set of repeat is: \n", red)
print("The length of the first set of repeat is: \n", red_len)
print("The second set of repeat is: \n",blue)
print("The length of the second set of repeat is: \n", blue_len)

The first set of repeat is: 
 [1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
The length of the first set of repeat is: 
 [5]
The second set of repeat is: 
 [1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0]
The length of the second set of repeat is: 
 [3]


#### Output

In [19]:
output = assemble.__compare_and_cut(red, red_len, blue, blue_len)
print("The output array containing the non-overlapping repeats is: \n",output[0])
print("The array containing the lengths of repeats is: \n", output[1])

The output array containing the non-overlapping repeats is: 
 [[1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0]
 [1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0]
 [0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0]]
The array containing the lengths of repeats is: 
 [[1]
 [1]
 [2]]


## 6. _merge_based_on_length

### About _merge_based_on_length

This function merges repeats that are the same length, as set by full_bandwidth, and are repeats of the same piece of structure. In the merging process, if there are rows that have at least one common repeat, the function will call _merge_rows to actually merge them.

### Arguments

- full_mat: A binary matrix with ones where repeats start and zeroes otherwise
        
- full_bw: The length of repeats encoded in input_mat
    
- target_bw: The length of repeats that we seek to merge

### Returns

- out_mat: A binary matrix with ones where repeats start and zeros otherwise with rows of full_mat merged if appropriate
        
- one_length_vec: The length of the repeats encoded in out_mat

### Examples

#### Input

In [20]:
full_mat = np.array([[0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0],[1,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0]])
full_bw = np.array([[2],[2]])
target_bw = np.array([[2],[2]])
print("The input array is: \n", full_mat)
print("The length of repeats in input array is: \n", full_bw)
print("The length of repeats we seek to merge is: \n",target_bw)

The input array is: 
 [[0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0]
 [1 1 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0]]
The length of repeats in input array is: 
 [[2]
 [2]]
The length of repeats we seek to merge is: 
 [[2]
 [2]]


#### Output

In [21]:
output = assemble.__merge_based_on_length(full_mat,full_bw,target_bw)
print("The merged array is: \n", output[0])
print("The length of repeats in the output array is: \n", output[1])

The merged array is: 
 [[1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 0 0 0]]
The length of repeats in the output array is: 
 [2]


## 7. __merge_rows

### About __merge_rows

This function merges rows that have at least one common repeat; said common repeat(s) must occur at the same time step and be of common length. In a while loop, the function checks all the unchecked rows one by one, finds indices of unmerged overlapping rows, unions rows with starting indices in common, and checks that newly merged rows do not cause overlaps within row (if there are conflicts, rerun compare_and_cut. When there is no unchecked row, it quits the function and finally returns the merged matrix.

### Arguments

- input_mat: A binary matrix with ones where repeats start and zeroes otherwise
        
- input_width: The length of repeats encoded in input_mat

### Returns

- merge_mat: A binary matrix with ones where repeats start and zeroes otherwise

### Examples

#### Input

In [22]:
input_mat = np.array([[0,0,1,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0],
 [1,1,1,0,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0]])
input_width = np.array([1])
print("The input array is: \n", input_mat)
print("The length of repeats in the input array is: \n",input_width)

The input array is: 
 [[0 0 1 1 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0]
 [1 1 1 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0]]
The length of repeats in the input array is: 
 [1]


In [23]:
output = assemble.__merge_rows(input_mat,input_width)
print("The merged array is: \n", output )

The merged array is: 
 [[1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0]]
