# Results Log (Species Prediction Models)

## Regular Convolutional Neural Network

**Architecture:** (no regularization, filter width: 2, 80 filters)  
**[Conv/Pool/Drop] -> [Conv/Pool/Drop] -> [Conv/Pool/Drop]**  
Epochs: 60  
Training Accuracy: ~0.78 (training loss appears to still be declining)  
Validation Accuracy: ~0.54 (training loss/accuracy remained approximately the same since Epoch 40)  


## Convolutional Neural Network with Dilated Convolution Branch

**Architecture:**  
Regular Convolution Branch: **[Conv/Pool/Drop] -> [Conv/Pool/Drop] -> [Conv/Pool/Drop]**  
Dilated Convolution Branch: **[Conv-Dil: (1,2)/Pool/Drop] -> [Conv-Dil: (1,4)/Pool/Drop] -> [Conv-Dil: (1,8)/Pool/Drop]**  
Epochs: 25  
Training Accuracy: ~0.80  
Validation Accuracy: ~0.53  

**Architecture:**  
Regular Convolution Branch: **[Conv/Pool/Drop] -> [Conv/Pool/Drop] -> [Conv/Pool/Drop]**  
Dilated Convolution Branch: **[Conv-Dil: (1,8)/Pool/Drop] -> [Conv-Dil: (1,4)/Pool/Drop] -> [Conv-Dil: (1,2)/Pool/Drop]**  
Epochs: 40  
Training Accuracy: ~0.95 (@ 40 epochs), ~0.67 (@9 epochs)  
Validation Accuracy: ~0.50 (@40 epochs), ~0.58 (@9 epochs)  


**Architecture:** (no regularization!)  
Regular Convolution Branch: **[Conv/Pool/Drop] -> [Conv/Pool/Drop]**  
Dilated Convolution Branch: **[Conv-Dil: (1,4)/Pool/Drop] -> [Conv-Dil: (1,4)/Pool/Drop]**  
Stacked Output: **Stack/Reshape -> [Conv/Pool/Drop]**  
Epochs: 40  
Training Accuracy: ~0.94 (~0.73 @ 10 epochs)    
Validation Accuracy: ~0.49 (~0.54 @ 10 epochs)     

## LSTM Models
3 Current Models:  
1) LSTM only - involves breaking each promoter sequence into smaller subsequences (looking for short patterns, i.e. TF binding motifs)  
2) Conv + LSTM (sequential) - applies convolutions/pooling to "summarize" promoter sequence first and then applies LSTM to get longer range patterns across the entire sequence "summary"  
3) Conv + LSTM (parallel) - same as LSTM only model but with addition of separate branch consisting of multiple convolutional layers (representations of each branch are obtained via a dense layer for each branch and then concatenated)  

----

### Conv + LSTM
Architecture: (see below)   
Epochs: 20  
Training Accuracy: ~0.4-0.6 (highly variable...)  
Validation Accuracy: ~0.44 (validation loss doesn't appear to be declining)  

Training for more epochs...  
Epochs (total): 45  
Training Accuracy: ~0.75 (still highly variable...)  
Validation Accuracy: ~0.56  

----
Architecture: (see below, changed pooling dimensions to (1,5) and stride to (1,5))  
Epochs: 20  
Training Accuracy: ~0.5 (highly variable...)  
Validation Accuracy: ~0.4 (validation loss doesn't appear to be declining)  

In [None]:
# LSTM block
conv1 = Conv2D(8,[4,5],activation='linear',
                name='convTrans_1',padding='valid')(dna)
leak1 = LeakyReLU(alpha=.001)(conv1)
pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
conv2 = Conv2D(1,[1,1],activation='linear',name='convTrans1x1',padding='same')(pool1)
reshapedSeq = tf.reshape(tf.squeeze(conv2,[1,3]),[tf.shape(conv2)[0],31,8])
lstm1 = RNN(reshapedSeq,n_steps,n_hidden,100)

FC = Dense(50,activation='relu',name='representation')(lstm1)
preds = Dense(num_species,activation='softmax')(FC)

### LSTM + Conv
Architecture: (see below)  
Epochs: 20  
Training Accuracy: ~0.2 (highly variable...)  
Validation Accuracy: ~0.15 (validation loss doesn't appear to be declining) 

In [None]:
reshapedDNA = tf.concat(tf.split(tf.squeeze(dna,[3]),int(promoter_length/n_steps),2),0)
reshapedDNA = tf.reshape(reshapedDNA,[tf.shape(reshapedDNA)[0],tf.shape(reshapedDNA)[2],4])
with tf.variable_scope('LSTM1'):
    lstm1 = RNN(reshapedDNA,n_steps,n_hidden,32)

n_steps2 = int(promoter_length/n_steps)
reshapedLSTM1 = tf.reshape(lstm1,[tf.shape(dna)[0],n_steps2,32])

reshapedLSTM1 = tf.reshape(reshapedLSTM1,[tf.shape(reshapedLSTM1)[0],32,n_steps2])
drop1 = conv_pool_drop(tf.expand_dims(reshapedLSTM1,3),2,4,filter_width,num_filters)
drop2 = conv_pool_drop(drop1,2,4,filter_width,num_filters)

flat = Flatten()(drop2)
FC = Dense(50,activation='relu',name='representation')(flat)
preds = Dense(num_species,activation='softmax')(FC)

### LSTM (only)
Architecture: (see below)  
Epochs: ~20 (stopped)   
Patch Length: 10  
Training Accuracy: ~0.12-0.3 (highly variable...)    
Validation Accuracy: ~0.13    

Architecture: (see below, change 50s to 25)  
Epochs: 20  
Patch Length: 20  
Training Accuracy: ~0.2 (highly variable...)    
Validation Accuracy: ~0.14  

In [None]:
reshapedDNA = tf.concat(tf.split(tf.squeeze(dna,[3]),50,2),0)
reshapedDNA = tf.reshape(reshapedDNA,[tf.shape(reshapedDNA)[0],tf.shape(reshapedDNA)[2],4])
lstm1 = RNN(reshapedDNA,n_steps,n_hidden,100)
reshapedLSTM = tf.reshape(lstm1,[tf.shape(dna)[0],50,100])

flat = Flatten()(reshapedLSTM)
FC = Dense(50,activation='relu',name='representation')(flat)
preds = Dense(num_species,activation='softmax')(FC)

### LSTM (2 Layers)

n_steps = 20  
Architecture: (see below)  
Epochs: 20  
Training Accuracy: ~0.18  
Validation Accuracy: ~0.15    

----

n_steps = 5    
Architecture: (see below)  
Epochs: 20  
Training Accuracy: ~0.18  
Validation Accuracy: ~0.15    

In [None]:

reshapedDNA = tf.concat(tf.split(tf.squeeze(dna,[3]),int(promoter_length/n_steps),2),0)
reshapedDNA = tf.reshape(reshapedDNA,[tf.shape(reshapedDNA)[0],tf.shape(reshapedDNA)[2],4])
with tf.variable_scope('LSTM1'):
    lstm1 = RNN(reshapedDNA,n_steps,n_hidden,32)

n_steps2 = int(promoter_length/n_steps)
reshapedLSTM1 = tf.reshape(lstm1,[tf.shape(dna)[0],n_steps2,32])
with tf.variable_scope('LSTM2'):
    lstm2 = RNN(reshapedLSTM1,n_steps2,n_hidden,64)

FC = Dense(50,activation='relu',name='representation')(lstm2)
preds = Dense(num_species,activation='softmax')(FC)

### LSTM + Conv (Parallel Branches)
Architecture: (see below)  
Epochs: ~20 (stopped)  
Training Accuracy: ~0.6-0.8 (highly variable...)  
Validation Accuracy: ~0.55  


# Results Log (Species Prediction) - 10 Species "Baseline"

## Regular Convolutional Neural Network

Epochs: 60  
Training Accuracy: ~0.7-0.8 (variable...)  
Validation Accuracy: ~0.61  

In [3]:
execfile('analysis.py')
label_names = ['sCer','cEleg','Mouse','Human','sPom','Zebrafish','dMelan','Chicken','aThal','Lizard']
model_name = 'all10_model'
model_dir = 'results/all10/'
testdata_file = 'data/h5datasets/all10/validation_even.h5'
a = getPredictions(model_name,model_dir,testdata_file,label_names,limit=False)
print(a[1])

Using TensorFlow backend.


[[500  37   2  12 344  10  47   1  37  10]
 [ 36 828   3   8  63  16  25   0  20   1]
 [  4   1 680 211   0  14   4  49   8  29]
 [ 11   2 316 541   5  36   2  58   3  26]
 [187  46   0  10 654  27  12   0  64   0]
 [ 32  18  60  66  43 673  27  24  19  38]
 [ 81  54   7  14 105  59 638  11  28   3]
 [  5   1 155 113   2  28   6 668   0  22]
 [107  58  13  24 178  51  14   4 546   5]
 [  8   2 286 173   6 105  13  63   2 342]]


In [7]:
precision, recall = calcPrecisionRecall(a[1].astype(float))
print(precision)
print(recall)

[ 0.51493306  0.79083095  0.44678055  0.4616041   0.46714286  0.66045142
  0.80964467  0.76082005  0.75103164  0.71848739]
[ 0.5    0.828  0.68   0.541  0.654  0.673  0.638  0.668  0.546  0.342]


In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage
dendrogram(linkage(a[1]),labels=label_names)

<img src="Notes/images/all10dend.png" alt="Confusion Matrix Dendrogram (All 10 Species)" style="width: 700px;"/>

## Densely Connected Network

Epochs: 50  
Training Accuracy: ~0.76   
Validation Accuracy: ~0.65 (still appears to be room for improvement)  

Additional training...  
Epochs: 50 (100 total)  
Training Accuracy: ~0.85 (still continues to increase...)  
Validation Accuracy: ~0.66 (no signs of increase; same with loss)    

## RNN Models

### LSTM Cell(s)  

**timestep = 20**  
Architecture: (see below)
Training Accuracy: ~0.5 (variable...)  
Validation Accuracy: ~0.39  

Architecture: (same as below except with **stacked RNNs - 2**)  
Epochs: 20  
Training Accuracy: ~0.5 (variable...)    
Validation Accuracy: ~0.41  

Architecture: (same as above except with **stacked RNNs - 3**)  
Epochs: 
Training Accuracy: ~0.72 (continually increasing...)    
Validation Accuracy: ~0.38 (high of ~0.4 @25 epochs)   

----

**timestep = 50**  
Architecture: (see below)  
Epochs: 60  
Training Accuracy: ~0.60    
Validation Accuracy: ~0.45       

training for more epochs...  
Epochs: 120  
Training Accuracy: ~0.78 (~0.60 @ 60 epochs)  
Validation Accuracy: ~0.42 (~0.47 @ 60 epochs)    

----

**timestep = 100**  
Architecture: (see below)  
Epochs: 50  
Training Accuracy: ~0.57      
Validation Accuracy: ~0.48 (appears to be continually increasing, while loss is decreasing...train for more epochs)  


In [None]:
reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
with tf.variable_scope('LSTM1'):
    lstm1 = RNN(reshapedDNA,n_steps,n_hidden,32)
    
FC = Dense(50,activation='relu',name='representation')(lstm1)
preds = Dense(num_species,activation='softmax')(FC)

**Subsequence LSTM + concatenated output LSTM (timestep = 5)**  
Architecture: (see below)
Epochs: 30  
Training Accuracy: ~0.2-0.3 (highly variable)  
Validation Accuracy: ~0.18  

In [None]:
n_steps = 5

reshapedDNA = tf.concat(tf.split(tf.squeeze(dna,[3]),int(promoter_length/n_steps),2),0)
reshapedDNA = tf.reshape(reshapedDNA,[tf.shape(reshapedDNA)[0],tf.shape(reshapedDNA)[2],4])
with tf.variable_scope('LSTM1'):
    lstm1 = RNN(reshapedDNA,n_steps,n_hidden,100)

n_steps2 = int(promoter_length/n_steps)
reshapedLSTM1 = tf.reshape(lstm1,[tf.shape(dna)[0],n_steps2,100])
with tf.variable_scope('LSTM2'):
    lstm2 = RNN(reshapedLSTM1,n_steps2,n_hidden,100)

### GRU Models
Architecture: (see above, same except change in cell type)  
Hidden Units: 128  
Epochs: 80  
Training Accuracy: ~0.48  
Validation Accuracy: ~0.48 (doesn't seem to be continuing to decrease...)  

Hidden Units: 512  
Epochs: 80  
Training Accuracy: ~0.9    
Validation Accuracy: ~0.4    


# Results Log (Species Prediction) - 10 Species "Baseline" (Binary)

## Regular Convolutional Neural Network

Epochs: ~59 (stopped)    
Training Accuracy: ~0.68  
Validation Accuracy: ~0.5  

Epochs: 80  
Training "Exact Match" Accuracy: ~0.5 (highly variable...)  
Validation "Exact Match" Accuracy: ~0.47 (appears to be slowly increasing...same with total loss)    

**4 Conv/Pool/Drop Layers (w/ precision, recall statistics)**    
Epochs: 50  
Training Exact Match Accuracy: ~0.5    
Training Precision: ~0.8 (gets up to as high as 0.85)    
Training Recall: ~0.5    
Validation Exact Match Accuracy: ~0.44   
Validation Precision: ~0.75    
Validation Recall: ~0.44    

In [None]:
drop1 = conv_pool_drop(dna,1,filter_height1,filter_width,num_filters)
drop2 = conv_pool_drop(drop1,2,filter_height2,filter_width,num_filters)
drop3 = conv_pool_drop(drop2,3,filter_height2,filter_width,num_filters)
drop4 = conv_pool_drop(drop2,4,filter_height2,filter_width,num_filters)

**Regular Convolution + LSTM (Only)**  

Output File: all10binBinary_RegConv.output   

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ~0.5          
Training Precision: ~0.77 (gets up to as high as 0.78)    
Training Recall: ~0.46 (gets up to as high as 0.25)    
Validation Exact Match Accuracy: ~0.46         
Validation Precision: ~0.69        
Validation Recall: ~0.47      

In [None]:
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,61,n_hidden,50,celltype='GRU')

## Combinations

**Raw Branch + Regular Convolution Branch + Dilated Convolution Branch**  

Output File: all10binBinary3Branches.output  

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ~0.6      
Training Precision: ~0.82 (gets up to as high as 0.89)    
Training Recall: ~0.65  
Validation Exact Match Accuracy: ~0.53     
Validation Precision: ~0.75    
Validation Recall: ~0.56   


In [None]:
# Raw Branch
with tf.variable_scope('RNN1'):
    reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
    rnn1 = RNN(reshapedDNA,n_steps,n_hidden,50,celltype='GRU')

# Regular Convolution Branch
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,61,n_hidden,50,celltype='GRU')

# Dilated Convolution Branch
with tf.variable_scope('RNN3'):
    convDil1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid',dilation_rate=(1,5))(dna)
    leakDil1 = LeakyReLU(alpha=.001)(convDil1)
    print(leakDil1)
    poolDil1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leakDil1) 
    convDil2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid',dilation_rate=(1,5))(poolDil1)
    leakDil2 = LeakyReLU(alpha=.001)(convDil2)
    poolDil2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leakDil2) 
    print(poolDil2)
    reshapedDilConv = tf.reshape(tf.squeeze(poolDil2,[1]),[tf.shape(dna)[0],55,16])
    rnn3 = RNN(reshapedDilConv,55,n_hidden,50,celltype='GRU')  

**Raw Branch + Regular Convolution Branch + Dilated Convolution Branch (w/ All Sequence Outputs + Dense Layers at Ends)**  

Output File: all10binBinary3BranchesDense.output  

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ~0.8        
Training Precision: ~0.91 (gets up to as high as 0.95)    
Training Recall: ~0.85 (gets up to as high as 0.89)    
Validation Exact Match Accuracy: ~0.51       
Validation Precision: ~0.60      
Validation Recall: ~0.52     

In [None]:
with tf.variable_scope('RNN1'):
    reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
    rnn1 = RNN(reshapedDNA,n_steps,n_hidden,16,celltype='GRU',return_seq=True)
    rnn1stacked = tf.reshape(tf.concat(rnn1,axis=1),[tf.shape(dna)[0],n_steps,16])
    rnn1Dense = Dense(100,activation='relu')(Flatten()(rnn1stacked))
    print(rnn1Dense)

# Regular Convolution Branch
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,61,n_hidden,16,celltype='GRU',return_seq=True)
    rnn2stacked = tf.reshape(tf.concat(rnn2,axis=1),[tf.shape(dna)[0],61,16])
    print(rnn2stacked)
    rnn2Dense = Dense(100,activation='relu')(Flatten()(rnn2stacked))
    print(rnn2Dense)

# Dilated Convolution Branch
with tf.variable_scope('RNN3'):
    convDil1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid',dilation_rate=(1,5))(dna)
    leakDil1 = LeakyReLU(alpha=.001)(convDil1)
    poolDil1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leakDil1) 
    convDil2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid',dilation_rate=(1,5))(poolDil1)
    leakDil2 = LeakyReLU(alpha=.001)(convDil2)
    poolDil2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leakDil2) 
    reshapedDilConv = tf.reshape(tf.squeeze(poolDil2,[1]),[tf.shape(dna)[0],55,16])
    rnn3 = RNN(reshapedDilConv,55,n_hidden,16,celltype='GRU',return_seq=True)    
    rnn3stacked = tf.reshape(tf.concat(rnn3,axis=1),[tf.shape(dna)[0],55,16])
    print(rnn3stacked)
    rnn3Dense = Dense(100,activation='relu')(Flatten()(rnn3stacked))
    print(rnn3Dense)

**Raw Branch + Regular Convolution Branch + Dilated Convolution Branch (w/ Longer Time Steps, t=100)**  

Output File: all10binBinary3BranchesLong.output  

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ~0.6          
Training Precision: ~0.8 (gets up to as high as 0.88)    
Training Recall: ~0.6 (gets up to as high as 0.70)      
Validation Exact Match Accuracy: ~0.55         
Validation Precision: ~0.75       
Validation Recall: ~0.57       

In [None]:
# Raw Branch
with tf.variable_scope('RNN1'):
    reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
    rnn1 = RNN(reshapedDNA,n_steps,n_hidden,50,celltype='GRU')

# Regular Convolution Branch
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.squeeze(pool2,[1])
    # reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,122,n_hidden,50,celltype='GRU')

# Dilated Convolution Branch
with tf.variable_scope('RNN3'):
    convDil1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid',dilation_rate=(1,5))(dna)
    leakDil1 = LeakyReLU(alpha=.001)(convDil1)
    poolDil1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leakDil1) 
    convDil2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid',dilation_rate=(1,5))(poolDil1)
    leakDil2 = LeakyReLU(alpha=.001)(convDil2)
    poolDil2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leakDil2) 
    reshapedDilConv = tf.squeeze(poolDil2,[1])
    # reshapedDilConv = tf.reshape(tf.squeeze(poolDil2,[1]),[tf.shape(dna)[0],55,16])
    rnn3 = RNN(reshapedDilConv,110,n_hidden,50,celltype='GRU')  

**Raw Branch + Regular Convolution Branch + Dilated Convolution Branch (w/ Mean of all Sequence Outputs)**  

Output File: all10binBinary3BranchesMean.output  

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ~0.65       
Training Precision: ~0.85  
Training Recall: 0.68 (gets up to as high as 0.72)      
Validation Exact Match Accuracy: ~0.53             
Validation Precision: ~0.7 (stays fairly consistent)         
Validation Recall: ~0.53           

In [None]:
with tf.variable_scope('RNN1'):
    reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
    rnn1 = RNN(reshapedDNA,n_steps,n_hidden,50,celltype='GRU',return_seq=True)
    rnn1mean = tf.divide(tf.add_n(rnn1),float(n_steps))

# Regular Convolution Branch
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,61,n_hidden,50,celltype='GRU',return_seq=True)
    rnn2mean = tf.divide(tf.add_n(rnn2),61.)

# Dilated Convolution Branch
with tf.variable_scope('RNN3'):
    convDil1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid',dilation_rate=(1,5))(dna)
    leakDil1 = LeakyReLU(alpha=.001)(convDil1)
    print(leakDil1)
    poolDil1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leakDil1) 
    convDil2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid',dilation_rate=(1,5))(poolDil1)
    leakDil2 = LeakyReLU(alpha=.001)(convDil2)
    poolDil2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leakDil2) 
    print(poolDil2)
    reshapedDilConv = tf.reshape(tf.squeeze(poolDil2,[1]),[tf.shape(dna)[0],55,16])
    rnn3 = RNN(reshapedDilConv,55,n_hidden,50,celltype='GRU',return_seq=True)  
    rnn3mean = tf.divide(tf.add_n(rnn3),55.)

**Raw Branch + Regular Convolution Branch + Dilated Convolution Branch (w/ Attention)**  

Output File: all10binBinary3BranchesMean.output  

Architecture: (see below)  
Epochs: 80  
Training Exact Match Accuracy: ///         
Training Precision: ///  
Training Recall: /// (gets up to as high as 0.72)      
Validation Exact Match Accuracy: ///             
Validation Precision: /// (stays fairly consistent)         
Validation Recall: ///           

In [None]:
def attention(input_list):

    # one-layer MLP to get representation of each RNN output (i.e. tensor in input_list)
    annotRep = [Dense(50,activation='tanh')(inp) for inp in input_list]

    # context vector
    contextVecs = [tf.Variable(tf.random_normal([50])) for i in range(len(input_list))]

    # multiply context vectors and RNN output representations + apply softmax function
    # (to get normalized importance weights)
    impWeights = [tf.nn.softmax(tf.multiply(annotRep[i],contextVecs[i])) for i \
        in range(len(input_list))]

    # multiply original input tensors with importance weights and add products
    seqVec = tf.add_n([tf.multiply(input_list[i],impWeights[i]) for i in range(len(input_list))])
    
    return seqVec

with tf.variable_scope('RNN1'):
    reshapedDNA = tf.reshape(tf.squeeze(dna,[3]),[tf.shape(dna)[0],n_steps,int(promoter_length/n_steps)*4])
    rnn1 = RNN(reshapedDNA,n_steps,n_hidden,50,celltype='GRU',return_seq=True)
    att1 = attention(rnn1)

# Regular Convolution Branch
with tf.variable_scope('RNN2'):
    conv1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid')(dna)
    leak1 = LeakyReLU(alpha=.001)(conv1)
    pool1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leak1) 
    conv2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid')(pool1)
    leak2 = LeakyReLU(alpha=.001)(conv2)
    pool2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leak2) 
    reshapedConv = tf.reshape(tf.squeeze(pool2,[1]),[tf.shape(dna)[0],61,16])
    rnn2 = RNN(reshapedConv,61,n_hidden,50,celltype='GRU',return_seq=True)
    att2 = attention(rnn2)

# Dilated Convolution Branch
with tf.variable_scope('RNN3'):
    convDil1 = Conv2D(32,[4,5],activation='linear',
                    name='convTrans_1',padding='valid',dilation_rate=(1,5))(dna)
    leakDil1 = LeakyReLU(alpha=.001)(convDil1)
    poolDil1 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_1',padding='same')(leakDil1) 
    convDil2 = Conv2D(8,[1,5],activation='linear',
                    name='convTrans_2',padding='valid',dilation_rate=(1,5))(poolDil1)
    leakDil2 = LeakyReLU(alpha=.001)(convDil2)
    poolDil2 = AveragePooling2D((1,2),strides=(1,2),name='AvgPoolTrans_2',padding='same')(leakDil2) 
    reshapedDilConv = tf.reshape(tf.squeeze(poolDil2,[1]),[tf.shape(dna)[0],55,16])
    rnn3 = RNN(reshapedDilConv,55,n_hidden,50,celltype='GRU',return_seq=True)  
    att3 = attention(rnn3)