# TensorFlow Tutorial

## Drug-Kinase Interaction Prediction with CNN

In [108]:
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import time
from datetime import timedelta
import math

# PROBLEM:

* This is a Multi-label classification problem where multiple labels may be assigned to each instance.

https://nickcdryan.com/2017/01/23/multi-label-classification-a-guided-tour/

https://stats.stackexchange.com/questions/12702/what-are-the-measure-for-accuracy-of-multilabel-data?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

## Cost Function

There are two ways to penalize the instances:
* if you do not want to miss any label in an image then if the classification gets all right but one,you should consider the whole things wrong,
* you can also that the label missed or misclassified is an error

We use the second method:

`sigmoid_cross_entropy_with_logits` is a TensorFlow function that penalizes each output node independently. It uses binary loss and model the output of the netowrk as an independed Bernouli distribution per label.

## Load data

In [109]:
drug_fingerprints_fh = 'sample/sample_fingerprints.csv'
drug_targets_fh      = 'sample/sample_targets.csv'
drug_weights_fh      = 'sample/sample_weights.csv'

### Dimensions of data 

In [110]:
sample_size       = 101
fingerprint_size  = 1024
fingerprint_width = 32
targets_num       = 420
weights_num       = 420
num_channels      = 1

### Function helper that populates data structures with the actual data

In [111]:
import re
def populate_data(file_handle,data_matrix, data_size):
    with open(file_handle) as fh:
        j=0
        content = fh.readlines()
        content = [x.strip() for x in content]
        for line in content:
            result = re.split(r'[,\t]\s*',line)
            for i in range(1,data_size+1):
                data_matrix[j][i-1] = np.float32(result[i])
            j = j+1
    print(j)
    fh.close()

### Data structures for loaded data

In [112]:
drug_fingerprints = []
drug_targets      = []
drug_weights      = []


for i in range(sample_size):
    fingerprint_holder = [0]* fingerprint_size
    drug_fingerprints.append(fingerprint_holder)
    
for i in range(sample_size):
    target_holder = [0]* targets_num
    drug_targets.append(target_holder)

for i in range(sample_size):
    weight_holder = [0]* weights_num
    drug_weights.append(weight_holder)

In [113]:
populate_data(drug_weights_fh, drug_weights, weights_num)
populate_data(drug_targets_fh, drug_targets, targets_num)
populate_data(drug_fingerprints_fh, drug_fingerprints, fingerprint_size)

IndexError: list index out of range

In [98]:
drug_fingerprints = np.array(drug_fingerprints)
drug_targets      = np.array(drug_targets)
drug_weights      = np.array(drug_weights)

In [99]:
type(drug_fingerprints)

numpy.ndarray

In [100]:
type(drug_fingerprints[0])

numpy.ndarray

In [101]:
type(drug_fingerprints[0][0])

numpy.float32

# TensorFlow

## Placeholders

Placeholder for the flat 'array' with **fingerprint** of each compound. `None` means that this tensor can hold arbitrary number of arrays with the fingerprints.

In [102]:
x = tf.placeholder(tf.float32, [None, fingerprint_size])

X is first define as and vecor of size `fingerprint_size` and its then redefine and reshape input as a 2D matrix (image)

In [103]:
x_image = tf.reshape(x, [-1, fingerprint_width, fingerprint_width, num_channels])

Placeholder for the true labels (true targets) for each compound. (Here we have 420 targets - kinases).

In [104]:
y_true = tf.placeholder(tf.float32, [None, targets_num])

Placeholder for weights. The weights will be used to calculate cross-entropy cost function.

In [105]:
cross_entropy_weights = tf.placeholder(tf.float32, [None, weights_num])

## Variables to Optimize

In [106]:
def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))

In [107]:
def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))

## Helper Function to Create Convolutional Layer

In [34]:
def new_conv_layer(input,              # The previous layer.
                   num_input_channels, # Num. channels in prev. layer.
                   filter_size,        # Width and height of each filter.
                   num_filters,        # Number of filters.
                   use_pooling=True):  # Use 2x2 max-pooling.

    # Shape of the filter-weights for the convolution.
    shape = [filter_size, filter_size, num_input_channels, num_filters]

    # Create new weights aka. filters with the given shape.
    weights = new_weights(shape=shape)

    # Create new biases, one for each filter.
    biases = new_biases(length=num_filters)

    # But e.g. strides=[1, 2, 2, 1] would mean that the filter
    # is moved 2 pixels across the x- and y-axis of the image.
    layer = tf.nn.conv2d(input=input,
                         filter=weights,
                         strides=[1, 1, 1, 1],
                         padding='SAME')
    
    # Add the biases to the results of the convolution.
    layer += biases

    # Use pooling to down-sample the image resolution?
    if use_pooling:
        # This is 2x2 max-pooling, which means that we
        # consider 2x2 windows and select the largest value
        # in each window. Then we move 2 pixels to the next window.
        layer = tf.nn.max_pool(value=layer,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME')

    # Rectified Linear Unit (ReLU).
    layer = tf.nn.relu(layer)

    return layer, weights

## Helper Function for Flattening a Layer

A convolutional layer produces an output tensor with 4 dimensions. We will add fully-connected layers after the convolution layers, so we need to reduce the 4-dim tensor to 2-dim which can be used as input to the fully-connected layer.

In [35]:
def flatten_layer(layer):
        # Get the shape of the input layer.
    layer_shape = layer.get_shape()

    # The shape of the input layer is assumed to be:
    # layer_shape == [num_images, img_height, img_width, num_channels]

    # The number of features is: img_height * img_width * num_channels
    # We can use a function from TensorFlow to calculate this.
    num_features = layer_shape[1:4].num_elements()
    
    # Reshape the layer to [num_images, num_features].
    # Note that we just set the size of the second dimension
    # to num_features and the size of the first dimension to -1
    # which means the size in that dimension is calculated
    # so the total size of the tensor is unchanged from the reshaping.
    layer_flat = tf.reshape(layer, [-1, num_features])

    # The shape of the flattened layer is now:
    # [num_images, img_height * img_width * num_channels]

    # Return both the flattened layer and the number of features.
    return layer_flat, num_features

## Helper Function for Creating a New Fully-Connected Layer

This function creates a new fully-connected layer in the computational graph for TensorFlow. Nothing is actually calculated here, we are just adding the mathematical formulas to the TensorFlow graph.

It is assumed that the input is a 2-dim tensor of shape `[num_images, num_inputs]`. The output is a 2-dim tensor of shape `[num_images, num_outputs]`.

In [36]:
def new_fc_layer(input,          # The previous layer.
                 num_inputs,     # Num. inputs from prev. layer.
                 num_outputs,    # Num. outputs.
                 use_relu=True): # Use Rectified Linear Unit (ReLU)?

    # Create new weights and biases.
    weights = new_weights(shape=[num_inputs, num_outputs])
    biases = new_biases(length=num_outputs)

    # Calculate the layer as the matrix multiplication of
    # the input and weights, and then add the bias-values.
    layer = tf.matmul(input, weights) + biases

    # Use ReLU?
    if use_relu:
        layer = tf.nn.relu(layer)

    return layer

# Design Computational Graph for CNN

We will create a CNN with two convolutional layers and flat layers. Specifically:
* **Fist Layer** 

In [37]:
filter_size1 = 7
num_filters1 = 16

* **Second Layer**

In [38]:
filter_size2 = 3
num_filters2 = 32

* **Fully Connected Layer**

In [39]:
fc_size = 576

### Convolutional Layer #1

In [40]:
layer_conv1, weights_conv1 = \
    new_conv_layer(input=x_image,
                   num_input_channels=num_channels,
                   filter_size=filter_size1,
                   num_filters=num_filters1,
                   use_pooling=True)

In [41]:
layer_conv1

<tf.Tensor 'Relu:0' shape=(?, 16, 16, 16) dtype=float32>

### Convolutional Layer #2

In [42]:
layer_conv2, weights_conv2 = \
    new_conv_layer(input=layer_conv1,
                   num_input_channels=num_filters1,
                   filter_size=filter_size2,
                   num_filters=num_filters2,
                   use_pooling=True)

In [43]:
layer_conv2

<tf.Tensor 'Relu_1:0' shape=(?, 8, 8, 32) dtype=float32>

### Flatten Layer

In [44]:
layer_flat, num_features = flatten_layer(layer_conv2)

In [45]:
layer_flat

<tf.Tensor 'Reshape_1:0' shape=(?, 2048) dtype=float32>

### Fully-Connected Layer 1

In [46]:
layer_fc1 = new_fc_layer(input=layer_flat,
                         num_inputs=num_features,
                         num_outputs=fc_size,
                         use_relu=True)

In [47]:
layer_fc1

<tf.Tensor 'Relu_2:0' shape=(?, 576) dtype=float32>

### Fully-Connected Layer 2

In [48]:
layer_fc2 = new_fc_layer(input=layer_fc1,
                         num_inputs=fc_size,
                         num_outputs=targets_num,
                         use_relu=False)

In [49]:
layer_fc2

<tf.Tensor 'add_3:0' shape=(?, 420) dtype=float32>

## Predicted Classes

In [50]:
y_pred =tf.round(tf.nn.sigmoid(layer_fc2))

In [51]:
y_pred

<tf.Tensor 'Round:0' shape=(?, 420) dtype=float32>

# Cost Function to be Optimized

## tf.nn.weighted_cross_entropy_with_logits 

In [52]:
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=layer_fc2,
                                                        labels=y_true)

In [53]:
cross_entropy

<tf.Tensor 'logistic_loss:0' shape=(?, 420) dtype=float32>

In [54]:
cost = tf.reduce_mean(cross_entropy)

In [55]:
cost

<tf.Tensor 'Mean:0' shape=() dtype=float32>

### Optimization Method

In [56]:
optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(cost)

### Performance Measures

In [57]:
#accuracy =tf.metrics.accuracy(labels=y_true,predictions=y_pred, weights=cross_entropy_weights)

In [58]:
accuracy, accuracy_ops =tf.metrics.accuracy(labels=y_true,predictions=y_pred)

In [59]:
stream_vars = [i for i in tf.local_variables()]

In [60]:
print(stream_vars)

[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>, <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]


### Create TensorFlow session

In [61]:
session = tf.Session()
init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
session.run(init)

In [62]:
train_batch_size = 64

In [63]:
def fetch_batch(batch_size):
    chosen = np.random.randint(len(drug_fingerprints), size = batch_size)
    X_batch = drug_fingerprints[chosen, :]
    y_batch = drug_targets[chosen, :]
    cross_entropy_weights = drug_weights[chosen,:]
    return X_batch,y_batch,cross_entropy_weights

In [64]:
# Counter for total number of iterations performed so far.
total_iterations = 0

def optimize(num_iterations):
    # Ensure we update the global variable rather than a local copy.
    global total_iterations

    # Start-time used for printing time-usage below.
    start_time = time.time()

    for i in range(total_iterations,
                   total_iterations + num_iterations):

        # Get a batch of training examples.
        # x_batch now holds a batch of images and
        # y_true_batch are the true labels for those images.
        x_batch, y_true_batch, weights_batch = fetch_batch(train_batch_size)

        # Put the batch into a dict with the proper names
        # for placeholder variables in the TensorFlow graph.
        feed_dict_train = {x: x_batch,
                           y_true: y_true_batch,
                          cross_entropy_weights: weights_batch}

        # Run the optimizer using this batch of training data.
        # TensorFlow assigns the variables in feed_dict_train
        # to the placeholder variables and then runs the optimizer.
        session.run(optimizer, feed_dict=feed_dict_train)

        # Print status every 100 iterations.
        if i % 1 == 0:
            # Calculate the accuracy on the training-set.
            acc_ops = session.run(accuracy_ops, feed_dict=feed_dict_train)

            # Message for printing.
            msg = "Optimization Iteration: , Training Accuracy:"

            # Print it.
            print(msg)
            print(acc_ops)
            print(i+1)
            print('[total, count]:',session.run(stream_vars)) 

    # Update the total number of iterations performed.
    total_iterations += num_iterations

    # Ending time.
    end_time = time.time()

    # Difference between start and end-times.
    time_dif = end_time - start_time

    # Print the time-usage.
    print("Time usage: " + str(timedelta(seconds=int(round(time_dif)))))

In [65]:
optimize(num_iterations=201)

Optimization Iteration: , Training Accuracy:
0.39401042
1
[total, count]: [10591.0, 26880.0]
Optimization Iteration: , Training Accuracy:
0.4124814
2
[total, count]: [22175.0, 53760.0]
Optimization Iteration: , Training Accuracy:
0.43451142
3
[total, count]: [35039.0, 80640.0]
Optimization Iteration: , Training Accuracy:
0.45207402
4
[total, count]: [48607.0, 107520.0]
Optimization Iteration: , Training Accuracy:
0.47118303
5
[total, count]: [63327.0, 134400.0]
Optimization Iteration: , Training Accuracy:
0.48868427
6
[total, count]: [78815.0, 161280.0]
Optimization Iteration: , Training Accuracy:
0.5062872
7
[total, count]: [95263.0, 188160.0]
Optimization Iteration: , Training Accuracy:
0.5225028
8
[total, count]: [112359.0, 215040.0]
Optimization Iteration: , Training Accuracy:
0.5390666
9
[total, count]: [130411.0, 241920.0]
Optimization Iteration: , Training Accuracy:
0.5534933
10
[total, count]: [148779.0, 268800.0]
Optimization Iteration: , Training Accuracy:
0.5672754
11
[total

Optimization Iteration: , Training Accuracy:
0.9148924
91
[total, count]: [2237900.0, 2446080.0]
Optimization Iteration: , Training Accuracy:
0.91581666
92
[total, count]: [2264778.0, 2472960.0]
Optimization Iteration: , Training Accuracy:
0.91671985
93
[total, count]: [2291653.0, 2499840.0]
Optimization Iteration: , Training Accuracy:
0.9176058
94
[total, count]: [2318533.0, 2526720.0]
Optimization Iteration: , Training Accuracy:
0.9184731
95
[total, count]: [2345413.0, 2553600.0]
Optimization Iteration: , Training Accuracy:
0.9193224
96
[total, count]: [2372293.0, 2580480.0]
Optimization Iteration: , Training Accuracy:
0.9201541
97
[total, count]: [2399173.0, 2607360.0]
Optimization Iteration: , Training Accuracy:
0.9209681
98
[total, count]: [2426051.0, 2634240.0]
Optimization Iteration: , Training Accuracy:
0.9217664
99
[total, count]: [2452931.0, 2661120.0]
Optimization Iteration: , Training Accuracy:
0.9225487
100
[total, count]: [2479811.0, 2688000.0]
Optimization Iteration: , T

Optimization Iteration: , Training Accuracy:
0.9559813
176
[total, count]: [4522633.0, 4730880.0]
Optimization Iteration: , Training Accuracy:
0.95623004
177
[total, count]: [4549513.0, 4757760.0]
Optimization Iteration: , Training Accuracy:
0.9564759
178
[total, count]: [4576393.0, 4784640.0]
Optimization Iteration: , Training Accuracy:
0.9567191
179
[total, count]: [4603273.0, 4811520.0]
Optimization Iteration: , Training Accuracy:
0.95695955
180
[total, count]: [4630153.0, 4838400.0]
Optimization Iteration: , Training Accuracy:
0.9571973
181
[total, count]: [4657033.0, 4865280.0]
Optimization Iteration: , Training Accuracy:
0.9574325
182
[total, count]: [4683913.0, 4892160.0]
Optimization Iteration: , Training Accuracy:
0.9576651
183
[total, count]: [4710793.0, 4919040.0]
Optimization Iteration: , Training Accuracy:
0.9578952
184
[total, count]: [4737673.0, 4945920.0]
Optimization Iteration: , Training Accuracy:
0.95812196
185
[total, count]: [4764549.0, 4972800.0]
Optimization Iter

# How to correctly use accuracy in TensorFlow?

https://stackoverflow.com/questions/46409626/how-to-properly-use-tf-metrics-accuracy

The accuracy function `tf.metrics.accuracy` calculates how often predictions matches labels based on two local variables it creates: `total` and `count`, that are used to compute the frequency with which logits matches labels. 

In [2]:
logits = tf.placeholder(tf.int64, [2,3])
labels = tf.Variable([[0, 1, 0], [1, 0, 1]])

acc, acc_op = tf.metrics.accuracy(labels=tf.argmax(labels, 1),   
                                  predictions=tf.argmax(logits,1))

### Initialize local variables
Since metrics.accuracy creates two local variables `total` and `count`, we need to call `local_variables_initializer()` to initialize them.

In [3]:
sess = tf.Session()

sess.run(tf.local_variables_initializer())
sess.run(tf.global_variables_initializer())

stream_vars = [i for i in tf.local_variables()]
print(stream_vars)

[<tf.Variable 'accuracy/total:0' shape=() dtype=float32_ref>, <tf.Variable 'accuracy/count:0' shape=() dtype=float32_ref>]


In [4]:
a=[[0,1,0],[1,0,1]]

In [5]:
np.argmax(a,axis=1)

array([1, 0])

In [6]:
b = [[1,0,0],[0,1,0]]

In [7]:
np.argmax(b,axis=1)

array([0, 1])

In [8]:
print('acc:',sess.run(acc, {logits:[[0,1,0],[1,0,1]]}))

acc: 0.0


In the example above even though we gave exactly matching examples, the total and count is zero => accuracy is 0.0

In [9]:
print('[total, count]:',sess.run(stream_vars)) 

[total, count]: [0.0, 0.0]


In [10]:
print('ops:', sess.run(acc_op, {logits:[[0,1,0],[1,0,1]]})) 
print('[total, count]:',sess.run(stream_vars)) 

ops: 1.0
[total, count]: [2.0, 2.0]


Above, we do it again now calling acc_op and we get actual updated results

Now we give new instance that is totally wrong/mismatched

In [11]:
print('acc:', sess.run(acc,{logits:[[1,0,0],[0,1,0]]}))

acc: 1.0


In [12]:
print('op:',sess.run(acc_op,{logits:[[0,1,0],[0,1,0]]}))

op: 0.75


In [13]:
print('[total, count]:',sess.run(stream_vars)) 

[total, count]: [3.0, 4.0]
