# RGB #Trial 3

A neural network is a computational tool that has to ability to easily learn complex patterns from a large quantity of data for applications such as classification and feature imitation.

A neural network consists of input nodes and subsequent nodes that sum the value of nodes before it, with each node value multiplied by a scalar. Initially, the scalar values are randomly assigned. But a partial derivative can determine the influence of each node on the final output. The values of the scalars are then updated based on the influence/partial-derivative values and the sizes of the classification errors. Over time, this gradient descent process would improve the classification accuracy. With sufficient training data, neural networks can then be used to identify scat samples in the wild based on their hyperspectral spectra.

## Setting up

#### 1. Specifying Network specific variable
Other than the variable in the next cell, the rest of the code is common for both a RGB and hyperspectral network

In [1]:
save_model_path = '../training_rgb'
data_path ="../Training data/RGB.csv"
n_o_input = 3

#### 2. Importing Dependency

In [2]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
%config InlineBackend.figure_format = 'retina'


import random

import pandas as pd
import pickle as pkl
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from random import randrange



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Data processing
#### 3. Importing data and splitting it into label and training data 

In [3]:
data = pd.read_csv(data_path)

data.head()

target_fields ='Class code'
data = data.drop(["Class","Sample","Session.Sample"],axis=1)
features0, targets0 = data.drop(target_fields, axis=1), data[target_fields]
features, targets  = np.array(features0) , np.array(targets0)

#### 4. Function to generate a non-repeating list of random numbers

In [4]:
def random_no_repeat(lower,upper,length):
    assert length<=(upper-lower)+1
    output = []
    while len(output)<length:
        x = random.randint(lower,upper)
        if not(x in output):
            output.append(x)
    return output

#### 5. Pick spectral data from random sessions for testing and validation set

In [5]:
validation_length = 1
test_length = 1
n_o_sample = 47
size_of_a_sample = 7

#Pick random labels for validation & test set
val_labels = []
test_labels = [] 
for num in range(n_o_sample):
    rand_upper = size_of_a_sample-1
    rand_len = test_length+validation_length
    ran_list = random_no_repeat(0,rand_upper,rand_len)
    
    rand_valid = np.array(ran_list[0:validation_length])
    val_labels.extend(n_o_sample*rand_valid+num)
    
    rand_t_start = validation_length
    rand_t_end = validation_length+test_length
    rand_test= np.array(ran_list[rand_t_start:rand_t_end])
    test_labels.extend(n_o_sample*rand_test+num)

#Labels for training set by removing those allocated to test & validation
train_labels = list(
    set([x for x in range(n_o_sample*size_of_a_sample)])
    - set(val_labels)
    - set(test_labels))

### Checkpoint 1
Check that the random session index for each sample is pick correctly, where labels of validation and test set should be same for given sample number

In [6]:
num = 0
print(num) #Sample number
print(n_o_sample*rand_valid+num,n_o_sample*rand_test+num) #Index of sample from random trials
print(targets[n_o_sample*rand_valid+num],targets[n_o_sample*rand_test+num]) #Lables of random samples

0
[94] [282]
[1] [1]


#### 6. The samples are ordered in related groups. Randomizing the data would remove biases from the sample's related order

In [7]:
from random import shuffle
shuffle(train_labels)
shuffle(val_labels)
shuffle(test_labels)
train_x, train_y = features[train_labels] ,targets[train_labels]
val_x , val_y = features[val_labels] , targets[val_labels]
test_x, test_y = features[test_labels] ,targets[test_labels]

### Checkpoint 2
Check that labels are in same order (disable the shuffle function first)


In [8]:
print(test_y)
print(val_y)

[1 2 2 5 1 2 1 2 5 4 1 2 2 3 1 3 3 3 5 3 5 4 3 2 1 3 4 2 3 1 5 2 6 7 8 2 4
 3 6 4 8 7 3 3 2 2 1]
[1 3 2 5 6 2 2 7 3 2 1 3 2 3 3 2 3 8 3 2 1 2 1 3 5 2 2 2 5 8 4 5 1 1 4 3 6
 4 3 1 1 4 5 4 3 2 7]


## Structuring the networks
#### 7. The 1st layer of the hyperspectral network
For the hyperspectral data set, an additional layer was added to the start of the network topology to reduce the data’s complexity. Traditionally, the first hidden layer looks for simpler features, in this case, the individual absorption features. Looking for correlation at an individual wavelength level, however, requires much computational cost. But as the absorption features exist as clusters of neighbouring wavelengths, we can use a convolution network like approach to convey the sequential and incremental relationship of wavelength inputs.  The prioritization of relationships between neighbouring inputs would reduce the computational cost. This convolution network like structures was, however, built from scratch, by stacking smaller dense layers together. This was to avoid the spatial invariance, whose generalisation in absorption features characteristics would disregard their potential variance across the spectrum.
  [simplify elaboration]


A leaky RELU (Rectified linear unit) was also used in both networks as the activation function. This ensured the required non-linearity for problem-solving without the high computational cost of functions like softmax. It also prevented the exploding gradient in backpropagation that is associated with the conventional RELU. 

In [9]:
def hyperspectral(input_,n_o_input, keep_prob, filter_width = 1, stride_size =1, relu_alpha = 0.2):
    n_o_strides = int((n_o_input-filter_width)/stride_size) +1  #round down
   
    Hyper_layer = []
    
    def dense(input_,start,keep_prob, filter_width, stride_size, relu_alpha):
        nn_input = tf.slice(input_,[0,start],[-1,filter_width])
        
        dropout1 = tf.nn.dropout(nn_input, keep_prob)
        dense1 = tf.layers.dense(dropout1, 1)
        relu1 = tf.maximum(relu_alpha * dense1, dense1)        
        return relu1
    
    for step in range(n_o_strides):
        start = step*stride_size
        output = dense(input_,start,keep_prob, filter_width, stride_size, relu_alpha)
        Hyper_layer.append(output)
    
    if (n_o_input-filter_width)%stride_size>0:
        start = n_o_input-filter_width
        output = dense(input_,start,keep_prob, filter_width, stride_size, relu_alpha)
        Hyper_layer.append(output)
        
    Hyper_l_stacked = tf.concat(Hyper_layer,1)
    
    print("Hyper_l_stacked",Hyper_l_stacked)
    return Hyper_l_stacked , n_o_strides

#### 8. The remaining layers which are common for both the hyperspectral and RGB network
Consisting of 2 layers, the remaining structures for the neural networks of the RGB and hyperspectral data set were similar. Both outputs were then consisted of 8 nodes, for the 8 total sub-classes of dropping and abiotic elements. For the droppings, the classes were segregated by the 5 type of excrement and the animal of origin. For the abiotic element, the classes were instead segregated by 3 material types. [See table ... from the report] For the 2 networks, the sizes of the hidden layer prior to the outputs were then both calculated by multiplying the layer prior with a ⅔ ratio, a structure that is common in neural network development.


In [10]:
def Classifier(input_,n_o_class,n_o_input, keep_prob,relu_alpha = 0.2):
    print("n_o_input",n_o_input)
    if n_o_input == 3:
        is_RGB = True
    elif n_o_input == 571:
        is_RGB = False
    else:
        raise ValueError('A very specific bad thing happened.'+str(n_o_input))
    
    if is_RGB:
        dense0 = tf.layers.dense(input_, 3)    
        relu0 = tf.maximum(relu_alpha * dense0, dense0)
        first_layer_out = tf.nn.dropout(relu0, keep_prob)
    else:
        first_layer_out,n_o_input= hyperspectral(input_,n_o_input, keep_prob, filter_width = 30, stride_size =1, relu_alpha = 0.2)

    hidden_size = n_o_input*2/3
    hidden_nodes = int(hidden_size)+1 # rounding
    print("hidden size:",str(hidden_nodes))
    
    
    dense1 = tf.layers.dense(first_layer_out, hidden_nodes)    
    relu1 = tf.maximum(relu_alpha * dense1, dense1)
    dropout1 = tf.nn.dropout(relu1, keep_prob)
    
    
    class_logits = tf.layers.dense(dropout1, n_o_class)    
    
    return class_logits

## Setting up for training
#### 9. Function to format neural network input

In [11]:
def one_hot_indv(x,n_o_class):    
    output = np.zeros(n_o_class)
    output[x-1]=1
    return output

def one_hot_encode(x,n_o_class):
    output = []
    for y in x:
        output.append(one_hot_indv(y,n_o_class))
        
    return np.array(output)

#### 10. Function to split data into smaller batch for training

In [12]:
def get_batches(x, y,batch_size=10):
    n_batches = len(x)//batch_size
    
    for ii in range(0, n_batches*batch_size, batch_size):
        # If we're not on the last batch, grab data with size batch_size
        if ii != (n_batches-1)*batch_size:
            X, Y = x[ii: ii+batch_size], y[ii: ii+batch_size] 
        # On the last batch, grab the rest of the data
        else:
            X, Y = x[ii:], y[ii:]
        yield X, Y

#### 11. Function to define loss value for training
Inspired by the aggregate loss scoring system from a GAN semi-supervised networ, an aggregated scoring system was structured to calculate the network’s loss value for training. This averaged the loss between the main class classification and the subclass classification; the main classes being manure vs non-manure, and the subclasses being the specific animal class or material type. And this aggregated scoring system aimed to teach the network the relationship between the groups of subclasses. 


In [13]:
def model_loss(input_,target_,m_class, n_class,n_o_input, keep_prob,relu_alpha = 0.2,sub_scaling = 1):
    
    n_o_class = m_class+n_class
    
    #raw output
    logits= Classifier(input_,n_o_class,n_o_input, keep_prob,relu_alpha = 0.2)
    subclass_softmax = tf.nn.softmax(logits)
    
    #Reduce outputs from 8 subclasses to 2 main classes
    n_class_logit, m_class_logit = tf.split(logits, [n_class, m_class], 1)
    m_class_logit1 =tf.reduce_sum(m_class_logit,1, keepdims =True) 
    n_class_logit1 =tf.reduce_sum(n_class_logit,1, keepdims =True) 
    main_class_logits = tf.concat([n_class_logit1, m_class_logit1], 1)
    main_class_softmax = tf.nn.softmax(main_class_logits)
    
    #Reduce labels from 8 subclasses to 2 main classes
    n_class_label, m_class_label = tf.split(target_, [n_class, m_class], 1)
    m_class_label1 =tf.reduce_sum(m_class_label,1, keepdims =True) 
    n_class_label1 =tf.reduce_sum(n_class_label,1, keepdims =True) 
    main_class_labels = tf.concat([n_class_label1, m_class_label1], 1)

    #Aggregated cost
    sub_class_cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=target_))
    main_class_cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=main_class_logits, labels=main_class_labels))
    total_cost = sub_class_cost + sub_scaling*main_class_cost

    optimizer = tf.train.AdamOptimizer().minimize(total_cost)
    
    #Accuracy value
    subclass_accuracy = tf.equal(tf.argmax(logits, 1), tf.argmax(target_, 1))
    subclass_accuracy =  tf.reduce_mean(tf.cast(subclass_accuracy, tf.float32), name='accuracy') #raw score 
    
    main_class_accuracy = tf.equal(tf.argmax(main_class_logits, 1), tf.argmax(main_class_labels, 1))
    main_class_accuracy = tf.reduce_mean(tf.cast(main_class_accuracy, tf.float32), name='accuracy') #raw score 
    
    confidence_sub_class =  tf.reduce_sum(tf.multiply(subclass_softmax,target_),1)
    confidence_main_class =  tf.reduce_sum(tf.multiply(main_class_softmax,main_class_labels),1)
    
    return optimizer,total_cost,subclass_softmax,main_class_softmax,subclass_accuracy,main_class_accuracy,confidence_sub_class,confidence_main_class

## Training the Model
#### 12. Defining the training parameters

In [14]:
tf.reset_default_graph()

n_class = 3
m_class = 5
n_o_class = m_class+n_class
input_ = tf.placeholder(tf.float32,  [None,n_o_input],name = 'x')
target_ = tf.placeholder(tf.float32,[None,n_o_class],name='y')
keep_prob = tf.placeholder(tf.float32,name='keep_prob')

epochs = 120
keep_probability = 0.95
sub_scaling = 1 
optimizer,total_cost,subclass_softmax,main_class_softmax,subclass_accuracy,main_class_accuracy,confidence_sub_class,confidence_main_class =model_loss(input_,target_,m_class, n_class,n_o_input, keep_prob,relu_alpha = 0.2,sub_scaling = sub_scaling) 

n_o_input 3
hidden size: 3


#### 13. Doing the actual training
The neural network was modified based on the classification accuracy for the validation set. These modifications are made to find the "optimal" structure with the best classifcation performance.  
- Prediction accuracy represents that % of readings in each batch whose subclass is correctly classified  
- Prediction confidence represents the cross entropy score. In statistics, a prediction at a 95% confidence would mean that the prediction would be correct for 95% of the time where such confidence value was given.  While the network’s classification confidence doesn’t directly translate to a statistical confidence, the confidence value of a  well trained neural network does exist as a good indication of future prediction accuracy.

In [15]:
print('Checking the Training on a Single Batch...')
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(epochs):
        batch_i = 1
        for batch_features, label_pre_one_hot in get_batches(train_x, train_y,batch_size = 10):
            batch_labels = one_hot_encode(label_pre_one_hot,n_o_class)
            
            optimizer_p= sess.run([optimizer], feed_dict = {input_:batch_features,target_:batch_labels,keep_prob:keep_probability})

            batch_i += 1 
        print('Validation Epoch {:>2} - '.format(epoch + 1), end='')
        
        random_index = random_no_repeat(0,len(val_x)-1,10)
        valid_labels = one_hot_encode(val_y[random_index],n_o_class) 
        subclass_accuracy_p,main_class_accuracy_p,confidence_sub_class_p,confidence_main_class_p= sess.run([subclass_accuracy,main_class_accuracy,confidence_sub_class,confidence_main_class], feed_dict = {input_:val_x[random_index],target_:valid_labels,keep_prob:1})
        print("Main Class Accuracy: {:.5f}, , Main Class Confidence {:.5f}, Subclass Accuracy: {:.5f}, Subclass Confidence: {:.5f}".format(np.mean(main_class_accuracy_p),np.mean(confidence_main_class_p),np.mean(subclass_accuracy_p),np.mean(confidence_sub_class_p)))
    saver = tf.train.Saver()
    save_path = saver.save(sess, save_model_path)

Checking the Training on a Single Batch...
Validation Epoch  1 - Main Class Accuracy: 0.70000, , Main Class Confidence 0.52164, Subclass Accuracy: 0.30000, Subclass Confidence: 0.12754
Validation Epoch  2 - Main Class Accuracy: 0.70000, , Main Class Confidence 0.53589, Subclass Accuracy: 0.20000, Subclass Confidence: 0.12891
Validation Epoch  3 - Main Class Accuracy: 0.50000, , Main Class Confidence 0.50237, Subclass Accuracy: 0.20000, Subclass Confidence: 0.12774
Validation Epoch  4 - Main Class Accuracy: 0.70000, , Main Class Confidence 0.55851, Subclass Accuracy: 0.20000, Subclass Confidence: 0.13057
Validation Epoch  5 - Main Class Accuracy: 0.50000, , Main Class Confidence 0.50836, Subclass Accuracy: 0.00000, Subclass Confidence: 0.12648
Validation Epoch  6 - Main Class Accuracy: 0.70000, , Main Class Confidence 0.56701, Subclass Accuracy: 0.20000, Subclass Confidence: 0.13409
Validation Epoch  7 - Main Class Accuracy: 0.60000, , Main Class Confidence 0.54955, Subclass Accuracy: 0

## Testing
#### 13. Testing the neural network
When the modifications made were no longer improving the validation accuracy, we were satisfied that we had achieved the "optimal" structure. 
The network was then tested with the an unseen set of spectral data to confirm network's performance. 
In the validation phase, modifications were made based on the network's previous performance , where some of the improvements may be specific to the validation data set. 
In the testing phase, the use of unseen data produced accuracy values that are representative of real-world performance. 


In [16]:
keep_probability = 1
print('Testing...')
with tf.Session() as sess:
    loader = tf.train.import_meta_graph(save_model_path + '.meta')
    loader.restore(sess, save_model_path)
    
    for epoch in range(1):
        batch_i = 1
        for batch_features, label_pre_one_hot in get_batches(val_x, val_y,batch_size = 1):# to be changed
            batch_labels = one_hot_encode(label_pre_one_hot,n_o_class)      
            
            subclass_accuracy_p,main_class_accuracy_p,confidence_sub_class_p,confidence_main_class_p= sess.run([subclass_accuracy,main_class_accuracy,confidence_sub_class,confidence_main_class], feed_dict = {input_:batch_features,target_:batch_labels,keep_prob:1})
            print('Sample {}, Class {},'.format(batch_i,label_pre_one_hot[0]), end='')
            print("Main Class Accuracy: {:.5f}, Main Class Confidence: {:.5f}, Subclass Accuracy: {:.5f}, Subclass Confidence: {:.5f}".format(np.mean(main_class_accuracy_p),np.mean(confidence_main_class_p),np.mean(subclass_accuracy_p),np.mean(confidence_sub_class_p)))

            batch_i += 1 

Testing...
INFO:tensorflow:Restoring parameters from ../training_rgb
Sample 1, Class 1,Main Class Accuracy: 1.00000, Main Class Confidence: 0.81529, Subclass Accuracy: 0.00000, Subclass Confidence: 0.10683
Sample 2, Class 3,Main Class Accuracy: 1.00000, Main Class Confidence: 0.88250, Subclass Accuracy: 0.00000, Subclass Confidence: 0.18709
Sample 3, Class 2,Main Class Accuracy: 1.00000, Main Class Confidence: 0.76355, Subclass Accuracy: 1.00000, Subclass Confidence: 0.19517
Sample 4, Class 5,Main Class Accuracy: 0.00000, Main Class Confidence: 0.13865, Subclass Accuracy: 0.00000, Subclass Confidence: 0.14136
Sample 5, Class 6,Main Class Accuracy: 0.00000, Main Class Confidence: 0.24839, Subclass Accuracy: 0.00000, Subclass Confidence: 0.09118
Sample 6, Class 2,Main Class Accuracy: 1.00000, Main Class Confidence: 0.77179, Subclass Accuracy: 1.00000, Subclass Confidence: 0.19469
Sample 7, Class 2,Main Class Accuracy: 1.00000, Main Class Confidence: 0.74078, Subclass Accuracy: 1.00000, S