<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#useful-definitions" data-toc-modified-id="useful-definitions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>useful definitions</a></span></li><li><span><a href="#setting-the-DNN-stage" data-toc-modified-id="setting-the-DNN-stage-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>setting the DNN stage</a></span></li></ul></div>

# useful definitions 

[Reference](https://lilianweng.github.io/lil-log/2017/09/28/anatomize-deep-learning-with-information-theory.html)


Kullback - Leibler distance may be expressed by the formula

$D_{KL} [p(X)||p(Y)] = \sum p(x) Log\left( \frac{p(X)}{p(Y)} \right) $

Mutual Information is a measure for how much one may learn about some random variable X given the knowledge of some other variable Y; Mutual information is defined as

$ I(X,Y) = D_{KL} [p(X,Y)||p(X)p(Y)] $

$I(X,Y) =   \sum p(X,Y) Log \left(\frac{p(X,Y)}{p(X)p(Y)}\right) = \sum p(X,Y) Log \left(\frac{p(X|Y)}{p(X)}\right) $

$I(X,Y) = H(X) - H(X|Y) $


Generalized KL distance for Tsallis Statistics can be written as

$D_{KL_q} [p(x)||p(y)] = \sum p(x)  \frac{\left(\frac{p(x)}{p(y)}\right)^{q-1}  -1}{q-1} $


Then, Generalized Mutual Information in Tsallis Statistcs becomes

$ I_q(X,Y) = D_{KL_q} [p(X,Y)||p(X) p(Y)] \\
D_{KL_q} [p(X,Y)||p(X) p(Y)] = \sum p(X,Y)  \frac{\left(\frac{p(X,Y)}{p(X)p(Y)}\right)^{q-1}  -1}{q-1} $



# setting the DNN stage

Multidimensional Variables $X$ and $Y$ shall be used to designate respectively the dataset and its labels.

Following Tishby's approach we shall treat each layer of the dense network as a single random variable $T_i$, since in dense networks each neuron is connected to the whole set of inputs as well as the outputs.

We are interested in understanding the evolution of layer weigths in a simple Dense Neural Network using the Information Plane, where we shall plot $I(X,T) \, x \, I(Y,T)$ then compare to the results of plotting $I_q(X,T) \, x \, I_q(Y,T) $ for a representative set of $q$ values 

In [21]:
import numpy as np
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt

import datetime

import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='3'

from keras.datasets import mnist
from keras.datasets import reuters
from keras.datasets import imdb
from keras.models import Sequential, load_model
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

%matplotlib inline

In [2]:
(X_train, y_train), (X_test, y_test) = mnist.load_data()

ntrain = 60000
ntest = 10000

#reducing dataset
X_train = X_train[0:ntrain]
y_train = y_train[0:ntrain]

X_train = X_train.reshape(ntrain, 784)
X_test = X_test.reshape(ntest, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# normalizing 
X_train /= 255
X_test /= 255

#one-hot encoding
n_classes = 10
Y_train = np_utils.to_categorical(y_train, n_classes)
Y_test = np_utils.to_categorical(y_test, n_classes)

#model
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))                            

model.add(Dense(10))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')



W0112 10:28:12.275921 139835290920768 deprecation_wrapper.py:119] From /home/nahum/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0112 10:28:12.388080 139835290920768 deprecation_wrapper.py:119] From /home/nahum/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0112 10:28:12.452216 139835290920768 deprecation_wrapper.py:119] From /home/nahum/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0112 10:28:12.521543 139835290920768 deprecation_wrapper.py:119] From /home/nahum/anaconda3/lib/python3.6/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0112 10:28:

In [51]:
def Write_Files(model, dirname, model_filename):
        """Save model on a directory.
        
        Parameters
        ----------------------------------------
        model: Keras model
        dirname: Name of the directory that you
                 want to put your files in.
        model_filename: Name of the file.
        
        """
        if not dirname in os.listdir(os.getcwd()):
            try:
                os.makedirs(dirname) #create your directory
            except OSError as exc: 
                if exc.errno != errno.EEXIST:
                    raise "ERROR!"
        try:
            path = os.getcwd() + '/data'        
            #os.chdir(path)
            model_path = os.path.join(path, model_filename)
            model.save(model_path)
            #os.chdir('..')
        
        except:
            raise "ERROR!"

In [56]:
epocas = 20;
caminho = os.getcwd()
save_dir = caminho 
for i in range(0,epocas):
    model.fit(X_train, Y_train,
          batch_size=128, epochs=1,
          verbose=2,
          validation_data=(X_test, Y_test))
    model_filename = 'keras_mnistepoca' + str(i) + '.h5'
    Write_Files(model, 'data', model_filename)

Defining the joint proability $P(X,Y)$

In [57]:
def PX(nbins):

    hists = []
    breaks = []

    for i in range(0,len(X_train[1])): 
        hist = np.histogram(np.transpose(X_train)[i],bins = nbins);
        hists.append(hist[0])
        breaks.append(hist[1])
    
    return hists , breaks  

#calculando a probabilidade conjunta

def PXY(nbins):

    Px, binsX =  PX(30)
    
    histy = np.histogram(y_train, bins=10)[0]
    Py = histy[0]
    binsY = histy[1]
    
    for i in range(0,10):
        indices = np.where(y_train==i)
        
        pxdadoy =[]
        breaks = []
        
        hist = np.histogram(np.transpose(X_train[indices])[i],bins = nbins);
        pxdadoy.append(hist[0])
        breaks.append(hist[1])

        PXY.append(pxdadoy*Py[i])
            
    return PXY

We use the Markov chain

$Y \rightarrow X \rightarrow T_i$

to get

$P(T_i,X) = P(T_i|X) P(X)$

and

$P(T_i,Y) = \sum P(X,Y) P(T_i | X) $

to get outputs of activation functions and get $T_i$

In [59]:
from keras import backend as K

inp = model.input                                           # input placeholder
outputs = [layer.output for layer in model.layers]          # all layer outputs
functors = [K.function([inp, K.learning_phase()], [out]) for out in outputs]    # evaluation functions

# Testing
test = np.random.random(input_shape)[np.newaxis,...]
layer_outs = [func([test, 1.]) for func in functors]
print(layer_outs)

NameError: name 'input_shape' is not defined