# Autoencoder for network data

Autoencoder is a non-linear dimensionality reduction tool (deep representation learning). It is an artificial neural network, or actually two neural networks being trained simultaneously:  
- first half of the layers or first network encodes the input high-dimensional data into lower-dimensional representation space, gradually reducing dimensionality from layer to layer;  
- second half transforms it back from the low-dimensional encoded representation feature space to the original dimensionality.

The architectures of the two parts are usually symmetric as could be seen on the figure below

<img src="autoencoder.png">

Two useful blogs for getting started:  

https://ramhiser.com/post/2018-05-14-autoencoders-with-keras/

https://www.datacamp.com/community/tutorials/autoencoder-keras-tutorial  

https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

(credits for the figure to the third one)

In [1]:
import numpy as np
import keras
from keras.datasets import mnist
from keras.models import Model, Sequential
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D, Flatten, Reshape
from keras import regularizers
import pickle

Using TensorFlow backend.


**The purpose of the below is to read a temporal network (each row is a time-slice, columns are edge weights) and run autoencoder to convert it to a smaller space of representative feature timelines**

In [2]:
def save(filename,X):
    outfile = open(filename,'wb')
    pickle.dump(X,outfile)
    outfile.close()

In [3]:
def load(filename):
    infile = open(filename,'rb')
    X=pickle.load(infile)
    infile.close()
    return X

In [4]:
X=load('Taipeiexchange1.pkl')

In [5]:
x_test=X[400:,0:100]; x_train=X[0:400,0:100]; #split the data into test and train

In [32]:
#design autoencoder
input_dim = x_train.shape[1]
#encoding_dim = [50,15,5,2] #sizes of the layers
#encoding_dim = [30,10,3] #sizes of the layers
encoding_dim = [80,60,40,25,15] #sizes of the layers
#encoding_dim = [50,25,12] #sizes of the layers

compression_factor = float(input_dim) / encoding_dim[-1]
print("Compression factor: %s" % compression_factor)

#define autoencoder architecture
autoencoder = Sequential()
autoencoder.add(
    Dense(encoding_dim[0], input_shape=(input_dim,), activation='relu')
)

for l in range(1,len(encoding_dim)):
    autoencoder.add(
         Dense(encoding_dim[l], input_shape=(encoding_dim[l-1],), activation='relu')
    )
    
for l in range(len(encoding_dim)-1,0,-1):
    autoencoder.add(
         Dense(encoding_dim[l-1], input_shape=(encoding_dim[l],), activation='relu')
     )

autoencoder.add(
    Dense(input_dim, input_shape=(encoding_dim[0],), activation='sigmoid')
)

autoencoder.summary()

Compression factor: 6.66666666667
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_31 (Dense)             (None, 80)                8080      
_________________________________________________________________
dense_32 (Dense)             (None, 60)                4860      
_________________________________________________________________
dense_33 (Dense)             (None, 40)                2440      
_________________________________________________________________
dense_34 (Dense)             (None, 25)                1025      
_________________________________________________________________
dense_35 (Dense)             (None, 15)                390       
_________________________________________________________________
dense_36 (Dense)             (None, 25)                400       
_________________________________________________________________
dense_37 (Dense)             (None, 40)   

In [33]:
np.random.seed(0)

In [34]:
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

In [35]:
autoencoder.fit(x_train, x_train,
                epochs=400,
                batch_size=50,
                shuffle=True,
                validation_data=(x_test, x_test),verbose=1)

Train on 400 samples, validate on 269 samples
Epoch 1/400
Epoch 2/400
Epoch 3/400
Epoch 4/400
Epoch 5/400
Epoch 6/400
Epoch 7/400
Epoch 8/400
Epoch 9/400
Epoch 10/400
Epoch 11/400
Epoch 12/400
Epoch 13/400
Epoch 14/400
Epoch 15/400
Epoch 16/400
Epoch 17/400
Epoch 18/400
Epoch 19/400
Epoch 20/400
Epoch 21/400
Epoch 22/400
Epoch 23/400
Epoch 24/400
Epoch 25/400
Epoch 26/400
Epoch 27/400
Epoch 28/400
Epoch 29/400
Epoch 30/400
Epoch 31/400
Epoch 32/400
Epoch 33/400
Epoch 34/400
Epoch 35/400
Epoch 36/400
Epoch 37/400
Epoch 38/400
Epoch 39/400
Epoch 40/400
Epoch 41/400
Epoch 42/400
Epoch 43/400
Epoch 44/400
Epoch 45/400
Epoch 46/400
Epoch 47/400
Epoch 48/400
Epoch 49/400
Epoch 50/400
Epoch 51/400
Epoch 52/400
Epoch 53/400
Epoch 54/400
Epoch 55/400
Epoch 56/400
Epoch 57/400
Epoch 58/400
Epoch 59/400
Epoch 60/400
Epoch 61/400
Epoch 62/400
Epoch 63/400
Epoch 64/400
Epoch 65/400
Epoch 66/400
Epoch 67/400
Epoch 68/400
Epoch 69/400
Epoch 70/400
Epoch 71/400
Epoch 72/400
Epoch 73/400
Epoch 74/400
E

<keras.callbacks.History at 0xb38280750>

In [36]:
#extract encoder layers from the networks
def encoderLayers(input_):
    output_=input_
    for l in range(len(encoding_dim)):
        output_=autoencoder.layers[l](output_)
    return output_

input_layer = Input(shape=(input_dim,))
#encoder_layer = autoencoder.layers[0]
encoder = Model(input_layer, (encoderLayers(input_layer)))

encoder.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 100)               0         
_________________________________________________________________
dense_31 (Dense)             (None, 80)                8080      
_________________________________________________________________
dense_32 (Dense)             (None, 60)                4860      
_________________________________________________________________
dense_33 (Dense)             (None, 40)                2440      
_________________________________________________________________
dense_34 (Dense)             (None, 25)                1025      
_________________________________________________________________
dense_35 (Dense)             (None, 15)                390       
Total params: 16,795
Trainable params: 16,795
Non-trainable params: 0
_________________________________________________________________


In [37]:
#encode the data
encoded_x = encoder.predict(x_test)
decoded_x = autoencoder.predict(x_test)

In [38]:
y=encoder.predict(np.concatenate((x_train,x_test)))

In [39]:
y[:10]

array([[ 1.3363421 , -0.        ,  0.78147084, -0.        ,  5.585493  ,
         1.2504551 , -0.        , -0.        ,  0.01923567, -0.        ,
         1.7564486 ,  4.062417  ,  3.8543825 ,  2.4002512 ,  1.2071981 ],
       [ 0.7709331 , -0.        ,  2.8277085 , -0.        ,  3.6389208 ,
         1.0418524 , -0.        , -0.        , -0.        , -0.        ,
         0.8897324 ,  4.085449  ,  4.341785  ,  2.5373921 ,  2.4504175 ],
       [-0.        , -0.        ,  4.1625543 , -0.        ,  0.4345499 ,
         0.46726698, -0.        , -0.        , -0.        , -0.        ,
         1.0089458 ,  3.8056495 ,  1.9283712 ,  5.624664  ,  2.9422386 ],
       [-0.        , -0.        ,  4.468466  , -0.        ,  0.5748289 ,
         0.2916113 , -0.        , -0.        , -0.        , -0.        ,
         1.2474947 ,  3.8932695 ,  1.8872777 ,  6.0076885 ,  3.491469  ],
       [-0.        , -0.        ,  4.6276383 , -0.        ,  0.8261027 ,
         0.589389  , -0.        , -0.        , 

In [40]:
#save the outcome
save('Tapeiexchange2.pkl',y)