# Deep Learning, part3: Other important examples

- Generative models: autoencoders and GANS
- Working with tabular data, data integration
- Recurrent NN and attention mechanisms
- Reinforcement learning

### Generative models: autoencoders, VAEs and GANS

- **Generative models.** These are NNs models used for dimensionality reduction or dataset transformations.
- A popular use for a NNs is to take its fitted weights and use them on other datasets. This is called **transfer learning**.
- NNs need to verify information against a set of prior information in order to learn. In that sense, all NNs are supervised learning methods.
- It is possible however to perform unsupervised learning with NNs, and the most popular method is auto-encoders. More precisely though, they are **self-supervised** because they generate their own labels from the training data.


**Autoencoders**

- A dimensionality reduction (or compression) NN algorithm in which the input of a model is the same as the output.
- They compress the input into a lower-dimensional code and then reconstruct the output from this representation.
- 3 components: encoder, code and decoder. The encoder compresses the input and produces the code, the decoder then reconstructs the input only using this code.


In [1]:
from IPython.display import Image
Image(url= "../img/AE.png", width=400, height=400)

Autoencoders properties and usage:
- Data-specific: Only able to meaningfully compress data similar to what they have been trained on. Autoencoders trained on handwritten digits won't compress landscape photos.
- Lossy: The output of the autoencoder will not be exactly the same as the input, it will be a close but degraded representation.
- Data denoising: By learning the relevant features they are able to denoise/normalize a dataset.
- Clustering: Clustering algorithms struggle with large dimensional data, so AE are an important preprocessing step.
- Generative models: Variational Autoencoders (VAE) learn the parameters of the probability distribution modeling the input data. By sampling points from this distribution we can also use the VAE as a generative model.


**Tabular data**
So far we have only used NNs on image and text (by converting them to numbers). Let's see an example of working directly with tabular data, which is more commonly used in 'omics research.

In [2]:
import pathlib
from pathlib import Path
import pandas as pd

data_loc = r'D:\windata\work\biopycourse\data\cll_data'
df = pd.read_csv(pathlib.Path(data_loc) / "cll_mrna.txt", index_col=0, sep ="\t")
df = df.dropna(axis='columns')
print(df.shape)
df.head()

(5000, 136)


Unnamed: 0,mRNA.H045,mRNA.H109,mRNA.H024,mRNA.H056,mRNA.H079,mRNA.H164,mRNA.H059,mRNA.H167,mRNA.H113,mRNA.H049,...,mRNA.H271,mRNA.H006,mRNA.H084,mRNA.H260,mRNA.H192,mRNA.H070,mRNA.H255,mRNA.H135,mRNA.H247,mRNA.H066
ENSG00000244734,4.558644,2.721512,9.938456,13.278004,6.086874,2.571839,4.938961,1.528848,2.286122,2.504699,...,4.199712,8.607476,10.682876,12.365431,6.731859,3.254823,3.269304,1.528848,8.826116,4.06359
ENSG00000158528,11.741854,13.287432,2.341006,3.232874,11.94082,11.506818,5.483675,2.618869,2.812801,2.504699,...,3.743776,3.948041,2.651553,12.776441,3.071359,1.528848,12.427299,1.528848,3.121428,12.465548
ENSG00000198478,8.921456,2.721512,12.381452,8.106266,4.889503,12.756213,3.59389,4.11949,5.220041,2.884897,...,2.226109,5.306285,9.321213,10.534619,6.091324,1.528848,2.907226,1.528848,8.087886,11.948637
ENSG00000175445,12.686458,10.925985,1.528848,1.528848,13.340588,10.885547,11.194029,11.599981,2.286122,2.884897,...,2.226109,9.034459,9.397879,11.78652,1.528848,4.436292,11.425088,1.528848,5.680739,10.767604
ENSG00000174469,2.644946,12.648355,1.528848,13.56521,5.476914,10.975187,7.944246,2.618869,2.286122,12.940957,...,13.723207,10.394117,12.091816,9.442299,4.948473,12.418931,1.528848,12.815852,9.970217,10.721614


In [108]:
X_train = df.T

**Goal**: reduce the dimensionality of this dataset from 5000 to 16, in order to efficiently cluster these samples.

New learnings:
- parametrization
- batch normalization layers
- naming layers

In [116]:
import tensorflow
import numpy as np

from tensorflow.keras.models import Model
from tensorflow.keras.layers import BatchNormalization, Concatenate, Dense, Input, Lambda,Dropout

# Hyperparameters
input_size = X_train.shape[1]
# elu, https://keras.io/activations/, maybe deals better with vanishing gradient
#act = "elu"
act = "relu"
# the intermediate dense layers size
ds = 128
# latent space dimension size
ls = 16
# dropout rate [0 1]
dropout = 0.2
# ensure reproducibility
np.random.seed(42)
tf.random.set_seed(42)


In [117]:
# Define the encoder
inputs_layer = Input(shape=(input_size,), name='input')
x = Dense(ds, activation=act)(inputs_layer)
x = BatchNormalization()(x)
coded_layer = Dense(ls, name='coded_layer')(x)

encoder = Model(inputs_layer, coded_layer, name='encoder')
encoder.summary()

Model: "encoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 5000)]            0         
_________________________________________________________________
dense_45 (Dense)             (None, 128)               640128    
_________________________________________________________________
batch_normalization_29 (Batc (None, 128)               512       
_________________________________________________________________
coded_layer (Dense)          (None, 16)                2064      
Total params: 642,704
Trainable params: 642,448
Non-trainable params: 256
_________________________________________________________________


In [118]:
# Define the decoder
decoder_inputs_layer = Input(shape=(ls,), name='latent_inputs')
x = decoder_inputs_layer
x = Dense(ds, activation=act)(x)
x = BatchNormalization()(x)
x = Dropout(dropout)(x)
output_layer = Dense(input_size)(x)

decoder = Model(decoder_inputs_layer, output_layer, name='decoder')
decoder.summary()

Model: "decoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
latent_inputs (InputLayer)   [(None, 16)]              0         
_________________________________________________________________
dense_46 (Dense)             (None, 128)               2176      
_________________________________________________________________
batch_normalization_30 (Batc (None, 128)               512       
_________________________________________________________________
dropout_13 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_47 (Dense)             (None, 5000)              645000    
Total params: 647,688
Trainable params: 647,432
Non-trainable params: 256
_________________________________________________________________


In [119]:
# Define the autoencoder
outputs = decoder(encoder(inputs_layer))
autoencoder = Model(inputs_layer, outputs, name='autoencoder')
autoencoder.summary()

Model: "autoencoder"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           [(None, 5000)]            0         
_________________________________________________________________
encoder (Functional)         (None, 16)                642704    
_________________________________________________________________
decoder (Functional)         (None, 5000)              647688    
Total params: 1,290,392
Trainable params: 1,289,880
Non-trainable params: 512
_________________________________________________________________


In [None]:
# compile and run

from tensorflow.keras.optimizers import Adam
from tensorflow.keras import optimizers

adam = optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.001, amsgrad=False)
autoencoder.compile(loss='mse', optimizer=adam, metrics=['accuracy'])
#history = autoencoder.fit(X_train, X_train, epochs=5, batch_size=32, shuffle=True, validation_data=(X_test, X_test))
history = autoencoder.fit(X_train, X_train, epochs=200, batch_size=64, shuffle=True)
autoencoder.save('cnn.h5')

In [123]:
encoded_X_train = encoder.predict(X_train)

In [124]:
encoded_X_train.shape

(136, 16)

Task:
- Run KMeans both before and after dimensionality reduction, and plot their silhouette scores. Is there an improvement?
- (advanced) Expand the AE above into a VAE, and repeat clustering assesment.
- Using the above hyperparameters try to improve the model fit and re-asses clustering performance.
- (really advanced) Search for a VAE-GANS implementation and re run.

## What are GANS?

-  “the most interesting idea in the last 10 years in Machine Learning” (Ian LeCun)
- Generator model: the goal of the generator is to fool the discriminator, so the generative neural network is trained to maximise the final classification error (between true and generated data)
- Discriminator model: the goal of the discriminator is to detect fake generated data, so the discriminative neural network is trained to minimise the final classification error

Example for MNIST:
- https://machinelearningmastery.com/how-to-develop-a-generative-adversarial-network-for-an-mnist-handwritten-digits-from-scratch-in-keras/

### Recurrent Neural Networks

These networks process (loop) the information several times through every node. Such networks are mainly applied with the purpose of classifying sequential input and rely on backpropagation of error to do so. When the information passes a single time, the network is called feed-forward. Recurrent networks, on the other hand, take as their input not just the current input example they see, but also what they have perceived previously in time. Thus a RNN uses the concept of time and memory.

One could, for example, define the activation function on a hidden state in this manner, by a method called backpropagation through time:
output_t = relu(dot(W, input) + dot(U, output.t-1))

A traditional deep neural network uses different parameters at each layer, while a RNN shares the same parameters across all steps. The output of each time step doesn't need to be kept (not necessarily). We not care for example while doing sentiment analysis about the output after every word.

Features:
- they can be bi-directional
- they can be deep (multiple layers per time step)
- RNNs can be combined with CNNs to solve complex problems, from speech or image recognition to machine translation.

In [None]:
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Embedding
from keras.layers import LSTM
from keras.datasets import imdb

max_features = 20000
maxlen = 80  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

print('Build model...')
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# try using different optimizers and different optimizer configs
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=15,
          validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)

In [None]:
conda install numpy">=1.19.1"

In [129]:
import numpy
print(numpy.__version__)

1.20.1
