Reference: Deep Learning with Python, 1st ed, by F. Challot 

In [6]:
import numpy as np

from keras import models
from keras import layers
from keras import optimizers
from keras import Input, layers  # for 7.1.1
from keras.models import Sequential, Model  # for 7.1.1


**Chapter 7 cover**:
* The Keras functional API
* Using Keras callbacks
* Working with the TensorBoard visualization tool

**Functional API** 

The Sequential model makes the assumption that the network has exactly one input and exactly one output, and that it consists of a linear stack of layers. But this assumption can be quite in-flexible in a number of cases. Scenarios where sequential models are not enough:
1) multi-input models (Eg. multi-modal input data)
2) multi-output models 
3) neural networks with non-linear network topologies (Eg. Residual connections)

Functional API is more general and flexible way of working with different models

![Getting Started](resid.PNG)

7.1.1 : Intro to functional API

In the functional API, you directly manipulate tensors, and you use layers as functions that take tensors and return tensors (hence, the name functional API).

In [17]:
# sequential model: .add adds layers to the current model of type "sequential"
# each feature vector is 1000 dimensional but number of such feature vectors is left arbitrary

seq_model = models.Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

seq_model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_15 (Dense)            (None, 32)                2080      
                                                                 
 dense_16 (Dense)            (None, 32)                1056      
                                                                 
 dense_17 (Dense)            (None, 10)                330       
                                                                 
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


Instantiating a "Model" object:
* The Model class turns an input tensor and output tensor into a model.
* Keras retrieves every layer involved in going from input_tensor to output_tensor, bringing them together into a graph-like data structure—a Model. Wouldn't work for unrelated input output tensor pairs.

In [16]:
# corresponding functional API code
# model inputs and output tensors of all layers

input_tensor = Input(shape=(64,)) 
x = layers.Dense(32, activation='relu')(input_tensor)                  
x = layers.Dense(32, activation='relu')(x)                             
output_tensor = layers.Dense(10, activation='softmax')(x)      

model = Model(input_tensor, output_tensor)   

model.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 64)]              0         
                                                                 
 dense_12 (Dense)            (None, 32)                2080      
                                                                 
 dense_13 (Dense)            (None, 32)                1056      
                                                                 
 dense_14 (Dense)            (None, 10)                330       
                                                                 
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


Compiling, training, or evaluating such an instance of Model, the API is the same as that of Sequential.

In [18]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy') # compiles the model

# generate x_train, y_train data
x_train = np.random.random((1000, 64)) # 1000 feature vectors of length 64 (one row = one feature vector)
y_train = np.random.random((1000, 10)) # 1000 output vectors of length 10 (each row doesn't add up to 1)

#  Trains the model for 10 epochs using SGD with minibatches of size 128
model.fit(x_train, y_train, epochs=10, batch_size=128)    

# Evaluates the model
score = model.evaluate(x_train, y_train)   

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [19]:
score

75.34786224365234

7.1.2: Multi-input models

* The functional API can be used to build models that have multiple inputs.
* Typically, such models at some point merge their different input branches using a layer that can combine several tensors: by adding (keras.layers.add) them , concatenating (keras.layers.concatenate) them, and so on.

We try to build following question and answering model using functional API. It has two inputs - question (text) and news article text info (used to answer question).


![Getting Started](qand1.PNG)

## Vectorizing text
Vectorizing text is the process of transforming text into numeric tensors. This can be done in multiple ways:
- Segment text into words, and transform each word into a vector.
- Segment text into characters, and transform each character into a vector.
- Extract n-grams of words or characters, and transform each n-gram into a vector. N-grams are overlapping groups of multiple consecutive words or characters.

The units to which text is broken down into are called **tokens** and breaking texts to tokens is called **tokenization**. One such tokenization is one-hot encoding.


Problem: One-hot encoding vectors aresparce and very high-dimensional. Therefore, we embed word vectors into low-dimensional spaces.

These word-representations are called **word-embeddings**. (Eg. GloVe) There are two ways to obtain word-embeddings:
- Use pre-trained word-embeddings
- Learn word-embeddings from data using an Embedding layer

## Embedding layer vs Dense layer
Reference: https://stackoverflow.com/questions/47868265/what-is-the-difference-between-an-embedding-layer-and-a-dense-layer

Summary: Both do the same multiplication. However, embedding layer works in such a way that the computation is faster.

Details about the code below:

* In the Input() function:  "shape = (32,)" indicates that the expected input will be batches of 32-dimensional vectors. "shape=(None,)" represents dimensions where the shape is not known

In [23]:
# dictionary lengths for reference text, question and final answer (i.e. output)
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# text_input represents a tensor-like object that deals with batches of feature vectors of unknown size. It will be of data type int 
# layer named "text" 
text_input = Input(shape=(None,), dtype='int32', name='text') 

# takes in "text_input" and outputs a word embedding of size 64
embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)  

# pass the word-embeddings into an LSTM
encoded_text = layers.LSTM(32)(embedded_text)

question_input = Input(shape=(None,), dtype='int32', name='question')  
embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)


concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1) 

answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)  


model_QA = Model([text_input, question_input], answer)  


model_QA.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['acc'])


model_QA.summary()

Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 text (InputLayer)              [(None, None)]       0           []                               
                                                                                                  
 question (InputLayer)          [(None, None)]       0           []                               
                                                                                                  
 embedding_4 (Embedding)        (None, None, 10000)  640000      ['text[0][0]']                   
                                                                                                  
 embedding_5 (Embedding)        (None, None, 10000)  320000      ['question[0][0]']               
                                                                                            

## Conv1D , Conv2D and Conv3D in keras
References:
1) https://stackoverflow.com/questions/42883547/intuitive-understanding-of-1d-2d-and-3d-convolutions-in-convolutional-neural-n?noredirect=1&lq=1
2) https://datascience.stackexchange.com/questions/51470/what-are-the-differences-between-convolutional1d-convolutional2d-and-convoluti


(batch_size, input_dim, channels):
input_dim can be 1D, 2D or 3D . Accordingly use Conv1D, conv2D, conv3D respectively.

Examples:
- 1D input (used for voice signals): 1 second stereo voice signal sampled at 44100 Hz. Here, input_dim = 44100 (44100 points sampled) and channels = 2 (amplitude and frequency stored for each sample in 1D array)
- 2D input (used for images): 32x32 RGB image. Here, input_dim = 32x32 (number of points/positions/pixels) and channels = 3 (RBG intensities calculated for each pixel in 2D array) 
- 3D input (used for videos): 1 second video of 32x32 RGB images at 24 frames per second. So input_dim = 32x32x3 and channels = 24 (24 measurements for each pixel in the 3D array)

**Channels**:
In case of 1D, 2D, 3D inputs te kernels are of 1D (vector), 2D (matrix) and 3D (array) resp.
