# Keras Functional API

For single-input single-output models, the Sequentual Keras API will suffice. But for more complicated models, where there can be many inputs, outputs. For example you may need to predict the price of a product from its image, description, and metadata. Or you may need to get the genre and date from a movie synopsis. You may even have internally complex models with nonlinear, like U-Nets or Inception modules. All these need a different API. 

<img src="img/inception.png", width="300"/>
Inception Module 
<img src="img/unet.png", width="500"/>
UNet for Semantic Segmentation

The fuctional API allows you to use Keras layers like functions, and pass in other layers as input. You can design one quite simply: 

```python 
x = layers.Dense(input = (var)) #define a layer  
x2 = layers.Dense(32)(x) #new func, feed in the last layer
x3 = layers.Dense(10)(x2) #new func, feed in the last layer
```

The following is an example of a 3 output model for a CNN: 

```python 
text_inputs = Input(shape=(None, ), name='text') #input 
embedding = layers.Embedding(256, vocab_size)(text_inputs) #embedding 
x = layers.LSTM(256)(embedding) #network - recursive 
x = layers.LSTM(256)(x)
x = layers.LSTM(256)(x)
x = layers.LSTM(256)(x)
x = layers.LSTM(256)(x)
x = layers.Dense(128, activation='relu')(x)

#create predictions 
age_prediction = layers.Dense(1, activation = None, name = 'age')(x)
income_prediction = layers.Dense(1, activation = 'softmax', name = 'income')(x)
gender_prediction = layers.Dense(1, activation = 'sigmoid', name = 'gender')(x)

#build, declare IO
model = Model(inputs = text_inputs, 
             outputs = [age_prediction, income_prediction, gender_prediction]) 
```

But this alone will cause an issue: the losses are imbalanced. There is a regression (age), and both binary and multiclass classifications. Each requres a different loss functions, and those loss functions will have different scales. Since gradient descent forces you to minimize a scalar, you must created a [weighted] average of the losses. 

There are two ways to do this. As a list, or dict (names required).

```python 
model.compile(optmizer='rmsprop', 
             loss = ['mse', 'categorical_crossentropy', 'binary_crossentropy'], 
             loss_weights = [0.25, 1., 10.])
```
 
 OR 
 
```python 
model.compile(optmizer='rmsprop', 
             loss = {'age' : 'mse'
                     'income' : 'categorical_crossentropy',
                     'gender' : 'binary_crossentropy'}, 
             loss_weights = {'age' : 0.25
                             'income' : 1.,
                             'gender' : 10.})
```

The loss balance is another parameter to tune. 

## Residual Connections 

For CNNs, residuals connections are an importnat area for layer reuse. They prevent a common issue with deep networks. When network depth increases, accuracy gets saturated and then degrades rapidly. Unexpectedly, such degradation is not caused by overfitting, and adding more layers to a suitably deep model leads to higher training error (vanishing gradients - loss of gradient signal in a deep network makes it untrainable). 

Residual Connections consits of feeding the output of an old layer available as input to a later layer, creating a shortcut in a squential network. The old output is then summed with this output. This is sort of like an LSTM's layer reuse, except that they are purely linear. 

Residual connections are good for networks with larger than 10 layers. 

```python 
x = ... #4d input tensor 
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.add([x, y]) #sum both layers 
```

## Layer Sharing 

Layer sharing is another good way to save computation for shared processing. An example is when you need to compare two sentences for similarity. The LSTM could learn single representation for a sentence (siamese LSTM). 

```python 
lstm = layers.LSTM(32) #define one LSTM 

left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, 'sigmoid')(merged)

model = Model([left_input,right_input], predictions)
model.fit([left_data, right_data], targets)
```

Here is a siamese CNN example. This can be used for sharing the convolutional base (learning image representations) between two closely placed cameras that together, sense depth. 

```python 
xception_base = applications.Xception(weights=None, include_top = False) #conv base 

left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))

left_features = xception_base(left_input)
right_features = xception_base(right_input)

merged = layers.concatenate([left_features, right_features], axis=-1)
```