## Functional API
- enables building NNs with complex topologies, or with multiple inputs or outputs
- non-sequential NNs

### Non-sequenatial NNs
- Wide & Deep NN
    - connects part or all of the inputs directly to the output layer
    - This architecture allows the network to learn deep pattern (w/ deep path) and simple rules (short path of input layer to output layer)
    - In contrast: regular MLP forces all data to go through deep path (full stack of layers) which distorts simple patterns because of the sequence of transformations

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split

housing = fetch_california_housing()

X_train_full, X_test, y_test_full, y_test = train_test_split(housing.data, housing.target)
X_train, X_val, y_train, y_val = train_test_split(X_train_full, y_test_full)

In [2]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

In [6]:
from tensorflow.keras.layers import Input, Dense, Concatenate, concatenate
from tensorflow.keras import Model

input_ = Input(shape=X_train.shape[1:]) # input object. Models may have multiple inputs
# dense hidden layer w/ 30 neurons. called like a function to pass in input_. Telling keras how to connect layers together. Hence "Functional API"
hidden1 = Dense(30, activation="relu")(input_) 
hidden2 = Dense(30, activation="relu")(hidden1)
concat = Concatenate()([input_, hidden2]) # concatenate the input layer and hidden layer before the output layer
output = Dense(1)(concat) # output layer with single layer and no activation function for regression
model = Model(inputs=[input_], outputs=[output]) # instantiate the model w/ specifications for which inputs and outputs to use

In [4]:
from tensorflow.keras.optimizers import SGD

model.compile(loss="mean_squared_error", optimizer=SGD(lr=1e-3), metrics=["accuracy"])

history=model.fit(X_train, y_train, epochs=30, validation_data=(X_val, y_val))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


What if I want to send a subset of features through the wide path and a different subset through the deep path?
- Solution: use multiple inputs (e.g. send 5 inputs through wide path, and 6 features trough deep path

In [15]:
input_A = Input(shape=[5], name="wide_input") # 5 features through wide path
input_B = Input(shape=[6], name="deep_input") # another 6 features through deep path
hidden1 = Dense(30, activation="relu")(input_B)
hidden2 = Dense(30, activation="relu")(hidden1)
concat = concatenate([input_A, hidden2])
output = Dense(1, name="main_output")(concat)
model = Model(inputs=[input_A, input_B],
                           outputs=[output])

In [16]:
model.compile(loss="mse", optimizer=SGD(lr=1e-3))

X_train_A, X_train_B = X_train[:, :5], X_train[:, 2:]
X_val_A, X_val_B = X_val[:, :5], X_val[:, 2:]
X_test_A, X_test_B = X_test[:, :5], X_test[:, 2:]
X_new_A, X_new_B = X_test_A[:3], X_test_B[:3]

history = model.fit((X_train_A, X_train_B), y_train, epochs=20,
                    validation_data=((X_val_A, X_val_B), y_val))
mse_test = model.evaluate((X_test_A, X_test_B), y_test)
y_pred = model.predict((X_new_A, X_new_B))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


### scenarios when the model can have multiple outputs:
- Scenarios like locating and identifying an object in a photo, this is both regression and classification
- Having multiple independent tasks on the same data. e.g. one task if classiying the person's facial expression, and another task identifying if they're wearing gasses or not
- Regularization technique (i.e. a training constraint whose objective is to reduce overfitting and thus improve the model's ability to generalize). Example is adding auxiliary inputs to ensure that the underlying pat of the netowrk learns something useful on its own, without relying on the rest of the network

In [18]:
input_A = Input(shape=[5], name="wide_input")
input_B = Input(shape=[6], name="deep_input")
hidden1 = Dense(30, activation="relu")(input_B)
hidden2 = Dense(30, activation="relu")(hidden1)
concat = concatenate([input_A, hidden2])
output = Dense(1, name="main_output")(concat)
aux_output = Dense(1, name="aux_output")(hidden2)
model = Model(inputs=[input_A, input_B],
                           outputs=[output, aux_output])

In [19]:
model.compile(loss=["mse", "mse"], loss_weights=[0.9, 0.1], optimizer=SGD(lr=1e-3)) # one loss for each output (main_output and aux_output)
# higher loss weight for the main_output because it is the output I care about the most. Aux output is only for regularization

In [22]:
history = model.fit([X_train_A, X_train_B], [y_train, y_train], epochs=20,
                    validation_data=([X_val_A, X_val_B], [y_val, y_val]))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [23]:
total_loss, main_loss, aux_loss = model.evaluate([X_test_A, X_test_B], [y_test, y_test])



In [24]:
y_pred_main, y_pred_aux = model.predict([X_test_A, X_test_B])