# Machine learning and neural network

Please install the following packages in your virtaul environment by running the cells:

In [None]:
!pip install pydot
!pip install graphviz

In [None]:
!pip install tensorflow

## Example: Image Classifier
Images are represented as 3D arrays of numbers, withintegers between [0, 255]. E.g. 300 x 100 x 3

In [None]:
from IPython.display import clear_output, Image, HTML, display

Image(url= "https://i.imgur.com/BAtqITz.jpg", width=500) # from Stanford

In [None]:
def pred():
    #
    #
    #
    return label

Since this task of recognizing a visual concept (e.g. cat) is relatively trivial for a human to perform, it is worth considering the challenges involved from the perspective of a Computer Vision algorithm. As we present (an inexhaustive) list of challenges below, keep in mind the raw representation of images as a 3-D array of brightness values:

* **Viewpoint variation**. A single instance of an object can be oriented in many ways with respect to the camera.
Scale variation. Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).
* **Deformation.** Many objects of interest are not rigid bodies and can be deformed in extreme ways.
* **Occlusion.** The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
* **Illumination conditions.** The effects of illumination are drastic on the pixel level.
* **Background clutter.** The objects of interest may blend into their environment, making them hard to identify.
* **Intra-class variation.** The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.

In [None]:
Image(url= "https://i.imgur.com/j5jPEv0.jpg", width=800) # from Stanford

The solution is the use of data driving methods. 

## What is Tensorflow?

In [None]:
Image(url= "https://i.imgur.com/4nk5b4c.jpg", width=700) # from Google

TensorFlow is the flow of tensors in a computational graph.
* Library for defining computation graphs
* Calculating gradients

### Tensors
TensorFlow does have its own data structure for the purpose of performance and ease of use. Tensor is the data structure used in Tensorflow. You can think of a TensorFlow tensor as an n-dimensional array or list.

**Example:**

Tensors have a Shape that’s described with a vectorn $[ 10000, 256, 256, 3 ]$
* 10000 Images
* Each Image has 256 Rows
* Each Row has 256 Pixels
* Each Pixel has 3 channels (RGB)

### Computaional Graph

The biggest idea about Tensorflow is that all the numerical computations are expressed as a computational graph. In other words, the backbone of any Tensorflow program is a Graph. Anything that happens in your model is represented by the computational graph.  

Computational graphs are an abstract way of describing computations as directed graph: 
* The edges correspond to multidimensional arrays (Tensors). 
* The nodes create or manipulate these Tensors according to specific rules (Operations Ops)
  * Operations on tensors (like math operations) 
  * Generating tensors (like variables and constants). 
  


In [None]:
Image(url= "https://i.imgur.com/2Ys4yTu.jpg", width=650)

To implement this model we will use `Keras`. The `Sequential` in `Keras` model is a linear stack of layers.

You can create a Sequential model by passing a list of layer instances to the constructor:

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf


from tensorflow import keras
from tensorflow.keras import layers

from sklearn import datasets
print(tf.__version__)

In [None]:
np.random.seed(0)
tf.random.set_seed(0)

## Keras models
The model object is how you tell Keras where the model starts and stops: where data comes in and where predictions come out. Build the tf.keras.Sequential model by stacking layers. 

In [None]:
Image(url= "https://i.imgur.com/GtL0Ehr.jpg", width=850)

In [None]:
# LeNet-5 Architecture
model = keras.Sequential([
    layers.Conv2D(6, (5, 5), activation='relu', input_shape=(32, 32, 1), name= "Convolution_1"),
    layers.MaxPooling2D((2, 2), name= "Subsampling_1"),
    layers.Conv2D(16, (5, 5), activation='relu', name= "Convolution_2"),
    layers.MaxPooling2D((2, 2) ,name= "Subsampling_2"),
    layers.Flatten(name= "FullyConnection_1"),
    layers.Dense(84, activation='relu', name= "FullyConnection_2"),
    layers.Dense(10, activation='softmax')],
    name='LeNet-5') 

### Summarize the model

The summary will tell you the names of the layers, as well as how many units they have and how many parameters are in the model.

In [None]:
model.summary()

### Visualize a model

The plot will show how the layers connect to each other.

In [None]:
#from tensorflow.keras.utils import plot_model
tf.keras.utils.plot_model(model, to_file ='model.png')
from matplotlib import pyplot as plt
img = plt.imread('model.png')
plt.imshow(img)
plt.show()

## Regression:
### Providing the Data

Next up we'll feed in some data. The relationship between $x$ and $y$ is that
$$y = x^3 - 4x^2 - 2^x + 2 + Noise $$

A python library called 'Numpy' provides lots of array type data structures that are a standard way of doing it. 


In [None]:
x = np.arange(0, 10, 0.16)
y = 0.6*x**3 - 5.2*x**2 - 3*x + 2
y_noise = y + np.random.normal(0, 1.8, size=(len(x),))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x, y_noise, label = "Ground truth + noise", color = 'black')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

### Split the data:

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x, y_noise, test_size=0.25, random_state=42)

print("The number of training data is: ", x_train.shape[0])
print("The number of validation data is: ", x_val.shape[0])

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Training Data", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

## At the Beginning: the Neuron
Making the comparison to the original, biological neurons, we explained in the lecture how scientists developed a simple model to simulate their behavior$^1$.

The mathematical equation for a neuron
$$ y = f(wx) $$

where:
* $x$ : input
* $f$ : activation function
* $w$ : wieght
* $y$ : output

Assuming that there is no activation function, the neuron can approximate a linear function.
$$ y = wx $$

In [None]:
Image(url= "https://i.imgur.com/KKYGVtc.jpg", width=450)

In [None]:
model = keras.Sequential(
    [layers.Dense(input_shape=[1], units= 1,activation=None,use_bias=False)],
    name = 'linear') 
model.summary()

When the computer is trying to 'learn' that, it makes a guess,maybe y=10x. The **LOSS function** measures the guessed answers against the known correct answers and measures how well or how badly it did.

It then uses the **OPTIMIZER function** to make another guess. Based on how the loss function went, it will try to minimize the loss.

In [None]:
loss = tf.keras.losses.MSE

In [None]:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)

In [None]:
metrics=[keras.metrics.MSE]

In [None]:
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

The process of training the neural network, where it 'learns' the relationship between the $x$ and $y$ is in the `model.fit` call. This is where it will go through the loop making a guess, measuring how good or bad it is, using the opimizer to make another guess etc. It will do it for the number of epochs you specify.

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))  

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
model.weights

## How we can reduce the loss:

### Usage of bias:
We will create a neural network. It has also 1 layer, and that layer has 1 neuron and bias. The model that can be represented by this network is: 
$$ y = wx + b$$

In [None]:
Image(url= "https://i.imgur.com/lPDTbj0.jpg", width=500)

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 1,use_bias=True)])
                          
model.summary()

In [None]:
loss = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
metrics=[keras.metrics.MSE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
model.weights

### Usage of initializers

Initializations define the way to set the initial random weights of Keras layers.

The keyword arguments used for passing initializers to layers will depend on the layer. Usually it is simply `kernel_initializer` and `bias_initializer`:

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 1,use_bias=True,
                                      kernel_initializer = keras.initializers.Constant(value=-20. ),
                                      bias_initializer = keras.initializers.Constant(value= -14.),
                                      )])

model.summary()

In [None]:
model.weights

In [None]:
loss = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
metrics=[keras.metrics.MSE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
model.weights

**Task: Try to initialize the model with the weight and bias, which can make the loss as low as possible.**

### Usage of other loss function

#### Mean Squared Error (MSE)

It is perhaps the most simple and common metric for regression evaluation. It is defined by the equation:

$$\text{MSE} = \frac{1}{N} \sum^{N}_{i=1}(y_i - \hat{y}_i)^2$$


#### Mean Absolute Error (MAE)
In MAE the error is calculated as an average of absolute differences between the target values and the predictions. The MAE is a linear score which means that all the individual differences are weighted equally in the average. Mathematically, it is calculated using this formula:
$$\text{MAE} = \frac{1}{N} \sum^{N}_{i=1}|y_i - \hat{y}_i|$$

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 1,activation=None,use_bias=True)])
model.summary()

In [None]:
loss = tf.keras.losses.MAE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
metrics=[keras.metrics.MAE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

**Q: MAE or MSE?**
* Do you have outliers in the data?
    * Use MAE

* Are you sure they are outliers?
    * Use MAE

* or they are just unexpected values we should still care about?
    * Use MSE

### Usage of multi-layers network
Perhaps the use of multi-layers network can help us to fit the function better. The model that can be represented by this network is: 
$$y = w_3(w_2(w_1x + b_1)+ b_2)+ b_3 $$

In [None]:
Image(url= "https://i.imgur.com/q4ozgnk.jpg", width=800)

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1],units= 1,use_bias=True),
                         layers.Dense(units= 1,use_bias=True),
                         layers.Dense(units= 1,use_bias=True)])

model.summary()

In [None]:
loss = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
metrics=[keras.metrics.MSE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

**Q: We used multi-layer networks, but the model is still linear, why is that? Can we say that this network is a deep network?**

### Usage of activation function:
However, simply outputting a weighted sum of the inputs limits the tasks that can be performed by the neural network. Therefore, a better processing of the data would be to map the weighted sum to a nonlinear space.

ReLU: An activation function that allows a model to solve nonlinear problems. The model that can be represented by this network is: 
$$y = \max(w_1x + b_1,0) $$

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 1,activation='relu',use_bias=True)])

model.summary()

In [None]:
loss = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)
metrics=[keras.metrics.MSE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
model.weights

**Q: Why is the result worse than the linear model?**

## Universal approximation theroy: 
The universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of $R^n$, under mild assumptions on the activation function.

In [None]:
Image(url= "https://i.imgur.com/BLmMNo4.jpg", width=500)

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 12,use_bias=True,activation='relu'),
                         layers.Dense(units= 1,use_bias=False)])

model.summary()

In [None]:
loss = tf.keras.losses.MSE
optimizer = tf.keras.optimizers.Adam(learning_rate=0.005)
metrics=[keras.metrics.MSE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=1500, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

## Let's go deeper

In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 20,use_bias=True,activation='relu'),
                         layers.Dense(units= 10,use_bias=True,activation='relu'),
                         layers.Dense(units= 10,use_bias=True,activation='relu'),
                         layers.Dense(units= 1,use_bias=True)])

model.summary()

In [None]:
loss = tf.keras.losses.MAE
optimizer = tf.keras.optimizers.Adam(learning_rate=0.002)
metrics=[keras.metrics.MAE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=2200, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
xs = np.array([-1.5,-1,-0.5,-6.6,7.5])
ys = xs**3 - 4*xs**2 - 2*xs + 2
ys_noise = ys + np.random.normal(0, 1.5, size=(len(xs),))

In [None]:
fig = plt.figure (figsize = (10,6))
plt.scatter(xs, ys_noise, label = "Ground truth + noise", color = 'green')
plt.scatter(xs,model.predict(xs), label = "Prediction", color = 'blue')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
fig = plt.figure (figsize = (10,6))
plt.scatter([4],model.predict([4]), label = "Test data", color = 'green')
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x, y_noise, label = "Ground truth + noise", color = 'black')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

## Improvement of generalization

### Regularization
L2 regularization is perhaps the most common form of regularization. It can be implemented by penalizing the squared magnitude of all parameters directly in the objective

http://rasbt.github.io/mlxtend/user_guide/general_concepts/regularization-linear/


In [None]:
model = keras.Sequential([layers.Dense(input_shape=[1], units= 20,use_bias=True,activation='relu',
                                      kernel_regularizer=keras.regularizers.l2(0.25)),
                         layers.Dense(units= 10,use_bias=True,activation='relu',
                                     kernel_regularizer=keras.regularizers.l2(0.25)),
                         layers.Dense(units= 10,use_bias=True,activation='relu',
                                     kernel_regularizer=keras.regularizers.l2(0.3)),
                         layers.Dense(units= 1,use_bias=True)])

model.summary()

In [None]:
loss = tf.keras.losses.MAE
optimizer = tf.keras.optimizers.Adam(learning_rate=0.002)
metrics=[keras.metrics.MAE]
model.compile(loss=loss, # Loss function to minimize 
              optimizer=optimizer,# Optimizer
              merics = metrics) # List of metrics to monitor

In [None]:
history = model.fit(x_train, y_train, epochs=2200, batch_size=47, validation_data=(x_val, y_val))

In [None]:
fig = plt.figure (figsize = (12,6))
plt.plot(x,y, label = "Ground truth", color = 'black')
plt.scatter(x_train, y_train, label = "Ground truth + noise", color = 'black')
plt.scatter(x_val, y_val, label = "Test Data", color = 'green')
plt.plot(x, model.predict(x), color = 'red', ls = '--', lw = 3, 
         label = 'Fitted function')
plt.legend(fontsize = 16, loc = 'upper left')
plt.xlabel ('x', fontsize = 16)
plt.ylabel ('y', fontsize = 16)
plt.tick_params(axis='both', which='major', labelsize=16)

In [None]:
# The returned "history" object holds a record
# of the loss values and metric values during training
fig = plt.figure (figsize = (12,6))
plt.plot(history.history['loss'], label = "train_loss", color = 'black')
plt.legend(fontsize = 16, loc = 'upper right')
#print('\nhistory dict:', history.history)

In [None]:
display(Image(url='./image67.gif',width=500))