##### Portions Copyright 2019 The TensorFlow Authors.
This notebook was edited from the TensorFlow Authors' original by Michael Glass and Jung Hee Kim.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Easy Neural Network to Compute a Linear Equation

This neural network will learn to compute a simple equation:
 y = 2*x + 1

The neural network is trained by giving it examples of the input X value, where we already know the correct Y value. 

*   When an X value flows into the network, some predicted Y value comes out.
*   The predicted Y is compared with the correct Y.
* The difference (the error) is used to adjust the parameters in the network. The parameters are adjusted a little bit to nudge the network's output in the correct direction. 
* This process is repeated. 
Eventually the network's parameters are adjusted to the point where the output is (close to) correct for every X value in the training data.
* Then we try some test data: we feed into the network some X values it has not seen before. 

Essentially we have produced a subroutine which computes the function, but we never explicitly wrote down the equation for the computer.



## Imports
Python imports:

* Import `TensorFlow` and call it `tf` for ease of use.

* Import `numpy`, for storing and manipulating arrays of numbers. (It is more flexible and efficient than using Python lists.)

* The framework for defining a neural network as a set of Sequential layers is called ``keras``, so we import that too.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

## Define and Compile the Neural Network

Next we will create the simplest possible neural network. It has 1 layer, and that layer has 1 neuron, and the input is just 1 value.

In [None]:
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])

Let's explain the above one line of Python code.
* We construct a ``Sequential`` object, which contains a whole neural network. We put it in the variable ``model``.
```
   model = ...Sequential( ... )
```
* This neural network contains a Python list of layers:
```
   model = Sequential( [ layer1, layer2, ... ] )
```
* Here we have a single layer, which is a ``Dense`` object with only 1 neuron in the layer:
```
   model = Sequential( [ keras.layers.Dense(units=1...) ] )
```
* The input_shape is how we organized an array X values for one example case. If one example case is a single array of 13 numbers (13 numerical attributes) we would write [13]. But if each example has 50 pixel values organized as a 2-D 10 x 5 picture would write [10, 5]. In our case each example has 1 number. 
```
   model = Sequential( [ Dense(units=1, input_shape=[1]) ] )
```



Keras neural network objects have a ``summary`` method, to show you what is inside.

In [None]:
model.summary()

The above should show that our neural network has one Dense layer. The output shape says this layer has one output, meaning it has one neuron. 

What is the meaning of the number of parameters? It shows us how many parameters are in each layer. For one neuron with one input:
* The weight multiplier $w$, which is applied to the input value
* The bias value $b$ for the neuron.

By default the activation function is a simple linear equation. Since we have one $x$-value input this network computes: $y = wx + b$.

However when we start, the network has no idea of the proper values of $w$ and $b$. Both these parameters are adjustable, meaning the neural nework will adjust their values as it learns.

---


Now we compile the neural network. When we do so, we have to specify 2 functions, a loss and an optimizer.

We know that in our function, the relationship between the numbers is y=2x-1. 

When the computer is trying to 'learn' that function, it makes a guess...maybe y=10x+10. The **loss** function measures the guessed answers against the known correct answers and measures how well or how badly it did.

We will use a loss function MEAN SQUARED ERROR. Generally this looks at the square of each error ($\Delta$ y), the difference between the expected output y-value and the correct output y-value.
* When the output of one training case is near the goal (small error), the correction is small. 
* Traning cases which produce larger errors are generally more important.
* The correction increases with the square of the error, so the correction will emphasize the training cases where the neural network is still not getting it right.

It then uses the **optimizer** function to adjust the parameters in the neural network, nudging them in the direction of the correct output.

The STOCASTIC GRADIENT DESCENT optimizer function. Gradient descent uses the derivative of the function y=f(z) which is inside each neuron.  Using the derivative we can estimate how much a small change in input will change the output. 

$\Delta y$ = f'($z$) $\Delta z$

 The loss function helped estimate how much output change $\Delta$ y we need to achieve to get the error near zero. The derivative allows us to estimate the input change which would achieve that output change. The input value $x$ from the training data is multiplied by the weight $w$, so we adjust the weight. The neuron also utilzes the bias value $b$, another adjustable value. The job of the optimizer function is to guess small adjustments to the weight and bias values to correct the $\Delta y$ error value. (Stochastic gradient descent is a modification of gradient descent.)

In [None]:
model.compile(optimizer='sgd', loss='mean_squared_error')

Remember that the `model` variable contains a neural network object.  The above line of Python code calls its `complile()` method.


## Providing the Data

Next we feed in some data. In this case we make 6 training cases, each has one X and one Y value. You can see that the relationship between these is that y=2x+1, so where x = -1, y=-1 etc.

We will use numpy arrays. One array contains the X values, the another contains the Y values. Neural networks use floating point numbers.

In [None]:
xs = np.array([-1.0,  0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-1.0,  1.0, 3.0, 5.0, 7.0, 9.0], dtype=float)

In the above code, the `np.array()` method converts a Python list into a Numpy array object.

# Training the Neural Network

The process of training the neural network, where it 'learns' the relationship between the Xs and Ys is in the ``model.fit()`` method. This is where it will go through the loop we spoke about above, making a guess, measuring how good or bad it is (aka the loss), using the opimizer to adjust the weights, make another guess etc. 

It will do it for the number of epochs you specify. Each epoch runs through all the training data once.

When you run this code, you'll see the loss on the right hand side.

First let us train for 1 epoch.

In [None]:
model.fit(xs, ys, epochs=1)

The average error is likely pretty large, remember that it started with random numbers for the weight and bias. Let see how well the model computes our function $y=2x+1$ for an input value $x$=1.5. We can use the `model.predict()` method. 

The input is the same size and shape as we specified when we built the network: a 1-element array. Since a neural network could have multiple outputs, the output will also be in the form of an array.

In [None]:
print(model.predict([1.5]))

Not very impressive. But we will train the model for 49 more epochs. Look at the loss value as it trains.

In [None]:
model.fit(xs, ys, epochs=49)

Now the loss should be a much smaller number, you have a model that has been trained to learn the relationship between X and Y. You can use the `model.predict` method again to have it figure out the Y for a previously unknown X. Let us try this with several different input values.

In [None]:
for i in [-2.0, 1.5, 6.0]:
  print(i,'predicts',model.predict([i]))

In the above code, `model.predict([...])` feeds to the network a list of the input numbers for one case. It returns the list of output numbers from the last layer.

Again, our neural network has only one input number, so the list of one input number $i$ is `[i]`.

How well is it working? 
 
It helps to know that the `loss` that is printed out is the average of the squares of all the errors. If an error was 0.2, the square is 0.04. So a `loss` of 0.04 means the errors were in the neighborhood of 0.2.

The key here at the neural networks always work with approximations. The real-number (floating point number) inputs are multiplies by real-number weights, then added up, then put through some in the neuron. The error is used to adjust the weights by fractional amounts. So it is quite likely that this network will never be trained to produce exact values.


We can do better, however. Let us try training the network for more epochs.

Keep in mind that the `model` variable contains a neural network object which has been partially trained. We can use the `fit()` method to train it some more. 



In [None]:
model.fit(xs, ys, epochs=300)

And we can test again. We will give it a few more values:

In [None]:
for i in [-5.0, -2.0, 1.5, 6.0, 10.0]:
  print(i,'predicts',model.predict([i]))

Are these values good enough now? 

A more advanced topic is to add a function which is called after each epoch, which can evaluate the whether the network is good to stop training.