# Tensorflow Playground

The goal of this exercise is to let you gain some code free intuition on neural network models. For this purpose we will use a tool developped by Google called the tensorflow playground: it's a code-free user friendly interface that let's you confront fully-connected neural network models to toy dataset examples.

Before we start the exercise follow this <a href="https://playground.tensorflow.org/"> link </a> to enter the playground!

Here is a short video explaining you the different fucntionnalities of the playground:


Throughout the whole exercise, unless specifically asked to modify a setting leave them as they are.

## Part 1 : Classification

### The Circle

1. Set the DATA to Circle, keep 1 hidden layer with one neuron on it, set the Activation to linear and the learning rate to 0.03. Start training by clicking the play button, what happens ? Is this neural network equivalent to a model we have studied before ?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_1.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=linear&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=1&seed=0.46827&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Actually this model is strictly equivalent to a linear regression!

2. Add more neurons on the hidden layer and start training again, do the results improve? why?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_2.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=linear&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8&seed=0.46827&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

The results do not improve because adding more neurons to this layer is like averaging several linear regressions together, it still results in fitting a linear regression model.

3. Try adding additional hidden layers, will it change the results? why? What are we missing here?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_3.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=linear&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,2&seed=0.46827&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Adding hidden layers is like composing functions, however composing several linear functions together still results in a linear functions:

$$
\begin{align}
f(x)=&ax + b\\
g(x)=&cx + d\\
f \circ g (x) =& a(cx + d) +b\\
f \circ g (x) =& \underbrace{ac}_{\text{a'}}x + \underbrace{ad + b}_{\text{b'}}
\end{align}
$$

In order to make a model that can adapt to a non-linear classification problem like the circle we need a non-linear model! The source of non-linearity in neural network models is the activation function. In the next question we will use a non-linear activation function.

4. Let's go back to only one hidden layer with 2 neurons on it, but this time set the activation to ReLu, what changes?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_4.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=2&seed=0.46827&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

This time the predictions take a different shape, the model is learning non-linearity in the data, however it is not yet complex enough in order to fit the circle well.

5. Increase the number of neurons on the hidden layer, what happens? What is the minimum number of neurons we can put on the hidden layer so we get acceptable predictions, can you understand why?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_5.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=3&seed=0.46827&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

With three neurons on the hidden layer, we are able to predict the cricle well, although our problem is not too difficult since the data is perfectly separable.

6. Let's make the problem more difficult, set noise to the max value, what happens to our data?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_6.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=50&networkShape=3&seed=0.58015&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Adding noise will make the data imperfect, some blue dots are in the yellow zone and vice-versa.

7. Let's increase the number of neurons on the hidden layer to the maximum and start training, what happens?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_7.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=50&networkShape=8&seed=0.58015&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

After a while, the test loss increases while the train loss keeps decreasing, this means that the model is over fitting, even though it is not really showing the prediciton shape since our model is still quite simple. The fact that we only have one hidden layer limits the level of complexity, adding more neurons on this layer will increase the capacity of the neural network to explore more "flavors" of this level of complexity but that's it. Picture it like a building with many floors and many rooms on each floors. The more you go up the more luxurious the rooms are, adding neurons on a layer is equivalent to exploring more rooms on that one floor, but it will not take you to the next level, adding layers unlocks completely new horizons which complexity grows exponentially at each floor.

8. Let's now add a new hidden layer with the maximum number of neurons and start training, what happens? Continue increasing the number of hidden layers until you reach the limit, what happens? What is the phenomenom called?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_8.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=50&networkShape=8,8&seed=0.58015&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Adding a hidden layer with many neurons unlocks a new lever of complexity for our models, you can see it in the little squares representing each neuron (the output space gets more complicated on the second layer), this is the result of composing non-linear functions, at each composition you unlock increasingly more complicated non-linear behavior.

9. Try adding new features to this model, does it solve our problem?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/config_9.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=circle&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=50&networkShape=8,8&seed=0.58015&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=true&sinX=true&cosY=true&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Although adding more hidden layers and neurons will let the neural network engineer its own non linear features, feeding the model non linear transformation of the data can help achieve equivalent results with a less complex architecture.

In this sense, complexifying the neural network's architecture is equivalent to exploring feature engineering. The difference is that the features you will manually create are not necessarily well aligned with the problem you are trying to solve, while the way the neural network does feature engineering is directly related to minimizing the loss function.

### The Spiral

1. Let's switch to a more difficult problem: the spiral, set DATA to spiral with noise 0, deactivate all features but $x_1$ and $x_2$, and keep only one hidden layer with 8 neurons and start training, what happens?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_1.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8&seed=0.02522&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Training the model with only one hidden layer will not yield great results because the data is too complex, we need additional levels of complexity in the output space.


2. Now try and add a second hidden layer in the network with 4, 6, then 8 neurons on it, do the results improve? What does it tell you about the effect of using multiple layers in the network as opposed to adding neurons on a layer?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2a.PNG" />
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2b.PNG" />
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_2c.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,4&seed=0.02522&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution a</a>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,6&seed=0.02522&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution b</a>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0&noise=0&networkShape=8,8&seed=0.02522&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution c</a>

Adding more and more neurons on the second hidden layer will yield ever improving results, although it may lead to over fitting.

If you try adding more neurons on the second hidden layer while keeping the number of neurons on the first hidden layer low, the network will not be able to reach its full potential because what happens in layers near the top is intrinsically limited by what happens in earlier layers near the bottom. Therefore it is usually a good practice to have a decreasing (or constant) number of neurons on each layer as you move up towards the top layer.

3. With two hidden layers with 8 neurons on both of them you should be able to get OK prediction with the spiral, however the model seems to overfit! What could we try adding to the model to limit this overfitting without changing the architecture (the number of hidden layers and number of neurons)?

<details>
<summary>Spoiler</summary>
Try adding L2 regularization with rate of 0.01
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_3.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0.01&noise=0&networkShape=8,8&seed=0.02522&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Adding regularization works in the exact same way as what happens with Ridge or Lasso regression, a penalization indexed on the value of the parameters is added to the loss function and limits the way the model's parameters may be optimized, leading to smoother, less over fitted output space.

4. Try adding noise to the data (30 for example) and start training the same model, does it still perform well?

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0.01&noise=30&networkShape=8,8&seed=0.97645&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

The model we have built is no longer overfitted, it renders the true realtion of $X$ and $Y$ rather faithfully, therefore adding noise to the data does not affect the model too much.

5. What is the effect of adding new features to this model?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/spiral_5.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=relu&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0.01&noise=30&networkShape=8,8&seed=0.97645&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=classification&initZero=false&hideText=false">solution</a>

Similarly to the cirle problem, adding new feature helps creating a more complex output space without having to change the architecture of the model.

### Regression

1. Try solving the plane regression problem on your own, what is the most simple architecture you can use to get excellent performance?

<details>
<summary>Spoiler</summary>
<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/reg_1.PNG" />
</details>

<a href="https://playground.tensorflow.org/#activation=linear&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0.01&noise=30&networkShape=1&seed=0.68404&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=regression&initZero=false&hideText=false">solution</a>

A simple model with one hidden layer and one neuron on it with linear activation function should do the trick!

2. Switch the data to multi-gaussian, what is the simplest architecture you can find to get good predictions? (Under 0.03 for train and test loss)

<a href="https://playground.tensorflow.org/#activation=tanh&regularization=L2&batchSize=10&dataset=spiral&regDataset=reg-gauss&learningRate=0.03&regularizationRate=0.01&noise=0&networkShape=8,4&seed=0.16387&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=true&xSquared=true&ySquared=true&cosX=false&sinX=true&cosY=false&sinY=true&collectStats=false&problem=regression&initZero=false&hideText=false">solution example</a>

This problem is a little more difficult, you will need two layers with a few neurons on each of them to achieve good performance. However if you add noise the problem gets very very difficult to fit well.

### Sum it up

Try and summarize what we have learned intuitively here about neural networks :

* What happens if we use a linear activation function?

Using a linear activation function results in a linear model and does not take advantage of the capabilities of neural networks.

* What is the effect of adding neurons on a layer?

Adding a neuron to a layer makes it possible for the model to create an additional "feature" on a given level of complexity.

* What is the effect of adding hidden layers?

Adding a hidden layer lets the model add one more level of non-linearity by applying one more activation function to the previous output, leading to exponentially complex outputs.

* In a model with several layers, is it more useful to add more neurons on the layers near the bottom or near the top?

It is more useful to increase the number of neurons towards the bottom because the complexity of the outputs of earlier neurons limit the complexity of the outputs of later neurons. It is generally good practice to have more neurons on bottom layers and progressively decrease the number of neurons going up the network.

* If the model overfits, what can we do to limit overfitting?

We can reduce the number of neurons and hidden layers in the network.

We can also introduce regularization like Ridge (L2) or Lasso (L1)

* Would you say that using neural network models compensates the need for feature engineering?

It does, as a matter of fact, the outputs of the neurons in the network may be interpreted as new features that will be used by later neurons to make even more complex features leading to the final prediciton. In addition to that, these "features" are build by neurons which parameters get optimized according to the loss function, so it creates features that are linked to the target variable without having to be explicitely coded!

In a way it is great because feature engineering is difficult and neural networks do it for us, the major downside is that it all happens in what may be qualified as a "black box" model. Depending on the data, neural network models may be using features to make predictions that are not at all interpretable or even well aligned with our final goal! We will learn some tools together in the following lectures that make it possible to interpret the way our models make their predictions.

* When you use additional features to feed the model, do you need to use as many neurons and layers? Would adding more neurons and layers be an alternative to using additional features?

Adding new features may let you use less complex architectures, the upside is that you know exactly what input features are used which makes the model more interpretable, on the other hand you may be missing some very useful features that model may have created for you!

I hope this exercise helped you better understanding deep neural networks, now let's learn the basics of tensorflow so we pave the way to building deep learning models with our own code!