# Deep Learning week - Day 1 - Playground

### Objectives:

- Get a visual representation of Neural Networks
- Get a better intuition of what Neural Networks are doing

<hr>

This first exercise does not require much code

# 1 - The data

Let's go on the [Playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=2&seed=0.23545&showTestData=false&discretize=false&percTrainData=70&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&regularization_hide=true&showTestData_hide=false&stepButton_hide=false&activation_hide=false&problem_hide=false&batchSize_hide=true&dataset_hide=false&resetButton_hide=false&discretize_hide=false&playButton_hide=false&learningRate_hide=true&regularizationRate_hide=true&percTrainData_hide=false&numHiddenLayers_hide=false) and select the following type of data : 

- A classification problem 
- The circle dataset (blue dots inside a circle of oranges dots)
- Ratio of training to test data : 70%
- No noise (=0)
- Do not show test data (right panel) and do not discretize the output
- Activation function: `ReLU` (💡In general, try it by default. It appears to work better for many problems!)[Note: Playground only allows you to select **one** activation function used for **all** the **hidden** layers]


# 2 - The features

Here, select only the features $X_1$ and $X_2$ - unselect the other features if necessary.

❓ Question ❓ In the case where you use the other variables, as $X_1^{2}$, $X_2^{2}$, $X_1 X_2$, $sin(X_1)$ and $sin(X_2)$, what type of classic Machine Learning operation does it corresponds to?

<details>
    <summary>Answer</summary>

It corresponds to some type of feature engineering where you transform them (by multiplication or by applying a sinus).

</details>

Here, in this exercise but also tomorrow, we will only use the raw input features $X_1$ and $X_2$. 

# 3 - Fit

Build a model that has : 
- three hidden layers
- 5 neurons on the first hidden layer
- 4 neurons on the second hidden layer
- 3 neurons on the last hidden layer
- In Playground, the output layer is not represented: Fot such binary classification task, it will automatically be a dense layer with 1 neuron activated by sigmoid

Fit it and stop iterations when loss has stabilized.


❓ Question ❓ 

- Look at the individual neurons and try to understand what each neuron specializes in (after the fit)?
- What do you think about the overall shape your results? Try with different activation to compare. Can you make it work with "Linear"?

> YOUR ANSWER HERE

# 4 - Neural network in Keras

Let's write the same model - at least the architecture - in Keras. It will corresponds to the following code.

In [8]:
from tensorflow.keras import models
from tensorflow.keras import layers

model = models.Sequential()

model.add(layers.Dense(5, activation='relu', input_dim=2)) # First hidden layer with 5 neurons
model.add(layers.Dense(4, activation='relu')) # Second hidden layer with 4 neurons
model.add(layers.Dense(3, activation='relu')) # Third hidden layer with 3 neurons

# Output layer that outputs a probability which is
# necessary in the case of a 2 class classification problem
model.add(layers.Dense(1, activation='sigmoid')) 

# For now, let's skip the model.compile() and the model.fit()

The `input_dim` of the first layer corresponds to the number of input features, which is 2 here, $X_1$ and $X_2$. It is mandatory for the first layer. The fact that you here define a `Sequential` model makes the following layer aware of the input size based on the output size of the previous layers

# 5 - Change the dataset

Change the dataset for the "XOR - Exclusive Or".

❓ Question ❓ 
- Go back to Playground and try to design a model with two hidden layers (you are free to choose the number of neurons per layer yourself) that has a very small **test loss**. 
- Once you have your model on Playground, write it below with the Keras library

In [15]:
from tensorflow.keras import models
from tensorflow.keras import layers

model=models.Sequential()
#Input first layer
model.add(layers.Dense(5, activation='relu', input_dim=2))

#2 hidden layer
model.add(layers.Dense(4, activation='relu'))
model.add(layers.Dense(4, activation='relu'))

#output
model.add(layers.Dense(1, activation='sigmoid')) 
model.summary()

#5*2+5 + 5*4+4 + 4*4+4 + 4*1+1=64 (bias??)

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_40 (Dense)            (None, 5)                 15        
                                                                 
 dense_41 (Dense)            (None, 4)                 24        
                                                                 
 dense_42 (Dense)            (None, 4)                 20        
                                                                 
 dense_43 (Dense)            (None, 1)                 5         
                                                                 
Total params: 64
Trainable params: 64
Non-trainable params: 0
_________________________________________________________________


❓ Question ❓ Try to repeat the same process with the **Spiral**! 

64

☝️ Much more weights are needed to fit this dataset, isn't it?

With deep enough models, you can pretty much fit any pattern.  The problem is about avoiding **overfitting**. Add a good deal of noise and you _may_ see your model learning "too much" on this noise. We will see in the next day of the module how to manage overfitting.

&nbsp;

<details>
    <summary>A picture of overfitting in Playground</summary>
    
<img src='https://github.com/lewagon/data-images/blob/master/DL/playground-overfitting.png?raw=true' width=700 style='margin:auto'>
</details>

# 6 - Regression

Now, switch the problem type to a regression problem. 

This time, the last layer will no longer look like  
`model.add(layers.Dense(1, activation='sigmoid'))`

but instead  
`model.add(layers.Dense(1, activation='linear'))`

meaning that you output 1 final value which is between $ -\infty$ and $+ \infty$ 

❓ Question ❓ Find on Playground a neural network that fits well the second regression dataset and write its architecture in Keras : 

In [13]:
# YOUR CODE HERE
model.add(layers.Dense(10,activation='linear',input_dim=2))

model.add(layers.Dense(1, activation='linear'))
model.add(layers.Dense(1, activation='linear'))

model.add(layers.Dense(1, activation='linear'))

model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_24 (Dense)            (None, 5)                 15        
                                                                 
 dense_25 (Dense)            (None, 4)                 24        
                                                                 
 dense_26 (Dense)            (None, 4)                 20        
                                                                 
 dense_27 (Dense)            (None, 1)                 5         
                                                                 
 dense_28 (Dense)            (None, 10)                20        
                                                                 
 dense_29 (Dense)            (None, 1)                 11        
                                                                 
 dense_30 (Dense)            (None, 1)                

### 🏁 You are now ready to do the same things with Keras directly !
Don't forget to commit and push your notebook