## Time to get your hands dirty. The Tensorflow playground.


Today we'll play with the TensorFlow playground. We'll see it next time, but Tensorflow is one of the main open-source deep learning libraries.

The playground provides some datasets and let you see what's going on during the training process in a nice, intuitive way. 

- We have a function $f(x_1,x_2)$ that returns +1 (blue) or -1 (orange).
- We are given a dataset and split it into training and testing. 
- Hit play, and the network starts training.
- The thickness of the line tells you how much that synapis contributes to the training.
- You can monitor progress on the top-right.
- The goal of the game is to minimize the loss function **on the test set** (I don't care about the loss function on the training set!).
- You can change everything but the type of problem, the test/train ratio, and the noise level.
- So feel free to play with activation function, number of layers, number of neuron per layers, learning rate, regularization, incoming features (data augmentation), etc. 


1) Let's start from [this dataset](https://playground.tensorflow.org/#activation=linear&regularization=L2&batchSize=10&dataset=xor&regDataset=reg-plane&learningRate=0.01&regularizationRate=0&noise=35&networkShape=1&seed=0.50246&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false). Try to get the lowest score on the test set. Take a screenshot or we won't believe you! Let's draw a leaderboard.

2) Then move on to [this dataset](https://playground.tensorflow.org/#activation=relu&regularization=L2&batchSize=10&dataset=spiral&regDataset=reg-plane&learningRate=0.1&regularizationRate=0.01&noise=50&networkShape=3,2&seed=0.65406&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false), which is more challenging. 

---
---
---


At the beginning, I chose a dataset to work on with the goal of minimizing the loss for that dataset.  
I then applied my deep learning knowledge combined with insights from ChatGPT about optimization strategies.

I selected the **Gaussian dataset**, which generates two clusters of points roughly centered around different means in 2D space.  
The decision boundary is nonlinear but relatively simple — usually elliptical or curved.

---

### Feature Selection

The next step was to identify the most relevant features for the problem. Here is my reasoning:

| Feature             | Description           | Usefulness for Gaussian Dataset          |
|---------------------|-----------------------|------------------------------------------|
| `x`, `y`            | Raw input coordinates | ✅ Essential, should be kept               |
| `x*y`               | Interaction term      | ✅ Helps model curvature                   |
| `x²`, `y²`          | Quadratic terms       | ✅ Important for capturing elliptical shapes |
| `sin(x)`            | Periodic function     | ❌ Not useful in this case                 |
| `cos(x)`            | Periodic function     | ❌ Not useful                             |

Therefore, I decided to keep the features `x`, `y`, `x*y`, `x²`, and `y²`, discarding the periodic ones that would add complexity without benefit.

---

### Network Architecture

Regarding the neural network design, a common heuristic is that the number of neurons per layer should be at least greater than the number of input features.  
I manually chose a network with **4 layers** and **8 neurons per layer**.

Initially, I was concerned about overfitting, but both the **training loss** and **test loss** dropped to zero, indicating the model is generalizing well and not overfitting.

---

### Activation Function

I chose the **`tanh`** activation function because:

- It handles nonlinearity smoothly, which is essential for the Gaussian dataset.
- It works well with noisy data (my noise level is 35), providing stable training.
- Its output is zero-centered, which helps the network converge more efficiently.

---

### Regularization

I used **L2 regularization (Ridge)** for the following reasons:

- Since I already selected relevant features, there was no need for aggressive pruning (which L1 encourages).
- The high noise level makes L2 more stable than L1 under these conditions.
- The decision boundary is smooth, so L2 helps keep weights small and distributed rather than forcing sparsity.

---

### Training Hyperparameters

I manually chose the following hyperparameters with the awareness that:

- **Regularization rate** = 0.01 → balances penalizing large weights without overly restricting model capacity.
- **Learning rate** = 0.003 → slow enough to allow stable convergence without unwanted oscillations.

---

### Batch Size

Finally, I selected an intermediate batch size because:

- Too small batches produce noisy gradients and unstable training.
- Too large batches risk overfitting the noisy data, reducing generalization ability.
- An intermediate batch size strikes the best balance between stability and generalization.

I choose 14 (default value)


![Description of the image](TensorFlowoptimization.png)


## Spiral Dataset Optimization Strategy

Then I selected the **Spiral dataset**, a classic benchmark for testing a model’s ability to learn nonlinear and intertwined decision boundaries. The goal was to minimize the loss while maintaining generalization, despite the high noise level and fixed 50/50 train-test split.

### Feature Selection

Given the spiral’s complex, circular structure, I carefully selected features that could help the model capture nonlinear and periodic patterns:

| Feature             | Description              | Usefulness for Spiral Dataset             |
|---------------------|--------------------------|-------------------------------------------|
| `x`, `y`            | Raw input coordinates    | Essential for spatial positioning         |
| `x²`, `y²`          | Quadratic terms          | Help model radial curvature               |
| `x*y`               | Interaction term         | Captures diagonal and rotational trends   | 


These features allow the network to learn curved and periodic boundaries, which are essential for distinguishing the spiral arms.

### Network Architecture

To model the spiral’s complexity, I used a deep feedforward neural network with the following structure:

- 4 hidden layers
  - Layer 1: 8 neurons
  - Layer 2: 6 neurons
  - Layer 3: 4 neurons
  - Layer 4: 2 neurons

This architecture balances depth and compression, enabling the network to learn high-level abstractions and then refine them into a compact decision boundary.

### Activation Function

I chose **ReLU** as the activation function because:

- It introduces nonlinearity without saturation (unlike sigmoid or tanh).
- It speeds up convergence and works well with deeper networks.
- Despite the spiral’s complexity, ReLU performed well when combined with the right features and architecture.

### Regularization

I applied **L2 regularization** with a rate of **0.001** to:

- Prevent overfitting in the presence of high noise (50%).
- Encourage smaller, more distributed weights.
- Maintain smooth decision boundaries without forcing sparsity.

### Training Hyperparameters

- **Learning rate**: 0.03 — fast enough to converge, but low enough to avoid instability.
- **Epochs**: 824 — training continued until both training and test loss stabilized at 0.048, indicating excellent generalization.
- **Batch size**: 25 — a balanced choice that smooths gradient updates while preserving generalization.




![Description of the image](TensorFlowoptimization_2.png)