## Training a Multi-Layer Perceptron (MLP) with Random Data

### Introduction
In this notebook, we will use a randomly generated dataset to train a Multi-Layer Perceptron (MLP) using PyTorch. This will help us understand the workflow of training a neural network, without relying on pre-existing datasets like MNIST.

### Mathematical Formulation

We will generate random data $ \mathbf{X} \in \mathbb{R}^{N \times D} $ (where $N$ is the number of samples and $D$ is the number of features) and corresponding labels $\mathbf{y} \in \mathbb{R}^{N} $, where the target is a random binary label (0 or 1).

The neural network will have:
1. An input layer with $D$ input features.
2. Two hidden layers with ReLU activations.
3. An output layer producing a probability (output between 0 and 1) using the sigmoid function.

---

### 1. Install PyTorch and Dependencies

```python
# First, let's install PyTorch (if not already installed)
!pip install torch 
```

### 2. Import Libraries


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np

### 3. Generate Random Data

Here, we will generate synthetic data with $N = 1000$ samples and $D = 20$ features. The labels will be binary (0 or 1) and randomly assigned.

- $X$ is a tensor of size $N \times D$, where each row represents an input sample with $D$ features.
- $y$ is a tensor of size $N$, where each value is either 0 or 1, representing the label for each sample.


In [None]:
# Generate random data (N samples, D features)
N = 1000  # Number of samples
D = 20    # Number of features

# Random input data (features)
X = torch.randn(N, D)

# Random binary labels (target output)
y = torch.randint(0, 2, (N,))  # Random binary labels (0 or 1)


### 4. Define the MLP Model

We will define a simple MLP with:
1. An input layer of size $D$,
2. Two hidden layers (with 64 and 32 neurons respectively),
3. An output layer with 1 neuron, applying a sigmoid activation to output a probability.

### 5. Define Loss and Optimizer

For this binary classification task, we will use **Binary Cross-Entropy Loss** and the **Stochastic Gradient Descent (SGD)** optimizer.


### 6. Split the Data into Training and Validation Sets

We will randomly split the data into 80% for training and 20% for validation.

### 7. Train the Model

Now, we will define the training loop and train the model for several epochs.


### 8. Evaluate the Model

After training, we will evaluate the model on the validation set.

### 9. Visualize Some Predictions

Finally, let's visualize some of the predictions made by the model on the validation data.