# PyTorch Mastery — 1. Simple Neural Network

Welcome! In this mini-tutorial, we build a simple fully connected (feed-forward) neural network for binary classification using synthetic data. The flow:

1) Import libraries
2) Generate a toy dataset
3) Train/test split and feature scaling
4) Convert to PyTorch tensors
5) Create TensorDataset and DataLoader
6) Define a small neural network
7) Choose loss function and optimizer
8) Train the model
9) Evaluate accuracy on the test set

Skim the code, then read the explanation cells that follow each code block.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader,Dataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

## Imports explained

- `import torch`: Core PyTorch library for tensors and autograd.
- `import torch.nn as nn`: Neural network building blocks (layers, loss functions).
- `import torch.optim as optim`: Optimizers like SGD/Adam for updating weights.
- `import torch.nn.functional as F`: Functional API (we’ll use `torch.relu` directly here, F also has activations).
- `from torch.utils.data import TensorDataset, DataLoader, Dataset`:
  - `TensorDataset`: Wraps feature and label tensors into indexable pairs.
  - `DataLoader`: Batches and shuffles data, and iterates efficiently.
  - `Dataset`: Base class to build custom datasets (e.g., `MyDataset`) when you need per-item logic, transforms, or reading from disk.
- `from sklearn.datasets import make_classification`: Generates a synthetic classification dataset.
- `from sklearn.model_selection import train_test_split`: Splits data into train and test sets.
- `from sklearn.preprocessing import StandardScaler`: Standardizes features to zero mean and unit variance.

In [None]:
X, y = make_classification(
    n_samples=1000, n_features=10, n_informative=5, 
    n_redundant=2, n_classes=2, random_state=42
)

## Create a synthetic dataset

- `make_classification(...)` creates `X` (features) and `y` (labels) for a binary classification problem.
- `n_samples=1000`: number of rows.
- `n_features=10`: total input features.
- `n_informative=5`: features that actually matter for the classes.
- `n_redundant=2`: linear combinations of informative features.
- `n_classes=2`: binary classification.
- `random_state=42`: makes the random generation reproducible.

In [15]:
import pandas as pd

dataframe=pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
dataframe['target'] = y
dataframe.head()

Unnamed: 0,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,feature_9,target
0,1.1251,1.178124,0.493516,0.79088,-0.614278,1.34702,1.419515,1.357325,0.966041,-1.981139,1
1,-0.564641,3.638629,-1.522415,-1.541705,1.616697,4.78131,3.190292,-0.890254,1.438826,-3.828748,0
2,0.516313,2.165426,-0.628486,-0.386923,0.492518,1.442381,1.332905,-1.958175,-0.348803,-1.804124,0
3,0.537282,0.966618,-0.11542,0.670755,-0.958516,0.87144,0.508186,-1.034471,-1.654176,-1.910503,1
4,0.278385,1.065828,-1.724917,-2.235667,0.715107,0.731249,-0.674119,0.59833,-0.524283,1.04761,0


In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Split into train and test

- `train_test_split(...)` splits arrays into training and testing subsets.
- `test_size=0.2`: 20% of data goes to testing.
- `random_state=42`: reproducible split.

In [5]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Standardize features

- `StandardScaler` shifts each feature to mean 0 and scales to unit variance.
- Fit on `X_train` only, then apply the learned scaling to both train and test.
- This prevents train/test leakage.

In [6]:
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

## Convert NumPy arrays to PyTorch tensors

- `torch.FloatTensor(X_train)`: features as 32-bit floats (required for `nn.Linear`).
- `torch.LongTensor(y_train)`: labels as integer class indices (required by `CrossEntropyLoss`).
- Do the same for test data.

Note: `CrossEntropyLoss` expects raw, unnormalized scores (logits) as `FloatTensor` and target labels as `LongTensor` with values in `[0, num_classes-1]`.

In [7]:
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32)


In [14]:
class MyDataset(Dataset):
    def __init__(self, X, y):
        """
        Initialize dataset with features and labels.
        Converts numpy arrays to PyTorch tensors.
        """
        self.X = torch.FloatTensor(X)
        self.y = torch.LongTensor(y)

    def __len__(self):
        """Return the total number of samples."""
        return len(self.X)

    def __getitem__(self, idx):
        """
        Retrieve a single sample at index `idx`.
        Must return (feature, label).
        """
        return self.X[idx], self.y[idx]


## Optional: Custom Dataset class (`MyDataset`)

This custom dataset mirrors what `TensorDataset` does for in-memory tensors, but gives you flexibility to add preprocessing, on-the-fly transforms, or load from disk.

- `__init__(self, X, y)`: stores features and labels as tensors (`FloatTensor` for features, `LongTensor` for integer class labels).
- `__len__(self)`: returns the total number of samples so the `DataLoader` knows how many batches to create.
- `__getitem__(self, idx)`: returns a single `(feature, label)` pair at index `idx`. You could augment data, normalize per item, or apply custom logic here.

How to use it instead of `TensorDataset`:

- Replace:
  - `train_dataset = TensorDataset(X_train_tensor, y_train_tensor)`
  - `test_dataset = TensorDataset(X_test_tensor, y_test_tensor)`
- With:
  - `train_dataset = MyDataset(X_train, y_train)`
  - `test_dataset = MyDataset(X_test, y_test)`

Note: In this version `MyDataset` converts NumPy arrays to tensors inside `__init__`. If you already have tensors (like `X_train_tensor`), you could modify `MyDataset` to accept tensors directly to avoid double conversion.

## Create datasets and data loaders

- Option A (used in this notebook):
  - `TensorDataset(features, labels)`: pairs up tensors so each index returns `(X[i], y[i])`.
  - `DataLoader(..., batch_size=32, shuffle=True)`: mini-batch iterator; shuffles training data each epoch to improve generalization.
- Option B (optional custom dataset):
  - Use `MyDataset(X, y)` when you need to apply transforms or custom logic in `__getitem__`.
  - `DataLoader(MyDataset(...), batch_size=32, shuffle=True)` works the same.

The test loader typically does not need shuffling.

In [8]:
class NeuralNet(nn.Module):
    def __init__(self, input_dim):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_dim, 32)
        self.fc2 = nn.Linear(32, 16)
        self.fc3 = nn.Linear(16, 2)  # 2 classes
        
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

model = NeuralNet(input_dim=10)


## Define the neural network

- Subclass `nn.Module` to define layers and the forward pass.
- `fc1`: Linear layer mapping input_dim → 32 hidden units.
- `fc2`: Linear layer mapping 32 → 16.
- `fc3`: Linear layer mapping 16 → 2 output logits (for 2 classes).
- In `forward`:
  - Apply `ReLU` after `fc1` and `fc2` for non-linearity.
  - `fc3` returns raw scores (logits). Don’t apply `softmax` here because `CrossEntropyLoss` handles it internally.
- Instantiate the model with `input_dim=10` to match `X`.

In [9]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


## Loss function and optimizer

- `nn.CrossEntropyLoss()`:
  - Expects logits of shape `(batch_size, num_classes)` and targets of shape `(batch_size,)` with class indices.
  - Internally applies `log_softmax` + `nll_loss`.
- `optim.Adam(model.parameters(), lr=0.001)`: Adam optimizer with a small learning rate for stable training.

In [10]:
epochs = 20
for epoch in range(epochs):
    model.train()
    total_loss = 0
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {total_loss/len(train_loader):.4f}")


Epoch [1/20], Loss: 0.6911
Epoch [2/20], Loss: 0.6419
Epoch [3/20], Loss: 0.5868
Epoch [4/20], Loss: 0.5190
Epoch [5/20], Loss: 0.4507
Epoch [6/20], Loss: 0.3891
Epoch [7/20], Loss: 0.3462
Epoch [8/20], Loss: 0.3207
Epoch [9/20], Loss: 0.3006
Epoch [10/20], Loss: 0.2865
Epoch [11/20], Loss: 0.2712
Epoch [12/20], Loss: 0.2595
Epoch [13/20], Loss: 0.2481
Epoch [14/20], Loss: 0.2376
Epoch [15/20], Loss: 0.2283
Epoch [16/20], Loss: 0.2192
Epoch [17/20], Loss: 0.2122
Epoch [18/20], Loss: 0.2050
Epoch [19/20], Loss: 0.1998
Epoch [20/20], Loss: 0.1932


## Training loop

- `epochs = 20`: number of full passes through the training data.
- `model.train()`: sets the model to training mode (affects layers like dropout/batchnorm; none here but good practice).
- For each batch:
  - `optimizer.zero_grad()`: clears old gradients.
  - `outputs = model(X_batch)`: forward pass to get logits.
  - `loss = criterion(outputs, y_batch)`: compute loss.
  - `loss.backward()`: backpropagates gradients.
  - `optimizer.step()`: updates weights.
  - Track and print average loss per epoch.

In [11]:
model.eval()
correct = 0
total = 0

with torch.no_grad():
    for X_batch, y_batch in test_loader:
        outputs = model(X_batch)
        _, predicted = torch.max(outputs, 1)
        total += y_batch.size(0)
        correct += (predicted == y_batch).sum().item()

print(f"Test Accuracy: {100 * correct / total:.2f}%")


Test Accuracy: 94.00%


## Evaluation on the test set

- `model.eval()`: evaluation mode (disables dropout/batchnorm behaviors if present).
- `torch.no_grad()`: turns off gradient tracking for faster inference and lower memory use.
- For each batch in `test_loader`:
  - `outputs = model(X_batch)`: logits.
  - `_, predicted = torch.max(outputs, 1)`: class with highest logit per sample.
  - Accumulate `correct` and `total` to compute accuracy.
- Print final test accuracy.

## Wrap-up and next steps

You trained a small feed-forward neural network on a synthetic binary classification dataset. Key takeaways:

- Standardize features for stable training.
- Use `CrossEntropyLoss` for multi-class logits with integer labels.
- Keep the model simple first; verify it learns, then iterate.

Try these extensions:

- Add `nn.Dropout` or `nn.BatchNorm1d` between layers.
- Tune hyperparameters: `hidden sizes`, `epochs`, `learning rate`, `batch size`.
- Add an `accuracy` calculation during training.
- Plot the training loss curve and confusion matrix for the test set.
- Switch to a real dataset (e.g., from your `Dataset/` folder) and adjust `input_dim` accordingly.