<a href="https://colab.research.google.com/github/reitezuz/18NES2-2025/blob/main/week_01/NN_libraries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to the deep-learning libraries in Python on a simple example (Breast Cancer Dataset)

### Deep Learning Libraries:

- **TensorFlow, PyTorch**: Low-level deep learning frameworks for defining and optimizing neural networks.
- **Keras**: High-level API that simplifies deep learning model creation, primarily built on TensorFlow (but can also work with other backends).
- **Lightning**: High-level API built on PyTorch, providing structured training loops and scalability.

### Other Useful Libraries:

- **Scikit-learn (sklearn)**: Utilities for dataset handling, preprocessing, and evaluation.
- **NumPy**: Efficient numerical computing, handling arrays and tensor-like structures.
- **Pandas**: Ideal for working with structured datasets, especially those containing categorical variables or missing values. Provides tools for data manipulation, filtering, and aggregation.


### Visualization Libraries:

- **Matplotlib:** The most widely used Python library for static visualizations (line plots, histograms, scatter plots, etc.).
- **Seaborn:** Built on Matplotlib, provides beautiful and easy-to-use statistical visualizations.
- **Plotly:** Interactive visualizations, useful for dashboards and deep exploration of data.
- **TensorBoard:** Built-in visualization tool for TensorFlow, used for monitoring model training and performance metrics.

### Machine Learning Workflow
1. Process the data
  - 1. Load, observe and analyze the data
  - 2. Clean and preprocess the data
2. Define the model
  - 1. Define the model architecture (type of the model, number of layers, activation functions)
  - 2. Set model hyperparameters (learning rate, batch size, optimizer, loss function,...)
3. Train the model
4. Evaluate the model and make predictions
   - 1. Assess model performance on training, validation and test data
   - 2. Use the trained model to predict on new unseen data

## 1. Process the data
1. Load and analyze the data
 - load the Breat Cancer Dataset (https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic, https://scikit-learn.org/stable/api/sklearn.datasets.html, https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_breast_cancer.html)
    - a nice dataset of 569 samples and 30 input features (real, positive) and one output featue



In [1]:
from sklearn.datasets import load_breast_cancer
import numpy as np

# Load and observe the data:
cancer = load_breast_cancer()  # dict
print(cancer.keys())

# Observe shape:
print("Input data shape:", cancer['data'].shape)
print("Target data shape:", cancer['target'].shape, "\n")

# Analyze the label distribution:
print("Target names:", cancer['target_names'])
print("Target distribution:", np.bincount(cancer['target']), "\n")

# Observe input data:
print("Feature names:", cancer['feature_names'] )
print("Minimum, mean and maximum input data values: ", np.min(cancer['data']), np.mean(cancer['data']), np.max(cancer['data']))
#print("Feature means:", np.mean(cancer['data'], axis=0))
#print("Feature description:", cancer['DESCR'])  # description of the data
print()

# Check for missing values
print("Number of missing values in the data:", np.sum(np.isnan(cancer['data'])))
print("Number of missing values in the labels:", np.sum(np.isnan(cancer['target'])))
print()

import pandas as pd
df = pd.DataFrame(cancer['data'], columns=cancer['feature_names']) # Create a Pandas DataFrame for easier analysis
#print(df.describe())                                               # Calculate statistics for each column

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])
Input data shape: (569, 30)
Target data shape: (569,) 

Target names: ['malignant' 'benign']
Target distribution: [212 357] 

Feature names: ['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
Minimum, mean and maximum input data values:  0.0 61.890712339519624 4254.0

Number of missing values in the data: 0
Number of missing values in the labels: 0



2. Preprocess  and clean the data

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler

X, y = cancer.data, cancer.target

# Preprocess the data:
# 1. Reshape (vectorize) the input data to have the shape (number of samlples, number of features) ... Already in correct format
# 2. Convert the data into floating-point numbers ... Already in correct format
# 3. Resolve missing values, incorrect values... not needed (data is clean)
# 4. One-hot encode the labels (for multi-class problems)... not needed (binary classification)
# 5. Data augmentation,...

# 5. Split the dataset - into training, validation and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# 6. Normalize the features to improve training stability
# - StandardScaler: Centers data around zero and scales to unit variance
# - MinMaxScaler: Scales features to a given range (default [0,1], here [-1,1])
scaler = StandardScaler()
#scaler = MinMaxScaler(feature_range=(-1, 1)) # alternative option
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_val = scaler.transform(X_val)


## Define and train the model in Keras
- MLP model for binary classification:
  - activations:
    - 'sigmoid' activation function in the output layer
    - 'relu' or 'tanh' in the hidden layers
 - loss function: 'binary_crossentropy'
 - metrics: 'accuracy', 'Precision', 'Recall', 'F1-score'...
 - optimizer: 'adam', 'sgd',...
 - batch size: 16, 32, 64, 128, 256
 - epochs: experimentally determined


In [3]:
import numpy as np
import keras
from keras import layers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

##########################################################################
# 1. Load, observe and analyze the data:
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

##########################################################################
# 2. Preprocess and clean the data:
# Split the dataset into training, validation and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Standardize the features to improve training stability
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_val = scaler.transform(X_val)

##########################################################################
# 3. Define the MLP model using Keras
# 3a.  Define the model architecture (type of the model, number of layers, activation functions)
model = keras.Sequential([
    layers.InputLayer(shape=(X_train.shape[1],)),  # Input Layer
    layers.Dense(30, activation='relu'),  # First hidden layer
    layers.Dense(15, activation='relu'),  # Second hidden layer
    layers.Dense(1, activation='sigmoid')  # Output layer for binary classification
])

#3b. Set model hyperparameters (Appropriate loss function, optimizer and its parameters, evaluation metrics)
#model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])
model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])
##########################################################################
# 4. Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_data=(X_val, y_val))

##########################################################################
# 5. Evaluate the model and make predictions
# 5a. Evaluate the model on the train, validation and test sets
test_loss, test_acc = model.evaluate(X_test, y_test)
train_loss, train_acc = model.evaluate(X_train, y_train)
val_loss, val_acc = model.evaluate(X_val, y_val)
print(f"Train Accuracy: {train_acc:.4f} | Train Loss: {train_loss:.4f}")
print(f"Validation Accuracy: {val_acc:.4f} | Validation Loss: {val_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f} | Test Loss: {test_loss:.4f}")

# 5b. Make predictions
predictions = (model.predict(X_test) > 0.5).astype(int)
print(predictions[:10].T)

Epoch 1/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 139ms/step - accuracy: 0.5219 - loss: 0.6926 - val_accuracy: 0.8242 - val_loss: 0.5547
Epoch 2/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 14ms/step - accuracy: 0.8422 - loss: 0.5162 - val_accuracy: 0.8901 - val_loss: 0.4507
Epoch 3/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.9259 - loss: 0.4004 - val_accuracy: 0.9451 - val_loss: 0.3829
Epoch 4/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.9322 - loss: 0.3387 - val_accuracy: 0.9121 - val_loss: 0.3343
Epoch 5/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 11ms/step - accuracy: 0.9470 - loss: 0.2938 - val_accuracy: 0.9121 - val_loss: 0.2989
Epoch 6/50
[1m23/23[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - accuracy: 0.9624 - loss: 0.2647 - val_accuracy: 0.9121 - val_loss: 0.2728
Epoch 7/50
[1m23/23[0m [32m━━━

### Keras backend
- by default Tensorflow, but we can change it:

In [4]:
from keras import backend

print(f"Current backend: {backend.backend()}")

Current backend: tensorflow


In [5]:
# restart kernel before running this code!
import os
os.environ["KERAS_BACKEND"] = "torch"  # "tensorflow" "jax", "torch"
from keras import backend

print(f"Current backend: {backend.backend()}")


Current backend: tensorflow


## Define and train the same model in Tensorflow

In [6]:
##########################################################################
# 1. Load, observe and analyze the data:
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target  # Features and labels

##########################################################################
# 2. Preprocess and clean the data:
# Split the dataset into training, validation and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Standardize the features to improve training stability
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train).astype(np.float32)
X_test = scaler.transform(X_test).astype(np.float32)
X_val = scaler.transform(X_val).astype(np.float32)

##########################################################################
# 3. Define the MLP model using pure Tensorflow
import tensorflow as tf
class MLPModel(tf.Module):
    def __init__(self): # Initialize weights and biases for each layer
        super().__init__()
        self.w1 = tf.Variable(tf.random.normal([X_train.shape[1], 30]), dtype=tf.float32)
        self.b1 = tf.Variable(tf.zeros([30]), dtype=tf.float32)
        self.w2 = tf.Variable(tf.random.normal([30, 15]), dtype=tf.float32)
        self.b2 = tf.Variable(tf.zeros([15]), dtype=tf.float32)
        self.w3 = tf.Variable(tf.random.normal([15, 1]), dtype=tf.float32)
        self.b3 = tf.Variable(tf.zeros([1]), dtype=tf.float32)

    def __call__(self, x): # Forward pass through the network
        x = tf.nn.relu(tf.matmul(x, self.w1) + self.b1)
        x = tf.nn.relu(tf.matmul(x, self.w2) + self.b2)
        x = tf.nn.sigmoid(tf.matmul(x, self.w3) + self.b3)
        return x

# Initialize the model
model = MLPModel()

# Define the loss function and optimizer
loss_fn = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

##########################################################################
# 4. Train the model
epochs = 50
batch_size = 16
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).batch(batch_size)
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, y_val)).batch(batch_size)

def train_step(model, x, y):
    with tf.GradientTape() as tape:
        predictions = model(x)                    # forward pass
        loss = loss_fn(y[:, None], predictions)   # compute the loss
    # Compute gradients of the loss with respect to model parameters ... backward pass
    gradients = tape.gradient(loss, [model.w1, model.b1, model.w2, model.b2, model.w3, model.b3])
    # Apply the computed gradients to update model parameters using the optimizer
    optimizer.apply_gradients(zip(gradients, [model.w1, model.b1, model.w2, model.b2, model.w3, model.b3]))
    return loss

# Train the model for the specified number of epochs
for epoch in range(epochs):
    for x_batch, y_batch in train_dataset:
        train_loss = train_step(model, x_batch, y_batch)
    print(f"Epoch {epoch+1}, Train Loss: {train_loss.numpy():.10f}")

##########################################################################
# 5. Evaluate the model and make predictions
# 5a. Evaluate the model on the train, validation and test sets
def compute_loss_and_accuracy(model, X, y):
    predictions = model(X)
    loss = loss_fn(y[:, None], predictions).numpy()
    accuracy = np.mean((predictions.numpy().flatten() > 0.5).astype(int) == y)
    return loss, accuracy

train_loss, train_acc = compute_loss_and_accuracy(model, X_train, y_train)
val_loss, val_acc = compute_loss_and_accuracy(model, X_val, y_val)
test_loss, test_acc = compute_loss_and_accuracy(model, X_test, y_test)

print(f"Train Accuracy: {train_acc:.4f} | Train Loss: {train_loss:.4f}")
print(f"Validation Accuracy: {val_acc:.4f} | Validation Loss: {val_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f} | Test Loss: {test_loss:.4f}")

# 5b. Make predictions
predictions = (model(X_test).numpy().flatten() > 0.5).astype(int)
print("Sample Predictions:", predictions[:10])

Epoch 1, Train Loss: 2.3737542629
Epoch 2, Train Loss: 1.6443748474
Epoch 3, Train Loss: 1.4101877213
Epoch 4, Train Loss: 1.3540862799
Epoch 5, Train Loss: 1.3431926966
Epoch 6, Train Loss: 1.3382281065
Epoch 7, Train Loss: 1.3356841803
Epoch 8, Train Loss: 1.3337393999
Epoch 9, Train Loss: 1.3323987722
Epoch 10, Train Loss: 1.3317096233
Epoch 11, Train Loss: 1.3312586546
Epoch 12, Train Loss: 1.3309545517
Epoch 13, Train Loss: 1.3308404684
Epoch 14, Train Loss: 1.3307992220
Epoch 15, Train Loss: 1.3307912350
Epoch 16, Train Loss: 1.3308336735
Epoch 17, Train Loss: 1.3307907581
Epoch 18, Train Loss: 1.3305640221
Epoch 19, Train Loss: 1.3303245306
Epoch 20, Train Loss: 1.3301476240
Epoch 21, Train Loss: 1.3299490213
Epoch 22, Train Loss: 1.3297969103
Epoch 23, Train Loss: 1.3296734095
Epoch 24, Train Loss: 1.3295692205
Epoch 25, Train Loss: 1.3294806480
Epoch 26, Train Loss: 1.3294019699
Epoch 27, Train Loss: 1.3293309212
Epoch 28, Train Loss: 1.3292657137
Epoch 29, Train Loss: 1.32920

### GradientTape in Tensorflow
- tensor in Tensorflow (tf.Tensor) is a multi-dimensional array similar to a NumPy array but optimized for GPU/TPU computation
- A trainable tensor (tf.Variable) is an enhanced version of a tensor that allows modification and gradient tracking during training
- tf.GradientTape() allows automatic symbolic differentiation:
    - It builds a computational graph, recording elementary operations
    - During the backward pass, it traverses the graph in reverse and computes exact symbolic derivatives using the chain rule.

In [7]:
import tensorflow as tf

# Define a simple function: f(x) = x^2 + 3x + 5
def f(x):
    return x**2 + 3*x + 5

# Define input tensor
x = tf.Variable(2.0)  # Start with x = 2.0
#x = tf.Variable(np.array([-1, 0, 1, 2], dtype=np.float32))
#x = tf.Variable(np.array([[2, 1], [5, 0]], dtype=np.float32))

# Compute the derivative using GradientTape
with tf.GradientTape() as tape:
    y = f(x)  # Compute function value

# Compute df/dx ... backward pass
dy_dx = tape.gradient(y, x)

# Print the result
print(f"Function value at x=2: f(2) = {y.numpy()}")
print(f"Derivative at x=2: f'(2) = {dy_dx.numpy()}")  # Should be 2(2) + 3 = 7

Function value at x=2: f(2) = 15.0
Derivative at x=2: f'(2) = 7.0


## Define and train the model in PyTorch:

In [8]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
import numpy as np

##########################################################################
# 1. Load, observe and analyze the data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

##########################################################################
# 2. Preprocess and clean the data:
# Split dataset into train, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train).astype(np.float32)
X_test = scaler.transform(X_test).astype(np.float32)
X_val = scaler.transform(X_val).astype(np.float32)

# Convert to PyTorch tensors
X_train, X_test, X_val = map(torch.tensor, (X_train, X_test, X_val))
y_train, y_test, y_val = map(lambda y: torch.tensor(y, dtype=torch.float32).view(-1, 1), (y_train, y_test, y_val))

##########################################################################
# 3. Define the MLP model using pure PyTorch
class MLP(nn.Module):
    def __init__(self, input_dim): # Initialize weights and biases for each layer
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, 30)
        self.fc2 = nn.Linear(30, 15)
        self.fc3 = nn.Linear(15, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x): # Forward pass through the netowrk
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

# Initialize model
model = MLP(input_dim=X_train.shape[1])

# Define loss function and optimizer
loss_fn = nn.BCELoss()  # Binary Cross Entropy Loss
optimizer = optim.SGD(model.parameters(), lr=0.01)

##########################################################################
# 4. Train the model
epochs = 50
batch_size = 16
train_dataset = torch.utils.data.TensorDataset(X_train, y_train)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

val_dataset = torch.utils.data.TensorDataset(X_val, y_val)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size)

def train(model, optimizer, loss_fn, train_loader, val_loader, epochs):
    for epoch in range(epochs):
        model.train()
        train_loss = 0
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            y_pred = model(X_batch)          # forward pass
            loss = loss_fn(y_pred, y_batch)  # compute the loss
            loss.backward()                  # backward pass: Compute gradients of the loss with respect to model parameters
            optimizer.step()                 # Apply the computed gradients to update model parameters using the optimizer
            train_loss += loss.item()

        # Validation step
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                y_pred = model(X_batch)
                val_loss += loss_fn(y_pred, y_batch).item()

        print(f"Epoch {epoch+1}, Train Loss: {train_loss/len(train_loader):.4f}, Validation Loss: {val_loss/len(val_loader):.4f}")

# Train the model for the specified number of epochs
train(model, optimizer, loss_fn, train_loader, val_loader, epochs)

##########################################################################
# 5. Evaluate the model and make predictions
# 5a. Evaluate the model on the train, validation and test sets
def evaluate(model, X, y):
    model.eval()
    with torch.no_grad():
        y_pred = model(X)
        loss = loss_fn(y_pred, y).item()
        accuracy = ((y_pred > 0.5).float() == y).float().mean().item()
    return loss, accuracy

train_loss, train_acc = evaluate(model, X_train, y_train)
val_loss, val_acc = evaluate(model, X_val, y_val)
test_loss, test_acc = evaluate(model, X_test, y_test)

print(f"Train Accuracy: {train_acc:.4f} | Train Loss: {train_loss:.4f}")
print(f"Validation Accuracy: {val_acc:.4f} | Validation Loss: {val_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f} | Test Loss: {test_loss:.4f}")

# 5b. Make predictions
y_pred = (model(X_test) > 0.5).float()
print("Sample Predictions:", y_pred[:10].T.numpy())


Epoch 1, Train Loss: 0.6986, Validation Loss: 0.6926
Epoch 2, Train Loss: 0.6852, Validation Loss: 0.6809
Epoch 3, Train Loss: 0.6724, Validation Loss: 0.6693
Epoch 4, Train Loss: 0.6589, Validation Loss: 0.6571
Epoch 5, Train Loss: 0.6448, Validation Loss: 0.6440
Epoch 6, Train Loss: 0.6295, Validation Loss: 0.6295
Epoch 7, Train Loss: 0.6126, Validation Loss: 0.6132
Epoch 8, Train Loss: 0.5933, Validation Loss: 0.5947
Epoch 9, Train Loss: 0.5711, Validation Loss: 0.5736
Epoch 10, Train Loss: 0.5465, Validation Loss: 0.5501
Epoch 11, Train Loss: 0.5190, Validation Loss: 0.5242
Epoch 12, Train Loss: 0.4909, Validation Loss: 0.4967
Epoch 13, Train Loss: 0.4600, Validation Loss: 0.4680
Epoch 14, Train Loss: 0.4281, Validation Loss: 0.4385
Epoch 15, Train Loss: 0.3955, Validation Loss: 0.4092
Epoch 16, Train Loss: 0.3660, Validation Loss: 0.3809
Epoch 17, Train Loss: 0.3350, Validation Loss: 0.3539
Epoch 18, Train Loss: 0.3076, Validation Loss: 0.3289
Epoch 19, Train Loss: 0.2821, Validat

### **Autograd in PyTorch**
- A **tensor (`torch.Tensor`)** in PyTorch is a multi-dimensional array similar to a NumPy array but optimized for GPU/TPU computation.
- A **trainable tensor (`torch.nn.Parameter`)** is an enhanced version of a tensor that allows modifications and gradient tracking during training.
- **`torch.autograd` enables automatic differentiation**:
    - It dynamically builds a computational graph, recording operations on tensors with `requires_grad=True`.
    - During the **backward pass**, it traverses the graph in reverse and computes **exact symbolic derivatives** using the **chain rule**.
- **Differences from TensorFlow**:
    - PyTorch builds the computational graph dynamically (eager execution), while TensorFlow supports both eager and static graph modes.
    - In PyTorch, gradients accumulate by default and must be manually reset using `optimizer.zero_grad()`, whereas in TensorFlow, `tf.GradientTape()` clears gradients automatically unless `persistent=True` is set.


In [9]:
import torch

# Define a simple function: f(x) = x^2 + 3x + 5
def f(x):
    return x**2 + 3*x + 5

# Define input tensor – uncomment one of the following:
x = torch.tensor(2.0, requires_grad=True)  # Scalar
#x = torch.tensor([-1, 0, 1, 2], dtype=torch.float32, requires_grad=True)  # Vector
#x = torch.tensor([[2, 1], [5, 0]], dtype=torch.float32, requires_grad=True)  # Matrix

# Compute function value
y = f(x)

# Compute df/dx ... backward pass
# For scalar output: use backward() directly
# For non-scalar output: provide gradient of same shape
if y.numel() == 1:
    y.backward()
else:
    y.backward(torch.ones_like(y))

# Print results
print(f"Function value f(x):\n{y.detach().numpy()}")
print(f"Derivative f'(x):\n{x.grad.numpy()}")

Function value f(x):
15.0
Derivative f'(x):
7.0


## Define and train the model in PyTorch Lightning

In [10]:
!pip install pytorch-lightning

Collecting pytorch-lightning
  Downloading pytorch_lightning-2.5.5-py3-none-any.whl.metadata (20 kB)
Collecting torchmetrics>0.7.0 (from pytorch-lightning)
  Downloading torchmetrics-1.8.2-py3-none-any.whl.metadata (22 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch-lightning)
  Downloading lightning_utilities-0.15.2-py3-none-any.whl.metadata (5.7 kB)
Downloading pytorch_lightning-2.5.5-py3-none-any.whl (832 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m832.4/832.4 kB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lightning_utilities-0.15.2-py3-none-any.whl (29 kB)
Downloading torchmetrics-1.8.2-py3-none-any.whl (983 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m983.2/983.2 kB[0m [31m61.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: lightning-utilities, torchmetrics, pytorch-lightning
Successfully installed lightning-utilities-0.15.2 pytorch-lightning-2.5.5 torchmetrics-1.8.2


In [11]:
import torch
import torch.nn as nn
import torch.optim as optim
import pytorch_lightning as pl
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
import numpy as np

##########################################################################
# 1. Load, observe and analyze the data
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target


##########################################################################
# 2. Preprocess and clean the data:
# Split the dataset into training, validation and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train).astype(np.float32)
X_test = scaler.transform(X_test).astype(np.float32)
X_val = scaler.transform(X_val).astype(np.float32)

# Convert to PyTorch tensors
X_train, X_test, X_val = map(torch.tensor, (X_train, X_test, X_val))
y_train, y_test, y_val = map(lambda y: torch.tensor(y, dtype=torch.float32).view(-1, 1), (y_train, y_test, y_val))

# Create DataLoaders
batch_size = 16
train_loader = DataLoader(TensorDataset(X_train, y_train), batch_size=batch_size, shuffle=True)
val_loader = DataLoader(TensorDataset(X_val, y_val), batch_size=batch_size)
test_loader = DataLoader(TensorDataset(X_test, y_test), batch_size=batch_size)

##########################################################################
# 3. Define the MLP model using PyTorch Lightning
class MLP(pl.LightningModule):
    def __init__(self, input_dim):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, 30)
        self.fc2 = nn.Linear(30, 15)
        self.fc3 = nn.Linear(15, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.loss_fn = nn.BCELoss()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

    def training_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)                                 # forward pass
        loss = self.loss_fn(y_pred, y)                   # compute the loss
        self.log("train_loss", loss, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_pred = self(x)
        loss = self.loss_fn(y_pred, y)
        self.log("val_loss", loss, prog_bar=True)

    def configure_optimizers(self):
        return optim.SGD(self.parameters(), lr=0.01)

# Initialize model
model = MLP(input_dim=X_train.shape[1])

# 4. Train the model using PyTorch Lightning Trainer
epochs = 50
trainer = pl.Trainer(max_epochs=epochs, log_every_n_steps=10)
trainer.fit(model, train_loader, val_loader)

# 5. Evaluate the model
def evaluate(model, dataloader):
    model.eval()
    total_loss = 0
    correct = 0
    with torch.no_grad():
        for x, y in dataloader:
            y_pred = model(x)
            loss = model.loss_fn(y_pred, y).item()
            correct += ((y_pred > 0.5).float() == y).sum().item()
            total_loss += loss
    accuracy = correct / len(dataloader.dataset)
    return total_loss / len(dataloader), accuracy

train_loss, train_acc = evaluate(model, train_loader)
val_loss, val_acc = evaluate(model, val_loader)
test_loss, test_acc = evaluate(model, test_loader)

print(f"Train Accuracy: {train_acc:.4f} | Train Loss: {train_loss:.4f}")
print(f"Validation Accuracy: {val_acc:.4f} | Validation Loss: {val_loss:.4f}")
print(f"Test Accuracy: {test_acc:.4f} | Test Loss: {test_loss:.4f}")

# 5b. Make predictions
y_pred = (model(X_test) > 0.5).float()
print("Sample Predictions:", y_pred[:10].T.numpy())

INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:pytorch_lightning.callbacks.model_summary:
  | Name    | Type    | Params | Mode 
--------------------------------------------
0 | fc1     | Linear  | 930    | train
1 | fc2     | Linear  | 465    | train
2 | fc3     | Linear  | 16     | train
3 | relu    | ReLU    | 0      | train
4 | sigmoid | Sigmoid | 0      | train
5 | loss_fn | BCELoss | 0      | train
--------------------------------------------
1.4 K     Tra

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=50` reached.


Train Accuracy: 0.9890 | Train Loss: 0.0663
Validation Accuracy: 0.9670 | Validation Loss: 0.1275
Test Accuracy: 0.9649 | Test Loss: 0.1033
Sample Predictions: [[1. 0. 0. 1. 1. 0. 0. 0. 0. 1.]]


# **Do the environments use GPU implicitly?**
- **TensorFlow/Keras** - Yes, if a GPU is available, it is used automatically.  
- **PyTorch** - No, it defaults to CPU unless explicitly moved to GPU (`model.to("cuda")`).  
- **PyTorch Lightning** - Yes, but only if `accelerator="gpu"` is set in the `Trainer`. Otherwise, it defaults to CPU.  

In [12]:
# Tensorflow:
import tensorflow as tf
print("Is TensorFlow using GPU?", tf.config.list_physical_devices('GPU'))

#import os
#os.environ["CUDA_VISIBLE_DEVICES"] = "-1"  # TO CPU

Is TensorFlow using GPU? [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [13]:
# Pytorch:
import torch
print("Is CUDA available?", torch.cuda.is_available())

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
X_train, y_train = X_train.to(device), y_train.to(device)

Is CUDA available? True


In [14]:
# Lightning:
trainer = pl.Trainer(accelerator="gpu", devices=1)  # TO GPU
#trainer = pl.Trainer(accelerator="cpu")

INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs


### **Using GPU in Google Colab**
By default, Google Colab runs on a CPU. To enable GPU acceleration:

1. **Go to:** `Runtime` → `Change runtime type`
2. **Select:** `Hardware accelerator` → `GPU`
3. **Click:** `Save`

To verify that a GPU is available in PyTorch or TensorFlow, run:


In [15]:
import torch
print("Is GPU available?", torch.cuda.is_available())
print("GPU name:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU found")

Is GPU available? True
GPU name: Tesla T4


In [16]:
import tensorflow as tf

if tf.config.list_physical_devices('GPU'):
  print("GPU is available and being used by TensorFlow")
else:
  print("GPU is not available or not being used by TensorFlow")

GPU is available and being used by TensorFlow


# Current versions of packages:

In [17]:
import tensorflow as tf
import keras
import torch

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"PyTorch version: {torch.__version__}")

TensorFlow version: 2.19.0
Keras version: 3.10.0
PyTorch version: 2.8.0+cu126
