# Learning paradigms with PyTorch

This notebook explores various learning paradigms in deep learning, implemented using PyTorch. Deep learning has evolved to include diverse techniques that extend beyond traditional supervised learning. These paradigms enable models to perform better on complex tasks, adapt to new tasks with limited data, and leverage shared knowledge across multiple tasks.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
import numpy as np

## Transfer learning
Transfer learning is a machine learning technique where a model that has already been trained on a large dataset is reused or fine-tuned on a new, often smaller dataset. Instead of starting from scratch, transfer learning allows us to leverage the knowledge captured in a pre-trained model to improve the performance and efficiency of a new model. This approach is particularly valuable because training deep neural networks from scratch typically requires vast amounts of data and computational resources. Transfer learning allows us to start with a pre-trained model, reducing the time and data needed. Key concepts:
- **Pre-trained model**: A neural network model that has already been trained on a large dataset.
- **Fine-tuning**: Adjusting the weights of the pre-trained model to adapt it to a new dataset or task.

We will start by training a model from scratch, saving its weights, and then using that model in various transfer learning scenarios.

### Pre-trained model
We will define a simple feedforward neural network model and train it on synthetic data. After training, we will save the model's state dictionary (its weights) so that we can use them later in different transfer learning scenarios.

In [2]:
# Generate synthetic data
np.random.seed(42)
X = np.random.rand(1000, 20)
y = np.random.randint(2, size=1000)

# Convert data to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).unsqueeze(1)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model
class FFNNModel(nn.Module):
    def __init__(self, input_size):
        super(FFNNModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x
    
# Instantiate the model, define loss and optimizer
model = FFNNModel(input_size=X_train.shape[1])
criterion = nn.BCELoss()  # Binary Cross-Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
for epoch in range(10):
    model.train()
    optimizer.zero_grad()
    output = model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

# Evaluate the model
model.eval()
with torch.no_grad():
    test_output = model(X_test)
    test_loss = criterion(test_output, y_test)
    print(f"Test Loss: {test_loss.item()}")

# Save the model weights
torch.save(model.state_dict(), 'pretrained_model.pth')

Epoch 1, Loss: 0.692634105682373
Epoch 2, Loss: 0.6919400095939636
Epoch 3, Loss: 0.6915565729141235
Epoch 4, Loss: 0.691327691078186
Epoch 5, Loss: 0.6911556124687195
Epoch 6, Loss: 0.6909562945365906
Epoch 7, Loss: 0.6907436847686768
Epoch 8, Loss: 0.6905004978179932
Epoch 9, Loss: 0.6902583241462708
Epoch 10, Loss: 0.6900127530097961
Test Loss: 0.6926771402359009


**Explanation**

In this section, we created a simple FFNN and trained it on synthetic binary classification data. Here is a breakdown of the steps:

- **Step 1**: Load/generate data for a related task - We generate synthetic data with 20 features and a binary target variable. The data is then split into training and testing sets.
- **Step 2**: Model definition - We define a model class `FFNNModel` with three hidden layers and an output layer. The hidden layers use ReLU activation, and the output layer uses the sigmoid activation function for binary classification.
- **Step 3**: Model training - The model is trained for 10 epochs using the Adam optimizer and binary cross-entropy loss function.
- **Step 4**: Saving the model - The trained model's state dictionary is saved to a file, which will be used later for transfer learning.

### Types of transfer learning

Transfer learning can be applied in several ways, depending on how the pre-trained model is used and the nature of the new task. Let's explore different types of transfer learning techniques using the pre-trained model we just saved.

#### Model as a fixed pre-trained model
In this approach, we use the pre-trained model directly without any changes. This is typically done when the new task is very similar to the original task for which the model was trained. The pre-trained model's layers are kept unchanged, and the model is used as-is without further training to make predictions on the new data.

In [3]:
# Generate new synthetic data
X_fixed = np.random.rand(100, 20)
y_fixed = np.random.randint(2, size=100)

# Convert data to PyTorch tensors
X_fixed = torch.tensor(X_fixed, dtype=torch.float32)
y_fixed = torch.tensor(y_fixed, dtype=torch.float32).unsqueeze(1)

# Define a new model with the same architecture as the trained model
class FFNNModelFixed(nn.Module):
    def __init__(self, input_size):
        super(FFNNModelFixed, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x
model_fixed = FFNNModelFixed(input_size=X_fixed.shape[1])

# Load the pre-trained weights
model_fixed.load_state_dict(torch.load('pretrained_model.pth'))

# Evaluate the model on new data
model_fixed.eval()
with torch.no_grad():
    test_output = model_fixed(X_fixed)
    test_loss = criterion(test_output, y_fixed)
    print(f"Fixed pre-trained model test loss: {test_loss.item()}")

# Use the pre-trained model directly for prediction
with torch.no_grad():
    predictions = model_fixed(X_fixed)
    print(f"Predictions from fixed pre-trained model: {predictions[:5].squeeze()}")

Fixed pre-trained model test loss: 0.6959823369979858
Predictions from fixed pre-trained model: tensor([0.5286, 0.5261, 0.5318, 0.5276, 0.5363])


**Explanation**

Here, we use the previously trained model as-is, without any further training:

- **Step 1**: Load the data for a similar task.
- **Step 2**: Model definition: We define a new model with the same architecture as the pre-trained model to ensure compatibility with the saved weights.
- **Step 3**: Load the pre-trained weights using `load_state_dict`.
- **Step 4**: Evaluate the model on the new dataset without further training.
- **Step 5**: Prediction - The pre-trained model is used to make predictions on the new data, demonstrating its ability to generalize to unseen data.

#### Feature extraction transfer learning
In this approach, we use the pre-trained model as a feature extractor. We freeze the lower layers (which capture general features) and add new layers on top to adapt to the new task. The output of the pre-trained model (before the final layer) is fed into a new model designed for the new task, allowing the model to learn task-specific features without retraining the entire network. This method is particularly useful when the new task is related to the original task but requires a different output or representation.

In [4]:
# Generate new synthetic data (related but different task)
np.random.seed(42)
X_feature = np.random.rand(300, 20)
y_feature = np.random.randint(3, size=300)

# Convert data to PyTorch tensors
X_feature = torch.tensor(X_feature, dtype=torch.float32)
y_feature = torch.tensor(y_feature, dtype=torch.long)  # For multiclass classification

# Split the data into training and testing sets
X_train_feature, X_test_feature, y_train_feature, y_test_feature = train_test_split(X_feature, y_feature, test_size=0.2, random_state=42)


# Define a new model with the same architecture as the trained model
class FFNNModelFeature(nn.Module):
    def __init__(self, input_size):
        super(FFNNModelFeature, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x
model_feature_extraction = FFNNModelFeature(input_size=X_feature.shape[1])

# Load the pre-trained weights
model_feature_extraction.load_state_dict(torch.load('pretrained_model.pth'))

# Freeze the layers of the pre-trained model
for param in model_feature_extraction.parameters():
    param.requires_grad = False

# Replace the last layer for the new task
model_feature_extraction.fc4 = nn.Linear(32, 3)  # Assuming 3 classes for the new task

# Define a new loss function and optimizer
criterion_new = nn.CrossEntropyLoss()
optimizer_new = optim.Adam(model_feature_extraction.fc4.parameters(), lr=0.001)  # Only optimize the new layer

# Train the new model
for epoch in range(5):
    model_feature_extraction.train()
    optimizer_new.zero_grad()
    output = model_feature_extraction(X_train_feature)
    loss = criterion_new(output, y_train_feature)
    loss.backward()
    optimizer_new.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

# Evaluate the model
model_feature_extraction.eval()
with torch.no_grad():
    test_output = model_feature_extraction(X_test_feature)
    test_loss = criterion_new(test_output, y_test_feature)
    print(f"Feature extractor model test loss: {test_loss.item()}")

Epoch 1, Loss: 1.0973738431930542
Epoch 2, Loss: 1.0973533391952515
Epoch 3, Loss: 1.0973336696624756
Epoch 4, Loss: 1.0973143577575684
Epoch 5, Loss: 1.0972952842712402
Feature extractor model test loss: 1.1009647846221924


**Explanation**

In this approach, we use the pre-trained model as a feature extractor for a related but different task:

- **Step 1**: Load the data for a related but different task.
- **Step 2**: Load the pre-trained model and freeze its layers to retain the pre-trained features.
- **Step 3**: Replace the last layer(s) of the pre-trained model with a new layer tailored to the new task (e.g., multiclass classification).
- **Step 4**: Train only the new layer(s) on the new dataset.
- **Step 5**: Evaluate the model on the new dataset.

## Fine-tuning transfer learning

Fine-tuning is a more flexible approach to transfer learning, where we start with a pre-trained model but allow some or all layers to be further trained on the new task. This approach allows the model to adapt more closely to the new task while retaining the knowledge learned from the pre-trained model in the original task. Fine-tuning is often used when the new task is sufficiently different from the original task, and the pre-trained model needs to be adjusted to better fit the new data.

In [5]:
# Generate new synthetic data (related task)
np.random.seed(42)
X_finetune = np.random.rand(400, 20)
y_finetune = np.random.randint(3, size=400)

# Convert data to PyTorch tensors
X_finetune = torch.tensor(X_finetune, dtype=torch.float32)
y_finetune = torch.tensor(y_finetune, dtype=torch.long)  # Use long type for classification

# Split the data into training and testing sets
X_train_finetune, X_test_finetune, y_train_finetune, y_test_finetune = train_test_split(X_finetune, y_finetune, test_size=0.2, random_state=42)

# Define a new model with the same architecture as the trained model
class PretrainedModel(nn.Module):
    def __init__(self, input_size):
        super(PretrainedModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x
pretrained_model_finetune = PretrainedModel(input_size=X_finetune.shape[1])

# Load the pre-trained model
pretrained_model_finetune.load_state_dict(torch.load('pretrained_model.pth'))

# Freeze some layers in the pre-trained model (e.g., first two layers)
for param in list(pretrained_model_finetune.parameters())[:2]:  # Assuming the first two layers
    param.requires_grad = False

# Modify the model: Add a new dense layer and the new output layer for the new task
pretrained_model_finetune.fc4 = nn.Linear(32, 16)  # New dense layer with 16 units
pretrained_model_finetune.fc5 = nn.Linear(16, 3)   # New output layer for the 3-class classification

# Define a new loss function and optimizer for fine-tuning
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(filter(lambda p: p.requires_grad, pretrained_model_finetune.parameters()), lr=0.001)

# Fine-tune the model
for epoch in range(5):
    pretrained_model_finetune.train()
    optimizer.zero_grad()
    output = pretrained_model_finetune(X_train_finetune)
    loss = criterion(output, y_train_finetune)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

# Evaluate the fine-tuned model
pretrained_model_finetune.eval()
with torch.no_grad():
    test_output = pretrained_model_finetune(X_test_finetune)
    test_loss = criterion(test_output, y_test_finetune)
    print(f"Fine-tuned model test loss: {test_loss.item()}")
    _, predicted = torch.max(test_output, 1)
    accuracy = (predicted == y_test_finetune).float().mean().item()
    print(f"Fine-tuned model accuracy: {accuracy}")

Epoch 1, Loss: 2.7765142917633057
Epoch 2, Loss: 2.7724404335021973
Epoch 3, Loss: 2.7685468196868896
Epoch 4, Loss: 2.764758586883545
Epoch 5, Loss: 2.7609825134277344
Fine-tuned model test loss: 2.7544312477111816
Fine-tuned model accuracy: 0.3375000059604645


**Explanation**

- **Step 1**: Generate or load data for a related task and split it into training and testing sets.
- **Step 2**: Define and load the pre-trained model.
- **Step 3**: Freeze the first layer to retain the pre-trained features.  
- **Step 4**: Modify the architecture if needed, especially the output layer to match the new task's requirements. We can also add new layers before the final output layer to allow the model to learn more complex features specific to the new task.
- **Step 5**: Fine-tune the model by training only the unfrozen layers on the new dataset.
- **Step 6**: Evaluate the fine-tuned model on the test set.

## Knowledge distillation (Teacher-student model)

Knowledge distillation transfers knowledge from a large, pre-trained model (the teacher) to a smaller and simpler model (the student). The idea is that the student model learns to mimic the teacher's behavior, achieving similar performance with fewer parameters, which makes it more efficient for deployment on devices with limited resources.

### Response-based knowledge distillation

Response-based knowledge distillation focuses on the output predictions (responses) of the teacher model. The student model is trained to mimic the probability distribution (soft labels) produced by the teacher model rather than the hard labels.

In [6]:
# Generate synthetic data
X = np.random.rand(1000, 20)
y = np.random.randint(2, size=1000)

# Convert data to PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).unsqueeze(1)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the Teacher model
class TeacherModel(nn.Module):
    def __init__(self, input_size):
        super(TeacherModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.relu(self.fc3(x))
        x = self.sigmoid(self.fc4(x))
        return x

# Instantiate and train the teacher model
teacher_model = TeacherModel(input_size=X_train.shape[1])
criterion = nn.BCELoss()
optimizer = optim.Adam(teacher_model.parameters(), lr=0.001)

for epoch in range(10):
    teacher_model.train()
    optimizer.zero_grad()
    output = teacher_model(X_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()
    print(f"Teacher Model - Epoch {epoch+1}, BCE Loss: {loss.item()}")

# Generate soft labels (probability distribution) from the Teacher model
teacher_model.eval()
with torch.no_grad():
    teacher_logits = teacher_model(X_train)
    # In our case of binary classification problem we don't need to apply softmax. The output of our teacher model already represents a probability distribution
    #teacher_soft_labels = F.softmax(teacher_logits, dim=1)  # Using temperature scaling
    teacher_soft_labels = teacher_logits
    # Evaluate the teacher model
    teacher_output = teacher_model(X_test)
    teacher_loss = criterion(teacher_output, y_test)
    print(f"Response-based teacher model test loss: {teacher_loss.item()}")
    accuracy = ((teacher_output > 0.5).float() == y_test).float().mean().item()
    print(f"Response-based teacher model accuracy: {accuracy}")

# Define the Student model
class StudentModel(nn.Module):
    def __init__(self, input_size):
        super(StudentModel, self).__init__()
        self.fc1 = nn.Linear(input_size, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.sigmoid(self.fc3(x))
        return x

# Instantiate the Student model
student_model = StudentModel(input_size=X_train.shape[1])

# Compile the Student model using KLD loss
criterion_kld = nn.KLDivLoss(reduction='batchmean')
optimizer_student = optim.Adam(student_model.parameters(), lr=0.001)

# Train the Student model on the soft labels
for epoch in range(10):
    student_model.train()
    optimizer_student.zero_grad()
    student_output = student_model(X_train)
    loss = criterion_kld(student_output, teacher_soft_labels)
    loss.backward()
    optimizer_student.step()
    print(f"Student Model - Epoch {epoch+1}, KLD Loss: {loss.item()}")

# Evaluate the Student model
student_model.eval()
with torch.no_grad():
    student_output = student_model(X_test)
    student_loss = criterion(student_output, y_test)
    print(f"Response-based student model test loss: {student_loss.item()}")
    accuracy = ((student_output > 0.5).float() == y_test).float().mean().item()
    print(f"Response-based student model accuracy: {accuracy}")

Teacher Model - Epoch 1, BCE Loss: 0.6930196285247803
Teacher Model - Epoch 2, BCE Loss: 0.6917576789855957
Teacher Model - Epoch 3, BCE Loss: 0.6908623576164246
Teacher Model - Epoch 4, BCE Loss: 0.6902269124984741
Teacher Model - Epoch 5, BCE Loss: 0.6897659301757812
Teacher Model - Epoch 6, BCE Loss: 0.6894090175628662
Teacher Model - Epoch 7, BCE Loss: 0.6891055107116699
Teacher Model - Epoch 8, BCE Loss: 0.6888608336448669
Teacher Model - Epoch 9, BCE Loss: 0.6886587738990784
Teacher Model - Epoch 10, BCE Loss: 0.6884830594062805
Response-based teacher model test loss: 0.7004870772361755
Response-based teacher model accuracy: 0.48500001430511475
Student Model - Epoch 1, KLD Loss: -0.6050364375114441
Student Model - Epoch 2, KLD Loss: -0.6076161861419678
Student Model - Epoch 3, KLD Loss: -0.610180139541626
Student Model - Epoch 4, KLD Loss: -0.6127474308013916
Student Model - Epoch 5, KLD Loss: -0.6153145432472229
Student Model - Epoch 6, KLD Loss: -0.6179109215736389
Student Mode

**Explanation**

- **Step 1**: Define and train the teacher model on the original task if not pre-trained.
- **Step 2**: Generate soft labels (probability distribution) using the trained teacher model. Since our problem is binary, we don't need to apply softmax. The output of your teacher model already represents a probability distribution
- **Step 3**: Define and train the student model, which is a smaller network.
- **Step 4**: Train the student model using the soft labels from the teacher model with Kullback-Leibler divergence loss to minimize the difference between the student's output and the teacher's soft labels.
- **Step 4**: Evaluate the student model on the test data.

## Multi-task learning (MTL)

Multi-task learning (MTL) is a technique where a single model is trained to perform multiple tasks simultaneously. Instead of training separate models for each task, MTL uses a unified architecture to learn from related tasks concurrently. The core idea behind MTL is that by learning multiple tasks together, the model can exploit commonalities and shared patterns across tasks, leading to improved generalization and performance.

In [7]:
# Generate synthetic data
np.random.seed(42)
X_mtl = np.random.rand(1000, 20)

# Task 1: Binary classification labels
y_classification = np.random.randint(2, size=1000)

# Task 2: Regression labels
y_regression = np.random.rand(1000)

# Convert data to PyTorch tensors
X_mtl = torch.tensor(X_mtl, dtype=torch.float32)
y_classification = torch.tensor(y_classification, dtype=torch.float32).unsqueeze(1)
y_regression = torch.tensor(y_regression, dtype=torch.float32).unsqueeze(1)

# Split the data into training and testing sets
X_train_mtl, X_test_mtl, y_train_class, y_test_class = train_test_split(X_mtl, y_classification, test_size=0.2, random_state=42)
_, _, y_train_reg, y_test_reg = train_test_split(X_mtl, y_regression, test_size=0.2, random_state=42)

# Create a DataLoader for training
train_dataset = TensorDataset(X_train_mtl, y_train_class, y_train_reg)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Define the MTL model
class MTLModel(nn.Module):
    def __init__(self, input_size):
        super(MTLModel, self).__init__()
        # Shared layers
        self.shared_fc1 = nn.Linear(input_size, 64)
        self.shared_fc2 = nn.Linear(64, 32)
        
        # Task 1: Classification
        self.classification_fc = nn.Linear(32, 1)
        
        # Task 2: Regression
        self.regression_fc = nn.Linear(32, 1)
        
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        shared = self.relu(self.shared_fc1(x))
        shared = self.relu(self.shared_fc2(shared))
        
        # Task 1: Classification
        classification_output = self.sigmoid(self.classification_fc(shared))
        
        # Task 2: Regression
        regression_output = self.regression_fc(shared)
        
        return classification_output, regression_output

# Initialize model, loss functions, and optimizer
mtl_model = MTLModel(input_size=X_train_mtl.shape[1])
criterion_class = nn.BCELoss()
criterion_reg = nn.MSELoss()
optimizer_mtl = optim.Adam(mtl_model.parameters(), lr=0.001)

# Train the model
for epoch in range(10):
    mtl_model.train()
    for batch in train_loader:
        X_batch, y_class_batch, y_reg_batch = batch
        
        optimizer_mtl.zero_grad()
        class_pred, reg_pred = mtl_model(X_batch)
        
        loss_class = criterion_class(class_pred, y_class_batch)
        loss_reg = criterion_reg(reg_pred, y_reg_batch)
        
        loss = loss_class + loss_reg
        loss.backward()
        optimizer_mtl.step()
    
    print(f"Epoch {epoch+1}, Classification Loss: {loss_class.item()}, Regression Loss: {loss_reg.item()}")

# Evaluate the model on the test set
mtl_model.eval()
with torch.no_grad():
    class_pred, reg_pred = mtl_model(X_test_mtl)
    
    test_class_loss = criterion_class(class_pred, y_test_class)
    test_reg_loss = criterion_reg(reg_pred, y_test_reg)
    
    test_class_accuracy = (class_pred.round() == y_test_class).float().mean()
    
    print(f"Test Classification Accuracy: {test_class_accuracy.item()}")
    print(f"Test Regression MSE: {test_reg_loss.item()}")

# Generate predictions for both tasks on the test data
classification_predictions = class_pred[:5]  # Predictions for the classification task
regression_predictions = reg_pred[:5]  # Predictions for the regression task
print("Classification Predictions (first 5):", classification_predictions)
print("Regression Predictions (first 5):", regression_predictions)

Epoch 1, Classification Loss: 0.6874178051948547, Regression Loss: 0.09434368461370468
Epoch 2, Classification Loss: 0.6925233602523804, Regression Loss: 0.11183096468448639
Epoch 3, Classification Loss: 0.6853673458099365, Regression Loss: 0.112230584025383
Epoch 4, Classification Loss: 0.6908676624298096, Regression Loss: 0.08216945081949234
Epoch 5, Classification Loss: 0.7056833505630493, Regression Loss: 0.06934796273708344
Epoch 6, Classification Loss: 0.6926990747451782, Regression Loss: 0.10169712454080582
Epoch 7, Classification Loss: 0.6919596791267395, Regression Loss: 0.08397369086742401
Epoch 8, Classification Loss: 0.6817010045051575, Regression Loss: 0.09836194664239883
Epoch 9, Classification Loss: 0.6826958656311035, Regression Loss: 0.08766327798366547
Epoch 10, Classification Loss: 0.7005208730697632, Regression Loss: 0.07402367144823074
Test Classification Accuracy: 0.5149999856948853
Test Regression MSE: 0.08783645927906036
Classification Predictions (first 5): ten

**Explanation**

- **Step 1**: Generate data for multiple tasks (binary classification and regression) and split the data into training and testing sets.
- **Step 2**: Define the MTL model with shared layers and task-specific outputs.
- **Step 3**: Compile the model with separate loss functions for each task (binary cross-entropy for classification and MSE for regression).
- **Step 4**: Train the model using combined losses.
- **Step 5**: Evaluate the model on the test set for both tasks and generate predictions.