<a href="https://colab.research.google.com/github/jcmachicao/knowledge_engineering/blob/main/U3__TransferLearning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Modern Deep Learning Training Techniques in PyTorch
# --------------------------------------------------
### This notebook demonstrates three advanced strategies:
### 1. Transfer Learning & Fine-Tuning
### 2. Self-Supervised Learning (Contrastive & Masked Prediction)
### 3. Curriculum & Active Learning

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, transforms, datasets
from torch.utils.data import DataLoader, TensorDataset
import wandb# Initialize Weights & Biases (you can disable if running offline)

In [2]:
wandb.init(project="modern-dl-training", name="demo_notebook", reinit=True)
device = 'cuda' if torch.cuda.is_available() else 'cpu'

  | |_| | '_ \/ _` / _` |  _/ -_)


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mgdmk[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


The `wandb.init()` function automatically creates a new project in your Weights & Biases account if a project with the specified name (`"modern-dl-training"` in this case) doesn't already exist.

The name of the final layer (`fc` in this case) is specific to the particular pre-trained model architecture (like ResNet in this example). Different model architectures might use different names for their final layers.

There isn't a single universal glossary that lists all layer names for all possible models. However, you can inspect the model's structure programmatically to see the names of its layers.

For example, you could print the model to see its layers and their names:

In many pre-trained models from libraries like torchvision, the final layer is often a fully connected layer (also known as a dense layer) that is used for classification. This layer is typically named `fc`.

When performing transfer learning, we often want to adapt the pre-trained model to a new task with a different number of output classes. We do this by replacing this final `fc` layer with a new fully connected layer that has the desired number of output neurons (in this example, 3 for a 3-class classification).

By freezing the other layers (`for p in model.parameters(): p.requires_grad = False`), we keep the learned features from the pre-trained model and only train the new `fc` layer on the new data.

In [3]:
# --------------------------------------------------
# 1. TRANSFER LEARNING & FINE-TUNING
# --------------------------------------------------

def transfer_learning_example():
    model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

    # Freeze all layers
    for p in model.parameters():
        p.requires_grad = False

    # Replace final layer for 3-class classification
    model.fc = nn.Linear(model.fc.in_features, 3)

    optimizer = torch.optim.Adam(model.fc.parameters(), lr=1e-3)
    criterion = nn.CrossEntropyLoss()

    # Dummy data
    X = torch.randn(32, 3, 224, 224)
    y = torch.randint(0, 3, (32,))

    outputs = model(X)
    loss = criterion(outputs, y)
    loss.backward()
    optimizer.step()

    wandb.log({"transfer_learning_loss": loss.item()})
    print("[Transfer Learning] Loss:", loss.item())

In [4]:
# --------------------------------------------------
# 2. SELF-SUPERVISED LEARNING
# --------------------------------------------------

def contrastive_learning_example():
    # Contrastive learning (SimCLR style, minimal)
    z1 = F.normalize(torch.randn(8, 128), dim=1)
    z2 = F.normalize(torch.randn(8, 128), dim=1)

    sim_matrix = torch.mm(z1, z2.T) / 0.5  # temperature scaling
    labels = torch.arange(8)
    loss = F.cross_entropy(sim_matrix, labels)

    wandb.log({"contrastive_loss": loss.item()})
    print("[Contrastive Learning] Loss:", loss.item())


def masked_prediction_example():
    x = torch.randint(0, 100, (4, 10))
    mask = torch.rand_like(x.float()) < 0.3
    x_masked = x.clone()
    x_masked[mask] = 0  # mask token

    model = nn.Embedding(100, 32)
    linear = nn.Linear(32, 100)

    emb = model(x_masked)
    pred = linear(emb)

    loss = nn.CrossEntropyLoss()(pred[mask], x[mask])

    wandb.log({"masked_loss": loss.item()})
    print("[Masked Prediction] Loss:", loss.item())

In [9]:
# --------------------------------------------------
# 3. CURRICULUM LEARNING
# --------------------------------------------------

def curriculum_learning_example():
    X = torch.linspace(-5, 5, 500).unsqueeze(1).to(device)
    y = torch.sin(X) + 0.1 * torch.randn_like(X) # y is created on the same device as X
    y = y.to(device) # Ensure y is on the device

    # Modified model definition to potentially fix shape mismatch
    model = nn.Sequential(nn.Linear(1, 100), nn.ReLU(), nn.Linear(100, 1)).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
    criterion = nn.MSELoss()

    for epoch in range(5):
        difficulty = 1 + epoch  # progressively expand input range

        # Aqui se ajusta las formas de las matrices para que sean compatibles para la multiplicación
        mask = (X.abs() < difficulty).view(-1)
        X_batch, y_batch = X[mask], y[mask]
        X_batch = X_batch.view(-1, 1)
        y_batch = y_batch.view(-1, 1)

        optimizer.zero_grad()
        loss = criterion(model(X_batch), y_batch)
        loss.backward()
        optimizer.step()

        wandb.log({"curriculum_loss": loss.item(), "epoch": epoch})
        print(f"[Curriculum] Epoch {epoch} | Range [-{difficulty}, {difficulty}] | Loss={loss.item():.4f}")

In [6]:
# --------------------------------------------------
# 4. ACTIVE LEARNING (conceptual demo)
# --------------------------------------------------

def active_learning_example():
    probs = torch.softmax(torch.randn(10, 3), dim=1)
    entropy = -torch.sum(probs * torch.log(probs + 1e-8), dim=1)
    topk = torch.topk(entropy, k=3)
    wandb.log({"avg_entropy": entropy.mean().item()})
    print("[Active Learning] Most uncertain samples:", topk.indices.tolist())

In [7]:
# --------------------------------------------------
# Run all demos
# --------------------------------------------------
transfer_learning_example()
contrastive_learning_example()
masked_prediction_example()

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


100%|██████████| 44.7M/44.7M [00:00<00:00, 76.3MB/s]


[Transfer Learning] Loss: 1.2010308504104614
[Contrastive Learning] Loss: 1.9420629739761353
[Masked Prediction] Loss: 4.663577079772949


In [10]:
curriculum_learning_example()
active_learning_example()

[Curriculum] Epoch 0 | Range [-1, 1] | Loss=0.8028
[Curriculum] Epoch 1 | Range [-2, 2] | Loss=1.0259
[Curriculum] Epoch 2 | Range [-3, 3] | Loss=0.4883
[Curriculum] Epoch 3 | Range [-4, 4] | Loss=0.8958
[Curriculum] Epoch 4 | Range [-5, 5] | Loss=1.8250
[Active Learning] Most uncertain samples: [2, 8, 0]


In [11]:
wandb.finish()

0,1
avg_entropy,▁
contrastive_loss,▁
curriculum_loss,▃▄▁▃█
epoch,▁▃▅▆█
masked_loss,▁
transfer_learning_loss,▁

0,1
avg_entropy,0.86986
contrastive_loss,1.94206
curriculum_loss,1.82495
epoch,4.0
masked_loss,4.66358
transfer_learning_loss,1.20103
