<a href="https://colab.research.google.com/github/shazam-25/deep-learning-concepts/blob/main/modern_training_trick.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
import torch
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingWarmRestarts

# Initialize model and optimizer
model = torch.nn.Linear(10, 1)  # Input size 10, Output size 1
optimizer = optim.Adam(model.parameters(), lr=0.01) # Adam optimizer
scheduler = CosineAnnealingWarmRestarts(optimizer, T_0=10, T_mult=2)  # Cosine annealing with warm restarts

# Training loop
for epoch in range(10):
  # Train model...
  # In a real training loop, you would update model parameters here
  # For demonstration purposes, we'll just step the scheduler
  scheduler.step() # Step the scheduler after each epoch
  print(f'Epoch {epoch+1}, Adjust learning rate: {optimizer.param_groups[0]["lr"]}')

# Input tensor
input = torch.tensor([[1.0, 2.0, 4.0, 6.0, 8.0, 10.0, 12.0, 14.0, 16.0, 18.0]])

# He initialization
layer = torch.nn.Linear(10, 10) # Example layer
torch.nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')  # He init

# Mixed precision training
# You would typically use torch.cuda.amp.GradScaler() with a CUDA-enabled device
# and within a training loop. This is a simplified example.
try:
    scaler = torch.amp.GradScaler()  # For mixed precision
    # Assuming 'input' is a tensor and model is defined
    with torch.amp.autocast('cuda'):
      output = model(input)
    print(f'Output: {output}')
    print("\nMixed precision setup successful (requires CUDA enabled device and input tensor).")
except Exception as e:
    print(f"\nCould not initialize GradScaler. Mixed precision requires a CUDA-enabled device. Error: {e}")

Epoch 1, Adjust learning rate: 0.009755282581475769
Epoch 2, Adjust learning rate: 0.009045084971874737
Epoch 3, Adjust learning rate: 0.007938926261462366
Epoch 4, Adjust learning rate: 0.006545084971874737
Epoch 5, Adjust learning rate: 0.005
Epoch 6, Adjust learning rate: 0.003454915028125263
Epoch 7, Adjust learning rate: 0.0020610737385376348
Epoch 8, Adjust learning rate: 0.0009549150281252633
Epoch 9, Adjust learning rate: 0.00024471741852423234
Epoch 10, Adjust learning rate: 0.01
Output: tensor([[6.0109]], grad_fn=<AddmmBackward0>)

Mixed precision setup successful (requires CUDA enabled device and input tensor).
