# A Quick Tutorial for Implicit Deep Learning

This tutorial introduces the **Implicit Deep Learning** (IDL) framework using the `idl` package in 3 main parts:

1. **A Simple Example**
    - Implicit Model
    - Implcit RNN
    - State-driven Implicit Model (SIM)
3. **Custom Activation for Implicit model**
4. **Implicit model as a layer**

## 1. A Simple Example

This section provides a quick guide on how to use our package. With just a few lines of code, you can get started effortlessly.

Before proceeding, please ensure you have installed the required packages by following the [installation](https://github.com/HoangP8/Implicit-Deep-Learning?tab=readme-ov-file#installation) instructions.

#### 1a. `ImplicitModel`

`ImplicitModel` is the most fundamental implicit model. Unlike traditional architectures, it solves an fixed-point equation to find hidden states. For details on its parameters and the underlying intuition, please refer to the [documentation](https://implicit-deep-learning.readthedocs.io/en/latest/api/idl.html).

In this example, we demonstrate how to use the model for a simple regression task.

In [13]:
import torch
import torch.nn as nn
import torch.optim as optim
from idl import ImplicitModel

torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Random input and output data
x = torch.randn(5, 64).to(device)  # (batch_size=5, input_dim=64)
y = torch.randn(5, 10).to(device)  # (batch_size=5, output_dim=10)

# Initialize the model
model = ImplicitModel(input_dim=64,
                      output_dim=10, 
                      hidden_dim=128)
model.to(device)

# Define MSE loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad() 
    output = model(x)  # Forward pass
    loss = criterion(output, y)  # Compute MSE loss
    loss.backward() 
    optimizer.step()
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")
        
# Inference step
model.eval()  
with torch.no_grad():  
    x_test = torch.randn(1, 64).to(device)
    y_pred = model(x_test)  
    print(f"Inference result: \n {y_pred}")

Epoch [1/10], Loss: 1.5919
Epoch [2/10], Loss: 1.0334
Epoch [3/10], Loss: 0.4830
Epoch [4/10], Loss: 0.1951
Epoch [5/10], Loss: 0.1479
Epoch [6/10], Loss: 0.1692
Epoch [7/10], Loss: 0.1399
Epoch [8/10], Loss: 0.0868
Epoch [9/10], Loss: 0.0465
Epoch [10/10], Loss: 0.0318
Inference result: 
 tensor([[-0.0525,  0.5056, -0.1804, -0.2234, -0.2438, -0.4717, -0.2398, -0.4559,
          0.0045, -0.1295]], device='cuda:0')


The `ImplicitModel` has its forward and backward passes **fully packaged**, ensuring that the training and inference steps work **as normal**, with no additional modifications required. You only need to define the model with the appropriate `input_dim`, `output_dim`, and `hidden_dim`, and use it just like any other model.

#### 1b. `ImplicitRNN`

`ImplicitRNN` uses an implicit layer to define recurrence within a standard RNN framework. For more details, please refer to the [documentation](https://implicit-deep-learning.readthedocs.io/en/latest/api/rnn.html).

Its usage is very similar to `ImplicitModel`. Below, we provide an example where the model learns to predict a single output from an input sequence in a simple regression task.

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim
from idl import ImplicitRNN

torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Random input and output sequence 
x = torch.randn(50, 20, 1).to(device)  # (batch_size=50, seq_len=20, input_dim=1)
y = torch.randn(50, 1).to(device)  # (batch_size=50, output_dim=1)

# Initialize the ImplicitRNN model
model = ImplicitRNN(input_dim=1, output_dim=1, hidden_dim=10, implicit_hidden_dim=10)
model.to(device)

# Define MSE loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loops
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad()
    output = model(x)  # Forward pass
    loss = criterion(output, y)  # Compute MSE loss
    loss.backward()
    optimizer.step()

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

# Inference step
model.eval()
with torch.no_grad():
    x_test = torch.randn(1, 20, 1).to(device)
    y_pred = model(x_test)
    print(f"Inference result: {y_pred}")

Epoch [1/10], Loss: 0.8179
Epoch [2/10], Loss: 0.8017
Epoch [3/10], Loss: 0.7861
Epoch [4/10], Loss: 0.7708
Epoch [5/10], Loss: 0.7557
Epoch [6/10], Loss: 0.7392
Epoch [7/10], Loss: 0.7199
Epoch [8/10], Loss: 0.6989
Epoch [9/10], Loss: 0.6883
Epoch [10/10], Loss: 0.6879
Inference result: tensor([[-0.5798]], device='cuda:0')


#### 1c. `SIM`

`SIM` (State-driven Implicit Modeling) is a training method that helps implicit models learn from pre-trained explicit networks by matching their internal state vectors. For more details, please refer to the [documentation](https://implicit-deep-learning.readthedocs.io/en/latest/api/sim.html).

In [6]:
import torch
from torch import nn
import numpy as np

# First define a simple feed forward network
class Model(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(input_size, 64, bias=False),
            nn.ReLU(),
            nn.Linear(64, 32, bias=False),
            nn.ReLU(),
            nn.Linear(32, 16, bias=False),
            nn.ReLU(),
            nn.Linear(16, output_size, bias=False),
        )

    def forward(self, x):
        if x.ndim == 4:
            x = x.squeeze(1).flatten(start_dim=-2)
        return self.linear_relu_stack(x)

    def scale_network(self, factor=0.9):
        """
        Scale all the weights of the network by the maximum norm so that there exist a solution to the SIM convex optimization problem.
        """
        layers_indices = [0, 2, 4]

        max_norm = max(
            torch.linalg.norm(self.linear_relu_stack[i].weight, np.inf)
            for i in layers_indices
        )

        for i in layers_indices:
            weight = self.linear_relu_stack[i].weight
            scaled_weight = torch.nn.Parameter(weight / (max_norm * factor))
            self.linear_relu_stack[i].weight = scaled_weight

        scaled_norm = max(
            torch.linalg.norm(self.linear_relu_stack[i].weight, np.inf)
            for i in layers_indices
        )
        print(f"Original norm : {max_norm}, Scaled norm: {scaled_norm}")

        return self

In [8]:
import torch
import torch.optim.lr_scheduler as lr_scheduler
from torchvision import transforms, datasets

EPOCHS = 5
LR = 0.01
BATCH_SIZE = 128
device = "cuda:1"

# Load MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
train_loader = torch.utils.data.DataLoader(
    datasets.FashionMNIST('./data', train=True, download=True, transform=transform),
    batch_size=BATCH_SIZE
)
test_loader = torch.utils.data.DataLoader(
    datasets.FashionMNIST('./data', train=False, download=True, transform=transform),
    batch_size=BATCH_SIZE
)

# Define model
model = Model(input_size=784, output_size=10).to(device)

# Define optimizer, loss function, and lr scheduler
optimizer = torch.optim.Adam(model.parameters(), lr=LR)
loss_fn = torch.nn.CrossEntropyLoss(reduction="sum")
scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=EPOCHS, eta_min=0.0001)

# Train model
for epoch in range(EPOCHS):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    model.eval()
    # if epoch == EPOCHS - 1:
    #     # Scale the network to ensure there exist a solution to the SIM convex optimization problem
    #     model = model.scale_network(0.9)
    with torch.no_grad():
        test_loss = 0
        correct = 0
        for batch_idx, (data, target) in enumerate(test_loader):
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += loss_fn(output, target).item()
            _, preds = torch.max(output, 1)
            correct += torch.sum(preds == target.data).item()
        test_loss = (test_loss / len(test_loader.dataset))
        accuracy = correct / len(test_loader.dataset) * 100
        print(f'Test Epoch {epoch+1}: Average loss: {test_loss:.4f}, Accuracy: {accuracy:.2f}%')
        

Test Epoch 1: Average loss: 0.5134, Accuracy: 81.49%
Test Epoch 2: Average loss: 0.4346, Accuracy: 84.36%
Test Epoch 3: Average loss: 0.4271, Accuracy: 84.78%
Test Epoch 4: Average loss: 0.4380, Accuracy: 84.19%
Test Epoch 5: Average loss: 0.4171, Accuracy: 85.36%


In [9]:
from idl.sim import SIM
from idl.sim.solvers import CVXSolver
from torch.utils.data import DataLoader, Subset
import random

# Take only a subset of the training dataset to train the state-driven model
selected_indices = random.sample(
    range(len(train_loader.dataset)), 2000
)
subset = Subset(train_loader.dataset, selected_indices)
subset_loader = DataLoader(subset, batch_size=1000, shuffle=True)

sim = SIM(activation_fn=torch.nn.functional.relu, device=device, dtype=torch.float32)

solver = CVXSolver(regen_states=True)

# Train SIM
sim.train(solver=solver, model=model, dataloader=subset_loader)

# Evaluate SIM
sim.evaluate(test_loader) * 100

100%|██████████| 4/4 [06:45<00:00, 101.25s/it]
100%|██████████| 1/1 [00:31<00:00, 31.86s/it]


Test accuruacy: 0.8252


np.float64(82.52000000000001)

## 2. Custom Activation for Implicit model
The default activation of the Implicit model is ReLU. To override the implicit function you wish to use, just simply replace the `phi` and `dphi` (gradient of activation) methods. Below is an example of SiLU activation.

In [17]:
# ImplicitFunctionInf: function to ensure wellposedness of Implicit model
from idl import ImplicitModel, ImplicitFunctionInf 
import torch

class ImplicitFunctionInfSiLU(ImplicitFunctionInf):
    """
    An implicit function that uses the SiLU nonlinearity.
    """
    
    @staticmethod
    def phi(X):
        return X * torch.sigmoid(X)

    @staticmethod
    def dphi(X):
        grad = X.clone().detach()
        sigmoid = torch.sigmoid(grad)
        return sigmoid * (1 + grad * (1 - sigmoid))


# Initialize the model
model = ImplicitModel(input_dim=64,
                      output_dim=10, 
                      hidden_dim=128,
                      f=ImplicitFunctionInfSiLU)

# train model normally after

## Implicit model as a layer
Implicit Model can be integrated as a layer within larger models, allowing it to be trained as part of the overall network. The training process works normally, below is an example:


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from idl import ImplicitModel

torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define a larger model that includes ImplicitModel as a layer
class MLPWithImplicit(nn.Module):
    def __init__(self, input_dim, hidden_dim, implicit_hidden_dim, output_dim):
        super(MLPWithImplicit, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.implicit_layer = ImplicitModel(input_dim=hidden_dim, output_dim=output_dim, hidden_dim=implicit_hidden_dim)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.activation(self.fc1(x))
        x = self.implicit_layer(x)  # Pass through ImplicitModel
        return x

# Random input and output data
x = torch.randn(5, 64).to(device)  # (batch_size=5, input_dim=64)
y = torch.randn(5, 10).to(device)  # (batch_size=5, output_dim=10)

# Initialize the model
model = MLPWithImplicit(input_dim=64, hidden_dim=128, implicit_hidden_dim=64, output_dim=10)
model.to(device)

# Define MSE loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 10
for epoch in range(num_epochs):
    optimizer.zero_grad() 
    output = model(x)  # Forward pass
    loss = criterion(output, y)  # Compute MSE loss
    loss.backward() 
    optimizer.step()
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")
        
# Inference step
model.eval()  
with torch.no_grad():  
    x_test = torch.randn(1, 64).to(device)  
    y_pred = model(x_test)  
    print(f"Inference result: \n {y_pred}")

Epoch [1/10], Loss: 0.7000
Epoch [2/10], Loss: 0.3324
Epoch [3/10], Loss: 0.1701
Epoch [4/10], Loss: 0.0583
Epoch [5/10], Loss: 0.0516
Epoch [6/10], Loss: 0.0430
Epoch [7/10], Loss: 0.0322
Epoch [8/10], Loss: 0.0248
Epoch [9/10], Loss: 0.0216
Epoch [10/10], Loss: 0.0203
Inference result: 
 tensor([[-0.7536, -0.1237,  0.1082, -0.7727,  0.6030,  1.1026,  0.3494,  0.2714,
          0.3646,  0.4176]], device='cuda:0')
