In [6]:
import sys
sys.path.append('../')

# Backbones

Backbones are used as the feature extractor in the meta template. In the folder `backbones` there are several backbones implemented for us. The most basic one is the `FCNet` which is a fully connected network. The other backbones are `ConvNet` and `ResNet` which are convolutional neural networks. Let's explore them one by one.

## Fully connected network

`FCNet` is a fully connected network with parameters 
- `x_dim` which is the dimension of the input
- `layer_dim` which is a list of integers that specifies dimensions of the hidden layers
- `dropout` which is the dropout rate
- `fast_weights` which is a boolean that specifies whether to use fast weights. This will be the case for all backbones and it is connected to MAML model (see [methods](methods.ipynb)). See [Fast weights](#fast-weights) for more details.

One block of `FCNet` is defined in the function `full_block`  in `backbones/blocks.py`. It consists of a linear layer, a batch normalization layer, a ReLU layer and a dropout layer.

Number of blocks is defined by numeber of elements in `layer_dim`. 

In [5]:
from backbones.fcnet import FCNet

model = FCNet(x_dim=32, layer_dim=[64,64])
print(model)

FCNet(
  (encoder): Sequential(
    (0): Sequential(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Dropout(p=0.2, inplace=False)
    )
    (1): Sequential(
      (0): Linear(in_features=64, out_features=64, bias=True)
      (1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
      (3): Dropout(p=0.2, inplace=False)
    )
  )
)


## Changing the backbone

## Changes to the original code

1. I includend `**kwargs` in the `__init__` function of the backbones to make the `run.py` script work with the backbones other than `FCNet`.

## Fast Weights

Fast weights in backbones facilitate easier implementation of MAML algorithm. The idea is that we have two sets of weights in the model: the initial weights and the fast weights. The initial weights are the ones that are updated in the outer loop of the MAML algorithm. The fast weights are the ones that are updated in the inner loop of the MAML algorithm. The fast weights are only temporary for one episode. This way we can use the same model for both inner and outer loops.

### Fast Weights in MAML
![MAML for multiple tasks](../images/maml_pseudocode.png)

MAML operates on a two-level learning process:
- The **inner loop** uses fast weights for quick task-specific adaptation.
- The **outer loop** updates the "slow" (initial) weights, improving the model's generalization across tasks.

### Inner Loop
- **Evaluation (Line 5)**: For each task $\mathcal{T}_i$, the algorithm evaluates the gradient of the loss function $\mathcal{L}_{\mathcal{T}_i}$ with respect to the initial parameters $\theta$ using a small subset of $K$ examples. This gradient tells us how to update the parameters to improve performance on this task.
- **Compute Adapted Parameters (Line 6)**: The fast weights $\theta_i'$ are computed by adjusting the initial parameters $\theta$ using the evaluated gradient. The step size hyperparameter $\alpha$ determines how big of a step to take in the direction of the gradient. This creates a new set of parameters that are adapted specifically for task $\mathcal{T}_i$, and these adapted parameters are what we refer to as "fast weights". They are fast in the sense that they are rapidly computed based on just a few examples from the current task and are discarded after use.

### Outer Loop
After processing each task in the batch, the initial parameters $\theta$ are updated. This update is based on the sum of the gradients of the loss function $\mathcal{L}_{\mathcal{T}_i}$ with respect to the fast weights $\theta_i'$ for each task in the batch. The step size hyperparameter $\beta$ controls the size of this update. The updated $\theta$ will be a better starting point for new tasks, hence improving the model's ability to adapt to new tasks quickly.