# Get activations from a foveated model

Here we will demonstrate two methods for getting activitations. The first uses the model class directly. 

Let's load a pre-trained model

In [None]:
%load_ext autoreload
%autoreload 2

from fovi import get_model_from_base_fn
from fovi.fovinet import FoviNet

device = 'cuda'

# base_fn = 'fovi-alexnet_a-1_res-64_rfmult-2_in1k'
base_fn = 'fovi-dinov3-splus_a-2.78_res-64_in1k'
model = get_model_from_base_fn(base_fn, device=device)

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k not found in ../models
Attempting to download fovi-dinov3-splus_a-2.78_res-64_in1k from HuggingFace Hub...


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k downloaded from HuggingFace Hub to /home/nblauch/.cache/fovi/fovi-dinov3-splus_a-2.78_res-64_in1k
adjusting FOV for fixation: 16.0 (full: 16.0)


  num_neighbors = torch.minimum(torch.tensor(self.k*m), torch.tensor(self.in_coords.shape[0]))
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  num_neighbors = torch.minimum(torch.tensor(self.k*m), torch.tensor(self.in_coords.shape[0]))


minimum k to use all inputs: 103
Note: horizontal flip always done in the loader, to avoid differences across fixations
Number of coords per layer: [3976, 64]


### Now we can create some fake data and get activations.
First, let's see which layers are available to hook

In [2]:
model.list_available_layers()

['',
 'backbone',
 'backbone.embeddings',
 'backbone.embeddings.patch_embeddings',
 'backbone.embeddings.patch_embeddings.parametrizations',
 'backbone.embeddings.patch_embeddings.parametrizations.weight',
 'backbone.embeddings.patch_embeddings.parametrizations.weight.0',
 'backbone.rope_embeddings',
 'backbone.layer',
 'backbone.layer.0',
 'backbone.layer.0.norm1',
 'backbone.layer.0.attention',
 'backbone.layer.0.attention.k_proj',
 'backbone.layer.0.attention.k_proj.parametrizations',
 'backbone.layer.0.attention.k_proj.parametrizations.weight',
 'backbone.layer.0.attention.k_proj.parametrizations.weight.0',
 'backbone.layer.0.attention.v_proj',
 'backbone.layer.0.attention.v_proj.parametrizations',
 'backbone.layer.0.attention.v_proj.parametrizations.weight',
 'backbone.layer.0.attention.v_proj.parametrizations.weight.0',
 'backbone.layer.0.attention.q_proj',
 'backbone.layer.0.attention.q_proj.parametrizations',
 'backbone.layer.0.attention.q_proj.parametrizations.weight',
 'backb

Let's hook the the fourth backbone block (layers.3), the full backbone (conv layers), and the projector (MLP)

In [3]:
import torch

inputs = torch.rand((10, 3, 256, 256)).to(device)
outputs, acts = model.get_activations(inputs, layer_names=['backbone.layers.3', 'backbone', 'projector'])

Note that the intermediate backbone block retains a spatial dimension ($n=60$), whereas the full backbone has been globally pooled and has no spatial dimension, similarly to the projector.

Note also that each activation tensor contains a fixation dimension as the second dimension.

In [4]:
{k: v.shape for k, v in acts.items()}

{'backbone.layers.3': torch.Size([10, 4, 1, 384]),
 'backbone': torch.Size([10, 4, 1, 384]),
 'projector': torch.Size([10, 4, 1024])}

# Using the trainer class

An even more stream-lined way of getting activations is to use the Trainer class. 

For this to work, you will need to define paths to existing dataset files. For now, these must be FFCV files. Soon, we will allow for standard image datasets. 

When loading a trainer from pre-trained, it is generally easiest to use the utility `get_trainer_from_base_fn`, which does a few basic things under the hood so we don't need to manually edit the config to turn off distributed training, etc. 

In [None]:
from fovi import get_trainer_from_base_fn
from fovi.paths import DATASETS_DIR

# base_fn = 'fovi-alexnet_a-1_res-64_rfmult-2_in1k'
base_fn = 'fovi-dinov3-splus_a-2.78_res-64_in1k'
# edit the paths to those storing your ImageNet-1K FFCV files
# in general, any kwarg you pass in will be used to update the loaded config file
kwargs = {
    'data.train_dataset': f'{DATASETS_DIR}/ffcv/imagenet/train_compressed.ffcv',
    'data.val_dataset': f'{DATASETS_DIR}/ffcv/imagenet/val_compressed.ffcv',
          }
trainer = get_trainer_from_base_fn(base_fn, load=True, model_dirs=['../models'], **kwargs)


Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k not found in ../models
Attempting to download fovi-dinov3-splus_a-2.78_res-64_in1k from HuggingFace Hub...


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Model with base_fn fovi-dinov3-splus_a-2.78_res-64_in1k downloaded from HuggingFace Hub to /home/nblauch/.cache/fovi/fovi-dinov3-splus_a-2.78_res-64_in1k
adjusting FOV for fixation: 16.0 (full: 16.0)
minimum k to use all inputs: 103
Note: horizontal flip always done in the loader, to avoid differences across fixations
Number of coords per layer: [3976, 64]
FoviNet(
  (network): BackboneProjectorWrapper(
    (backbone): DINOv3ViTModel(
      (embeddings): DINOv3ViTEmbeddings(
        (patch_embeddings): ParametrizedKNNPartitioningPatchEmbedding(
        	in_channels=3
        	out_channels=384
        	k=103
        	n_ref=256
        	in_coords=SamplingCoords(length=3976, fov=16.0, cmf_a=2.785765, resolution=44, style=isotropic)
        	out_coords=SamplingCoords(length=64, fov=16.0, cmf_a=2.785765, resolution=6, style=isotropic)
        	sample_cortex=geodesic
        )
      )
      (rope_embeddings): FoviDinoV3RoPE()
      (layer): ModuleList(
        (0-5): 6 x DINOv3ViTLayer(
    

In [7]:
outputs, activations, targets = trainer.compute_activations(trainer.val_loader, layer_names=['backbone.layers.3', 'backbone', 'projector'], max_batches=4, do_postproc=True)

  1%|█▎                                                                                                                                                                                | 3/391 [00:10<22:36,  3.50s/it]


In [8]:
{k: v.shape for k, v in activations.items()}

{'backbone.layers.3': (512, 20, 1, 384),
 'backbone': (512, 20, 1, 384),
 'projector': (512, 20, 1024)}

note that we also now have the network outputs, which have been aggregated over fixations (since we passed `do_postproc=True`, which applies the fixation aggregator head)

In [9]:
outputs.shape

(512, 1000)

we can quickly check our top-1 accuracy (note: this is an unstable estimate since we used a small number of batches)

In [10]:
trainer.val_meters['top_1_val'](torch.tensor(outputs), torch.tensor(targets))

tensor(0.7305, device='cuda:0')