## Read data

your X_train and X_test should be of the shape (n_samples, 1, seq_len=512)

In [None]:
import numpy as np

data = [np.load(f'data/HandMovementDirection/{variable}_{set_name}.npy')
        for variable in ['X', 'y'] for set_name in ['train', 'test']]

X_train, X_test, y_train, y_test = data

print("X_train dims: ", X_train.shape)
print("X_test dims: ", X_test.shape)

if original sequence length is different, resize it, for example, using the following function:


In [None]:
import torch.nn.functional as F

def resize(X):
    X_scaled = F.interpolate(torch.tensor(X, dtype=torch.float), size=512, mode='linear', align_corners=False)
    return X_scaled.numpy()
    
X_train, X_test = resize(X_train), resize(X_test)

print("X_train dims: ", X_train.shape)
print("X_test dims: ", X_test.shape)

## Load model

In [None]:
from mantis.architecture import Mantis8M
    
device = 'cpu' # set device
network = Mantis8M(device=device) # init model
network = network.from_pretrained("paris-noah/Mantis-8M") # load weights

## Extract deep features with an adapter

If the number of channels is too large, one approach is to reduce the dimensionality first, then pass the transformed input to the foundation model. This resembles a dimensionality reduction problem, with two key nuances:

 1. The data is three-dimensional.
 2. The dimensionality reduction algorithm should preserve the temporal patterns, as these are important for the foundation model.
 
In this package, we provide several simple solutions to address these challenges, while leaving the development of more sophisticated and efficient solutions for future work.

### Dimension reduction along channels

Given a 3d dataset of size $(N, D, T)$, an intuitive approach would be to reshape it to $(N, D \times T)$ in order to apply classical dimension reduction algorithms like PCA. However, this may disrupt the temporal structure as new features will be combine different time stamps and channels at the same.

Therefore, we reshape the data to $(N \times T, D)$, allowing the dimension reduction algorithm to focus on correlations between channels over all time steps, effectively capturing spatial correlations while preserving temporal information. In the case of linear feature transformation, we eventually learn a rotation matrix $W \in \mathbb{R}^{D' \times D}$ that linearly combines the original $D$ channels into new $D'$ channels, which intuitevely allows to preserve most of the temporal patterns.

We have implemented this approach as a wrapper called `MultichannelProjector` that supports any 2d unsupervised dimension reduction algorithm that follows the `scikit-learn` convention with a `n_components` argument as well as `fit` and `transform` methods.

In [None]:
from mantis.adapters import MultichannelProjector
from sklearn.decomposition import PCA

adapter = MultichannelProjector(new_num_channels=5, base_projector=PCA)

We also provided shortcuts for 3 algorithms: `PCA`, `TruncatedSVD` from `sklearn.decomposition` and `SparseRandomProjection` from `sklearn.random_projection`.

In [None]:
# for PCA
adapter = MultichannelProjector(new_num_channels=5, base_projector='pca')
# for TruncatedSVD
adapter = MultichannelProjector(new_num_channels=5, base_projector='svd')
# for SparseRandomProjection
adapter = MultichannelProjector(new_num_channels=5, base_projector='rand')

### Patch-wise dimension reduction

As previous approach completely ignores interdependecies between the measurements at different timestamps, one simple approach to overcome this issue is to split the time dimension into non-overlapping patches. In other words, we reshape $(N, D, T)$ data into $(N \times P, S \times D)$, where $P$ is the number of patches, $S$ is the patch size, $T = S \times D$, and then use a dimension reduction algorithm, which results in a projected data matrix of size $(N \times P, S \times D')$. Finally, we reshape the transformed input to $(N, D', T)$.

This approach we implemented through an optional argument `patch_window_size` in `MultichannelProjector`.

In [None]:
from mantis.adapters import MultichannelProjector

adapter = MultichannelProjector(new_num_channels=5, base_projector='pca', patch_window_size=8)

### Channel selection

Alternatively to dimension reduction, we can use feature selection approaches to reduce the number of channels. We have implemented a very simple approach where we reshape data to $(N \times T, D)$, then sort the channels by variance in descending order and keep $D'$ first ones.

In [None]:
from mantis.adapters import VarianceBasedSelector

adapter = VarianceBasedSelector(new_num_channels=5)

### Scaling channels

This is step is optional and depends on an application. If all channels are measured in different units, it makes sense to first scale channels before applying an adapter. 
If all channels are measured in the same units, scaling is not necessary and sometimes can be even harmful as it may change the channel importance ranking.

In [None]:
# optional: scale training and test data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(-1, X_train.shape[-1])).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(-1, X_test.shape[-1])).reshape(X_test.shape)

### Apply an adapter

All these adapters follow the same pipeline which is written as follows: 

In [None]:
adapter.fit(X_train)
X_reduced_train, X_reduced_test = adapter.transform(X_train), adapter.transform(X_test)

print("X_reduced_train dims: ", X_reduced_train.shape)
print("X_reduced_test dims: ", X_reduced_test.shape)

Extract deep features, learn a classifier and evaluate the perfomance

In [None]:
from sklearn.ensemble import RandomForestClassifier
from mantis.trainer import MantisTrainer

model = MantisTrainer(device=device, network=network) # init trainer
Z_train = model.transform(X_reduced_train)
Z_test = model.transform(X_reduced_test)

predictor = RandomForestClassifier(n_estimators=200, n_jobs=-1, random_state=0)
predictor.fit(Z_train, y_train)
y_pred = predictor.predict(Z_test)
print(f'Accuracy on the test set is {np.mean(y_test == y_pred)}')

## Differentiable adapter for fine-tuning

Previously, we demonstrated the use of standalone adapters, which can be applied before passing the input to the foundation model. In this section, we show how to use a differentiable adapter, implemented as a pytorch module, that is learned through backpropagation as part of the overall network.

The advantage of such an adapter is that the dimensionality reduction layer is optimized according to the ultimate classification loss. However, this approach is significantly slower, as the adapter and classification head are learned jointly, requiring a forward pass through the foundation model for every optimization step — even if the foundation model's weights are frozen.

We have implemented a simple adapter that applies a learnable rotation matrix $W \in \mathbb{R}^{D' \times D}$, linearly combining original $D$ channels into new $D'$ channels. 

In [None]:
from mantis.adapters import LinearChannelCombiner

adapter = LinearChannelCombiner(num_channels=X_train.shape[1], new_num_channels=5)

Alternatively, you can define your own pytorch module:

In [None]:
from torch import nn

class MyAdapter(nn.Module):
    def __init__(self, num_channels, new_num_channels):
        super().__init__()
        self.num_channels = num_channels
        self.new_num_channels = new_num_channels

    def forward(self, x):
        raise NotImplementedError

### Adapter + head fine-tuning 

In [None]:
fine_tuning_type = 'adapter_head'

# fine-tune the model
model.fit(X_train, y_train, fine_tuning_type=fine_tuning_type, adapter=adapter)

# evaluate performance
y_pred = model.predict(X_test)
print(f'Accuracy on the test set is {np.mean(y_test == y_pred)}')

### Full fine-tuning

In [None]:
fine_tuning_type = 'full'

# fine-tune the model
model.fit(X_train, y_train, fine_tuning_type=fine_tuning_type, adapter=adapter)

# evaluate performance
y_pred = model.predict(X_test)
print(f'Accuracy on the test set is {np.mean(y_test == y_pred)}')