# Egocentric zone-aware action recognition across environments

This notebook is meant to run on Google Colab.

**NOTE**: results in the paper were averaged over three runs with different seeds so results may vary slightly.

## Setup the environment

In [None]:
!pip install gdown

## Download the code

In [None]:
!git clone git@github.com:sapeirone/EgoZAR.git
!cd EgoZAR

## Data preparation

### EK100 UDA TBN features
Download the official TBN features from [here](https://github.com/epic-kitchens/C4-UDA-for-Action-Recognition).

In [None]:
!mkdir -p data
!wget -O ek100.zip https://www.dropbox.com/scl/fo/us8zy3r2rufqriig0pbii/ABeUdV83UNmJ5US-oCxAPno?rlkey=yzbuczl198z067pnotx1zxvuo&e=1&dl=0
!unzip ek100.zip -d data/
!rm ek100.zip

### CLIP features

Download the pre-extracted CLIP ViT-L/14 features for EgoZAR. 

You can also extract these features with different CLIP variants using the `save_CLIP_features.py` script.

In [None]:
!mkdir clip_features

# Souce train features
!gdown 1TBmxIuoERx1v1xkrBfNfvWeIHNYUzRZS

# Target validation features
!gdown 186X1PBb1RuzBeXObbCs60DEaJJ4vlLdg

## Source Only baseline

In [1]:
!python train.py --modality=RGB --modality=Flow --modality=Audio

[32m2024-09-21 14:06:35[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mINFO[0m Initializing the datasets...
[32m2024-09-21 14:06:35[0m [35mhyperion[0m [34mdataset[554860][0m [1;30mINFO[0m Loading EK100 dataset for split source and mode train.
[32m2024-09-21 14:06:41[0m [35mhyperion[0m [34mdataset[554860][0m [1;30mINFO[0m Loading EK100 dataset for split target and mode val.
[32m2024-09-21 14:06:44[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mINFO[0m Building the clusters...
[32m2024-09-21 14:06:46[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mINFO[0m Building the EgoZAR architecture for the following modalities ['RGB', 'Flow', 'Audio']...
[32m2024-09-21 14:06:46[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mINFO[0m 
[32m2024-09-21 14:06:46[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mINFO[0m Starting the training loop for 30 epochs...
[32m2024-09-21 14:06:46[0m [35mhyperion[0m [34m__main__[554860][0m [1;30mIN

### EgoZAR

In [3]:
!python train.py --modality=RGB --modality=Flow --modality=Audio --ca --use-input-features=N --use-egozar-motion-features=Y --use-egozar-acz-features=Y \
    --disent-loss-weight=1.0 \
    --disent-n-clusters=4

[32m2024-09-21 14:29:24[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mINFO[0m Initializing the datasets...
[32m2024-09-21 14:29:24[0m [35mhyperion[0m [34mdataset[559989][0m [1;30mINFO[0m Loading EK100 dataset for split source and mode train.
[32m2024-09-21 14:29:31[0m [35mhyperion[0m [34mdataset[559989][0m [1;30mINFO[0m Loading EK100 dataset for split target and mode val.
[32m2024-09-21 14:29:34[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mINFO[0m Building the clusters...
[32m2024-09-21 14:29:36[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mINFO[0m Building the EgoZAR architecture for the following modalities ['RGB', 'Flow', 'Audio']...
[32m2024-09-21 14:29:37[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mINFO[0m 
[32m2024-09-21 14:29:37[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mINFO[0m Starting the training loop for 30 epochs...
[32m2024-09-21 14:29:37[0m [35mhyperion[0m [34m__main__[559989][0m [1;30mIN