<div align="center">

# Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives

[Simone Alberto Peirone](https://scholar.google.com/citations?user=K0efPssAAAAJ), [Francesca Pistilli](https://scholar.google.com/citations?user=7MJdvzYAAAAJ), [Antonio Alliegro](https://scholar.google.com/citations?user=yQqW5q0AAAAJ), [Tatiana Tommasi](https://scholar.google.com/citations?user=ykFtI-QAAAAJ), [Giuseppe Averta](https://scholar.google.com/citations?user=i4rm0tYAAAAJ)

</div>

<div align="center">

<a href='https://arxiv.org/abs/2502.02487' style="margin: 10px"><img src='https://img.shields.io/badge/Paper-Arxiv:2502.02487-red'></a>
<a href='https://sapeirone.github.io/hier-egopack/' style="margin: 10px"><img src='https://img.shields.io/badge/Project-Page-Green'></a>
<a target="_blank" href="https://colab.research.google.com/github/sapeirone/hier-egopack/blob/main/quickstart.ipynb" style="margin: 10px">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

✨ <strong>This paper extends our previous work <a href="https://sapeirone.github.io/EgoPack/">A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives" (CVPR 2024)</a></strong> ✨
</div>
<br>

<div align="center">
<strong>Abstract:</strong>

Our comprehension of video streams depicting human activities is naturally multifaceted: in just a few moments, we can grasp what is happening, identify the relevance and interactions of objects in the scene, and forecast what will happen soon, everything all at once. To endow autonomous systems with such a holistic perception, learning how to correlate concepts, abstract knowledge across diverse tasks, and leverage tasks synergies when learning novel skills is essential.
A significant step in this direction is EgoPack, a unified framework for understanding human activities across diverse tasks with minimal overhead. EgoPack promotes information sharing and collaboration among downstream tasks, essential for efficiently learning new skills.
In this paper, we introduce Hier-Egopack, which advances EgoPack by enabling reasoning also across diverse temporal granularities, which expands its applicability to a broader range of downstream tasks.
To achieve this, we propose a novel hierarchical architecture for temporal reasoning equipped with a GNN layer specifically designed to tackle the challenges of multi-granularity reasoning effectively.
We evaluate our approach on multiple Ego4d benchmarks involving both clip-level and frame-level reasoning, demonstrating how our hierarchical unified architecture effectively solves these diverse tasks simultaneously.
</div>

This notebook allows to quickly setup a Google Colab environment for running Hier-EgoPack on the Moment Queries task.

In [None]:
!git clone git@github.com:sapeirone/hier-egopack.git

## Step 0: Dataset Access
To access the Ego4d dataset you need to fill the registration form here. Then, you will receive by email an access key with its associated secret.

In [None]:
import os
os.environ['AWS_ACCESS_KEY_ID'] = ""
os.environ['AWS_SECRET_ACCESS_KEY'] = ""

In [None]:
%%bash

# Set up the AWS CLI
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip -o awscliv2.zip >/dev/null
sudo ./aws/install >/dev/null 2>&1
aws configure set aws_access_key_id "$AWS_ACCESS_KEY_ID" && aws configure set aws_secret_access_key "$AWS_SECRET_ACCESS_KEY"
rm "awscliv2.zip"

In [None]:
!pip install ego4d
!mkdir -p /content/hier-egopack/data/ego4d/raw/annotations/v1
!ego4d --output_directory=/content/hier-egopack/data/ego4d/raw --datasets annotations --benchmarks fho moments -y --version v1
# adjust the directory structure to match the one required by hier-egopack
!mv /content/hier-egopack/data/ego4d/raw/v1/annotations/* /content/hier-egopack/data/ego4d/raw/annotations/v1/

### Download EgoVLP pre-extracted features

Download the pre-extracted features using EgoVLP.

In [None]:
!mkdir -p /content/hier-egopack/data/ego4d/raw/features/egovlp/
!wget http://sapeirone.it/data/hier-egopack/egovlp_trainval_features.zip

!unzip -j /content/egovlp_trainval_features.zip -d /content/hier-egopack/data/ego4d/raw/features/egovlp/

## Step 2: Installing dependencies

In [None]:
!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
!pip3 install torch_geometric
!pip install ego4d
!pip3 install pyg_lib torch_scatter torch_sparse torch_cluster torch_spline_conv -f https://data.pyg.org/whl/torch-2.4.0+cu124.html
!cd /content/hier-egopack && pip3 install -r requirements.txt

# install nms
!cd /content/hier-egopack/libs/utils/ && python setup.py install --user

## Step 3: Single task training (optional)

In [None]:
!cd /content/hier-egopack/ && python train_single_task.py --config-name=mq features=ego4d/egovlp eval_interval=-1 lr_warmup=True

## Step 4: EgoPack task training

In [None]:
# Download the mtl checkpoint
!wget https://sapeirone.it/data/hier-egopack/ego4d_ar-ego4d_lta-ego4d_oscc-ego4d_pnr_2024-11-09-13-36.pth
!mv ego4d_ar-ego4d_lta-ego4d_oscc-ego4d_pnr_2024-11-09-13-36.pth ckpt.pth

In [None]:
!cd /content/hier-egopack/ && python train_egopack.py --config-name=mq resume_from=/content/ckpt.pth lr_warmup=True egopack.depth=1 egopack.hidden_size=256 egopack.conv_depth=1 egopack.k=4