Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation (CVPR 2024 Oral)

This repo contains a reference implementation of our proposed method Action Segmentation Optimal Transport (ASOT), accepted to CVPR 2024. We also include our full training/evaluation pipelines for the unsupervised learning experiments in the paper.

Action Segmentation Optimal Transport (ASOT)

ASOT is a post-processing technique that decodes a temporally consistent class predictions given a noisy affinity matrix. We use ASOT in the context of generating pseudo-labels within an unsupervised learning pipeline based on joint representation learning and clustering.

For example, given an affinity matrix that looks like this

ASOT generates pseudo-labels that look like this

Unlike previous methods which use optimal transport for pseudo-labels, we account for the temporal consistency property inherent to video data. Furthermore, we use unbalanced optimal transport to account for long-tail action class distributions prevalent in video datasets. Regular optimal transport will produce pseudo-labels that look like this

Unsupervised learning pipeline

Our unsupervised segmentation pipeline uses a joint representation learning and clustering formulation. A frame-wise feature extractor (MLP) and action cluster embeddings are jointly learned by using pseudo-labels generated per batch.

See the main paper for results and SOTA comparison.

Example code

We provide a self-contained example which shows how ASOT is used for post-processing in examples/. Run example.py and feel free to tune the ASOT parameters.

Re-run Unsupervised Learning Experiments

Some setup is required to run the unsupervised learning pipeline. Steps involve installing dependencies, setting up datasets and running the train/eval scripts.

Datasets

Our datasets are comprised of per video frame features (pre-extracted), frame-wise labels and a mapping file which maps class IDs to action class names. Download instructions and folder structure are described in this section.

Breakfast, YTI, 50 Salads: click here to find links to download the datasets. Desktop assembly: click here to download the dataset.

The data directory should at the minimum have the following structure.

data                 # root path for all datasets
├─ dataset_name/                # root path for single dataset
│  ├─ features/          # pre-extracted visual frame features
│  │  ├─ fname1.npy      # can also be txt
│  │  ├─ fname2.npy      # can also be txt
│  │  ├─ ...      
|  ├─ groundTruth/       # frame-wise labels
│  │  ├─ fname1 
│  │  ├─ fname2
│  │  ├─ ...      
|  ├─ mapping/       # frame-wise labels
│  │  ├─ mapping.txt # class-to-action ID mapping

dataset_name can be one of Breakfast desktop_assembly FS YTI. It should be easy to set up new datasets as long as the folder structure is setup correctly.

Dependencies

numpy scipy scikit-learn matplotlib pytorch pytorch-lightning wandb

Run train/eval pipeline

We provide bash scripts and python commands to run the unsupervised learning experiments described in the paper. All hyperparameters are set in the subsequent scripts/commands according to the paper and should be consistent with the reported results.

Breakfast

Run bash run_bf.sh. This runs training code for each activity class separately.

YouTube Instructions

Run bash run_yti.sh. This runs training code for each activity class separately.

50 Salads, Desktop Assembly

python3 train.py -d FSeval -ac all -c 12 -ne 30 --seed 0 --group main_results --rho 0.15 -lat 0.11 -vf 5 -lr 1e-3 -wd 1e-4 -ua
python3 train.py -d FS -ac all -c 19 -ne 30 -g 0 --seed 0 --group main_results --rho 0.15 -lat 0.15 -vf 5 -lr 1e-3 -wd 1e-4 -ua
python3 train.py -d desktop_assembly -ac all -c 22 -ne 30 --seed 0 --group main_results --rho 0.25 -lat 0.16 -vf 5 -lr 1e-3 -wd 1e-4 -r 0.02 -ls 512 128 40 -ua

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_bf.sh		run_bf.sh
run_sensitivity.sh		run_sensitivity.sh
run_yti.sh		run_yti.sh
system_train.png		system_train.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

run_bf.sh

run_bf.sh

run_sensitivity.sh

run_sensitivity.sh

run_yti.sh

run_yti.sh

system_train.png

system_train.png

Repository files navigation

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation (CVPR 2024 Oral)

Action Segmentation Optimal Transport (ASOT)

Unsupervised learning pipeline

Example code

Re-run Unsupervised Learning Experiments

Datasets

Dependencies

Run train/eval pipeline

Breakfast

YouTube Instructions

50 Salads, Desktop Assembly

About

Releases

Packages

Languages

License

mingu6/action_seg_ot

Folders and files

Latest commit

History

Repository files navigation

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation (CVPR 2024 Oral)

Action Segmentation Optimal Transport (ASOT)

Unsupervised learning pipeline

Example code

Re-run Unsupervised Learning Experiments

Datasets

Dependencies

Run train/eval pipeline

Breakfast

YouTube Instructions

50 Salads, Desktop Assembly

About

Resources

License

Stars

Watchers

Forks

Languages