Skip to content

gkordo/s2vs

Repository files navigation

Self-Supervised Video Similarity Learning

This repository contains the PyTorch implementation of the paper Self-Supervised Video Similarity Learning. It contains code for the training of video similarity learning network with self-supervision. Also, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the employed video datasets, and pre-trained models are provided.

Prerequisites

  • Python 3
  • PyTorch
  • Torchvision
  • FFMpeg

Preparation

Installation

  • Clone this repo
$ git clone git@github.com:https://github.com/gkordo/s2vs.git
$ cd s2vs
  • Install the required packages
$ pip install -r requirements.txt

Training

  • Extract the frames from the videos in the dataset used for training.
$ ffmpeg -nostdin -y -vf fps=1 -start_number 0 -q 0 ${video_id}/%05d.jpg -i <path_to_video>
  • Edit scripts/train_ssl.sh to configure the training session.

  • Choose the augmentation types you want to include during training by providing the appropriate values to the --augmentations argument. Provide a string that contains GT for Global Transformations, FT for Frame Transformations TT for Temporal Transformations and ViV for Video-in-Video.

  • Run the script as follows

$ bash scripts/train_ssl.sh
  • Once the training is over, a model.pth file will have been created in a path based on the provided experiment_path argument.

Evaluation

  • Download the datasets from the original sources:

  • Determine the pattern based on the video ids that video files are stored, e.g. {id}/video.* if it follows the pattern:

Dataset_dir
├── video_id1
│   └── video.mp4
├── video_id2
│   └── video.flv
│     ⋮
└── video_idN
    └── video.webm
$ python evaluation.py --dataset FIVR-200K --dataset_path <path_to_dataset> --pattern '{id}/video.*' --model_path <path_to_model>

or run the script with the provided features

$ python evaluation.py --dataset FIVR-200K --dataset_hdf5 <path_to_hdf5> --model_path <path_to_model>
  • If no value is given to the --model_path argument, then the pretrained s2vs_dns model is used.

Use our pretrained models

  • Usage of the model is similar to DnS and ViSiL

  • Load our pretrained models as follows:

import torch

feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC')
s2vs_dns = torch.hub.load('gkordo/s2vs:main', 's2vs_dns')
s2vs_vcdb = torch.hub.load('gkordo/s2vs:main', 's2vs_vcdb')

Citation

If you use this code for your research, please consider citing our papers:

@inproceedings{kordopatis2023s2vs,
  title={Self-Supervised Video Similarity Learning},
  author={Kordopatis-Zilos, Giorgos and Tolias, Giorgos and Tzelepis, Christos and Kompatsiaris, Ioannis and Patras, Ioannis and Papadopoulos, Symeon},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year={2023}
}

@inproceedings{kordopatis2019visil,
  title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
  author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

Visualization

For visualization examples of augmentation and similarity matrices, as well as model usage in code, have a look at this Colab notebook.

Related Projects

DnS - computational efficiency w/ selector network

ViSiL - original ViSiL approach

FIVR-200K - download our FIVR-200K dataset

License

This project is licensed under the MIT License - see the LICENSE file for details

Contact for further details

Giorgos Kordopatis-Zilos (kordogeo@fel.cvut.cz)