Implementation of our ECCV 2016 Paper (Video Summarization with Long Short-term Memory)
Switch branches/tags
Nothing to show
Clone or download
Latest commit 0ee0a09 Apr 23, 2017
Failed to load latest commit information.
codes update code and data Apr 22, 2017
data data Apr 23, 2017
models update code and data Apr 22, 2017 update code and data Apr 22, 2017 Update Apr 23, 2017

Video Summarization with LSTM

This repository provides the data and implementation for video summarization with LSTM, i.e. vsLSTM and dppLSTM in our paper:

Video Summarization with Long Short-term Memory
Ke Zhang*, Wei-Lun Chao*, Fei Sha, and Kristen Grauman.
In Proceedings of the European Conference on Computer Vision (ECCV), 2016, Amsterdam, The Netherlands. (*Equal contribution) [pdf] [supp]

If you find the codes or other related resources from this repository useful, please cite the following paper:

  title={Video summarization with long short-term memory},
  author={Zhang, Ke and Chao, Wei-Lun and Sha, Fei and Grauman, Kristen},


  • MAC OS X or Linux
  • NVIDIA GPU with compute capability 3.5+
  • Python 2.7+
  • Theano 0.7+
  • Matlab


Download the data and unzip to ./data/

Note that we down-sampled the original video by 2fps.

  1. file name: in the format 'Data_$Dataset$_google_p5.h5', e.g. Data_SumMe_google_p5.h5, means the frame level feature of SumMe dataset.
  2. the index of videos are stored as ‘idx’ in the file, in most cases it’s from 1 to n, where n is the number of videos in the dataset (except for Youtube dataset).
  3. feature & ground-truth: the feature is indexed as ‘fea_i’ , the importance is indexed as ‘gt_1_i’ (real number, from the original dataset), and the keyframe we used is indexed as ‘gt_2_i’ (binary value transferred from the original dataset) for the i-th video in the dataset.

Original videos and annotations for each dataset are also available from the the authors' project page


dppLSTM for video summarization

We have enclosed pre-trained models in the ./model directory download the model and run the following commands:

Download the pre-trained models and unzip it to ./models and run the following commands:

cd ./codes
THEANO_FLAGS=device=gpu0,floatX=float32 python 

This will automatically run summarization on the video data using pre-trained model, and save the results in ./res_LSTM/ as dppLSTM_$DATASET$_2_inference.h5

If you want to train the model on your own data, just uncomment Line 85 in

train(model_idx = model_idx, train_set = train_set, val_set = val_set, model_saved = model_file)


For both SumMe and TVSum datasets, you can find the code for evaluation provided by the author:

We also provided the evaluation code with wrappers that help adapt to the datasets above

To run evaluation on the predicted summarization, start the matlab and run the following commands:

cd ./codes
dppLSTM_eval('../data/', '$DATASET$', '/dppLSTM_$DATASET$_2_inference.h5')