Skip to content

DeepGaze MR: Model presented in the ECCV 2020 paper "Measuring the Importance of Temporal Features in Video Saliency"

License

Notifications You must be signed in to change notification settings

mtangemann/deepgazemr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepGaze MR

This repository provides the video saliency model DeepGaze MR and the meta-benchmark proposed in the following paper:

@InProceedings{tangemann2020,
  author = {Matthias Tangemann and Matthias Kümmerer and Thomas S.A. Wallis and Matthias Bethge},
  title = {Measuring the Importance of Temporal Features in Video Saliency},
  booktitle = {The European Conference on Computer Vision (ECCV)},
  month = {August},
  year = {2020}
}

Usage

DeepGaze MR is available on PyTorch Hub. It can be easily used without having to clone this repository. However, you have to make sure that all dependencies listed in requirements.txt are satisfied.

model = torch.hub.load('mtangemann/deepgazemr', 'DeepGazeMR', pretrained=True)
model.to(device)

When loading the model with pretrained=True (the default), you will get the model with the weights and center bias from the LEDOV dataset. This is exactly the model which has been used in the paper.

Input videos are expected as float tensors of shape T x C x H x W in the range [0.0,1.0]. DeepGaze MR takes care of correctly normalizing the features for the VGG network. The following example shows how to preprocess a video when using scikit-video:

video = skvideo.io.vread('file.mp4')
video = torch.from_numpy(video).type(torch.float32)
video = video.permute(0, 3, 1, 2) / 255.0
video = video.to(device)

There are two ways how to use DeepGaze MR for predicting human gaze. The forward method expects a clip of 16 frames and returns the predicted probability distribution for human gaze on the last frame in the window:

clip = video[0:16]
prediction = model.forward(clip)
print(prediction.shape)  # e.g. [360, 640], matching the input resolution

The predict method is used to predict gaze for full videos. This method is optimized to not compute features for the same frame multiple times when shifting the window. So it is much faster than naively using forward for all windows. The predict method returns an iterator over all predictions for the input video. Due to the windowed approach, the predictions for the first 15 frames will be None.

for i, prediction in enumerate(model.predict(video)):
  if prediction is not None:
    # do something with the prediction for frame i

When transferring DeepGaze MR to different datasets than LEDOV, you have to provide the correct center bias for that dataset (even when using the pretrained model). The center bias is expected to be a probability distribution of shape H x W. If the resolution of the center bias does not match the resolution of the input video, DeepGaze MR will scale it accordingly.

center_bias = torch.load(...)  # normalized tensor of shape HxW

model = torch.hub.load('mtangemann/deepgazemr', 'DeepGazeMR', pretrained=True, center_bias=center_bias)
model.to(device)

Alternatively, the custom center bias can also be passed to the forward and predict methods:

prediction = model.forward(clip, center_bias=center_bias)

# or

iterator = model.predict(video, center_bias=center_bias)

Meta-Benchmark

DeepGaze MR was used to create a meta benchmark based on the LEDOV and DIEM datasets. The meta benchmark consists of those frames where at least 1bit of explainable information was not explained by DeepGaze MR. The files in the meta-benchmark directory contain the frame indices of the meta benchmark as described in the paper.

Each CSV file contains 3 columns named video, frame and mask for the video name, the frame number and whether the respective frame is included in the meta-benchmark (True=included, False=not included). Please note that we resampled the videos in the LEDOV dataset to 30Hz in our study. The given frame numbers refer to the resampled videos.

To evaluate your model on this meta-benchmark, please first apply it to the full videos of the respective dataset. Then calculate the performances using regular image saliency metrics for those frames contained in the meta-benchmark. Final results are then obtained by first averaging the scores over frames, and then over videos.

Contact

If you have any questions, please contact matthias.tangemann@bethgelab.org or create an issue on GitHub.

License

Licensed under the MIT License.

If you use this model in your work, please cite the original paper mentioned above.

About

DeepGaze MR: Model presented in the ECCV 2020 paper "Measuring the Importance of Temporal Features in Video Saliency"

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages