In [2]:
# change to the root directory of the project
import os
if os.getcwd().split("/")[-1] == "examples":
    os.chdir('..')
print(os.getcwd())

import numpy as np
import pandas as pd
import random
import seaborn as sns
import matplotlib.pyplot as plt

from scandy.models.LocationModel import LocationModel
from scandy.models.ObjectModel import ObjectModel
from scandy.utils.dataclass import Dataset
import scandy.utils.functions as uf

from neurolib.utils.parameterSpace import ParameterSpace
from neurolib.optimize.evolution import Evolution

from IPython import display

/beegfs/home/users/n/nicolas-roth/ScanDy


# Visualization of simulated scanpaths

The visualizations created in this notebook are stored in the `ScanDy/visualizations/` folder.

## Load dataset
ScanDy assumes video information to be already precomputed. The paths for the precomputed maps can be provided when initializing the dataset. If no information is given, it assumes the following file structure:

```
DATAPATH/
├── videos/                 # Folder containing the videos (only for visualization)
├── featuremaps/            # Folder containing the precomputed saliency maps
    ├── molin/              #   The name of the subfolder is not required,   
    ├── TASEDnet/           #   but has to match params['featureset'].
    └── /.../ 
├── polished_segmentation/  # Folder with object segmentation masks
├── optical_flow/           # Folder with optical flow maps (e.g. PWC net)
└── gt_fov_maps_333/        # Optional, if NSS scores are to be computed 
                            # (smoothed human gaze positions)
```

In [3]:
datadict = {
    "PATH": "/scratch/nroth/VidCom/VidCom/", 
    'FPS' : 30,
    'PX_TO_DVA' : 0.06,
    'FRAMES_ALL_VIDS' : 300,
    'gt_foveation_df' : '2021-12-04_VidCom_GT_fov_df',
    "outputpath" : os.getcwd()+"/visualizations/"  # here it saves the visualizations
}
VidCom = Dataset(datadict)

## Initialize a model and specify parameters

We initialize an instance from the object-based model family. We here use low-level saliency maps from Molin et al., which corresponds to the `O_ll` model in the paper.

In [4]:
O_ll = ObjectModel(VidCom)
# low level features
O_ll.params["featuretype"] = "molin"

We initialize the free model parameters with the average parameters from the evolutionary optimization described in the manuscript.

In [5]:
O_ll.params["ddm_thres"] = 1.873
O_ll.params["ddm_sig"] = 0.241
O_ll.params["att_dva"] = 13.72
O_ll.params["ior_decay"] = 198.9
O_ll.params["ior_inobj"] = 0.76


## Run and evaluate for a single video

Given the model and the dataset, we can now run the scanpath simulation. First we only run it a single time and choose a random seed for reproducibility.

Running this should only take a few seconds.

In [6]:
O_ll.run('field03', seeds = [10])

Let's have a look at the events (i.e. saccadic decisions and resulting foveations) of the simulated scanpath.

In [7]:
O_ll.evaluate_all_to_df()
O_ll.result_df

Unnamed: 0,nfov,video,subject,frame_start,frame_end,duration_ms,x_start,y_start,x_end,y_end,object,sac_amp_dva,sac_angle_h,sac_angle_p,fov_category,ret_times
0,0,field03,seed010,0,3,133.333333,480,270,483,269,Ground,,,,B,
1,1,field03,seed010,4,6,100.0,283,446,277,447,Object 1,16.024494,138.491171,,D,
2,2,field03,seed010,7,13,233.333333,274,418,258,417,Object 1,1.749286,-95.906141,125.602687,I,
3,3,field03,seed010,14,23,333.333333,267,435,265,436,Object 1,1.207477,63.434949,159.34109,I,
4,4,field03,seed010,24,31,266.666667,369,122,358,129,Ground,19.846491,-71.674603,-135.109552,B,
5,5,field03,seed010,32,39,266.666667,266,280,260,286,Ground,10.609147,121.352762,-166.972635,B,
6,6,field03,seed010,40,48,300.0,208,433,189,434,Object 1,9.355576,109.4808,-11.871961,R,566.666667
7,7,field03,seed010,49,58,333.333333,187,381,186,387,Object 1,3.182263,-92.161079,158.35812,I,
8,8,field03,seed010,59,68,333.333333,280,265,272,262,Ground,9.240779,-52.386043,39.775036,B,
9,9,field03,seed010,69,76,266.666667,570,301,569,301,Ground,18.032471,7.456065,59.842109,B,


We can now qualitatively assess how reasonable this predicted scanpath is by plotting it on top of the "observed" video.

## Visualize the different modules for a single scanpath

The models have a method that visualizes what's going on in the different modules (I-V) of the model while the scanpath is simulated. The creation of the gif will take multiple minutes.

In [8]:
O_ll.write_sgl_output_gif('field03_Oll_mean_sglrun', slowgif=True, dpi=100)

  dv = np.float64(self.norm.vmax) - np.float64(self.norm.vmin)
  vrange = np.array([vmin, vmax], dtype=scaled_dtype)


Saved to /beegfs/home/users/n/nicolas-roth/ScanDy/visualizations/field03_Oll_mean_sglrun_slow.gif


This is then saved as a gif (specified outputpath in `Dataset`) and can be displayed in the notebook with the following command:

In [11]:
display.Image(VidCom.outputpath + 'field03_Oll_mean_sglrun_slow.gif')

This gif shows the simulated gaze position (green cross) on top of visualizations of the different modules of the model. The bottom left panel shows the object masks on top of the original video (shown with 10 fps instead of 30 fps).

(I) Precomputed low-level saliency map with anisotropic center bias. Low values are shown in dark, high values in bright colors. 

(II) Gaze dependent visual sensitivity map, Gaussian with a uniform spread across currently foveated objects. Black means not sensitive (0), white means fully sensitive (1).

(III) Visualization of the inhibition of return value of each object (attribute of the `ObjectFile` instance). White means no inhibition (0), black means fully inhibited (1).

(IV) Visualization of the decision variable of each object (attribute of the `ObjectFile` instance). The saturation of the object mask represents the amount of accumulated evidence (white corresponds to 0, dark blue/red/green/orange to the decision threshold $\theta$).

(V) The red circle indicates the next gaze position. The pixel values indicate for each object how likely each position within each object is as a saccade target (calculated from the features (I) and sensitivity (II), $F\times S$).

## Simulate and visualize multiple scanpaths
Due to the stochasticity of the scanpath generation, a single run is not sufficient to assess the quality of the model predictions. We therefore run the simulation multiple times and plot the scanpaths on top of the video.

In [18]:
O_ll.run('field03', seeds = [s for s in range(1, 13)], overwrite_old=True)
O_ll.evaluate_all_to_df(overwrite_old=True)

Unnamed: 0,nfov,video,subject,frame_start,frame_end,duration_ms,x_start,y_start,x_end,y_end,object,sac_amp_dva,sac_angle_h,sac_angle_p,fov_category,ret_times
0,0,field03,seed001,0,4,166.666667,480,270,470,278,Ground,,,,B,
1,1,field03,seed001,5,8,133.333333,287,446,278,447,Object 1,14.905261,137.447049,,D,
2,2,field03,seed001,9,16,266.666667,280,403,281,396,Object 1,2.642726,-87.397438,135.155514,I,
3,3,field03,seed001,17,23,233.333333,336,294,344,288,Ground,6.953014,-61.665758,25.731680,B,
4,4,field03,seed001,24,39,533.333333,232,152,237,169,Ground,10.570903,-129.472460,-67.806702,B,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
337,24,field03,seed012,221,230,333.333333,251,360,336,380,Object 3,5.208954,-114.498574,42.736417,I,
338,25,field03,seed012,231,241,366.666667,399,330,482,326,Object 3,4.825806,-38.437301,76.061273,I,
339,26,field03,seed012,242,249,266.666667,446,469,522,471,Object 3,8.847712,104.130480,142.567781,I,
340,27,field03,seed012,250,262,433.333333,552,363,674,352,Object 3,6.725355,-74.475889,-178.606369,I,


In [19]:
O_ll.video_output_gif('field03', 'field03_Oll_mean', slowgif=False, dpi=100)

Saved to /beegfs/home/users/n/nicolas-roth/ScanDy/examples/field03_Oll_mean.gif


In this visualization every color corresponds to a different simulated scanpath (i.e., a different random seed). The video is shown with 30 fps, as in the eye tracking data collection.

In [None]:
display.Image(VidCom.outputpath + 'field03_Oll_mean.gif')

## Location-based model

Lastly, we repeat the above steps for the location-based model with low-level features, `L_ll`.

In [3]:
L_ll = LocationModel(VidCom)

L_ll.params["featuretype"] = "molin"
L_ll.params["ddm_thres"] = 0.355
L_ll.params["ddm_sig"] = 0.013
L_ll.params["att_dva"] = 12.77
L_ll.params["ior_decay"] = 226.5
L_ll.params["ior_dva"] = 6.82

L_ll.run('field03', seeds = [10])
L_ll.evaluate_all_to_df()
L_ll.result_df

Unnamed: 0,nfov,video,subject,frame_start,frame_end,duration_ms,x_start,y_start,x_end,y_end,object,sac_amp_dva,sac_angle_h,sac_angle_p,fov_category,ret_times
0,0,field03,seed010,0,0,33.333333,480,270,480,270,Ground,,,,B,
1,1,field03,seed010,1,2,66.666667,387,260,389,262,Ground,5.612165,-173.862744,,B,
2,2,field03,seed010,3,5,100.0,296,400,294,405,Object 1,9.984728,123.976544,-62.160712,D,
3,3,field03,seed010,6,13,266.666667,284,400,282,404,Object 1,0.67082,-153.434949,82.588507,I,
4,4,field03,seed010,14,18,166.666667,195,187,190,189,Ground,14.02743,-111.846918,41.58803,B,
5,5,field03,seed010,19,31,433.333333,191,189,205,190,Ground,0.06,0.0,111.846918,B,
6,6,field03,seed010,32,38,233.333333,446,102,450,112,Ground,15.39383,-20.059425,-20.059425,B,
7,7,field03,seed010,39,45,233.333333,451,98,454,93,Ground,0.84214,-85.914383,-65.854959,B,
8,8,field03,seed010,46,48,100.0,679,324,682,317,Ground,19.348116,45.753848,131.668232,B,
9,9,field03,seed010,49,55,233.333333,677,323,687,321,Ground,0.468615,129.805571,84.051723,B,


In [4]:
L_ll.write_sgl_output_gif('field03_Lll_mean_sglrun', slowgif=True, dpi=100)

Saved to /beegfs/home/users/n/nicolas-roth/ScanDy/examples/field03_Lll_mean_sglrun_slow.gif


Analogously to the object-based model above, this gif shows the simulated gaze position (green cross) on top of visualizations of the different modules of the model. The bottom left panel shows the original video (shown with 10 fps instead of 30 fps).

(I) Precomputed low-level saliency map with anisotropic center bias. Low values are shown in dark, high values in bright colors. 

(II) Gaze dependent Gaussian visual sensitivity map. Black means not sensitive (0), white means fully sensitive (1).

(III) Inhibition of return map (value calculated for every pixel). White means no inhibition (0), black means fully inhibited (1)

(IV) Visualization of the decision variable of each pixel-location. The saturation of a pixel represents the amount of accumulated evidence (white corresponds to 0, dark red to the decision threshold $\theta$).

(V) The red circle indicates the next gaze position. The pixel values indicate the optical flow.

In [None]:
display.Image(VidCom.outputpath + 'field03_Lll_mean_sglrun_slow.gif')

In [5]:
L_ll.run('field03', seeds = [s for s in range(1, 13)], overwrite_old=True)
L_ll.evaluate_all_to_df(overwrite_old=True)
L_ll.video_output_gif('field03', 'field03_Lll_mean', slowgif=False, dpi=100)

Saved to /beegfs/home/users/n/nicolas-roth/ScanDy/examples/field03_Lll_mean.gif


In [None]:
display.Image(VidCom.outputpath + 'field03_Lll_mean.gif')

By just comparing the resulting scanpaths on this one video ("field03" is part of the test set), we can see that the location-based model is not able to appropriately capture the way humans would explore the scene. The object-based model, on the other hand, leads to scanpaths which are hard to distinguish from human scanpaths.