<a href="https://colab.research.google.com/github/sylvainma/Summarizer/blob/hdf5-dataset-generation/summarizer/datasets/KTS_to_uniform.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# KTS to uniform segmentation
The following notebook edits an HDF5 dataset to change the segmentation method used.  
As a result, `/change_points` and `n_frame_per_seg` will reflect a uniform segmentation of `secs_per_segment` (configurable below).  

----

Run this cell only if you are using this notebook in a standalone way, i.e. you don't already have the [Summarizer](https://github.com/sylvainma/Summarizer) code and datasets locally.

In [1]:
!git clone -l -s --single-branch --branch hdf5-dataset-generation https://github.com/sylvainma/Summarizer.git summarizer
%cd summarizer
!ls

Cloning into 'summarizer'...
remote: Enumerating objects: 429, done.[K
remote: Counting objects: 100% (429/429), done.[K
remote: Compressing objects: 100% (254/254), done.[K
remote: Total 879 (delta 297), reused 292 (delta 175), pack-reused 450[K
Receiving objects: 100% (879/879), 543.27 KiB | 1.67 MiB/s, done.
Resolving deltas: 100% (576/576), done.
/content/summarizer
README.md  summarizer


Retrieving datasets.  
You may choose to use your own, in which case you can ignore this cell, and specify the name of your dataset in the next one (parameters). 

In [2]:
%cd summarizer/datasets
!pip install -q h5py hdf5storage numpy
!python download_datasets.py

/content/summarizer/summarizer/datasets
[K     |████████████████████████████████| 61kB 4.2MB/s 
[?25hDownloading summarizer_dataset_summe_google_pool5.h5...
Downloading summarizer_dataset_tvsum_google_pool5.h5...
Downloading summarizer_dataset_LOL_google_pool5.h5...


In [0]:
#@title Parameters for uniform segmentation
#@markdown ---
#@markdown Segment length in seconds:
secs_per_segment = 2 #@param {type:"slider", min:1, max:100, step:1}
#@markdown ---
#@markdown Frames per second in original videos:
fps = 30.0 #@param {type:"number"}
#@markdown ---
#@markdown Dataset name:
dataset = 'summarizer_dataset_summe_google_pool5.h5' #@param ['summarizer_dataset_summe_google_pool5.h5', 'summarizer_dataset_tvsum_google_pool5.h5'] {allow-input: true}

Opening the HDF5 dataset for editing.

In [0]:
import h5py
h5_file = h5py.File(dataset, 'r+')

We define a lambda function to retrive the typical number of frames between two picks.

In [0]:
import numpy as np
trimmed_mean_diff = lambda x: np.mean((x - np.roll(x, 1))[1:-1])

Iterating over videos in dataset.

In [0]:
for video in h5_file:
  picks = h5_file[video]['picks'][...]
  keyshot_frequency = trimmed_mean_diff(picks)
  changepoint_duration = int(round(secs_per_segment * fps / keyshot_frequency))
  segment_limits = picks[::changepoint_duration][:-1]
  change_points = np.vstack((segment_limits, np.append(picks[::changepoint_duration][1:len(segment_limits)], [picks[-1]]))).transpose()
  del h5_file[video]['change_points']
  h5_file.create_dataset(f'{video}/change_points', data = change_points.astype(np.int32))
  picks = change_points[:, 1] - change_points[:, 0]
  del h5_file[video]['n_frame_per_seg']
  h5_file.create_dataset(f'{video}/n_frame_per_seg', data = np.array(list(picks)).astype(np.int32))

Closing the HDF5 dataset after editing to remove file lock.

In [0]:
h5_file.close()

Don't forget to download the newly created HDF5 dataset:

In [8]:
!ls -lah | grep $dataset

-rw-r--r-- 1 root root  36M Jun 13 19:45 summarizer_dataset_summe_google_pool5.h5
