<a href="https://colab.research.google.com/github/ledduy610/uit-vsum/blob/main/VSUM_SUM_GAN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unsupervised Video Summarization

Paper: 
* AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization (IEEE TCSVT 2020) https://www.iti.gr/~bmezaris/publications/csvt20_preprint.pdf
* A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization (2019) https://www.iti.gr/~bmezaris/publications/acmmm2019_preprint.pdf
* Unsupervised Video Summarization via Attention-Driven Adversarial Learning (2020) https://www.iti.gr/~bmezaris/publications/mmm2020_lncs11961_1_preprint.pdf
* Unsupervised Video Summarization with Adversarial LSTM Networks (CVPR 2017) http://web.engr.oregonstate.edu/~sinisa/research/publications/cvpr17_summarization.pdf


Repos
* https://github.com/e-apostolidis
* https://github.com/e-apostolidis/AC-SUM-GAN
* https://github.com/e-apostolidis/SUM-GAN-sl
* https://github.com/e-apostolidis/SUM-GAN-AAE
* https://github.com/e-apostolidis/PoR-Summarization-Measure
* https://github.com/j-min/Adversarial_Video_Summary


# Mục mới

In [None]:
from google.colab import drive
drive.mount('/content/drive') 

## Link to GDrive

In [None]:

szRootDir = '/content/drive/MyDrive/0.Desktop/VSUM-Colab/' #Duy
szRootDir = '/content/drive/MyDrive/temp/' #An

%cd $szRootDir 
!pwd

## Clone repos
* https://github.com/e-apostolidis/SUM-GAN-sl.git
* https://github.com/e-apostolidis/SUM-GAN-AAE.git
* https://github.com/e-apostolidis/AC-SUM-GAN.git
* https://github.com/e-apostolidis/PoR-Summarization-Measure.git
* https://github.com/j-min/Adversarial_Video_Summary.git

In [None]:
!git clone https://github.com/e-apostolidis/SUM-GAN-sl.git

In [None]:
!git clone https://github.com/e-apostolidis/SUM-GAN-AAE.git

In [None]:
!git clone https://github.com/e-apostolidis/AC-SUM-GAN.git

In [None]:
!git clone https://github.com/e-apostolidis/PoR-Summarization-Measure.git

In [None]:
!git clone https://github.com/j-min/Adversarial_Video_Summary.git

## Data
Structured h5 files with the video features and annotations of the SumMe and TVSum datasets are available within the "data" folder. The GoogleNet features of the video frames were extracted by [Ke Zhang](https://github.com/kezhang-cs) and [Wei-Lun Chao](https://github.com/pujols) and the h5 files were obtained from [Kaiyang Zhou](https://github.com/KaiyangZhou/pytorch-vsumm-reinforce). These files have the following structure:
<pre>
/key
    /features                 2D-array with shape (n_steps, feature-dimension)
    /gtscore                  1D-array with shape (n_steps), stores ground truth improtance score (used for training, e.g. regression loss)
    /user_summary             2D-array with shape (num_users, n_frames), each row is a binary vector (used for test)
    /change_points            2D-array with shape (num_segments, 2), each row stores indices of a segment
    /n_frame_per_seg          1D-array with shape (num_segments), indicates number of frames in each segment
    /n_frames                 number of frames in original video
    /picks                    positions of subsampled frames in original video
    /n_steps                  number of subsampled frames
    /gtsummary                1D-array with shape (n_steps), ground truth summary provided by user (used for training, e.g. maximum likelihood)
    /video_name (optional)    original video name, only available for SumMe dataset
</pre>
Original videos and annotations for each dataset are also available in the authors' project webpages:
- TVSum dataset: https://github.com/yalesong/tvsum
- SumMe dataset: https://gyglim.github.io/me/vsum/index.html#benchmark

### SUM-GAN-sl

* https://github.com/e-apostolidis/SUM-GAN-sl

In [None]:
szRootDir = szRootDir + '/SUM-GAN-sl' #Duy
%cd $szRootDir 
!pwd

### Training
To train the model using one of the aforementioned datasets and for a number of randomly created splits of the dataset (where in each split 80% of the data is used for training and 20% for testing) use the corresponding JSON file that is included in the "data/splits" directory. This file contains the 5 randomly generated splits that were utilized in our experiments.

For training the model using a single split, run:
<pre>
python main.py --split_index N (with N being the index of the split)
</pre>
Alternatively, to train the model for all 5 splits, use the 'run_splits.sh' script according to the following:
<pre>
chmod +x run_splits.sh    # Makes the script executable.
./run_splits              # Runs the script.  
</pre>
Please note that after each training epoch the algorithm performs an evaluation step, and uses the trained model to compute the importance scores for the frames of each test video. These scores are then used by the provided evaluation scripts to assess the overal performance of the model (in F-Score).

The progress of the training can be monitored via the TensorBoard platform and by:
- opening a command line (cmd) and running: tensorboard --logdir=/path/to/log-directory --host=localhost
- opening a browser and pasting the returned URL from cmd

In [None]:
!pip install TensorboardX

In [None]:
%cd "$szRootDir/data/SumMe"
!unrar x eccv16_dataset_summe_google_pool5.rar

In [None]:
%cd "$szRootDir/data/TVSum"
!unrar x eccv16_dataset_tvsum_google_pool5.rar

In [None]:
import os
 
print("pwd=%s" % os.getcwd()) # old style formating

# In courtersy of : https://stackoverflow.com/questions/49264194/import-py-file-in-another-directory-in-jupyter-notebook
import sys  
sys.path.insert(0, szRootDir + "/model/")
print(sys.path)

%cd $szRootDir/model
!pwd 

In [None]:
%cd $szRootDir/model
!pwd 
!python main.py --split_index 1

In [None]:
!pwd
!ls

#Solver.evaluate didn't have the code to create subdirectory in exp folder, we have to create it ourself

!mkdir -p ../exp1/SumMe/models/split1
!mkdir -p ../exp1/SumMe/results/split1

In [None]:
sys.argv = 'main.py --split_index 1'.split()
config = get_config(mode='train')
test_config = get_config(mode='test')

print(config)
print(test_config)
print('split_index:', config.split_index)

In [None]:

# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import json
from tqdm import tqdm, trange

from layers import Summarizer, Discriminator
from utils import TensorboardWriter

# labels for training the GAN part of the model
original_label = torch.tensor(1.0).cuda()
summary_label = torch.tensor(0.0).cuda()

class Solver(object):
    def __init__(self, config=None, train_loader=None, test_loader=None):
        """Class that Builds, Trains and Evaluates SUM-GAN-sl model"""
        self.config = config
        self.train_loader = train_loader
        self.test_loader = test_loader

    def build(self):

        # Build Modules
        self.linear_compress = nn.Linear(
            self.config.input_size,
            self.config.hidden_size).cuda()
        self.summarizer = Summarizer(
            input_size=self.config.hidden_size,
            hidden_size=self.config.hidden_size,
            num_layers=self.config.num_layers).cuda()
        self.discriminator = Discriminator(
            input_size=self.config.hidden_size,
            hidden_size=self.config.hidden_size,
            num_layers=self.config.num_layers).cuda()
        self.model = nn.ModuleList([
            self.linear_compress, self.summarizer, self.discriminator])

        if self.config.mode == 'train':
            # Build Optimizers
            self.s_e_optimizer = optim.Adam(
                list(self.summarizer.s_lstm.parameters())
                + list(self.summarizer.vae.e_lstm.parameters())
                + list(self.linear_compress.parameters()),
                lr=self.config.lr)
            self.d_optimizer = optim.Adam(
                list(self.summarizer.vae.d_lstm.parameters())
                + list(self.linear_compress.parameters()),
                lr=self.config.lr)
            self.c_optimizer = optim.Adam(
                list(self.discriminator.parameters())
                + list(self.linear_compress.parameters()),
                lr=self.config.discriminator_lr)

            self.writer = TensorboardWriter(str(self.config.log_dir))

    def reconstruction_loss(self, h_origin, h_sum):
        """L2 loss between original-regenerated features at cLSTM's last hidden layer"""

        return torch.norm(h_origin - h_sum, p=2)

    def prior_loss(self, mu, log_variance):
        """KL( q(e|x) || N(0,1) )"""
        return 0.5 * torch.sum(-1 + log_variance.exp() + mu.pow(2) - log_variance)

    def sparsity_loss(self, scores):
        """Summary-Length Regularization"""

        return torch.abs(torch.mean(scores) - self.config.regularization_factor)

    criterion = nn.MSELoss()

    def train(self):
        step = 0
        for epoch_i in trange(self.config.n_epochs, desc='Epoch', ncols=80):
            s_e_loss_history = []
            d_loss_history = []
            c_original_loss_history = []
            c_summary_loss_history = []
            for batch_i, image_features in enumerate(tqdm(
                    self.train_loader, desc='Batch', ncols=80, leave=False)):

                self.model.train()

                # [batch_size=1, seq_len, 1024]
                # [seq_len, 1024]
                image_features = image_features.view(-1, self.config.input_size)

                # [seq_len, 1024]
                image_features_ = Variable(image_features).cuda()

                #---- Train sLSTM, eLSTM ----#
                if self.config.verbose:
                    tqdm.write('\nTraining sLSTM and eLSTM...')

                # [seq_len, 1, hidden_size]
                original_features = self.linear_compress(image_features_.detach()).unsqueeze(1)

                scores, h_mu, h_log_variance, generated_features = self.summarizer(original_features)

                h_origin, original_prob = self.discriminator(original_features)
                h_sum, sum_prob = self.discriminator(generated_features)

                tqdm.write(f'original_p: {original_prob.item():.3f}, summary_p: {sum_prob.item():.3f}')

                reconstruction_loss = self.reconstruction_loss(h_origin, h_sum)
                prior_loss = self.prior_loss(h_mu, h_log_variance)
                sparsity_loss = self.sparsity_loss(scores)

                tqdm.write(f'recon loss {reconstruction_loss.item():.3f}, prior loss: {prior_loss.item():.3f}, sparsity loss: {sparsity_loss.item():.3f}')

                s_e_loss = reconstruction_loss + prior_loss + sparsity_loss

                self.s_e_optimizer.zero_grad()
                s_e_loss.backward()
                # Gradient cliping
                torch.nn.utils.clip_grad_norm(self.model.parameters(), self.config.clip)
                self.s_e_optimizer.step()

                s_e_loss_history.append(s_e_loss.data)

                #---- Train dLSTM (generator) ----#
                if self.config.verbose:
                    tqdm.write('Training dLSTM...')

                # [seq_len, 1, hidden_size]
                original_features = self.linear_compress(image_features_.detach()).unsqueeze(1)

                scores, h_mu, h_log_variance, generated_features = self.summarizer(original_features)

                h_origin, original_prob = self.discriminator(original_features)
                h_sum, sum_prob = self.discriminator(generated_features)

                tqdm.write(f'original_p: {original_prob.item():.3f}, summary_p: {sum_prob.item():.3f}')

                reconstruction_loss = self.reconstruction_loss(h_origin, h_sum)
                g_loss = self.criterion(sum_prob, original_label)

                tqdm.write(f'recon loss {reconstruction_loss.item():.3f}, g loss: {g_loss.item():.3f}')

                d_loss = reconstruction_loss + g_loss

                self.d_optimizer.zero_grad()
                d_loss.backward()
                # Gradient cliping
                torch.nn.utils.clip_grad_norm(self.model.parameters(), self.config.clip)
                self.d_optimizer.step()

                d_loss_history.append(d_loss.data)

                #---- Train cLSTM ----#
                if self.config.verbose:
                    tqdm.write('Training cLSTM...')

                self.c_optimizer.zero_grad()

                # Train with original loss
                # [seq_len, 1, hidden_size]
                original_features = self.linear_compress(image_features_.detach()).unsqueeze(1)
                h_origin, original_prob = self.discriminator(original_features)
                c_original_loss = self.criterion(original_prob, original_label)
                c_original_loss.backward()

                # Train with summary loss
                scores, h_mu, h_log_variance, generated_features = self.summarizer(original_features)
                h_sum, sum_prob = self.discriminator(generated_features.detach())
                c_summary_loss = self.criterion(sum_prob, summary_label)
                c_summary_loss.backward()
                
                tqdm.write(f'original_p: {original_prob.item():.3f}, summary_p: {sum_prob.item():.3f}')
                tqdm.write(f'gen loss: {g_loss.item():.3f}')
                
                # Gradient cliping
                torch.nn.utils.clip_grad_norm(self.model.parameters(), self.config.clip)
                self.c_optimizer.step()

                c_original_loss_history.append(c_original_loss.data)
                c_summary_loss_history.append(c_summary_loss.data)
                

                if self.config.verbose:
                    tqdm.write('Plotting...')

                self.writer.update_loss(reconstruction_loss.data, step, 'recon_loss')
                self.writer.update_loss(prior_loss.data, step, 'prior_loss')
                self.writer.update_loss(sparsity_loss.data, step, 'sparsity_loss')
                self.writer.update_loss(g_loss.data, step, 'gen_loss')

                self.writer.update_loss(original_prob.data, step, 'original_prob')
                self.writer.update_loss(sum_prob.data, step, 'sum_prob')

                step += 1

            s_e_loss = torch.stack(s_e_loss_history).mean()
            d_loss = torch.stack(d_loss_history).mean()
            c_original_loss = torch.stack(c_original_loss_history).mean()
            c_summary_loss = torch.stack(c_summary_loss_history).mean()

            # Plot
            if self.config.verbose:
                tqdm.write('Plotting...')
            self.writer.update_loss(s_e_loss, epoch_i, 's_e_loss_epoch')
            self.writer.update_loss(d_loss, epoch_i, 'd_loss_epoch')
            self.writer.update_loss(c_original_loss, step, 'c_original_loss')
            self.writer.update_loss(c_summary_loss, step, 'c_summary_loss')

            # Save parameters at checkpoint
            ckpt_path = str(self.config.save_dir) + f'/epoch-{epoch_i}.pkl'
            tqdm.write(f'Save parameters at {ckpt_path}')
            torch.save(self.model.state_dict(), ckpt_path)

            self.evaluate(epoch_i)


    def evaluate(self, epoch_i):

        self.model.eval()

        out_dict = {}

        for video_tensor, video_name in tqdm(
                self.test_loader, desc='Evaluate', ncols=80, leave=False):

            # [seq_len, batch=1, 1024]
            video_tensor = video_tensor.view(-1, self.config.input_size)
            video_feature = Variable(video_tensor).cuda()

            # [seq_len, 1, hidden_size]
            video_feature = self.linear_compress(video_feature.detach()).unsqueeze(1)

            # [seq_len]
            with torch.no_grad():
                scores = self.summarizer.s_lstm(video_feature).squeeze(1)
                scores = scores.cpu().numpy().tolist()

                out_dict[video_name] = scores

            score_save_path = self.config.score_dir.joinpath(
                f'{self.config.video_type}_{epoch_i}.json')
            with open(score_save_path, 'w') as f:
                tqdm.write(f'Saving score at {str(score_save_path)}.')
                json.dump(out_dict, f)
            score_save_path.chmod(0o777)

    def pretrain(self):
        pass



In [None]:
from configs import get_config
#from solver import Solver
from data_loader import get_loader

In [None]:
sys.argv = 'main.py --split_index 1'.split()
config = get_config(mode='train')
test_config = get_config(mode='test')

print(config)
print(test_config)
print('split_index:', config.split_index)

In [None]:
train_loader = get_loader(config.mode, config.split_index)
test_loader = get_loader(test_config.mode, test_config.split_index)
solver = Solver(config, train_loader, test_loader)

In [None]:
solver.build()
solver.evaluate(-1)	# evaluates the summaries generated using the initial random weights of the network 
solver.train()

## Adversarial_Video_Summary

* https://github.com/j-min/Adversarial_Video_Summary

Không chạy được

In [None]:
szRootDir = '/content/drive/MyDrive/0.Desktop/VSUM-Colab/Adversarial_Video_Summary' #Duy
%cd $szRootDir 
!pwd

In [None]:
import os
 
print("pwd=%s" % os.getcwd()) # old style formating

# In courtersy of : https://stackoverflow.com/questions/49264194/import-py-file-in-another-directory-in-jupyter-notebook
import sys  
sys.path.insert(0, szRootDir + "/layers/")
print(sys.path)

### Train

In [None]:
!pip install tensorboardX

In [None]:
#https://github.com/j-min/Adversarial_Video_Summary/blob/master/train.py

from configs import get_config
from solver import Solver
from data_loader import get_loader

In [None]:
import sys
sys.argv=['--mode train']

In [None]:
config = get_config(mode='train')
test_config = get_config(mode='test')
print(config)


In [None]:
train_loader = get_loader(config.video_root_dir, config.mode)
test_loader = get_loader(test_config.video_root_dir, test_config.mode)
solver = Solver(config, train_loader, test_loader)

solver.build()
solver.train()

In [None]:
szDataFile = "/content/drive/MyDrive/0.Desktop/VSUM-Colab/SUM-GAN-sl/data/TVSum/eccv16_dataset_tvsum_google_pool5.h5"

In [None]:
import h5py

print (h5py.__version__)

fFile = h5py.File(szDataFile, "r")

In [None]:
print(fFile.keys())

In [None]:
i = 0
videoList = (list) (fFile.keys())
videoData = fFile[videoList[i]]
print(videoData)

In [None]:
#pick n_frames
videoData['n_frames'][()]
#videoData['gtsummary'].shape