# TransNet: A deep network for fast detection of common shot transitions
This repository contains code for paper *TransNet: A deep network for fast detection of common shot transitions*.

If you use it in your work, please cite:


    @article{soucek2019transnet,
        title={TransNet: A deep network for fast detection of common shot transitions},
        author={Sou{\v{c}}ek, Tom{\'a}{\v{s}} and Moravec, Jaroslav and Loko{\v{c}}, Jakub},
        journal={arXiv preprint arXiv:1906.03363},
        year={2019}
    }

## How to use it?

Firstly, *tensorflow* needs to be installed.
Do so by doing:

    pip install tensorflow

If you want to run **TransNet** directly on video files, *ffmpeg* needs to be installed as well:

    pip install ffmpeg-python

You can also install *pillow* for visualization:

    pip install pillow

    
Tested with *tensorflow* v1.12.0.

In [None]:
!pip install tensorflow
!pip install ffmpeg-python
!pip install pillow

In [None]:
import ffmpeg
import numpy as np
import tensorflow as tf

from transnet import TransNetParams, TransNet
from transnet_utils import draw_video_with_predictions, scenes_from_predictions

In [None]:
# initialize the network
params = TransNetParams()
params.CHECKPOINT_PATH = "./model/transnet_model-F16_L3_S2_D256"

net = TransNet(params)

In [None]:
# export video into numpy array using ffmpeg
video_stream, err = (
    ffmpeg
    .input('test.mp4')
    .output('pipe:', format='rawvideo', pix_fmt='rgb24', s='{}x{}'.format(params.INPUT_WIDTH, params.INPUT_HEIGHT))
    .run(capture_stdout=True)
)
video = np.frombuffer(video_stream, np.uint8).reshape([-1, params.INPUT_HEIGHT, params.INPUT_WIDTH, 3])

In [None]:
# predict transitions using the neural network
predictions = net.predict_video(video)

In [None]:
print(predictions.shape)

In [None]:
# plot all 64 maps in an 8x8 squares
from matplotlib import pyplot as plt

# for i in range(544):
#     square = 16
#     ix = 1
#     for _ in range(square):
#         for _ in range(square):
#             # specify subplot and turn of axis
#             ax = plt.subplot(square, square, ix)
#             ax.set_xticks([])
#             ax.set_yticks([])
#             # plot filter channel in grayscale
#             plt.imshow(predictions[i, :, :, ix-1])
#             #plt.imshow(predictions[0, :, :, ix-1])
#             ix += 1
#     # show the figure
#     plt.show()
    
square = 16
ix = 1
for _ in range(square):
    for _ in range(square):
            # specify subplot and turn of axis
            ax = plt.subplot(square, square, ix)
            ax.set_xticks([])
            ax.set_yticks([])
            # plot filter channel in grayscale
            plt.imshow(predictions[i, :, :, ix-1])
            #plt.imshow(predictions[0, :, :, ix-1])
            ix += 1
    # show the figure
plt.show()    

In [None]:
# from matplotlib import pyplot as plt

# plt.plot(predictions, marker='o', linestyle='--')
# #plt.xlim(0,100)
# plt.show()

## Visualize results

Function `draw_video_with_predictions` displays all video frames with confidence bars for each frame. The green bar is considered as detected shot boundary (predicted value exceeds the threshold), the red bar is shown otherwise.

Function `scenes_from_predictions` returns a list of scenes for a given video. Each scene is defined as a tuple of (start frame, end frame).

As described in the paper, the threshold of `0.1` is used.

In [None]:
# # For ilustration purposes, we show only 200 frames starting with frame number 8000.
# draw_video_with_predictions(video[:], predictions[:], threshold=0.1)

In [None]:
# # Generate list of scenes from predictions, returns tuples of (start frame, end frame)
# scenes = scenes_from_predictions(predictions, threshold=0.1)

# # For ilustration purposes, only the visualized scenes are shown.
# scenes[:]