![](pictures/logo_ipp.png)

# RAMP on advertising spotting

*Augustin Combes, Lucas Duchassin, Marc Veron-Tarabeux, Richard Boggio, Thibaut Valour*

## Introduction

Identifying commercial blocks in news videos is a crucial task in television broadcast analysis and monitoring, and it can be a tedious and time-consuming task if done manually. This challenge can be addressed by applying machine learning-based methods. The aim of this RAMP challenge is to classify TV news commercials, which is a semantic video classification problem.

The dataset for this challenge contains features extracted from 150 hours of broadcast news videos from five different news channels: CNNIBN, NDTV 24X7, TIMESNOW, BBC, and CNN. Three Indian and two international news channels were recorded concurrently, and the feature file preserves the order of occurrence of shots.

The features extracted include audio and visual features from each video shot, such as short-term energy, zero-crossing rate, spectral centroid, spectral flux, spectral roll-off frequency, fundamental frequency, video shot length, screen text distribution, motion distribution, frame difference distribution, edge change ratio, and MFCC bag of audio words.

In this notebook, we will first present the available data and features. Then, we will present a baseline model for this challenge that aims to predict whether a given video shot is a commercial or not. The goal is to maximize the AUC score on the test set.

![](pictures/news.jpg)


## Software prerequisites

This starting kit requires dependencies, they can be installed using the following command at the root of the starting kit: 

    pip install -r requirements.txt

In [2]:
import numpy as np
import pandas as pd

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.base import BaseEstimator, TransformerMixin

from problem import get_train_test

## The Data

Regarding video features, commercial shots are usually short in length with fast visual transitions and overlaid text bands. The length of the shot is directly used as a feature, and the placement of overlaid text bands is represented by a 15-dimensional text distribution feature. Motion distribution, frame change distribution, and edge change ratio are used to capture the dynamic nature of commercial shots.

Motion distribution is obtained by computing dense optical flow and constructing a distribution of flow magnitudes over the entire shot. Frame difference distribution is used to capture sudden changes in pixel intensities. Edge change ratio is defined as the ratio of displaced edge pixels to the total number of edge pixels in a frame.

Regarding audio features, several commonly used features in audio signal processing are used, such as short time energy, zero crossing rate, spectral centroid, spectral roll-off, spectral flux, and fundamental frequency. In addition, a bag-of-audio-words feature is also used to capture the distribution of audio words.


#### feature table:


- 1 Shot Length
- 2-3. Motion Distribution (Mean and Variance)
- 4-5. Frame Difference Distribution (Mean and Variance)
- 6-7. Short time energy (Mean and Variance)
- 8-9. Zero-crossing rate (ZCR) (Mean and Variance)
- 10-11. Spectral Centroid (Mean and Variance)
- 12-13. Spectral Roll off (Mean and Variance)
- 14-15. Spectral Flux (Mean and Variance)
- 16-17. Fundamental Frequency (Mean and Variance)
- 18-58. Motion Distribution (40 bins)
- 59-91. Frame Difference Distribution (32 bins)
- 92-122. Text area distribution (15 bins for Mean and 15 bins for Variance)
- 123-4123. Bag of Audio Words (4000 bins)
- 4124-4125. Edge Change Ratio (Mean and Variance)

In [3]:
X_train, X_test, y_train, y_test = get_train_test()

X_train, X_test = X_train.drop(columns=['channel']), X_test.drop(columns=['channel'])
X_train, X_test = X_train.fillna(0), X_test.fillna(0) 

X_train.head()

Unnamed: 0,shot_length,motion_distribution_mean,motion_distribution_variance,frame_difference_distribution_mean,frame_difference_distribution_variance,short_time_energy_mean,short_time_energy_variance,ZCR_mean,ZCR_variance,spectral_centroid_mean,...,bag_of_audio_words_3993,bag_of_audio_words_3994,bag_of_audio_words_3995,bag_of_audio_words_3996,bag_of_audio_words_3997,bag_of_audio_words_3998,bag_of_audio_words_3999,bag_of_audio_words_4000,edge_change_ratio_mean,edge_change_ratio_variance
0,39.0,0.358888,0.179183,2.207158,1.448997,0.011593,0.01,0.135016,0.092428,3539.445312,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.41314,0.485093
1,354.0,1.261189,0.785149,5.846283,5.492986,0.018532,0.011248,0.126942,0.069577,3912.496582,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.453395,0.277466
2,39.0,0.977077,1.535457,7.017412,10.836514,0.017693,0.010044,0.122276,0.054648,3576.808838,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.709121,0.679148
3,164.0,2.187399,2.278633,13.425022,16.9335,0.015677,0.009293,0.083365,0.047887,3843.954834,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.201829,0.504669
4,25.0,6.117249,5.267529,26.422398,14.66175,0.01366,0.006832,0.07075,0.046628,3334.407227,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.581734,0.411938


In [4]:
print(f'Proportion of 1s in the training set: {y_train.mean():.2f}')

Proportion of 1s in the training set: 0.63


# Baseline model

In [5]:
class Scaler(BaseEstimator, TransformerMixin):
    def __init__(self, scaler):
        self.scaler = scaler

    def fit(self, X, y=None):
        self.scaler.fit(X)
        return self

    def transform(self, X_df):
        idx = X_df.index
        col = X_df.columns
        res = self.scaler.transform(X_df)
        res = pd.DataFrame(res,index=idx,columns=col)

        return res
  
scaling = Scaler(StandardScaler())
logreg = LogisticRegression(max_iter=200)

pipe = make_pipeline(scaling, logreg)

In [6]:
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)
probas = pipe.predict_proba(X_test)

In [7]:
print("Accuracy: {}".format(accuracy_score(y_test, y_pred)))
print('AUC score: {}'.format(roc_auc_score(y_test, probas[:,1])))

Accuracy: 0.8839497981976004
AUC score: 0.9380675274395616


In [8]:
!ramp-test --submission starting_kit

[38;5;178m[1mTesting TV Commercial Classification Challenge[0m
[38;5;178m[1mReading train and test files from ./data/ ...[0m
[38;5;178m[1mReading cv ...[0m
[38;5;178m[1mTraining submissions\starting_kit ...[0m
[38;5;178m[1mCV fold 0[0m
	[38;5;178m[1mscore   auc  bal_acc   f1  pw_prec  pw_rec       time[0m
	[38;5;10m[1mtrain[0m  [38;5;10m[1m[38;5;150m0.9[0m4[0m     [38;5;150m0.75[0m  [38;5;150m1.0[0m      [38;5;150m0.9[0m    [38;5;150m0.9[0m2  [38;5;150m56.880427[0m
	[38;5;12m[1mvalid[0m  [38;5;12m[1m[38;5;105m0.9[0m4[0m     [38;5;105m0.74[0m  [38;5;105m1.0[0m      [38;5;105m0.9[0m    [38;5;105m0.9[0m2   [38;5;105m3.431410[0m
	[38;5;1m[1mtest[0m   [38;5;1m[1m[38;5;218m0.9[0m4[0m     [38;5;218m0.74[0m  [38;5;218m1.0[0m      [38;5;218m0.9[0m    [38;5;218m0.9[0m2   [38;5;218m0.9[0m21266
[38;5;178m[1mCV fold 1[0m
	[38;5;178m[1mscore   auc  bal_acc   f1  pw_prec  pw_rec       time[0m
	[38;5;10m[1mtrain[0m  [38;