<img src="banner.png" height = "200">
We demonstrate the multi lingual nature of the RUDDER dataset and how to use them for research purposes in this tutorial. The complete description is available on the webpage <a href="https://rudder-2021.github.io/">here</a>


We first begin with importing the required packages. <br/>
**Note**: We demonstrate only how to use the features extracted and not how to extract them.

In [1]:
import pickle
import json
import numpy as np
import seaborn as sn
import pandas as pd
import matplotlib.pyplot as plt
import random

The dataset contains captions in multiple Indian Languages namely, Marathi, Hindi, Telugu, Tamil, Kannada and Malayalam. Look at the below to load the captions and see the captions in various languages.

In [2]:
# Load captions for different language
with open('rudder_captions.json') as f:
    data = json.load(f)

cap_id, captions = random.choice(list(data.items()))

for language in captions:
    print(cap_id, language, captions[language])

0012a8f4a2.txt HINDI बोतलबंद पानी पर प्रकाश
0012a8f4a2.txt KANNADA ಬಾಟಲ್ ನೀರಿನ ಮೇಲೆ ಬೆಳಕು
0012a8f4a2.txt MALAYALAM കുപ്പിവെള്ളത്തിൽ വെളിച്ചം
0012a8f4a2.txt MARATHI लेजरचा प्रकाश बाटलीच्या पाण्यावर सोडा
0012a8f4a2.txt TAMIL பாட்டில் தண்ணீரில் ஒளி


As described in the website and in the arxiv submission. Not all videos are present in all the languages. We demonstrate below the numbers of videos that have audios in multiple languages. <br/>
**Note**: This dataset is not just bilingual, it is just for the ease we show the numbers for two languages. The code can be extended to visualise for 3 or 4 languages.

In [None]:
languages = ['HINDI', 'MARATHI', 'TAMIL', 'TELUGU']
language_count = {}
for language in languages:
    language_count[language] = []
for cap_id in data:
    for language in languages:
        if language in data[cap_id]:
            language_count[language].append(cap_id)
language_intersect = np.empty([len(languages), len(languages)], dtype = int)
for idx, language in enumerate(languages):
    for idx2, language2 in enumerate(languages):
        language_intersect[idx][idx2] = len(np.intersect1d(language_count[language], language_count[language2]))
        
plot_data = pd.DataFrame(language_intersect, index = languages,
                  columns = languages)
plt.figure(figsize = (10,7))
sn.heatmap(plot_data, annot=True, fmt='d')

We have covered the details and description of the dataset till now. We now proceed to demonstrate how to use the features extracted from various pre-trained models publicly available. We release the features extracted on our dataset for public use.

The following features for our dataset are provided:

| **S.no** 	| **Feature Type** 	|  **Feature Name**  	|                     **Pre-trained model used**                     	|
|:----:	|:------------:	|:--------------:	|:--------------------------------------------------------------:	|
|   1  	|     Video    	|       I3D      	|   <a href="https://github.com/piergiaj/pytorch-i3d">link</a>   	|
|   2  	|     Video    	|   RGB Resnext  	|                                                                	|
|   3  	|     Video    	|    RGB Senet   	|                                                                	|
|   4  	|     Video    	|  Action r2p1d  	| <a href="https://github.com/moabitcoin/ig65m-pytorch">link</a> 	|
|   5  	|     Video    	| Scene densenet 	|                                                                	|
|   6  	|     Audio    	|  Audio VGGish  	|  <a href="https://github.com/harritaylor/torchvggish">link</a> 	|
|   7  	|    Caption   	|      GloVe     	|   <a href="https://nlp.stanford.edu/projects/glove/">link</a>  	|

In [None]:
# Load I3d features()
infile = open('feats/aggregated_i3d_25fps_256px_stride25_offset0_inner_stride1/i3d-avg.pickle','rb')
data = pickle.load(infile)
infile.close()

vid_id, feat = random.choice(list(data.items()))

print("I3D", feat.shape)

In [None]:
# Load RGB features(Resnext)
infile = open('feats/aggregated_imagenet_25fps_256px_stride1_offset0/resnext101_32x48d-avg.pickle','rb')
data = pickle.load(infile)
infile.close()

vid_id, feat = random.choice(list(data.items()))

print("Resnext", feat.shape)

In [None]:
# Load RGB features(Senet)
infile = open('feats/aggregated_imagenet_25fps_256px_stride1_offset0/senet154-avg.pickle','rb')
data = pickle.load(infile)
infile.close()

vid_id, feat = random.choice(list(data.items()))

print("SENET", feat.shape)

In [None]:
# Load Action features(r2p1d)
infile = open('feats/aggregated_r2p1d_30fps_256px_stride32_offset0_inner_stride1/r2p1d-ig65m-avg.pickle','rb')
data = pickle.load(infile)
infile.close()

vid_id, feat = random.choice(list(data.items()))

print("R2p1D", feat.shape)

In [None]:
# Load Scene features(Densenet)
infile = open('feats/aggregated_scene_25fps_256px_stride1_offset0/densenet161-avg.pickle','rb')
data = pickle.load(infile)
infile.close()

vid_id, feat = random.choice(list(data.items()))

print("Densenet", feat.shape)

In [5]:
# Load different audio features
languages = ['HINDI', 'MARATHI', 'TAMIL', 'TELUGU']

for language in languages:
    infile = open('feats/aggregated_audio_feats/' + language + '_TFT.pickle','rb')
    data = pickle.load(infile)
    infile.close()

**References:**
blah blah blah

**For any communication regarding RUDDER**<br/>
**Contact**<br/>
Jayaprakash A [jayaprakash at cse dot iitb dot ac dot in] or<br/>
Abhishek [abhishek at cse dot iitb dot ac dot in]