# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [19]:
#more section2_data.txt

## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


In [20]:
import json

class MyDataStruct:
    def __init__(self, filename):
        self.data = []
        with open(filename, 'r') as f:
            for line in f:
                row = json.loads(line)
                self.data.append(row)
                
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return len(self.data)


my_data_struct = MyDataStruct('section2_data.txt')
print(len(my_data_struct))
my_data_struct[0]

51


{'_i': 0,
 'frame': 'frame_000.png',
 'video': 'video000',
 'value': 39,
 'labels': ['bird']}

#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [21]:
reader = MyDataStruct('section2_data.txt')
print(len(reader))
print(my_data_struct[25], end='\n')
print(my_data_struct[42])

51
{'_i': 25, 'frame': 'frame_003.png', 'video': 'video003', 'value': 24, 'labels': ['panda']}
{'_i': 42, 'frame': 'frame_002.png', 'video': 'video004', 'value': 32, 'labels': ['panda', 'bird', 'cat']}


## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [22]:
#Previous code
import json

class MyDataStruct:
    def __init__(self, filename):
        self.data = []
        with open(filename, 'r') as f:
            for line in f:
                row = json.loads(line)
                self.data.append(row)
                
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return len(self.data)


my_data_struct = MyDataStruct('section2_data.txt')

### YOUT SOLUTION
def video_frame_count(my_data_struct):
    dict = {}
    for i in range(len(my_data_struct)):
        video_name = my_data_struct[i]['video']
        if video_name not in dict:
            dict[video_name] = 1
        else:
            dict[video_name] += 1
    return dict

video_frame_count(my_data_struct)

{'video000': 7, 'video001': 10, 'video002': 5, 'video003': 18, 'video004': 11}

#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [23]:
#Previous code
import json

class MyDataStruct:
    def __init__(self, filename):
        self.data = []
        with open(filename, 'r') as f:
            for line in f:
                row = json.loads(line)
                self.data.append(row)
                
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return len(self.data)


my_data_struct = MyDataStruct('section2_data.txt')

### YOUT SOLUTION
def video_value_sum_with_dog(my_data_struct):
    video_values = {}
    for i in range(len(my_data_struct)):
        video_name = my_data_struct[i]['video']
        if 'dog' in my_data_struct[i]['labels']:
            if video_name not in video_values:
                video_values[video_name] = my_data_struct[i]['value']
            else:
                video_values[video_name] += my_data_struct[i]['value']
    return video_values

video_value_sum_with_dog(my_data_struct)
            

{'video000': 96,
 'video001': 69,
 'video002': 91,
 'video003': 129,
 'video004': 49}

#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [24]:
#Previous code
import json

class MyDataStruct:
    def __init__(self, filename):
        self.data = []
        with open(filename, 'r') as f:
            for line in f:
                row = json.loads(line)
                self.data.append(row)
                
    def __getitem__(self, index):
        return self.data[index]
    
    def __len__(self):
        return len(self.data)


my_data_struct = MyDataStruct('section2_data.txt')

### YOUR SOLUTION
def animal_count(my_data_struct):
    count = {}
    for i in range(len(my_data_struct)):
        video_name = my_data_struct[i]['video']
        val = my_data_struct[i]['value']
        if video_name != "video003" and val % 2 == 0:
            animals = my_data_struct[i]['labels']
            for animal in animals:
                if animal not in count:
                    count[animal] = 1
                else:
                    count[animal] += 1
    return count

animal_count(my_data_struct)          
        
   

{'dog': 10, 'cat': 7, 'bird': 6, 'frog': 8, 'panda': 4}