# Section 2 - Coding

In this section we will load and manipulate "unconventional" data files - for which you will be required to create a simple loading functionality and then be able to process and query a data file.

There is a "section2_data.txt" file attached to this IPython notebook. The data file contains few rows from a computer vision dataset. Each row corresponds to a frame in a video and contains some metadata and annotations over it.

The following notebook will pose some questions about reading and processing this data.

Feel free to use any existing python library to answer the questions.

In [2]:
!head section2_data.txt

{"_i": 0, "frame": "frame_000.png", "video": "video000", "value": 39, "labels": ["bird"]}
{"_i": 1, "frame": "frame_001.png", "video": "video000", "value": 33, "labels": ["frog", "dog"]}
{"_i": 2, "frame": "frame_002.png", "video": "video000", "value": 25, "labels": ["panda", "panda"]}
{"_i": 3, "frame": "frame_003.png", "video": "video000", "value": 28, "labels": ["dog", "dog"]}
{"_i": 4, "frame": "frame_004.png", "video": "video000", "value": 16, "labels": ["cat"]}
{"_i": 5, "frame": "frame_005.png", "video": "video000", "value": 32, "labels": ["bird", "frog", "bird"]}
{"_i": 6, "frame": "frame_006.png", "video": "video000", "value": 35, "labels": ["bird", "dog"]}
{"_i": 7, "frame": "frame_000.png", "video": "video001", "value": 25, "labels": ["dog", "bird"]}
{"_i": 8, "frame": "frame_001.png", "video": "video001", "value": 16, "labels": ["dog", "panda", "bird"]}
{"_i": 9, "frame": "frame_002.png", "video": "video001", "value": 23, "labels": ["panda"]}


## Section 1 - Design a data loader

Design a data structure, that give a file path `"section2_data.txt"`, it will read and parse the contents of the file above.

#### Q1 - Design the data structure with the following properties:
- The data structure is either callable or indexable. It will accepts a single parameter, as integer, and return the parsed contents of the row corresponding to the given index.
- The data structure needs to return the number of rows in the file (and in memory) when called with the python command `len(my_data_struct)`


In [3]:
## YOUR SOLUTION
class DataLoader:
    def __init__(self, file_path):
        self.data = []
        with open(file_path, 'r') as f:
            for line in f:
                row = eval(line.strip())
                self.data.append(row)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return len(self.data)

In [4]:
my_data = DataLoader('/home/windows/Desktop/Kirana_club_intern/section2_data.txt')

In [6]:
row = my_data[0]
row

{'_i': 0,
 'frame': 'frame_000.png',
 'video': 'video000',
 'value': 39,
 'labels': ['bird']}

In [7]:
num_rows = len(my_data)
num_rows

51

#### Q2 - Prove that you can initialize the reader and then calculate its length `len(reader)` and print the 26th and 43rd elements of the dataset.

In [12]:
## YOUR SOLUTION
# Initialize the reader
reader = my_data

# Calculate the length of the reader
print(len(reader))

51


In [13]:
# Print the 26th element
print(reader[25])

# Print the 43rd element
print(reader[42])

{'_i': 25, 'frame': 'frame_003.png', 'video': 'video003', 'value': 24, 'labels': ['panda']}
{'_i': 42, 'frame': 'frame_002.png', 'video': 'video004', 'value': 32, 'labels': ['panda', 'bird', 'cat']}


## Section 2 - Process the data

#### Q1 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the number of frames of that video.

In [14]:
### YOUT SOLUTION
def video_frame_count(my_data_struct):
    frame_count = {}
    for row in my_data_struct:
        video = row['video']
        if video not in frame_count:
            frame_count[video] = 0
        frame_count[video] += 1
    return frame_count

In [15]:
video_frame_count(my_data)

{'video000': 7, 'video001': 10, 'video002': 5, 'video003': 18, 'video004': 11}

#### Q2 - Write an algorithm that will generate a dictionary with key/value pairs, where the keys are the name of each unique video in the dataset and the value is the sum of the `value` field of all the frames containing a `dog`.

In [16]:
### YOUR SOLUTION
def video_value_sum_with_dog(my_data_struct):
    video_dog_value_dict = {}
    for row in my_data_struct:
        if "dog" in row["labels"]:
            video_name = row["video"]
            if video_name not in video_dog_value_dict:
                video_dog_value_dict[video_name] = row["value"]
            else:
                video_dog_value_dict[video_name] += row["value"]
    return video_dog_value_dict

In [17]:
video_value_sum_with_dog(my_data)

{'video000': 96,
 'video001': 69,
 'video002': 91,
 'video003': 129,
 'video004': 49}

#### Q3 - Last, create an algorithm that returns a dictionary with the count of each of the animal types in the dataset, excluding occurrences in `video003` and rows where the `value` is odd.

In [19]:
### YOUR SOLUTION
def animal_count(my_data_struct):
   animal_counts = {}
   for row in my_data_struct:
       if row["video"] == "video003" or row["value"] % 2 != 0:
           continue
       for label in row["labels"]:
           if label in animal_counts:
               animal_counts[label] += 1
           else:
               animal_counts[label] = 1
   return animal_counts

In [20]:
animal_count(my_data)

{'dog': 10, 'cat': 7, 'bird': 6, 'frog': 8, 'panda': 4}