# Numerical Representation of images
In this notebook we study how images are represented as arrays of numbers in the computer.

## Importing modules

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import cv2
import requests
import os
import shutil
from tqdm import tqdm
import zipfile

## Fetching the Tiny ImageNet data set

In this notebook we shall use the [Tiny ImageNet data set](https://tiny-imagenet.herokuapp.com/) to illustrate how images are stored and processed in computers. As explained in the linked web page, this data set is smaller version of the [ImageNet](http://www.image-net.org/) data set with 500 training, 50 validation and 50 test images of 200 represented classes. In this notebook we use this data set to the basic elements of image representation by computers. 

#### Generating tree structure

In [2]:
def generate_tree(generate_dirs,overwrite=False):
    if overwrite:
        pbar = tqdm(generate_dirs)
        for d in pbar:
            if os.path.isdir(d):
                pbar.set_description(f"Removing directory {d}")
                shutil.rmtree(d)
            else: 
                pbar.set_description(f"Directory {d} does not exist. Skipping.")
    pbar = tqdm(generate_dirs)    
    for d in pbar:
        if not os.path.isdir(d):
            pbar.set_description(f"Generating directory {d}")
            os.mkdir(d)
        else:
            pbar.set_description(f"Directory {d} already exists. Skipping")

In [3]:
parent_dir = "/home/rio/data_sets"
tiny_imagenet_dir = os.path.join(parent_dir, "tiny_imagenet")
generate_dirs = [parent_dir,tiny_imagenet_dir]
overwrite=False
generate_tree(generate_dirs,overwrite)

Generating directory /home/rio/data_sets/tiny_imagenet: 100%|██████████| 2/2 [00:00<00:00, 815.30it/s]


#### Downloading and extracting data


In [4]:
def fetch_data(url, save_path, unzip_path,chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, 'wb') as fd:
        pbar = tqdm(r.iter_content(chunk_size=chunk_size),desc=f"Downloading data to {save_path}")
        for chunk in pbar:
            fd.write(chunk)
    with zipfile.ZipFile(save_path, 'r') as zip_file:
        pbar = tqdm(zip_file.namelist(),total=len(zip_file.namelist()),desc = f"Extracting to {unzip_path}")
        for file in pbar:
            zip_file.extract(member=file, path=unzip_path)

        # Extract each file to another directory
        # If you want to extract to current working directory, don't specify path
        #zip_file.extract(member=file, path=directory)
        #print(f"Extracting to {unzip_path}")
        #zip_ref.extractall(unzip_path)
    
            

In [5]:
%%time
tiny_imagenet_url = "http://cs231n.stanford.edu/tiny-imagenet-200.zip"
save_path = os.path.join(tiny_imagenet_dir, "tiny-imagenet-200.zip")
unzip_path = os.path.join(tiny_imagenet_dir)
fetch_data(tiny_imagenet_url, save_path, unzip_path,chunk_size=128)

Downloading data to /home/rio/data_sets/tiny_imagenet/tiny-imagenet-200.zip: 1938282it [00:47, 40446.39it/s]
Extracting to /home/rio/data_sets/tiny_imagenet: 100%|██████████| 120609/120609 [00:08<00:00, 13929.20it/s]

CPU times: user 30.3 s, sys: 3.68 s, total: 34 s
Wall time: 58.2 s





## Loading and displaying images

Let us start 