# PetFinder.my: Train image shape information

Before running machine learning, you need to know some information about the images you are going to use.

example:
- RGB or glay scale ?
- How many pixels? 1280 x 960 ?
- How much is the aspect ratio? 4:3 ?

etc.

This notebook (and output csv file / dataset) shows you the size and aspect ratio of train image.

Based on this result, I believe that image pre-processing, such as extracting only the area around the pet, or resizing the image so as not to change its aspect ratio, will be necessary for this competition.

I hope this notebook will be of some help to you :)


# Index
1. [Import libraries](#Import-libraries)
2. [Constant declaration](#Constant-declaration)
3. [Function declaration](#Function-declaration)
4. [Load data](#Load-data)
5. [Get shape of train image](#Get-shape-of-train-image)
6. [Number of pixels](#Number-of-pixels)
7. [Preprocess example](#Preprocess-example)

## Import libraries

In [None]:
import os
import ast
from typing import Tuple

from IPython.display import display
from PIL import Image
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Constant declaration

In [None]:
DIRECTORY_PATH = '../input/petfinder-pawpularity-score'
TRAIN_FOLDER_PATH = os.path.join(DIRECTORY_PATH, 'train')
TRAIN_CSV_PATH = os.path.join(DIRECTORY_PATH, 'train.csv')
TEST_FOLDER_PATH = os.path.join(DIRECTORY_PATH, 'test')
TEST_CSV_PATH = os.path.join(DIRECTORY_PATH, 'test.csv')

## Function declaration

In [None]:
def image_from_id(imgid: str, is_test: bool = False) -> np.ndarray:
    if is_test:
        img_path = os.path.join(TEST_FOLDER_PATH, f'{imgid}.jpg')
    else:
        img_path = os.path.join(TRAIN_FOLDER_PATH, f'{imgid}.jpg')
    img = cv2.imread(img_path)
    return img

def shape_from_id(imgid: str, is_test: bool = False) -> Tuple[int, int]:
    img = image_from_id(imgid, is_test=is_test)
    return img.shape[0], img.shape[1]

def aspect_from_id(imgid: str, is_test: bool = False) -> Tuple[int, int]:
    shape = shape_from_id(imgid, is_test=is_test)
    gcd = np.gcd(shape[0], shape[1])
    return shape[0] // gcd, shape[1] // gcd

def convert_ndarray_to_PIL(img: np.ndarray) -> Image.Image:
    return Image.fromarray(np.uint8(img)).convert('RGB')

def square_image_zfill(img: np.ndarray, img_size: int) -> np.ndarray:
    # [height, width, 3]
    if img.shape[0] < img.shape[1]:
        diff = img.shape[1] - img.shape[0]
        zarr1 = np.zeros((diff//2, img.shape[1], 3), dtype=np.uint8)
        zarr2 = np.zeros((diff//2 + diff%2, img.shape[1], 3), dtype=np.uint8)
        img_square = np.concatenate([zarr1, img, zarr2], axis=0)
    elif img.shape[0] > img.shape[1]:
        diff = img.shape[0] - img.shape[1]
        zarr1 = np.zeros((img.shape[0], diff//2, 3), dtype=np.uint8)
        zarr2 = np.zeros((img.shape[0], diff//2 + diff%2, 3), dtype=np.uint8)
        img_square = np.concatenate([zarr1, img, zarr2], axis=1)
    else:
        img_square = img
    return cv2.resize(img_square, (img_size, img_size))

## Load data

In [None]:
df_train = pd.read_csv(TRAIN_CSV_PATH, dtype=str)
df_test = pd.read_csv(TEST_CSV_PATH, dtype=str)

pd.set_option('display.max_columns', None)

display('train.csv')
display(df_train.head(5))
display('test.csv')
display(df_test.head(5))

train_display_id = df_train["Id"][0]
display(f'train image - Id: {train_display_id}')
display(convert_ndarray_to_PIL(image_from_id(train_display_id)))
test_display_id = df_test["Id"][0]
display(f'test image - Id: {test_display_id}')
display(convert_ndarray_to_PIL(image_from_id(test_display_id, is_test=True)))

## Get shape of train image

In [None]:
if os.path.exists('../input/petfinder-image-shape/train_img_shape.csv'):
    df_train_img_shape = pd.read_csv('../input/petfinder-image-shape/train_img_shape.csv', dtype=str)
else:
    df_train_img_shape = df_train.loc[:, ['Id']]
    df_train_img_shape.loc[:, 'img_shape'] = df_train_img_shape.loc[:, 'Id'].apply(shape_from_id).values
    df_train_img_shape.loc[:, 'aspect_ratio'] = df_train_img_shape.loc[:, 'Id'].apply(aspect_from_id).values
    df_train_img_shape.to_csv('train_img_shape.csv', index=False)

display(df_train_img_shape.head(5))
pd.set_option('display.max_rows', None)

display('aspect ratio')
df_value_counts = df_train_img_shape['aspect_ratio'].value_counts()
display(df_value_counts[df_value_counts >= 5])
pd.set_option('display.max_rows', 60)

## Number of pixels

In [None]:
df_train_img_shape.loc[:, 'img_shape'] = df_train_img_shape.loc[:, 'img_shape'].apply(ast.literal_eval).values
df_train_img_shape.loc[:, 'num_pixels'] = df_train_img_shape.loc[:, 'img_shape'].apply(lambda x: x[0]*x[1]).values

pd.set_option('display.float_format', '{:.0f}'.format)
display(df_train_img_shape['num_pixels'].describe())
display(df_train_img_shape[df_train_img_shape['num_pixels'] == 10800])
display(df_train_img_shape[df_train_img_shape['num_pixels'] == 1638400])

## Preprocess example

In [None]:
IMG_SIZE = 224
plt.figure()
for i in range(12):
    fig = plt.subplot(3, 4, i+1)
    train_display_id = df_train["Id"][i]
    img = square_image_zfill(image_from_id(train_display_id), IMG_SIZE)
    fig.imshow(img)
plt.show()