Google Landmark Recognition - Image Shape Distribution
---
The train dataset contains ~1.5 million images in varying shapes. While the majority of the images have an aspect ratio of ~1, some images are in the form of horizontal or vertical strips. This notebook presents some examples of such images. 

Traditionally, convolutional neural networks (CNNs), used in the state of the art recognition solutions, are trained on square images. Therefore, the images have to be resized during preprocessing step. The methods (cropping, zeropadding, or a combination of both) used to resize these vertical (or horizontal) strips may affect the performance of CNNs.

In [None]:
import os
import imageio
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
# load the train_csv file with the image properties
base_dir = '../input/landmark-recognition-2020/'
train_csv = pd.read_csv('../input/landmark-recognition-multiprocessing-image-size/train_featured.csv')

In [None]:
# This swaps the values of xsize with ysize columns as they were assigned wrongly in a previous kernel.
ys_temp = train_csv.xsize.copy()
train_csv.xsize = train_csv.ysize
train_csv.ysize = ys_temp
del ys_temp

In [None]:
train_csv.head()

## Xsize vs Ysize
---

In [None]:
g = sns.jointplot(x="xsize", y="ysize", data=train_csv)

In [None]:
print('Range of xsize and ysize:')
print('---'*8)
print(train_csv.xsize.min(), '<= xsize <=', train_csv.xsize.max())
print(train_csv.ysize.min(), '<= ysize <=', train_csv.ysize.max())

In [None]:
print('The most commonly observed 3 values of xsize and ysize:')
xcount = train_csv.xsize.value_counts()
ycount = train_csv.ysize.value_counts()
print(xcount[:3])
print(ycount[:3])

In [None]:
train_csv[(train_csv.ysize < 100) & (train_csv.xsize > 600)]

In [None]:
def load_image(idx):
    impath = base_dir + 'train/' + '/'.join(list(idx[:3])) + '/' + idx + '.jpg'
    return imageio.imread(impath)

In [None]:
plt.imshow(load_image(train_csv.loc[43027, 'id']))
plt.show()

In [None]:
plt.imshow(load_image(train_csv.loc[491551, 'id']))
plt.show()

In [None]:
plt.imshow(load_image(train_csv.loc[1289336, 'id']))
plt.show()

In [None]:
train_csv[(train_csv.xsize < 200) & (train_csv.ysize > 600)]

In [None]:
plt.imshow(load_image(train_csv.loc[338829, 'id']))
plt.show()

In [None]:
plt.imshow(load_image(train_csv.loc[546929, 'id']))
plt.show()

In [None]:
plt.imshow(load_image(train_csv.loc[965744, 'id']))
plt.show()