# Cropping des images

A partir des informations des fichiers `anno_train.csv` et `anno_test.csv`, cropping des images sources et enregistrement dans un dossier `cars_cropped` (en conservant la même arborescence `cars_cropped/{nom_marque}/{image*}`

### Fonctionnnement

Les deux fichiers `annotation` contiennent des `Bounding box annotations of images`, on se sert de ces informations pour faire le resize de l'image avec un zoom sur le véhicule pour limiter le bruit et optimiser l'apprentissage du modèle.

Algo inspiré par https://www.kaggle.com/hengzheng/dog-breeds-classifier#crop-and-save-pictures

In [None]:
import os
import pandas as pd
from PIL import Image

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
drive_path = "/content/drive/MyDrive/TP_cars"

In [None]:
os.listdir(drive_path)

['anno_train.csv',
 'anno_test.csv',
 'names.csv',
 'car_data',
 'Incept_20e196c',
 'cars_cropped']

---

In [None]:
# Check train
print(len(os.listdir(drive_path+'/car_data/train/')))

n = 0
for folder in os.listdir(drive_path+'/car_data/train/'):
  n+=len(os.listdir(drive_path+'/car_data/train/'+folder))
print(n)

# 196, 8104

196
8104


In [None]:
# Check test
print(len(os.listdir(drive_path+'/car_data/test/')))

n = 0
for folder in os.listdir(drive_path+'/car_data/test/'):
  n+=len(os.listdir(drive_path+'/car_data/test/'+folder))
print(n)

# 196, 8041

196
8041


### CROPPING TRAIN

In [None]:
train_annotations = pd.read_csv(drive_path+'/anno_train.csv', header=None)
train_annotations

Unnamed: 0,0,1,2,3,4,5
0,00001.jpg,39,116,569,375,14
1,00002.jpg,36,116,868,587,3
2,00003.jpg,85,109,601,381,91
3,00004.jpg,621,393,1484,1096,134
4,00005.jpg,14,36,133,99,106
...,...,...,...,...,...,...
8139,08140.jpg,3,44,423,336,78
8140,08141.jpg,138,150,706,523,196
8141,08142.jpg,26,246,660,449,163
8142,08143.jpg,78,526,1489,908,112


In [None]:
%%time

train_folder = drive_path+'/car_data/train/'
train_cropped_folder = drive_path+'/cars_cropped/train/'

n_image =0
n_folder = 1
for folder in os.listdir(train_folder):
  print(f"Dossier : {str(n_folder)}/115")

  if not os.path.exists(train_cropped_folder+folder):
    os.makedirs(train_cropped_folder+folder)

    for filename in os.listdir(train_folder+folder):

      img_annotation = train_annotations[train_annotations[0] == filename]

      image = Image.open(train_folder+folder+'/'+filename)
      
      xmin = img_annotation[1]
      ymin = img_annotation[2]

      xmax = img_annotation[3]
      ymax = img_annotation[4]

      image = image.crop((xmin, ymin, xmax, ymax))
      image = image.convert('RGB')
      image = image.resize((224, 224))
        
      image.save(train_cropped_folder+folder+'/'+filename)

      n_image +=1
    n_folder +=1

print('Fin cropping train : ')
print(n_folder, n_image)

### CROPPING TEST

In [None]:
test_annotations = pd.read_csv(drive_path+'/anno_test.csv', header=None)
test_annotations

Unnamed: 0,0,1,2,3,4,5
0,00001.jpg,30,52,246,147,181
1,00002.jpg,100,19,576,203,103
2,00003.jpg,51,105,968,659,145
3,00004.jpg,67,84,581,407,187
4,00005.jpg,140,151,593,339,185
...,...,...,...,...,...,...
8036,08037.jpg,49,57,1169,669,63
8037,08038.jpg,23,18,640,459,16
8038,08039.jpg,33,27,602,252,17
8039,08040.jpg,33,142,521,376,38


In [None]:
%%time

test_folder = drive_path+'/car_data/test/'
test_cropped_folder = drive_path+'/cars_cropped/test/'

n_image =0
n_folder = 1
for folder in os.listdir(test_folder):
  print(f"Dossier : {str(n_folder)}/196")

  if not os.path.exists(test_cropped_folder+folder):
      os.makedirs(test_cropped_folder+folder)

  for filename in os.listdir(test_folder+folder):

    img_annotation = test_annotations[test_annotations[0] == filename]

    image = Image.open(test_folder+folder+'/'+filename)
    
    xmin = img_annotation[1]
    ymin = img_annotation[2]

    xmax = img_annotation[3]
    ymax = img_annotation[4]

    image = image.crop((xmin, ymin, xmax, ymax))
    image = image.convert('RGB')
    image = image.resize((224, 224))
    image.save(test_cropped_folder+folder+'/'+filename)

    n_image+=1
  n_folder+=1

print('Fin cropping test : ')
print(n_folder, n_image)

Dossier : 1/196
Dossier : 2/196
Dossier : 3/196
Dossier : 4/196
Dossier : 5/196
Dossier : 6/196
Dossier : 7/196
Dossier : 8/196
Dossier : 9/196
Dossier : 10/196
Dossier : 11/196
Dossier : 12/196
Dossier : 13/196
Dossier : 14/196
Dossier : 15/196
Dossier : 16/196
Dossier : 17/196
Dossier : 18/196
Dossier : 19/196
Dossier : 20/196
Dossier : 21/196
Dossier : 22/196
Dossier : 23/196
Dossier : 24/196
Dossier : 25/196
Dossier : 26/196
Dossier : 27/196
Dossier : 28/196
Dossier : 29/196
Dossier : 30/196
Dossier : 31/196
Dossier : 32/196
Dossier : 33/196
Dossier : 34/196
Dossier : 35/196
Dossier : 36/196
Dossier : 37/196
Dossier : 38/196
Dossier : 39/196
Dossier : 40/196
Dossier : 41/196
Dossier : 42/196
Dossier : 43/196
Dossier : 44/196
Dossier : 45/196
Dossier : 46/196
Dossier : 47/196
Dossier : 48/196
Dossier : 49/196
Dossier : 50/196
Dossier : 51/196
Dossier : 52/196
Dossier : 53/196
Dossier : 54/196
Dossier : 55/196
Dossier : 56/196
Dossier : 57/196
Dossier : 58/196
Dossier : 59/196
Dossie