# Aircraft recognition using satellite images

In this first notebook, we gather and explore raw data. Then we use sliding windows on our labelled data to create a set of labeled image samples.

In [3]:
import os
import skimage
import pandas as pd
import numpy as np
import skimage.exposure
import skimage.io
import scipy.stats
import tqdm
from matplotlib import pyplot as plt

In [2]:
import khumeia
%env TP_ISAE_DATA= ./data/`
khumeia.download_train_data()

env: TP_ISAE_DATA=./data/`
[2018-11-13 08:11:25,385][tp-isae][get_data][INFO] Downloading training data
[2018-11-13 08:11:25,388][tp-isae][get_data][INFO] Downloading data from tp_isae_train_data.tar.gz to ./data/`
[2018-11-13 08:11:25,388][tp-isae][get_data][INFO] Downloading https://storage.googleapis.com/isae-deep-learning/tp_isae_train_data.tar.gz
[2018-11-13 08:11:27,075][tp-isae][get_data][INFO] Extracting tar gz
[2018-11-13 08:11:31,146][tp-isae][get_data][INFO] Done. Your training data is located here ./data/`/raw/trainval


## Hands on data & framework

## Explore data with pandas

In [4]:
RAW_DATA_DIR = os.path.join(os.environ.get("TP_ISAE_DATA"),"raw")
image_ids = pd.read_csv(os.path.join(RAW_DATA_DIR, "trainval_ids.csv"))
train_labels = pd.read_csv(os.path.join(RAW_DATA_DIR, "trainval_labels.csv"))

'image_ids' contains the 25 unique image ids: there are 25 images with planes in them.
'train_labels' contains image ids, size of the square containing a plane and coordinates (x,y) of vertice of the square

In [42]:
from khumeia.data.item import SatelliteImage
TRAINVAL_DATA_DIR = os.path.join(RAW_DATA_DIR, "trainval")
trainval_satellite_images = SatelliteImage.list_items_from_path(TRAINVAL_DATA_DIR)
trainval_collection = trainval_satellite_images[:2]

[2018-11-13 09:03:43,842][tp-isae][item][INFO] Looking in ./data/`/raw/trainval


### Data viz

In [19]:
%matplotlib inline
import khumeia.visualisation
from matplotlib import pyplot as plt

In [22]:
item = trainval_satellite_images[0]
print(item)
image = item.image
labels = item.labels

{
    "label_file": "./data/`/raw/trainval/USGS_ATL.json",
    "image_shape": [
        7852,
        6689,
        3
    ],
    "nb_labels": 35,
    "image_file": "./data/`/raw/trainval/USGS_ATL.jpg",
    "image_id": "USGS_ATL",
    "class": "SatelliteImage"
}


In [None]:
image = khumeia.visualisation.draw_bboxes_on_image(image, labels, color=(0, 255, 0))
plt.figure(figsize=(10,10))
plt.title(item.image_id)
plt.imshow(image)
plt.show()

## Generating training sets

In [24]:
import json
from khumeia.data.dataset import TilesDataset, SlidingWindow
from khumeia.data.sampler import *



In [43]:
dataset = TilesDataset(items=trainval_collection)

### Using sliding windows

In [28]:
from khumeia.utils import list_utils 

In [44]:
sliding_window = SlidingWindow(
    tile_size=64,
    stride=64,
    discard_background=False,
    padding=0,
    label_assignment_mode="center")

dataset.generate_candidates_tiles(sliding_windows=sliding_window)

[2018-11-13 09:04:58,400][tp-isae][dataset][INFO] Generating a pool of candidates tiles


HBox(children=(IntProgress(value=0, description='Applying slider', max=1, style=ProgressStyle(description_widt…

HBox(children=(IntProgress(value=0, description='On item', max=2, style=ProgressStyle(description_width='initi…

[2018-11-13 09:05:00,530][tp-isae][dataset][INFO] Candidates tiles generated ! Now sample them using Dataset.sample_tiles_from_candidates


This piece of code generated a mesh over the image on the basis of the characteristics of the sliding window we defined upon

In [45]:
item = dataset.items[1]
image = item.image
labels = item.labels

tiles = list_utils.filter_tiles_by_item(dataset.candidate_tiles, item)

In [None]:
aircrafts_tiles = list_utils.filter_tiles_by_label(tiles, "aircraft")
background_tiles = list_utils.filter_tiles_by_label(tiles, "background")

image = khumeia.visualisation.draw_bboxes_on_image(image, background_tiles, color=(255,0,0))
image = khumeia.visualisation.draw_bboxes_on_image(image, aircrafts_tiles, color=(0,0,255))
image = khumeia.visualisation.draw_bboxes_on_image(image, labels, color=(0,255,0))

plt.figure(figsize=(10,10))
plt.title(item.image_id)
plt.imshow(image)
plt.show()

In [None]:
# A demo with higher level functions in the framework
item = dataset.items[0]
tiles = dataset.candidate_tiles

image = khumeia.visualisation.draw_item_with_tiles(item, tiles)
plt.figure(figsize=(10,10))
plt.title(item.image_id)
plt.imshow(image)
plt.show()

### Selecting among candidate tiles

In [50]:
# Random sampling of 4000 tiles from our dataset
sampler = RandomSampler(nb_tiles_max=4000, with_replacement=False)
dataset.sample_tiles_from_candidates(tiles_samplers=sampler)

[2018-11-13 09:07:35,640][tp-isae][dataset][INFO] Sampling tiles


HBox(children=(IntProgress(value=0, description='Sampling tiles', max=1, style=ProgressStyle(description_width…

[2018-11-13 09:07:35,664][tp-isae][sampler][INFO] Sampling
[2018-11-13 09:07:35,688][tp-isae][dataset][INFO] Tiles sampled, now generate the dataset using Dataset.generate_tiles_dataset


### Dumping data

In [54]:
# Now dump data to keras.ImageDataGenerator format

dataset_dir = os.path.join(os.path.expandvars("$TP_ISAE_DATA"), "dataset")

## Uncomment to dump
dataset.generate_tiles_dataset(output_dir=dataset_dir,remove_first=True)

[2018-11-13 09:31:32,690][tp-isae][dataset][INFO] Generating a dataset of tiles at location ./data/`/dataset
[2018-11-13 09:31:32,695][tp-isae][dataset][INFO] Dumping tiles to ./data/`/dataset


HBox(children=(IntProgress(value=0, description='Saving tiles to ./data/`/dataset', max=2, style=ProgressStyle…