# Training a model on the Cambridge Air Photos Collection

This notebook uses the [Tensorflow for poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/) tutorial to train a new model for classifying images in the Tribune collection.

First we'll clone the code repository.

In [1]:
!git clone https://github.com/googlecodelabs/tensorflow-for-poets-2

Cloning into 'tensorflow-for-poets-2'...
remote: Enumerating objects: 417, done.[K
remote: Total 417 (delta 0), reused 0 (delta 0), pack-reused 417[K
Receiving objects: 100% (417/417), 33.97 MiB | 14.18 MiB/s, done.
Resolving deltas: 100% (155/155), done.
Checking out files: 100% (142/142), done.


Now let's move into the new directory

In [2]:
cd tensorflow-for-poets-2

/home/jovyan/tensorflow-for-poets-2


## Our categories

For our initial experiment we're going to try and distinguish between two categories — protests and portraits.

In [3]:
img_sets = {
    'protests': ['FL4520808', 'FL4520807', 'FL4520809', 'FL4520810', 'FL4520811', 'FL4520812', 'FL4520813', 'FL4520814', 'FL4520816', 'FL4520817', 'FL4520818', 'FL4520820', 'FL4520821', 'FL4520822', 'FL4520823', 'FL4520825', 'FL4520826', 'FL4520827', 'FL4520828', 'FL4520829', 'FL4520830', 'FL4520832', 'FL4520833', 'FL4520834', 'FL4520835', 'FL4520836', 'FL4562467', 'FL4562470', 'FL4562473', 'FL4562477', 'FL4562493', 'FL4562496', 'FL4562498', 'FL4562502', 'FL4562504', 'FL4562506', 'FL4562507', 'FL4562514', 'FL4562526', 'FL4562531', 'FL4562534', 'FL4562538', 'FL4562543', 'FL4562548', 'FL4431373', 'FL4431375', 'FL4431376', 'FL4431377', 'FL4431405', 'FL4431403', 'FL4534782','FL4534784','FL4534786','FL4534787','FL4534789','FL4548906','FL4548908','FL4548910','FL4548914','FL4548915','FL4548916','FL4548918','FL4548919','FL4548920','FL4548924','FL4581459','FL4581460','FL4581461','FL4581462','FL4581463','FL4581468','FL4581469','FL4581470','FL4581471','FL4581473','FL4581474','FL4581475','FL4581477','FL4581478','FL4581481','FL4544430','FL4544432','FL4544435','FL4544437','FL4544438','FL4544439','FL4544441','FL4544448','FL4528139','FL4528140','FL4528141','FL4528142','FL4528143','FL4528144','FL4527324','FL4527326','FL4527329','FL4527333','FL4527335','FL4530238'],
    'portraits': ['FL4549209', 'FL4564140', 'FL4549684', 'FL4545567', 'FL4488477', 'FL4545569', 'FL4534794', 'FL4510388', 'FL4513567', 'FL4513591', 'FL4513594', 'FL4468261', 'FL4531198', 'FL4531240', 'FL4517378', 'FL4517384', 'FL4529746', 'FL4512049', 'FL4512055', 'FL4485185', 'FL4487605', 'FL4487592', 'FL4485540', 'FL4484944', 'FL4484950', 'FL4481774', 'FL4481787', 'FL4478835', 'FL4486661', 'FL4486662', 'FL4474330', 'FL4474354', 'FL4480349', 'FL4480384', 'FL4486300', 'FL4473256', 'FL4474185', 'FL4474152', 'FL4479422', 'FL4479449', 'FL4474018', 'FL4472433', 'FL4479794', 'FL4466608', 'FL4466614', 'FL4450989', 'FL4489424', 'FL4480459', 'FL4588049', 'FL4492349', 'FL4502482', 'FL4491527', 'FL4444441', 'FL4490697', 'FL4433631', 'FL4434468', 'FL4430650', 'FL4430652', 'FL4468274', 'FL4529677', 'FL4532361', 'FL4495950', 'FL8797006', 'FL4522775', 'FL4517556', 'FL4517563', 'FL4518600', 'FL4515829', 'FL4515847', 'FL4519602', 'FL4424262', 'FL4424263', 'FL4424264', 'FL4424278', 'FL4424279', 'FL4588015', 'FL4588016', 'FL4588017', 'FL4537870', 'FL4537872', 'FL4537873', 'FL4537874', 'FL4537878', 'FL4537880', 'FL4537881', 'FL4537882', 'FL4537883', 'FL4537888', 'FL4537889', 'FL4537891', 'FL4537895', 'FL4537896', 'FL4537897', 'FL4537899', 'FL4537902', 'FL4537906', 'FL4537907', 'FL4537909', 'FL4537911', 'FL4540963', 'FL4540964', 'FL4540966', 'FL4540970', 'FL4540972', 'FL4540973', 'FL4540975', 'FL4539968', 'FL4539969', 'FL4539970', 'FL4539971', 'FL4539972', 'FL4539974', 'FL4539988', 'FL4539989', 'FL4490339', 'FL4538816', 'FL4538817', 'FL4538818', 'FL4538825', 'FL4538826', 'FL4538827', 'FL4538828', 'FL4538829', 'FL4538838', 'FL4538839', 'FL4538840', 'FL4538841']
}

Download the training images.

In [4]:
import os
from urllib.parse import urlparse
from tqdm import tqdm_notebook
import requests
# Download training images
for img_set in ['protests', 'portraits']:
    img_dir = os.path.join('tf_files', 'tribune', img_set)
    os.makedirs(img_dir, exist_ok=True)
    for img in tqdm_notebook(img_sets[img_set]):
        img_url = 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/500/{0}-500.jpg'.format(img)
        parsed = urlparse(img_url)
        filename = os.path.join(img_dir, os.path.basename(parsed.path))
        response = requests.get(img_url, stream=True)
        with open(filename, 'wb') as fd:
            for chunk in response.iter_content(chunk_size=128):
                fd.write(chunk)

HBox(children=(IntProgress(value=0), HTML(value='')))




HBox(children=(IntProgress(value=0, max=127), HTML(value='')))




In [37]:
ls tf_files/tribune

[0m[01;34mearthworks[0m/  [01;34mnature_reserve[0m/  [01;34mportraits[0m/  [01;34mprotests[0m/


Run this in a terminal, Jupyter doesn't allow background processes...

I'm assuming this won't be possible on Binder?

```
tensorboard --logdir tf_files/training_summaries &
```

## Train the model

In [38]:
%%bash
IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"

python -m scripts.retrain \
  --bottleneck_dir=tf_files/bottlenecks \
  --how_many_training_steps=500 \
  --model_dir=tf_files/models/ \
  --summaries_dir=tf_files/training_summaries/"${ARCHITECTURE}" \
  --output_graph=tf_files/tribune_graph.pb \
  --output_labels=tf_files/tribune_labels.txt \
  --architecture="${ARCHITECTURE}" \
  --image_dir=tf_files/tribune

INFO:tensorflow:Looking for images in '.ipynb_checkpoints'
INFO:tensorflow:Looking for images in 'earthworks'
INFO:tensorflow:Looking for images in 'nature_reserve'
INFO:tensorflow:Looking for images in 'portraits'
INFO:tensorflow:Looking for images in 'protests'
2019-03-05 02:19:29.639171: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:100 bottleneck files created.
INFO:tensorflow:Creating bottleneck at tf_files/bottlenecks/nature_reserve/aae95.jpg_mobilenet_0.50_224.txt
INFO:tensorflow:Creating bottleneck at tf_files/bottlenecks/nature_reserve/aah81.jpg_mobilenet_0.50_224.txt
INFO:tensorflow:Creating bottleneck at tf_files/bottlenecks/nature_reserve/aag71.jpg_mobilenet_0.50_224.txt
INFO:tensorflow:Creating bottleneck at tf_files/bottlenecks/nature_reserve/aag77.jpg_mobilenet_0.50_224.txt
INFO:tensorflow:Creating bottleneck at tf_files/bottlenecks/nature_reserve/aag33.jpg

## Test the trained model

First let's test against the training set.

In [7]:
# Make a list of all the test images
import os
import random
from IPython.display import display, HTML
imgs = []
data_dir = 'tf_files/tribune/'
for img_dir in [d for d in os.listdir(data_dir) if os.path.isdir(os.path.join(data_dir, d))]:
    for img in [i for i in os.listdir(os.path.join(data_dir, img_dir)) if i[-4:] == '.jpg']:
        imgs.append(os.path.join(data_dir, img_dir, img))    

In [39]:
# Choose one image at random
img = random.sample(imgs, 1)[0]
display(HTML('<img src="tensorflow-for-poets-2/{0}"><br>{0}'.format(img)))

In [40]:
!python -m scripts.label_image --graph=tf_files/tribune_graph.pb --labels=tf_files/tribune_labels.txt --image=$img

2019-03-05 02:21:05.395653: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Evaluation time (1-image): 1.816s

earthworks (score=0.63679)
nature reserve (score=0.36291)
protests (score=0.00030)
portraits (score=0.00000)


## Test against a randomly selected image from the complete collection

Let's see how our model goes against images it's never seen before...

In [10]:
# Load Tribune images data
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/ozglam-data-records-of-resistance/master/data/images.csv')

In [11]:
# Set up a directory for test images
test_dir = os.path.join('tf_files', 'tribune_tests')
os.makedirs(test_dir, exist_ok=True)
images = df['images']

In [12]:
# get a random image
img = images.sample(1).iloc[0]
img_url = 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/500/{0}-500.jpg'.format(img)
filename = os.path.join(test_dir, '{}-500.jpg'.format(img))
response = requests.get(img_url, stream=True)
with open(filename, 'wb') as fd:
    for chunk in response.iter_content(chunk_size=128):
        fd.write(chunk)
display(HTML('<img src="tensorflow-for-poets-2/{0}"><br>{0}'.format(filename)))

In [43]:
display(HTML('<img src="tensorflow-for-poets-2/tf_files/tribune_tests/luskville.png">'))


In [44]:
!python -m scripts.label_image --graph=tf_files/tribune_graph.pb --labels=tf_files/tribune_labels.txt --image="/home/jovyan/tensorflow-for-poets-2/tf_files/tribune_tests/luskville.png"

2019-03-05 02:23:44.905957: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Evaluation time (1-image): 1.760s

nature reserve (score=0.99554)
earthworks (score=0.00396)
protests (score=0.00050)
portraits (score=0.00001)
