# Generate Data for Training

Before running this notebook, grab the LabelMe data from [https://vision.csi.miamioh.edu/labelme.zip](https://vision.csi.miamioh.edu/labelme.zip) and extract it somewhere. 
> *NOTE:* I prefer to put my data on a large external drive and then soft-link it to a local 'data' folder. For example 
```bash
ln -s /media/${USER}/external-drive/data ./data
```

The next cell **will fail** on your system, replace the paths to the labelme INPUT and the data OUTPUT with the paths to folders on your own system (e.g. a large drive)

In [None]:
!ln -sf "/media/femianjc/My Book/independant_12_layers/data/training/" ./data
!ln -sf "/media/femianjc/My Book/labelme/" ./labelme

Each image has a corresponding XML file. 
My script to produce training data takes in list of XML files, let's generate a comprehensive list...

In [None]:
%%bash
pushd ./labelme/Annotations/ >/dev/null
ls -1 */*.xml > files.txt
popd >/dev/null
mv labelme/Annotations/files.txt .
echo Found $(cat files.txt | wc -l) files...
echo
head files.txt
echo ...
tail files.txt

Okay, so when I produced the data for the labelers I highlighted the part of the image I wanted them to label. As a result, the labelme tool may have recorded the path to the highlighted image instead of the original, so we will want to fix that. 

In [None]:
%pylab notebook
import os

In [None]:
xml = 'facades-2017-07-21/honolulu_hawaii-002943-000004-8HfFc2j4u0BaBAaaYeNy1w-facade-01-highlighted.xml'
hl_jpg = os.path.join('labelme/Images', xml.replace('.xml', '.jpg'))
nohl_jpg = hl_jpg.replace('highlighted', 'original')
mask_jpg = hl_jpg.replace('highlighted', 'mask')

figsize(10, 5)
subplot(221)
imshow(imread(nohl_jpg))
title("No highlights")
subplot(222)
imshow(imread(hl_jpg))
title("Highlighted")
subplot(223)
imshow(imread(mask_jpg))
title("Mask")
show()

It may be difficult to tell, but the images above are the result of the following process:
1. I asked the labelers to outline the dominant, camera-facing facades (within 15 deg). 
2. I automatically calculated the homography using the approach of [Affara et al](TBD).
3. I warpedthe image and rendered out the (supposedly rectified) images shown in the figure above. 
4. The labelers continued labeling features in the (supposedly rectified) images.

## Replace 'highlighted' by 'original' XML's

**ALERT: ** I have **already backed up** my data so I feel comfortable modifying this in-place. You should consider doing the same. 

In [None]:
xmls = [os.path.join('labelme/Annotations', xml.strip()) for xml in open('files.txt').readlines()]

In [None]:
for i, xml in enumerate(xmls):
    if 'highlighted' in xml:
        contents = open(xml).read().replace('highlighted', 'original')
        xmls[i] = xml.replace('highlighted', 'original')
        with open(xmls[i], 'w') as f:
            f.write(contents)
    print '\r{: 3} of {}'.format(i+1, len(xmls)),

In [None]:
with open('files.txt', 'w') as f:
    f.writelines([os.path.relpath(xml, 'labelme/Annotations') + '\n' for xml in xmls])
print "Updated files.txt"

In [None]:
!tail files.txt

# Generate Prepared Training Data

In [None]:
!mkdir -p ./data/labelme-out

**NOTE: ** This next script outputs only the labels used by the i12 classifier. I have additional labels to use but at this point in time I am not producing them....

**NOTE:** This next script will take FOREVER, so before we run it I think we want to work out some things...
   - Crop in to the largest facade in the image, so we do not waste training time
   - Add some extra labels 

In [None]:
figure(figsize=(24,18))
%run -m pyfacades.models.independant_12_layers.import_labelme -- --plot --from-labelme=./labelme --files files.txt --ignore=background -o ./data/labelme-out --resume