The jigsaw CNN learn a representation by reassembling an image from it's patches.
This is achieved by:
- randomly cropping a square from the image.
- segmenting the crop into 9 patches (with more random crops).
- permuting the patches
- predict what permutation was applied to the patches.
With the aim of learning about structure, colour and texture without labels.
python -m jigsaw.train --gpu 3 "/path/to/train/*.jpg" "/path/to/test/*.jpg"
Note that the path globs must be quoted or the shell we expand them. Images will automatically
be rescaled, cropped and turned into patches at runtime. Check
--help for more details. Training
on the cpu is not supported, you must specify a gpu ID.
This is what the first layer filters look like after 350k batches. They look good but need some more fine tuning.
To identify an
n-permutation we only need
n-1 elements so I've made the task harder by randomly zero'ing one of the patches (i.e dropout for patches). Permutations are generated in a different manner than specified in the paper but the average hamming distance is almost the same at
0.873 (see scripts/perm-gen.py).
Training could be made faster by precalculating batches.