This project solves Kaggle competition (knowledge) called First Steps With Julia using Caffe framework.
Martin Keršner, m.kersner@gmail.com
Current top score: 0.72284
- Add new networks
- Create cycle (create model, train, evaluate, submit result)
- Automatic submission
- Save a current model when user terminated training
- Redirect logs or save them while printing, final log for each training should contain:
- Solver parameters
- Caffe log
- Plotted training loss
Dataset (can be downloaded by data/download_data.py) consists of images depicting alphanumerical characters (lowercase [a-z], uppercase [A-Z] and [0-9]), so it includes 62 different classes altogether. All images differ in dimensions, type of font and background and foreground color. Given data are split into training (6,283 samples) a testing (6,220 samples) subsets. Dataset without augmentation will be denoted as orig.
The data come originally from:
T. E. de Campos, B. R. Babu and M. Varma, Character recognition in natural images, Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009.
Most deep networks require static size of input, so the first step is to resize (data/resize_images.py) images according to particular network.
Distribution of classes within training subset is not weighted (shown in figure below). Some classes reach above 300 samples per class, whereas many of them don't even contain 60 samples per class.
Number of training samples per class is insufficient for proper training and subsequent identification. One of the ways how to tackle this problem is to augment data and obtain overall higher number of training samples. In order to do that following methods can be employed:
- Rotation
- Applying filter
Many characters in dataset are slightly rotated and almost none of them are displayed straight. Some of them are even upside down. Therefore, we could anticipate that modest rotation of images could help to build a more robust model.
For a start, we rotate (about 10° using data/rotate_dataset.m) each of training images to the left and right once and create twice larger training dataset. It results in training dataset with 18,849 training images. This augmented dataset will be denoted as rot_18849.
Training from scratch
Lenet is the first net we try to exploit for character identification.
Date | Data | Train/val rat. | Train. time | # iter. | Solver | Caffe ACC | Kaggle ACC |
---|---|---|---|---|---|---|---|
2015/11/26 | orig | 50:50 | ? | 10,000 | ? | ? | 0.63711 |
Fine-tuning
BVLC Caffenet is replication of AlexNet with few differences. This model was trained using ImageNet dataset. The model and more information can be found here.
Date | Data | Train/val rat. | Train. time | # iter. | Solver | Caffe ACC | Kaggle ACC |
---|---|---|---|---|---|---|---|
2015/12/04 | orig | 50:50 | ? | 10,000 | ? | ? | 0.69895 |
2015/12/07 | rot_18849 | 50:50 | 1h 58m | 10,000 | params | 0.944211 | 0.72284 |