Skip to content

Latest commit

 

History

History
80 lines (55 loc) · 4.45 KB

File metadata and controls

80 lines (55 loc) · 4.45 KB

Identify characters from Google Street View images

This project solves Kaggle competition (knowledge) called First Steps With Julia using Caffe framework.

Martin Keršner, m.kersner@gmail.com

Current top score: 0.72284

TODO

  • Add new networks
  • Create cycle (create model, train, evaluate, submit result)
  • Automatic submission
  • Save a current model when user terminated training
  • Redirect logs or save them while printing, final log for each training should contain:
    • Solver parameters
    • Caffe log
    • Plotted training loss

Data description

Dataset (can be downloaded by data/download_data.py) consists of images depicting alphanumerical characters (lowercase [a-z], uppercase [A-Z] and [0-9]), so it includes 62 different classes altogether. All images differ in dimensions, type of font and background and foreground color. Given data are split into training (6,283 samples) a testing (6,220 samples) subsets. Dataset without augmentation will be denoted as orig.

The data come originally from:

T. E. de Campos, B. R. Babu and M. Varma, Character recognition in natural images, Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009.

Data augmentation

Most deep networks require static size of input, so the first step is to resize (data/resize_images.py) images according to particular network.

Distribution of classes within training subset is not weighted (shown in figure below). Some classes reach above 300 samples per class, whereas many of them don't even contain 60 samples per class.

Number of training samples per class is insufficient for proper training and subsequent identification. One of the ways how to tackle this problem is to augment data and obtain overall higher number of training samples. In order to do that following methods can be employed:

  • Rotation
  • Applying filter

Rotation

Many characters in dataset are slightly rotated and almost none of them are displayed straight. Some of them are even upside down. Therefore, we could anticipate that modest rotation of images could help to build a more robust model.

For a start, we rotate (about 10° using data/rotate_dataset.m) each of training images to the left and right once and create twice larger training dataset. It results in training dataset with 18,849 training images. This augmented dataset will be denoted as rot_18849.

Lenet

Training from scratch

Lenet is the first net we try to exploit for character identification.

Date Data Train/val rat. Train. time # iter. Solver Caffe ACC Kaggle ACC
2015/11/26 orig 50:50 ? 10,000 ? ? 0.63711

BVLC Caffenet

Fine-tuning

BVLC Caffenet is replication of AlexNet with few differences. This model was trained using ImageNet dataset. The model and more information can be found here.

Date Data Train/val rat. Train. time # iter. Solver Caffe ACC Kaggle ACC
2015/12/04 orig 50:50 ? 10,000 ? ? 0.69895
2015/12/07 rot_18849 50:50 1h 58m 10,000 params 0.944211 0.72284