Identify characters from Google Street View images

This project solves Kaggle competition (knowledge) called First Steps With Julia using Caffe framework.

Current top score: 0.72284

TODO

Add new networks
Create cycle (create model, train, evaluate, submit result)
Automatic submission
Save a current model when user terminated training
Redirect logs or save them while printing, final log for each training should contain:
- Solver parameters
- Caffe log
- Plotted training loss

Data description

Dataset (can be downloaded by data/download_data.py) consists of images depicting alphanumerical characters (lowercase [a-z], uppercase [A-Z] and [0-9]), so it includes 62 different classes altogether. All images differ in dimensions, type of font and background and foreground color. Given data are split into training (6,283 samples) a testing (6,220 samples) subsets. Dataset without augmentation will be denoted as orig.

The data come originally from:

T. E. de Campos, B. R. Babu and M. Varma, Character recognition in natural images, Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009.

Data augmentation

Most deep networks require static size of input, so the first step is to resize (data/resize_images.py) images according to particular network.

Distribution of classes within training subset is not weighted (shown in figure below). Some classes reach above 300 samples per class, whereas many of them don't even contain 60 samples per class.

Number of training samples per class is insufficient for proper training and subsequent identification. One of the ways how to tackle this problem is to augment data and obtain overall higher number of training samples. In order to do that following methods can be employed:

Rotation
Applying filter

Rotation

Many characters in dataset are slightly rotated and almost none of them are displayed straight. Some of them are even upside down. Therefore, we could anticipate that modest rotation of images could help to build a more robust model.

For a start, we rotate (about 10° using data/rotate_dataset.m) each of training images to the left and right once and create twice larger training dataset. It results in training dataset with 18,849 training images. This augmented dataset will be denoted as rot_18849.

Lenet

Training from scratch

Lenet is the first net we try to exploit for character identification.

Date	Data	Train/val rat.	Train. time	# iter.	Solver	Caffe ACC	Kaggle ACC
2015/11/26	orig	50:50	?	10,000	?	?	0.63711

BVLC Caffenet

Fine-tuning

BVLC Caffenet is replication of AlexNet with few differences. This model was trained using ImageNet dataset. The model and more information can be found here.

Date	Data	Train/val rat.	Train. time	# iter.	Solver	Caffe ACC	Kaggle ACC
2015/12/04	orig	50:50	?	10,000	?	?	0.69895
2015/12/07	rot_18849	50:50	1h 58m	10,000	params	0.944211	0.72284

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Identify characters from Google Street View images

TODO

Data description

Data augmentation

Rotation

Lenet

BVLC Caffenet

Files

README.md

Latest commit

History

README.md

File metadata and controls

Identify characters from Google Street View images

TODO

Data description

Data augmentation

Rotation

Lenet

BVLC Caffenet