Doubling number of conv layers improves accuracy #38

Chazzz · 2019-01-11T10:02:44Z

I'm not sure the title is tremendously surprising to anyone, but I cleared 76% word accuracy with a deeper network. More interestingly, using a deeper network and terminating around epoch 25 yields a 74-75% word accuracy model, which is better and faster than training a smaller network to the bitter end.

Relevant code:

		for i in range(numLayers):
			kernel = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i], featureVals[i + 1]], stddev=0.1))
			conv = tf.nn.conv2d(pool, kernel, padding='SAME',  strides=(1,1,1,1))
			conv_norm = tf.layers.batch_normalization(conv, training=self.is_train)
			relu = tf.nn.relu(conv_norm)
			kernel2 = tf.Variable(tf.truncated_normal([kernelVals[i], kernelVals[i], featureVals[i+1], featureVals[i + 1]], stddev=0.1))
			conv2 = tf.nn.conv2d(relu, kernel2, padding='SAME',  strides=(1,1,1,1))
			conv_norm2 = tf.layers.batch_normalization(conv2, training=self.is_train)
			relu2 = tf.nn.relu(conv_norm2)
			pool = tf.nn.max_pool(relu2, (1, poolVals[i][0], poolVals[i][1], 1), (1, strideVals[i][0], strideVals[i][1], 1), 'VALID')

The text was updated successfully, but these errors were encountered:

Arideno · 2019-01-11T11:29:00Z

Oh, well done. Thank you. And I have one question. How do you print this plot above?

Chazzz · 2019-01-11T11:56:04Z

Tensorboard plus a bunch of hooks which aren't committed anywhere.

githubharald · 2019-01-11T18:14:57Z

thanks for sharing the results of your experiments.
I'll like to keep the model as simple and minimalistic as possible, but I'll link to this issue from the "Improve accuracy" section such that others can benefit from your findings.

Chazzz · 2019-01-12T12:53:39Z

Expanding the layers a bit more, I hit a top word accuracy of 78% using layer depth/width values similar to VGG16 but with batch normalization. Based off my other hyperparameter runs, increasing model size further than that won't meaningfully impact accuracy without a resnet-like approach (obviously outside the scope of this project).

githubharald · 2019-01-12T19:38:43Z

When increasing the model size, at some point the model is able to perfectly learn the training data without improving validation accuracy, i.e. it overfits. Therefore you could try to make the task a bit harder while training by using data augmentation. At the moment, the model is very sensitive to small translations (see this article) [1]. By adding random translations, validation accuracy should get better.

[1] However, this behaviour improved since you uploaded the new pretrained model.

Chazzz · 2019-01-12T20:21:21Z

Nice article btw. My results show that even with more layers the model only overfits by about 5% (even with data augmentation off!), and accuracy takes about a 1% hit when turning data augmentation off. If anything the model insufficiently overfits (by not overfitting on train it effectively underfits on test). He et al., 2015 demonstrated that increasing the number of layers is insufficient to guarantee overfit, and I would expect their results to apply to SimpleHTR as well.

RajPratim21 · 2019-03-04T18:13:20Z

@Chazzz can you kindly let me know of a rough idea of how much time it took you to train the system with your system specific details. I am planning to apply a range of image augmentation like translation, adding Gaussian noise, Random cropping and etc to make the model more robust.

Chazzz · 2019-03-04T19:36:51Z

Hi @RajPratim21 I trained the above on a GTX 980 Ti, and as shown in the graph in my initial post, training would take between 40 mins to 80 mins. LMK if there are other system details which are of interest.

jevinruv · 2019-03-25T15:49:55Z

Tensorboard plus a bunch of hooks which aren't committed anywhere.

possible to share the code for the tensorboard integration? thanks!

Chazzz · 2019-03-25T16:08:45Z

@jevinruv Let me check, it should be possible.

jevinruv · 2019-03-26T13:48:56Z

@jevinruv Let me check, it should be possible.

Thank you, looking forward for it!

Chazzz · 2019-03-27T05:54:03Z

@jevinruv https://github.com/Chazzz/SimpleHTR-experimental

jevinruv · 2019-03-29T02:28:08Z

@jevinruv https://github.com/Chazzz/SimpleHTR-experimental
Thank you !

githubharald closed this as completed Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubling number of conv layers improves accuracy #38

Doubling number of conv layers improves accuracy #38

Chazzz commented Jan 11, 2019

Arideno commented Jan 11, 2019 •

edited

Loading

Chazzz commented Jan 11, 2019

githubharald commented Jan 11, 2019 •

edited

Loading

Chazzz commented Jan 12, 2019 •

edited

Loading

githubharald commented Jan 12, 2019

Chazzz commented Jan 12, 2019

RajPratim21 commented Mar 4, 2019 •

edited

Loading

Chazzz commented Mar 4, 2019 •

edited

Loading

jevinruv commented Mar 25, 2019

Chazzz commented Mar 25, 2019

jevinruv commented Mar 26, 2019

Chazzz commented Mar 27, 2019

jevinruv commented Mar 29, 2019

Doubling number of conv layers improves accuracy #38

Doubling number of conv layers improves accuracy #38

Comments

Chazzz commented Jan 11, 2019

Arideno commented Jan 11, 2019 • edited Loading

Chazzz commented Jan 11, 2019

githubharald commented Jan 11, 2019 • edited Loading

Chazzz commented Jan 12, 2019 • edited Loading

githubharald commented Jan 12, 2019

Chazzz commented Jan 12, 2019

RajPratim21 commented Mar 4, 2019 • edited Loading

Chazzz commented Mar 4, 2019 • edited Loading

jevinruv commented Mar 25, 2019

Chazzz commented Mar 25, 2019

jevinruv commented Mar 26, 2019

Chazzz commented Mar 27, 2019

jevinruv commented Mar 29, 2019

Arideno commented Jan 11, 2019 •

edited

Loading

githubharald commented Jan 11, 2019 •

edited

Loading

Chazzz commented Jan 12, 2019 •

edited

Loading

RajPratim21 commented Mar 4, 2019 •

edited

Loading

Chazzz commented Mar 4, 2019 •

edited

Loading