You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the handling you do afterwards the RNN is a fully-connected layer? If not, what operation do you do before doing the CTC loss, and why?
What I understood is:
image_width = 28
image_height = 28
self.inputs= [64 28 224] [batch_size, max_time=image_height, 8*image_width]
self.seq_len=[64] is an array containing all entrys the value 224
tf.nn.dynamic_rnn outputs: [batch_size, max_time, cell.output_size] => [64 28 256] 3D
then reshaped to => [batch_size * max_time, cell.output_size] [1792 256] 2D
w = tf.Variable(tf.truncated_normal([self.num_units, self.num_class], stddev=0.1))
[256 11] 2D
weights for the fully-connected layer
b = tf.constant(0.1, dtype=tf.float32, shape=[self.num_class])
this is the bias; applied to the outputs of the fully connected layer.
logits = tf.matmul(h_state, w) + b
[1792 256] * [256 11] => [1792 11]
matrix multiplication for fully-connected : input_x * weights + bias, I found this also in here: layer_fcbr
So you define a fully-connected layer, which has 28x256 =7168 inputs and 28x11=308 outputs using 7168x308 =2,207,744 weight connections. (inputs*outputs)
Softmax activation then is done in tf.nn.ctc_loss.
So my network (in C++ using cuDNN on a GPU) looks like this :
3 main layer; batch size=64 word_size=1:
0. RNN TANH (I= 28,O=7168) layers=1 hidden=256 seq=28 OUT=width=256 height=28
transposed=1 (time major=false)
1. FullyConnected(I=7168,O= 308) neurons=2207744 in= 7168 out= 308
width in=256 height in=28=seq len T
width out=11 =A(=Alphabet) height out=28=seq len T activation=SOFTMAX
2. CTC Loss (I= 308,O= 1) labels=1
width in=11 height in=28=seq len T
width out=1 height out=1 deterministic=1
ghost
changed the title
what is the handling between RNN and CTC in your model?
[SOLVED] what is the handling between RNN and CTC in your model?
Oct 18, 2018
hi Stardut,
UPDATE: SOLVED See next comment.
in your code at https://github.com/stardut/ctc-ocr-tensorflow/blob/master/model.py
and https://github.com/stardut/ctc-ocr-tensorflow/blob/master/train.py
I have some questions of understanding.
I try to implement your sample with cuDNN and C++
see: https://devtalk.nvidia.com/default/topic/1027434/cudnn/ctc-connectionist-temporal-classification-example-code/post/5289703/#5289703
I think the handling you do afterwards the RNN is a fully-connected layer? If not, what operation do you do before doing the CTC loss, and why?
What I understood is:
image_width = 28
image_height = 28
self.inputs= [64 28 224] [batch_size, max_time=image_height, 8*image_width]
self.seq_len=[64] is an array containing all entrys the value 224
tf.nn.dynamic_rnn outputs: [batch_size, max_time, cell.output_size] => [64 28 256] 3D
then reshaped to => [batch_size * max_time, cell.output_size] [1792 256] 2D
w = tf.Variable(tf.truncated_normal([self.num_units, self.num_class], stddev=0.1))
[256 11] 2D
weights for the fully-connected layer
b = tf.constant(0.1, dtype=tf.float32, shape=[self.num_class])
this is the bias; applied to the outputs of the fully connected layer.
logits = tf.matmul(h_state, w) + b
[1792 256] * [256 11] => [1792 11]
matrix multiplication for fully-connected : input_x * weights + bias, I found this also in here: layer_fcbr
logits = tf.reshape(logits, [self.batch_size, -1, self.num_class])
[1792 11] => [64 ? 11]
1792/64=28 [64 28 11]
=> [self.batch_size, 28, self.num_class]
in another implementation here this was done:
logits = tf.reshape(net, [-1, rnn_seq_length, num_classes])
So 28 here is obviously the RNN seq length.
self.logits = tf.transpose(logits, (1, 0, 2))
=> [RNN seq length, self.batch_size, self.num_class] [28 64 11]
So you define a fully-connected layer, which has 28x256 =7168 inputs and 28x11=308 outputs using 7168x308 =2,207,744 weight connections. (inputs*outputs)
Softmax activation then is done in tf.nn.ctc_loss.
So my network (in C++ using cuDNN on a GPU) looks like this :
3 main layer; batch size=64 word_size=1:
I=inputs O=outputs
Current results:
RNN:TANH LearningRate:=0.001 :
the loss unfortunately increases...
RNN:TANH LearningRate:=0.0001 :
the loss unfortunately increases...
LSTM LearningRate:=0.001
remains around 877 ...
Laerning Rate: 0.01 :
iteration= 130 CTCLoss=inf MaxGradient=1.00000
when I run your Python code with word_size=1 and LearningRate 0.001 I get:
Unfortunately I cannot run so many iterations, cause Tensorflow is really slow on my system.
Thank you very much !
The text was updated successfully, but these errors were encountered: