Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SOLVED] what is the handling between RNN and CTC in your model? #2

Closed
ghost opened this issue Oct 16, 2018 · 1 comment
Closed

[SOLVED] what is the handling between RNN and CTC in your model? #2

ghost opened this issue Oct 16, 2018 · 1 comment

Comments

@ghost
Copy link

ghost commented Oct 16, 2018

hi Stardut,

UPDATE: SOLVED See next comment.

in your code at https://github.com/stardut/ctc-ocr-tensorflow/blob/master/model.py
and https://github.com/stardut/ctc-ocr-tensorflow/blob/master/train.py
I have some questions of understanding.
I try to implement your sample with cuDNN and C++
see: https://devtalk.nvidia.com/default/topic/1027434/cudnn/ctc-connectionist-temporal-classification-example-code/post/5289703/#5289703

I think the handling you do afterwards the RNN is a fully-connected layer? If not, what operation do you do before doing the CTC loss, and why?

What I understood is:
image_width = 28
image_height = 28
self.inputs= [64 28 224] [batch_size, max_time=image_height, 8*image_width]
self.seq_len=[64] is an array containing all entrys the value 224
tf.nn.dynamic_rnn outputs: [batch_size, max_time, cell.output_size] => [64 28 256] 3D
then reshaped to => [batch_size * max_time, cell.output_size] [1792 256] 2D

w = tf.Variable(tf.truncated_normal([self.num_units, self.num_class], stddev=0.1))
[256 11] 2D
weights for the fully-connected layer

b = tf.constant(0.1, dtype=tf.float32, shape=[self.num_class])
this is the bias; applied to the outputs of the fully connected layer.

logits = tf.matmul(h_state, w) + b
[1792 256] * [256 11] => [1792 11]
matrix multiplication for fully-connected : input_x * weights + bias, I found this also in here: layer_fcbr

logits = tf.reshape(logits, [self.batch_size, -1, self.num_class])
[1792 11] => [64 ? 11]
1792/64=28 [64 28 11]
=> [self.batch_size, 28, self.num_class]

in another implementation here this was done:
logits = tf.reshape(net, [-1, rnn_seq_length, num_classes])
So 28 here is obviously the RNN seq length.

self.logits = tf.transpose(logits, (1, 0, 2))
=> [RNN seq length, self.batch_size, self.num_class] [28 64 11]

So you define a fully-connected layer, which has 28x256 =7168 inputs and 28x11=308 outputs using 7168x308 =2,207,744 weight connections. (inputs*outputs)

Softmax activation then is done in tf.nn.ctc_loss.

So my network (in C++ using cuDNN on a GPU) looks like this :
3 main layer; batch size=64 word_size=1:

0. RNN TANH      (I=  28,O=7168) layers=1  hidden=256  seq=28   OUT=width=256 height=28
                            transposed=1  (time major=false)
1. FullyConnected(I=7168,O= 308) neurons=2207744  in= 7168  out=  308    
                            width in=256 height in=28=seq len T 
                            width out=11 =A(=Alphabet)  height out=28=seq len T   activation=SOFTMAX 
2. CTC Loss      (I= 308,O=   1) labels=1   
                              width in=11 height in=28=seq len T 
                              width out=1 height out=1   deterministic=1

I=inputs O=outputs

Current results:
RNN:TANH LearningRate:=0.001 :

 iteration=    1490   CTCLoss=786.307678   MaxGradient=0.000119
 iteration=    4150   CTCLoss=779.627380   MaxGradient=0.000180
 iteration=    6770   CTCLoss=773.462158   MaxGradient=0.000181
 iteration=   10020   CTCLoss=713.224548   MaxGradient=0.000182
 iteration=   12430   CTCLoss=738.838501   MaxGradient=0.000183
 iteration=   16970   CTCLoss=743.546448   MaxGradient=0.000183
 iteration=   21460   CTCLoss=752.958435   MaxGradient=0.000183
 iteration=   28110   CTCLoss=776.493713   MaxGradient=0.000122
 iteration=   32380   CTCLoss=787.156616   MaxGradient=0.000122
 iteration=   40260   CTCLoss=797.081116   MaxGradient=0.000183
 iteration=   50920   CTCLoss=733.585999   MaxGradient=0.000122 
 iteration=   60120   CTCLoss=807.688660   MaxGradient=0.000183

the loss unfortunately increases...

RNN:TANH LearningRate:=0.0001 :

iteration=     100   CTCLoss=91.204422   MaxGradient=0.0000000
iteration=    1100   CTCLoss=334.184631   MaxGradient=0.000000
iteration=    2650   CTCLoss=373.010071   MaxGradient=0.000000
iteration=    4380   CTCLoss=389.112732   MaxGradient=0.000000
iteration=    6200   CTCLoss=400.843048   MaxGradient=0.000020
iteration=   10240   CTCLoss=382.037598   MaxGradient=0.000018
iteration=   14200   CTCLoss=428.568726   MaxGradient=0.000100
iteration=   16620   CTCLoss=451.009247   MaxGradient=0.000229
iteration=   19850   CTCLoss=765.182373   MaxGradient=0.000183  
iteration=   26500   CTCLoss=779.201782   MaxGradient=0.000183
iteration=   37600   CTCLoss=780.723083   MaxGradient=0.000183
iteration=   49220   CTCLoss=783.665039   MaxGradient=0.000183 

the loss unfortunately increases...

LSTM LearningRate:=0.001

iteration=    7380   CTCLoss=905.169983   MaxGradient=0.000183
iteration=   21060   CTCLoss=871.665466   MaxGradient=0.000183
iteration=   27840   CTCLoss=875.463440   MaxGradient=0.000183
iteration=   31730   CTCLoss=877.157898   MaxGradient=0.000183
iteration=   40160   CTCLoss=878.621460   MaxGradient=0.000183

remains around 877 ...

Laerning Rate: 0.01 :
iteration= 130 CTCLoss=inf MaxGradient=1.00000


when I run your Python code with word_size=1 and LearningRate 0.001 I get:

train step: 1, word error: 9.046875, loss: 54.338867
train step: 2, word error: 6.546875, loss: 43.808159
train step: 3, word error: 5.015625, loss: 33.899433
train step: 4, word error: 4.031250, loss: 24.553177
train step: 5, word error: 3.718750, loss: 21.032534
train step: 6, word error: 2.984375, loss: 17.196867
train step: 7, word error: 2.687500, loss: 15.107779
train step: 8, word error: 2.203125, loss: 13.060326
train step: 9, word error: 1.765625, loss: 11.079235
train step: 10, word error: 1.828125, loss: 9.643269
train step: 11, word error: 1.812500, loss: 8.474516
train step: 12, word error: 1.796875, loss: 7.690924

Unfortunately I cannot run so many iterations, cause Tensorflow is really slow on my system.

Thank you very much !

@ghost
Copy link
Author

ghost commented Oct 18, 2018

@ghost ghost closed this as completed Oct 18, 2018
@ghost ghost changed the title what is the handling between RNN and CTC in your model? [SOLVED] what is the handling between RNN and CTC in your model? Oct 18, 2018
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants