Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lstm + ctc for mnist #2

Closed
anxingle opened this issue Aug 28, 2016 · 16 comments
Closed

lstm + ctc for mnist #2

anxingle opened this issue Aug 28, 2016 · 16 comments

Comments

@anxingle
Copy link

Hi, igormq. It is very helpful to see your Blog talk about CTC on Tensorflow . Thank you a million. But I have some confusion about the CTC module.
1. If sequence is A B B * B * B( * is blank). tf.ctc.ctc_greedy_decoder() should return ABBB. But Doc. say result is A B if merge_repeated =True.
2 . My code is using LSTM to classify Mnist data . Just one layer and 28 timeSteps . But CTC_LOSS don't work at all. Can you help me define the right call style? The code is so simple and I promise U can get it when you see the code .
Thanks again.

@anxingle
Copy link
Author

I write just as your code tell me.And it works well if comment the CTC functions. I really don't know what's wrong with it .

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

Thank you @anxingle , I'm very glad that you liked my post. Answering your questions:

  1. Yes, you are absolute right, this is the default behavior of TensorFlow's implementation, but in Graves' thesis, he wrote that you have to delete the repeated labels and therefore remove the blank labels, as we can see at page 57 of his thesis. I don't have any clue why the Tensorflow team implemented in that way.
  2. I read your code, but it's better if you send to me your error log and your code with the CTC implementation (not as a comment), because in your code I didn't see the seq_len placeholder and the sparse placeholder for y. Could you do that?

@anxingle
Copy link
Author

Thank you very much. I will do what you told me as soon as I can.

@anxingle
Copy link
Author

I add the entire code, and show me error.txt.

@anxingle
Copy link
Author

I tried tf.int64.

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

Could you send me your dataset?

@anxingle
Copy link
Author

I have push the mnist dataset into the data , you can just git clone the repository.
I am really grateful to you.

@anxingle
Copy link
Author

It takes almost 1 hours.Thanks GFW

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

Why are you trying to use CTC as a cost function? CTC is used when you don't have an alignment between your input and output and/or the output length vary along the samples. So, for one to one relationship (like one image one digit), CTC probably isn't the best solution for you. But, if you intend to use this code in a continuous hand writing recognition, CTC will work better. I'm looking your code and making some changes. As soon as possible I'll give you a feedback, ok?

@anxingle
Copy link
Author

Thank you for your reply. But in this code, I have 28 inputs, so it's a problem about many inputs( maybe laterly I'll add multi labels) maps to one label. My senior implement multi labels recognise framework mxnet warpctc and he told me it should be the best solution .
So nice !

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

Yes, but CTC works only for more than one label. I'll show you a working code, but I don't think that for this example CTC will outperform the softmax layer.

@anxingle
Copy link
Author

Got it! I change another dataset !

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

I made a working code and I put it on gist. You major issue was using the sparse place holder and the sequence length place holder. The targets required by CTC must not be encoded, you must provide as labels and you must feed the sparse place holder as a tuple of (indices, values, shape) (that is generated by sparse_tuple_from); in the case for mnist, for batch you will have a target like

y = (
[[0, 0], [1, 0], [2, 0], ..., [batch_size-1, 0]],
[label_1, label_2, label_3, ..., label_batch_size],
[batch_size, 1]
)

And the seq_lenplaceholder works to tell the run what is the size of each data in batch, but for MNIST, the network was feed with 28 inputs of length 28, so:

seq_len = [28 for _ in xrange(batch_size)]

I hope I could help you. If you have any question I'll be happy to answer you.

@igormq
Copy link
Owner

igormq commented Aug 29, 2016

You can use this dataset, whose images have more than one digit and the number of digits differ from image to image. CTC may work better with this dataset.

@anxingle
Copy link
Author

I even don't know how to express my appreciation ! Thanks a lot.

@igormq
Copy link
Owner

igormq commented Aug 30, 2016

You're welcome. If you have any questions, please feel free to ask.

@igormq igormq closed this as completed Aug 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants