fix: 🐛 SAR decoder indices #578

khalidMindee · 2021-11-04T11:06:43Z

Fixed indices starting from 0 to vocab_size+1 means a one hot vector embedding with depth of vocab_size+2

…_size+2

fg-mindee · 2021-11-04T12:30:22Z

Hey there 👋

If that's indeed an issue, we'll need to fix the PyTorch implementation as well!
However the loop has an extra iteration for the sos symbol, so I'd to check with @charlesmindee for this one!

fg-mindee

Thanks! Mind doing the same for PyTorch as well please?

fg-mindee · 2021-11-04T15:03:17Z

doctr/models/recognition/sar/tensorflow.py

-            embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
+            embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 2), **kwargs)


Would you mind changing the PyTorch implementation as well please? 🙏

charlesmindee · 2021-11-05T10:10:14Z

Hi @khalidMindee,

I am not sure this is an issue, in the paper they mention this for the embedding of the hidden state:

The outputs are computed by the following transformation:

 yt=φ(h′t,gt) = softmax(Wo[h′t;gt]) 

where h′t is the current hidden state and gt is the output of the attention module.
Wo is a linear transformation, which embeds features into the output space of 94 classes, 
in corresponding to 10 digits, 52 case sensitive letters, 31 punctuation characters, and an “END” token.

Which means they have a size of len(vocab) + 1 (for EOS, which is len(vocab)) for the embedding, and the start symbol (len(vocab) + 1) does not seem to be embedded.

Do I miss something there 🤔 ?

fg-mindee · 2021-11-05T11:08:53Z

Hi @khalidMindee,

I am not sure this is an issue, in the paper they mention this for the embedding of the hidden state:
The outputs are computed by the following transformation:

 yt=φ(h′t,gt) = softmax(Wo[h′t;gt]) 

where h′t is the current hidden state and gt is the output of the attention module.
Wo is a linear transformation, which embeds features into the output space of 94 classes, 
in corresponding to 10 digits, 52 case sensitive letters, 31 punctuation characters, and an “END” token.
Which means they have a size of len(vocab) + 1 (for EOS, which is len(vocab)) for the embedding, and the start symbol (len(vocab) + 1) does not seem to be embedded.

Do I miss something there thinking ?

For clarification, let me elaborate a bit because there are a few things to discuss:

the symbol is initialized at self.vocab_size + 1. When we switch to one hot, if depth=self.vocab_size + 1, TF accepts out of bounds and leave everything in the one-hot as zeros. The question in this PR is: is that on purpose @charlesmindee?
should we include SOS? or any extra token? According to the paper, I'd say no

charlesmindee · 2021-11-05T11:22:22Z

I see your point @fg-mindee, we can indeed switch to a depth of vocab_size + 2, but I think we also need to modify it there:

109 self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))

And if we do this switch we also need to retrain this model, since it modifies this dense layer, or am I mistaking ?

fg-mindee · 2021-11-05T11:40:54Z

I see your point @fg-mindee, we can indeed switch to a depth of vocab_size + 2, but I think we also need to modify it there:

109 self.embed = layers.Dense(embedding_units, use_bias=False, input_shape=(None, self.vocab_size + 1))

And if we do this switch we also need to retrain this model, since it modifies this dense layer, or am I mistaking?

Well we have two options:

not changing the dense, and only initializing the symbol with self.vocab (so with EOS), and keeping the one hot the same way. This is the best option I guess 👈 but this assuming the symbol is meant to be initialized with EOS (if it's supposed to be initialized with no token at all, we need to keep it as is)
changing the dense, but this effectively adds a class (SOS I guess) and this differs from the paper. So I don't think that's the solution

We need to check how the symbol is meant to be initialized, and act accordingly :)

charlesmindee · 2021-11-05T15:32:19Z

This is what the paper mention for the SOS symbol:

The encoder and decoder do not share parameters. Initially, the holistic feature hW is fed into the decoder LSTM,at time step 0. Then a “START” token is input into LSTM at step1. From step2, the output of the previous step is fed into LSTM until the “END” token is received. All the LSTM inputs are represented by one-hot vectors, followed by a linear transformation Ψ(). During training, the inputs of decoder LSTMs are replaced by the ground-truth character sequence.The outputs are computed by the following transformation: yt=φ(h′t,gt) = softmax(Wo[h′t;gt])(1) where h′t is the current hidden state and gt is the output of the attention module.Wo is a linear transformation, which embeds features into the output space of 94 classes, in corresponding to10 digits, 52 case sensitive letters,31 punctuation characters, and an “END” token.

fg-mindee · 2021-11-05T16:01:17Z

So actually, we should properly specify the comment, but the code should stay the same. Or did I miss something?

fg-mindee · 2021-11-09T10:45:57Z

So actually, we should properly specify the comment, but the code should stay the same. Or did I miss something?

@charlesmindee ? (just to know whether we should close the PR, or iterate on it before the 0.4.1)

fg-mindee · 2021-11-12T12:09:48Z

@charlesmindee do you think we should close this PR?

charlesmindee · 2021-11-12T13:19:07Z

I think the we should keep it this way since it seems to stick to the paper's description, but again I may be mistaking here.
If you agree we can close this PR indeed.

fg-mindee · 2021-11-12T13:52:19Z

Alright @khalidMindee, would you mind editing the comment above this line to specify that this is on purpose so that the one-hot does'nt have any non-zero values? (same in PyTorch if possible)

Or do you prefer that we close this PR and handle this on our own?

khalidMindee · 2021-11-12T14:41:35Z

Yes, No problem with closing the PR .

fg-mindee · 2021-11-12T14:47:24Z

Yes, No problem with closing the PR .

The question was more about the other way: is it OK for you to edit this PR and adapt the comment instead? 😅

fg-mindee · 2021-11-12T16:35:41Z

Closing this in favour of #617

khalidMindee and others added 2 commits November 3, 2021 15:44

fix: 🐛 indices starting from 0 to vocab_size+1 means a depth of vocab…

2d7cfd5

…_size+2

Merge branch 'main' into fix_SARDecoder_embedding

dd126cc

fg-mindee self-requested a review November 4, 2021 12:04

fg-mindee self-assigned this Nov 4, 2021

fg-mindee added type: bug Something isn't working module: models Related to doctr.models labels Nov 4, 2021

fg-mindee added this to the 0.4.1 milestone Nov 4, 2021

fg-mindee changed the title ~~fix: 🐛 indices starting from 0 to vocab_size+1 means a one hot vector embedding with depth of v…~~ fix: 🐛 SAR decoder indices Nov 4, 2021

fg-mindee requested a review from charlesmindee November 4, 2021 12:30

fg-mindee added framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition labels Nov 4, 2021

fg-mindee reviewed Nov 4, 2021

View reviewed changes

fg-mindee mentioned this pull request Nov 12, 2021

docs: Specified comment in SAR about symbol encoding #617

Merged

fg-mindee closed this Nov 12, 2021

fg-mindee deleted the fix_SARDecoder_embedding branch November 12, 2021 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: 🐛 SAR decoder indices #578

fix: 🐛 SAR decoder indices #578

khalidMindee commented Nov 4, 2021 •

edited by fg-mindee

Loading

fg-mindee commented Nov 4, 2021

fg-mindee left a comment

fg-mindee Nov 4, 2021

charlesmindee commented Nov 5, 2021 •

edited

Loading

fg-mindee commented Nov 5, 2021

charlesmindee commented Nov 5, 2021

fg-mindee commented Nov 5, 2021

charlesmindee commented Nov 5, 2021 •

edited

Loading

fg-mindee commented Nov 5, 2021

fg-mindee commented Nov 9, 2021

fg-mindee commented Nov 12, 2021

charlesmindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

khalidMindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

		embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 1), **kwargs)
		embeded_symbol = self.embed(tf.one_hot(symbol, depth=self.vocab_size + 2), **kwargs)

fix: 🐛 SAR decoder indices #578

fix: 🐛 SAR decoder indices #578

Conversation

khalidMindee commented Nov 4, 2021 • edited by fg-mindee Loading

fg-mindee commented Nov 4, 2021

fg-mindee left a comment

Choose a reason for hiding this comment

fg-mindee Nov 4, 2021

Choose a reason for hiding this comment

charlesmindee commented Nov 5, 2021 • edited Loading

fg-mindee commented Nov 5, 2021

charlesmindee commented Nov 5, 2021

fg-mindee commented Nov 5, 2021

charlesmindee commented Nov 5, 2021 • edited Loading

fg-mindee commented Nov 5, 2021

fg-mindee commented Nov 9, 2021

fg-mindee commented Nov 12, 2021

charlesmindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

khalidMindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

fg-mindee commented Nov 12, 2021

khalidMindee commented Nov 4, 2021 •

edited by fg-mindee

Loading

charlesmindee commented Nov 5, 2021 •

edited

Loading

charlesmindee commented Nov 5, 2021 •

edited

Loading