Skip to content
This repository has been archived by the owner on Dec 29, 2022. It is now read-only.

How to develop beam search with a set of predefined responses? #81

Closed
amirj opened this issue Mar 19, 2017 · 8 comments
Closed

How to develop beam search with a set of predefined responses? #81

amirj opened this issue Mar 19, 2017 · 8 comments

Comments

@amirj
Copy link
Contributor

amirj commented Mar 19, 2017

In the original paper of SmartReply:

First, the elements of R (possible set of responses) are organized into a trie. Then, we conduct a left-to-right beam search, but only retain hypotheses that appear in the trie. This search process has complexity O(bl) for beam size b and maximum response length l. Both b and l are typically in the range of 10-30, so this method dramatically reduces the time to find the top responses and is a critical element of making this system deployable.

Would you please consider this feature and add a detailed task list to this issue for interested contributors.

@chenghuige
Copy link

chenghuige commented Mar 19, 2017

Agree with @amirj this is interesting feature, and we might add more features so as to improve decoding speed.
I think trie based beam search will not improve speed much for general purpose, it is used mostly for selecting responses from a candidate set rather then generate sequence each step choosing one word from full vocabulary, so this can ensure response quality, the performance improve is compared with scoring every candidates then sort to select the most high probability responses.
So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step?), out graph approach like im2txt did might be simpler.
For improving decoding, beam search speed, vocabulary size is the most key thing.
since something like below is costly for large vocabulary, especially xw_plus_b
logits = tf.nn.xw_plus_b(output, self.w, self.v)
logprobs = tf.nn.log_softmax(logits)
You have to calc for every word in vocabulary it's output, so to get the final probability.
When training tensorflow has provided sampled_softmax_loss to improve performance a lot.
But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.
May be for trie based method you can only consider small vocabulary for each step but doing that you can not get exact probability at each step.
Another approach is to use self-normalization to avoid cost calculations for each word in vocab, but you need to change the training cost function.
http://sebastianruder.com/word-embeddings-softmax/index.html#selfnormalisation

@dennybritz
Copy link
Contributor

But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.

I think the vocabulary issue has kind of been "solved" by using subword units / word pieces, for example.

You typically only need 16k-32k word pieces to cover almost all of the vocabulary.

As for the original question. It's an interesting feature, but it doesn't seem very common to me as it needs a set of predefined responses, which you have for very few tasks. It seem very specific to response retrieval (not generation). I think a large refactoring of the beam search may be necessary to support this.

@amirj
Copy link
Contributor Author

amirj commented Mar 24, 2017

@dennybritz Thank you.

@amirj amirj closed this as completed Mar 24, 2017
@amirj
Copy link
Contributor Author

amirj commented Apr 19, 2017

So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step?), out graph approach like im2txt did might be simpler.

@chenghuige would you please elaborate more on how to developing it out graph?
im2txt leveraged out graph approach?

@chenghuige
Copy link

@amirj im2txt\inference_utils\caption_generator.py, here im2txt does out graph beam search, each step by sess.run(),
softmax, new_states, metadata = self.model.inference_step(sess,
input_feed,
state_feed)
but I think transfering softmax(big data) each time might make inference really slow.. any way you can fully control the process do beam search with trie.

@nikita-solvvy
Copy link

I have a similar feature request as this.(tensorflow/tensorflow#11602) At first glance it might very specific. But I think its very useful to determine scores for various output sequences.

@whiskyboy
Copy link

Have you solved this problem? I met the same problem and have no idea how to put a trie data structure into TensorFlow.

@yuanzhigang10
Copy link

Have you solved this problem? I also met the same problem and have no idea how to put a trie data structure into TensorFlow.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants