How to develop beam search with a set of predefined responses? #81

amirj · 2017-03-19T10:01:56Z

First, the elements of R (possible set of responses) are organized into a trie. Then, we conduct a left-to-right beam search, but only retain hypotheses that appear in the trie. This search process has complexity O(bl) for beam size b and maximum response length l. Both b and l are typically in the range of 10-30, so this method dramatically reduces the time to find the top responses and is a critical element of making this system deployable.

Would you please consider this feature and add a detailed task list to this issue for interested contributors.

chenghuige · 2017-03-19T14:24:33Z

Agree with @amirj this is interesting feature, and we might add more features so as to improve decoding speed.
I think trie based beam search will not improve speed much for general purpose, it is used mostly for selecting responses from a candidate set rather then generate sequence each step choosing one word from full vocabulary, so this can ensure response quality, the performance improve is compared with scoring every candidates then sort to select the most high probability responses.
So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step？）, out graph approach like im2txt did might be simpler.
For improving decoding, beam search speed, vocabulary size is the most key thing.
since something like below is costly for large vocabulary, especially xw_plus_b
logits = tf.nn.xw_plus_b(output, self.w, self.v)
logprobs = tf.nn.log_softmax(logits)
You have to calc for every word in vocabulary it's output, so to get the final probability.
When training tensorflow has provided sampled_softmax_loss to improve performance a lot.
But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.
May be for trie based method you can only consider small vocabulary for each step but doing that you can not get exact probability at each step.
Another approach is to use self-normalization to avoid cost calculations for each word in vocab, but you need to change the training cost function.
http://sebastianruder.com/word-embeddings-softmax/index.html#selfnormalisation

dennybritz · 2017-03-22T16:53:22Z

But seems for decoding tensorflow lack some thing mentioned in "On Using Very Large Target Vocaulary for Neural Machine Translation". You have to calc for every word in vocabulary.

I think the vocabulary issue has kind of been "solved" by using subword units / word pieces, for example.

You typically only need 16k-32k word pieces to cover almost all of the vocabulary.

As for the original question. It's an interesting feature, but it doesn't seem very common to me as it needs a set of predefined responses, which you have for very few tasks. It seem very specific to response retrieval (not generation). I think a large refactoring of the beam search may be necessary to support this.

amirj · 2017-03-24T00:57:06Z

@dennybritz Thank you.

amirj · 2017-04-19T18:25:15Z

So it is used in specific scene, and it is hard to implement with in graph beam search(may be need to write c++ op for generating candidates words each step？）, out graph approach like im2txt did might be simpler.

@chenghuige would you please elaborate more on how to developing it out graph?
im2txt leveraged out graph approach?

chenghuige · 2017-04-20T05:54:32Z

@amirj im2txt\inference_utils\caption_generator.py, here im2txt does out graph beam search, each step by sess.run(),
softmax, new_states, metadata = self.model.inference_step(sess,
input_feed,
state_feed)
but I think transfering softmax(big data) each time might make inference really slow.. any way you can fully control the process do beam search with trie.

nikita-solvvy · 2017-08-17T06:17:53Z

I have a similar feature request as this.(tensorflow/tensorflow#11602) At first glance it might very specific. But I think its very useful to determine scores for various output sequences.

whiskyboy · 2018-01-18T08:03:32Z

Have you solved this problem? I met the same problem and have no idea how to put a trie data structure into TensorFlow.

yuanzhigang10 · 2021-01-14T13:26:39Z

Have you solved this problem? I also met the same problem and have no idea how to put a trie data structure into TensorFlow.

amirj closed this as completed Mar 24, 2017

yick2232 mentioned this issue Jul 28, 2017

There is a input sequence, how to calculate probability for a special output sequence? tensorflow/tensorflow#11602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to develop beam search with a set of predefined responses? #81

How to develop beam search with a set of predefined responses? #81

amirj commented Mar 19, 2017

chenghuige commented Mar 19, 2017 •

edited

dennybritz commented Mar 22, 2017

amirj commented Mar 24, 2017

amirj commented Apr 19, 2017

chenghuige commented Apr 20, 2017

nikita-solvvy commented Aug 17, 2017

whiskyboy commented Jan 18, 2018

yuanzhigang10 commented Jan 14, 2021

How to develop beam search with a set of predefined responses? #81

How to develop beam search with a set of predefined responses? #81

Comments

amirj commented Mar 19, 2017

chenghuige commented Mar 19, 2017 • edited

dennybritz commented Mar 22, 2017

amirj commented Mar 24, 2017

amirj commented Apr 19, 2017

chenghuige commented Apr 20, 2017

nikita-solvvy commented Aug 17, 2017

whiskyboy commented Jan 18, 2018

yuanzhigang10 commented Jan 14, 2021

chenghuige commented Mar 19, 2017 •

edited