A keras implementation of Bidirectional-LSTM for 'Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems.
To train Sentence Embedding Network, use only in-domain sentence by Bi-LSTM Network.
After the sentence representation were learned, used them to train autoencoder aimed to at OOD sentence detection.
Data set consists of 12000 ID sentences for 6 domains: food,cloth,education,store,lifeservice,cafe and 6000 OOD sentecnes for 3 domains: hotel, leisure, accommodation.
Use only ID sentence to train Autoencoder and Bi-LSTM(Neural sentence embedding).
- To initialize embedding layer in Bi-LSTM network(Neural sentence embedding), use pre-trained word2vec model.
- Use domain-category analysis as an auxiliary task to train Bi-LSTM(Neural sentence embedding) for OOD sentence detection.
- After training the Bi-LSTM(Neural sentence embedding), used it's last hidden layer for sentence representation.
- Use sentence representaion to train autoencdoer to detece OOD sentence by reconstruction error.
To test autoencoder, use 6000 OOD sentences and 2400 ID sentences.
Accuracy : 71.4%
The implementation differs from the original paper in the following ways :
- use different dataset : https://aihub.or.kr/aidata/85
- use different word2vec model : https://github.com/Kyubyong/wordvectors
Reference :
https://arxiv.org/abs/1807.11567
https://www.tensorflow.org/tutorials/generative/autoencoder?hl=ko