Sentence Detection

A keras implementation of Bidirectional-LSTM for 'Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems.

To train Sentence Embedding Network, use only in-domain sentence by Bi-LSTM Network.
After the sentence representation were learned, used them to train autoencoder aimed to at OOD sentence detection.

Data set consists of 12000 ID sentences for 6 domains: food,cloth,education,store,lifeservice,cafe and 6000 OOD sentecnes for 3 domains: hotel, leisure, accommodation.

Use only ID sentence to train Autoencoder and Bi-LSTM(Neural sentence embedding).

To distinguish ID(in-domain), OOD(out-of-domain) sentence, I went through the following process.

To initialize embedding layer in Bi-LSTM network(Neural sentence embedding), use pre-trained word2vec model.
Use domain-category analysis as an auxiliary task to train Bi-LSTM(Neural sentence embedding) for OOD sentence detection.
After training the Bi-LSTM(Neural sentence embedding), used it's last hidden layer for sentence representation.
Use sentence representaion to train autoencdoer to detece OOD sentence by reconstruction error.

To test autoencoder, use 6000 OOD sentences and 2400 ID sentences.

Autoencoder Model Loss

ID-sentence reconstruction error rate

OOD-sentence reconstuction error rate

Accuracy : 71.4%

The implementation differs from the original paper in the following ways :

use different dataset : https://aihub.or.kr/aidata/85
use different word2vec model : https://github.com/Kyubyong/wordvectors

Reference :
https://arxiv.org/abs/1807.11567
https://www.tensorflow.org/tutorials/generative/autoencoder?hl=ko

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
img		img
ID_OOD_Classification.ipynb		ID_OOD_Classification.ipynb
README.md		README.md
Sentence_Embedding.ipynb		Sentence_Embedding.ipynb
ko.bin		ko.bin
neural_network.h5		neural_network.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Detection

A keras implementation of Bidirectional-LSTM for 'Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems.

To distinguish ID(in-domain), OOD(out-of-domain) sentence, I went through the following process.

Autoencoder Model Loss

ID-sentence reconstruction error rate

OOD-sentence reconstuction error rate

About

Releases

Packages

Languages

pAciFic132/Sentence-Detection

Folders and files

Latest commit

History

Repository files navigation

Sentence Detection

A keras implementation of Bidirectional-LSTM for 'Neural sentence embedding using only in-domain sentences for out-of-domain sentence detection in dialog systems.

To distinguish ID(in-domain), OOD(out-of-domain) sentence, I went through the following process.

Autoencoder Model Loss

ID-sentence reconstruction error rate

OOD-sentence reconstuction error rate

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages