Baseline for Chinese Natural Language Inference (CNLI) dataset

Description

This is the code we used to establish a baseline for the Chinese Natural Language Inference (CNLI) corpus.

Data

The CNLI dataset can be downloaded at here

Both the train and dev set are tab-separated format. Each line in the train (or dev) file corresponds to an instance, and it is arranged as：

sentence-id premise hypothesis label

Model

This repository includes the baseline model for Chinese Natural Language Inference (CNLI) dataset. We choose the Decomposable Attention Model as our baseline model. More details about the model can be found in the original paper.

Requirements

python 3.5
tensorflow '1.4.0'
jieba 0.39

Training

Data Preprocessing
We use jieba to tokenize the sentences. During trainging, we use the pre-trained SGNS embedding introduced in Analogical Reasoning on Chinese Morphological and Semantic Relations. You can download the sgns.merge.word from here.

Main Scripts
config.py：the parameter configuration.
decomposable_att.py: implementation of the Decomposable Attention Model.
data_reader.py: preparing data for the model.
train.py: training the Decomposable Attention Model.

Running Model
You can train the model by the following command line:

python3 train.py

Results

We adopt early stopping on dev set. The best results are shown in the following table:

data	accuracy(%)
train	64.88
dev	58.70

Reporting issues

Please let us know, if you encounter any problems.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
BaselineModel		BaselineModel
CNLI_Data		CNLI_Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Baseline for Chinese Natural Language Inference (CNLI) dataset

Description

Data

Model

Requirements

Training

Results

Reporting issues

About

Releases

Packages

Languages

fssqawj/CNLI

Folders and files

Latest commit

History

Repository files navigation

Baseline for Chinese Natural Language Inference (CNLI) dataset

Description

Data

Model

Requirements

Training

Results

Reporting issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages