Recurrent Convolutional Neural Network for Relation Extraction

Tensorflow Implementation of Deep Learning Approach for Relation Extraction Challenge(SemEval-2010 Task #8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals) via Recurrent Convolutional Neural Networks.

Implementation of Recurrent Structure

Bidirectional RNN (Bi-RNN) is used to implement the left and right context vectors.
Each context vector is created by shifting the output of Bi-RNN and concatenating a zero state indicating the start of the context.

Usage

Train

train data is located in "SemEval2010_task8_all_data/SemEval2010_task8_training/TRAIN_FILE.TXT".
"GoogleNews-vectors-negative300" is used as pre-trained word2vec model.

Display help message:

 $ python train.py --help

 optional arguments:
 	-h, --help            show this help message and exit
 	--train_dir TRAIN_DIR
 							Path of train data
 	--dev_sample_percentage DEV_SAMPLE_PERCENTAGE
 							Percentage of the training data to use for validation
 	--max_sentence_length MAX_SENTENCE_LENGTH
 							Max sentence length in train(98)/test(70) data
 							(Default: 100)
 	--word2vec WORD2VEC   Word2vec file with pre-trained embeddings
 	--text_embedding_dim TEXT_EMBEDDING_DIM
 							Dimensionality of word embedding (Default: 300)
 	--position_embedding_dim POSITION_EMBEDDING_DIM
 							Dimensionality of position embedding (Default: 100)
 	--filter_sizes FILTER_SIZES
 							Comma-separated filter sizes (Default: 2,3,4,5)
 	--num_filters NUM_FILTERS
 							Number of filters per filter size (Default: 128)
 	--dropout_keep_prob DROPOUT_KEEP_PROB
 							Dropout keep probability (Default: 0.5)
 	--l2_reg_lambda L2_REG_LAMBDA
 							L2 regularization lambda (Default: 3.0)
 	--batch_size BATCH_SIZE
 							Batch Size (Default: 64)
 	--num_epochs NUM_EPOCHS
 							Number of training epochs (Default: 100)
 	--display_every DISPLAY_EVERY
 							Number of iterations to display training info.
 	--evaluate_every EVALUATE_EVERY
 							Evaluate model on dev set after this many steps
 	--checkpoint_every CHECKPOINT_EVERY
 							Save model after this many steps
 	--num_checkpoints NUM_CHECKPOINTS
 							Number of checkpoints to store
 	--learning_rate LEARNING_RATE
 							Which learning rate to start with. (Default: 1e-3)
 	--allow_soft_placement [ALLOW_SOFT_PLACEMENT]
 							Allow device soft device placement
 	--noallow_soft_placement
 	--log_device_placement [LOG_DEVICE_PLACEMENT]
 							Log placement of ops on devices
 	--nolog_device_placement

Train Example:

$ python train.py --word2vec "GoogleNews-vectors-negative300.bin"

Evalutation

test data is located in "SemEval2010_task8_all_data/SemEval2010_task8_testing_keys/TEST_FILE_FULL.TXT".
You must give "checkpoint_dir" argument, path of checkpoint(trained neural model) file, like below example.

Evaluation Example:

 $ python eval.py --checkpoint_dir "runs/1523902663/checkpoints"

Official Evaluation of SemEval 2010 Task #8
1. After evaluation like the example, you can get the "prediction.txt" and "answer.txt" in "result" directory.
2. Install perl.
3. Move to SemEval2010_task8_all_data/SemEval2010_task8_scorer-v1.2.
```
$ cd SemEval2010_task8_all_data/SemEval2010_task8_scorer-v1.2
```
4. Check your prediction file format.
```
$ perl semeval2010_task8_format_checker.pl ../../result/prediction.txt
```
5. Scoring your prediction.
```
$ perl semeval2010_task8_scorer-v1.2.pl ../../result/prediction.txt ../../result/answer.txt
```
6. The scorer shows the 3 evaluation reuslts for prediction. The official evaluation result, (9+1)-WAY EVALUATION TAKING DIRECTIONALITY INTO ACCOUNT -- OFFICIAL, is the last one. See the README for more details.

SemEval-2010 Task #8

Given: a pair of nominals
Goal: recognize the semantic relation between these nominals.
Example:
- "There were apples, pears and oranges in the bowl."
  → CONTENT-CONTAINER(pears, bowl)
- “The cup contained tea from dried ginseng.”
  → ENTITY-ORIGIN(tea, ginseng)

The Inventory of Semantic Relations

Cause-Effect(CE): An event or object leads to an effect(those cancers were caused by radiation exposures)
Instrument-Agency(IA): An agent uses an instrument(phone operator)
Product-Producer(PP): A producer causes a product to exist (a factory manufactures suits)
Content-Container(CC): An object is physically stored in a delineated area of space (a bottle full of honey was weighed) Hendrickx, Kim, Kozareva, Nakov, O S´ eaghdha, Pad ´ o,´ Pennacchiotti, Romano, Szpakowicz Task Overview Data Creation Competition Results and Discussion The Inventory of Semantic Relations (III)
Entity-Origin(EO): An entity is coming or is derived from an origin, e.g., position or material (letters from foreign countries)
Entity-Destination(ED): An entity is moving towards a destination (the boy went to bed)
Component-Whole(CW): An object is a component of a larger whole (my apartment has a large kitchen)
Member-Collection(MC): A member forms a nonfunctional part of a collection (there are many trees in the forest)
Message-Topic(CT): An act of communication, written or spoken, is about a topic (the lecture was about semantics)
OTHER: If none of the above nine relations appears to be suitable.

Distribution for Dataset

SemEval-2010 Task #8 Dataset [Download]

Relation	Train Data	Test Data	Total Data
Cause-Effect	1,003 (12.54%)	328 (12.07%)	1331 (12.42%)
Instrument-Agency	504 (6.30%)	156 (5.74%)	660 (6.16%)
Product-Producer	717 (8.96%)	231 (8.50%)	948 (8.85%)
Content-Container	540 (6.75%)	192 (7.07%)	732 (6.83%)
Entity-Origin	716 (8.95%)	258 (9.50%)	974 (9.09%)
Entity-Destination	845 (10.56%)	292 (10.75%)	1137 (10.61%)
Component-Whole	941 (11.76%)	312 (11.48%)	1253 (11.69%)
Member-Collection	690 (8.63%)	233 (8.58%)	923 (8.61%)
Message-Topic	634 (7.92%)	261 (9.61%)	895 (8.35%)
Other	1,410 (17.63%)	454 (16.71%)	1864 (17.39%)
Total	8,000 (100.00%)	2,717 (100.00%)	10,717 (100.00%)

Reference

Recurrent Convolutional Neural Network for Text Classification (AAAI 2015), S Lai et al. [paper]
roomylee's cnn-relation-extraction repository
roomylee's rcnn-text-classification repository

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
SemEval2010_task8_all_data		SemEval2010_task8_all_data
result		result
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_helpers.py		data_helpers.py
eval.py		eval.py
rcnn.py		rcnn.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recurrent Convolutional Neural Network for Relation Extraction

Implementation of Recurrent Structure

Usage

Train

Evalutation

SemEval-2010 Task #8

The Inventory of Semantic Relations

Distribution for Dataset

Reference

About

Releases

Packages

Languages

License

roomylee/rcnn-relation-extraction

Folders and files

Latest commit

History

Repository files navigation

Recurrent Convolutional Neural Network for Relation Extraction

Implementation of Recurrent Structure

Usage

Train

Evalutation

SemEval-2010 Task #8

The Inventory of Semantic Relations

Distribution for Dataset

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages