FAST neural relation extraction. We have implemented several methods of neural relation extraction. This project aims to help beginners of relation extraction start experiments quickly. The source code is referenced but not limited to Lin et.al (2016)。
- Python(2 or 3)
- Numpy
- Tensorflow (>=1.4)
- sklearn
We also split the network into four layers: Embedding layer,Encoder layer,Selector layer and Classifier layer. Among them,Encoder layer and Selector layer are the main parts we want to extend. The role of each layer is as follows:
- Embedding-layer:trainsform text to word embedding and position embedding...
- Encoder-laye: use CNN or PCNN to extract features of sentences.
- Selector-layer:use selective attention to calculate weights for each sentence,then produce the bag feature
- Classifier-layer:use cross-entropy to calculate loss of the network.
init_data.py
: use this script to convert plain text into numpy format.engine.py
: who controls the process of trainging and testing.main.py
: the entrypoint of the project, initailize the arguments.model.py
: integrate all layers.utils.py
: contains codes for common use.
Before you run the model for the first time, you should initialize the dataset:
python main.py --is_train=True --preprocess=True [--clean=True]
The--clean
option determines whether to clear existing model files. After it, there is no need to initialize the data again.
Run the model:
python main.py --is_train=True
Test the model:
python main.py --is_train=False
Our dataset are based on Lin et.al. but we keep the entity types. The following list are all files:
- relation2id.txt:relation mapping file
- type2id.txt:entity type mapping file
- vec.txt:word2vec file
- train.txt:training set
- test.txt:testing set
The dataset can be download from here. The format of data is as follows:
entity1_mid entity2_mid entity1 entity2 entity1_type entity2_type relation_label sentence ###END###