EventExtraction

Extract Service-related Event From Social Media News.

Usage

Preparation

Clone this repository

git clone --recursive http://192.168.1.104:12345/serviceecosystem/eventextraction.git

download albert pre-trained model

wget https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip

Note: This is albert tiny. For more pre-trained albert model, please visit https://github.com/brightmart/albert_zh

Format your file

If your data is news data like this:

<title> \t timestamp
鲜丰水果完成红杉领投B轮融资，已拥有1100+家全国门店	2018-01-22

Simply run the preprocess script.

python src/preprocess.py

After this you need to split your data into train and test(dev) set. You will get a json file like below:

{
"text": "<sentence1>,<sentence2>,<timestamp>",
"labels": [[0, 1, "LOC"], [2, 3, "ORG"], [15, 25, "Cause"]]
}
{
"text": "<sentence1>,<timestamp>",
"labels": [[0, 1, "LOC"], [2, 3, "ORG"], [14, 24, "Cause"]]
}

Note:

You must attach the timestamp (if you do not need it, just treat it as a placeholder)
I have label the sentence pair relationship on the timestamp. (Don't worry, input_fn will process this problem)

Train joint learning model

cd src/albert_zh
python joint_learning.py --do_train true \
    --data_dir your/path/to/data/directory/ \
    --vocab_file your/path/to/vocab.txt  \
    --bert_config_file your/path/to/albert_config_tiny.json \
    --max_seq_length 128 \
    --train_batch_size 32 \
    --learning_rate 1e-5 \
    --num_train_epochs 3 \
    --init_checkpoint your/path/to/albert_model.ckpt \
    --output_dir your/path/to/output_dir/

Normally you do not need modify the max_predictions_per_seq, train_batch_size, learning_rate.

Using the model to do prediction or evaluation

Do Predict

cd src/albert_zh
python joint_learning.py --do_predict true \
    --data_dir your/path/to/data_dir/ \
    --vocab_file your/path/vocab.txt \
    --bert_config_file your/path/to/albert_config_tiny.json \
    --max_seq_length 128 \ 
    --output_dir your/path/to/output_dir/ \
    --predict_batch_size 32

If you need to do evaluation just change --do_predict to --do_eval.

How to Use Other Pre-training models (BERT, RoBERT)?:

I use the albert as pre-trained model in my joint learning model. Because ALBERT is much smaller than BERT. So that it much faster to train/predict/eval.

This code can be easily modified to suit BERT or RoBERT. Only need to change some import in src/albert_zh/joint_learning.py.

What should I do if I only want to do NER/Text-Pair-Classification?

You can modify the souce code. In src/albert_zh/joint_learning.py, you can find

total_loss = task1_loss + task2_loss

where task1_loss is the NER job loss and task2_loss is the Text-Pair-Classification loss. You can modify the loss weight to control which job is more important.

Specifically, when you set task1_loss weight to 0, then this only do text pair classification task.

Can I use this model to do single sentence classification?

Of course you can.

What you need to do is change get_labelrs. and make sure your each of your input sentence doesn't contain ,.

How can I use custom NER-tag and sentence-pair-label?

You can modify get_labeles and get_labelrs respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EventExtraction

Usage

Preparation

Format your file

Train joint learning model

Using the model to do prediction or evaluation

How to Use Other Pre-training models (BERT, RoBERT)?:

What should I do if I only want to do NER/Text-Pair-Classification?

Can I use this model to do single sentence classification?

How can I use custom NER-tag and sentence-pair-label?

About

Releases

Packages

Languages

icecity96/eventextraction

Folders and files

Latest commit

History

Repository files navigation

EventExtraction

Usage

Preparation

Format your file

Train joint learning model

Using the model to do prediction or evaluation

How to Use Other Pre-training models (BERT, RoBERT)?:

What should I do if I only want to do NER/Text-Pair-Classification?

Can I use this model to do single sentence classification?

How can I use custom NER-tag and sentence-pair-label?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages