Bert-Training-and-News-Classification

Pretraining BERT and apply it to character level chinese news text classification.
实验使用Facebook实现的一个Transformer中的Encoder来完成Bert模型，链接：fairseq。

Bert Training

论文地址：BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Generate Data

run python gen_bert_index_data.py chars_vocab_path raw_text_data index_num_data. For example, python gen_bert_index_data.py corpus/chars.lst data/train.txt idx_data/train.txt.

Train Model

change the train data path or valid data path or other settings in Configs/bert.json, or use the default settings.
run python bert_train.py(use default bert configuration json file path Configs/bert.json) or python bert_train.py bert_config_json_file_path

Test Model

change the test data path or prediction output path in Configs/bert.json, or use the default settings.
run python bert_test.py, compare the mask prediction and is_next_sent label with raw text stored in data directory, and compute the accuray of prediction.

Chinese News Classification

Train Model

change the train data path or valid data path or other settings in Configs/para_cls.json, or use the default settings.
run python Chinese_news_cls_train.py(use default configuration json file path Configs/para_cls.json) or python Chinese_news_cls_train.py custom_config_json_file_path

Test Model

change the test data path or other settings in Configs/para_cls.json, or use the default settings.
run python Chinese_news_cls_test.py(use default configuration json file path Configs/para_cls.json) or python Chinese_news_cls_test.py custom_config_json_file_path

Report F1 Score

Run function gen_csv_report in report.py to get the report csv which contains the confusion matrix.
Run function compute_macro_F1 or 'compute_micro_F1' to get the macro F1 score or micro F1 score from the confusion matrix.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
Chinese_News_Classification		Chinese_News_Classification
Configs		Configs
Models		Models
corpus		corpus
data		data
idx_data		idx_data
loaders		loaders
modules		modules
Chinese_news_cls_test.py		Chinese_news_cls_test.py
Chinese_news_cls_train.py		Chinese_news_cls_train.py
README.md		README.md
bert_test.py		bert_test.py
bert_train.py		bert_train.py
config.py		config.py
gen_bert_index_data.py		gen_bert_index_data.py
mask_cross_entropy_loss.py		mask_cross_entropy_loss.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bert-Training-and-News-Classification

Bert Training

Generate Data

Train Model

Test Model

Chinese News Classification

Train Model

Test Model

Report F1 Score

About

Releases

Packages

Languages

lvbu12/Bert-Training-and-News-Classification

Folders and files

Latest commit

History

Repository files navigation

Bert-Training-and-News-Classification

Bert Training

Generate Data

Train Model

Test Model

Chinese News Classification

Train Model

Test Model

Report F1 Score

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages