Skip to content

Code & Train/Test Data for “Classifying Free Texts Into Pre-Defined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents” for publication in Chemical Research in Toxicology in 2023.

Notifications You must be signed in to change notification settings

gaeolus/drug_label

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains the code, data, and results for the paper titled "Classifying Free Texts Into Pre-Defined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents" by Magnus Gray, Joshua Xu, Weida Tong, and Leihong Wu.

The following describes the files and file structure.

Folders:

'data': contains the data files used for training and testing the models

	'binary': contains the data files (.csv) for the binary classification task
	'multi': contains the data files (.csv) for the multi-class task

'results': contains the primary results for the BERT-based models

	'binary': contains the BERT results for the binary classification task

		'eval_logs': contains the testing results (.txt)
		'graphs': contains plots of training loss and accuracy (.png)
		'train_logs': contains the training history logs (.txt)

	'multi': contains the BERT results for the multi-calss task

		'eval_logs': contains the testing results (.txt)
		'graphs': contains plots of training loss and accuracy (.png)
		'train_logs': contains the training history logs (.txt)

'rf': contains the results for the random forest models

'shap': contains the results of the SHAP model explainability analysis

'svm': contains the results for the SVM models

Python Files:

'binary.py': the code for fine-tuning and evaluating the binary BERT-based models

'multi.py': the code for fine-tuning and evaluating the multi-class BERT-based models

'rf.py': the code for fine-tuning and evaluating the random forest models

'shap.py': the code for the SHAP model explainability analysis

'svm.py': the code for fine-tuning and evaluating the SVM models

About

Code & Train/Test Data for “Classifying Free Texts Into Pre-Defined Sections Using AI in Regulatory Documents: A Case Study with Drug Labeling Documents” for publication in Chemical Research in Toxicology in 2023.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%