Skip to content

Training Conditional Random Fields (CRF) by Maximizing Area Under the ROC Curve (AUC)

License

Notifications You must be signed in to change notification settings

realbigws/DeepCNF_AUC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

-----------------------
DeepCNF_Package (v1.02)
date: 2015.05.30
-----------------------

Title:
Training Conditional Random Fields by Maximizing AUC

Author: Sheng Wang 
Contact email: realbigws@gmail.com 

References:
1. AUC-maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
	Sheng Wang#, Siqi Sun, Jinbo Xu
	ECML/PKDD, 2016
	https://link.springer.com/chapter/10.1007/978-3-319-46227-1_1

2. AUCpreD: Proteome-level Protein Disorder Prediction by AUC-maximized Deep Convolutional Neural Fields
	Sheng Wang#, Jianzhu Ma, Jinbo Xu
	ECCB, 2016
	Bioinformatics, 2016
	https://academic.oup.com/bioinformatics/article-abstract/32/17/i672/2450776

--------------------------------------------------------


=========
Abstract:
=========

1. The training program of Deep Convolutional Neural Filed ( DeepCNF ) 
for maximizing (1) log probability, (2) posterior probability , or (3) AUC value.

2. The predicting program of DeepCNF for a given feature file 
by using a trained model.



========
Install:
========

1. download the package

git clone https://github.com/realbigws/DeepCNF_AUC/
cd DeepCNF_AUC/

+++++++++++++++++++++

2. compile

2.1 compile under single CPU

cd source_code/
        make
cd ../

-------------

2.2 compile under MPI/OpenMP

cd source_code/
	make mpi
cd ../



======
Usage:
======

1. DeepCNF_Train

Usage :

mpirun -np NP ./DeepCNF_Train -r train_file -t test_file
             -w window_str -d node_str -s state_num -l feat_num [-S feat_select]
            [-n finetune_num] [-f finetune_reg] [-m init_model] [-o out_root]
            [-G gate_function] [-W label_weight] [-M method] [-D AUC_degree]
Options:

-np NP :             number of processors.

-r train_file :      training file.

-t test_file :       testing file.

-w window_str :      window string for DeepCNF. e.g., '5,5'

-d node_str :        node string for DeepCNF. e.g., '40,20'

-s state_num :       state number.

-l feat_num :        feature number at each position.

-S feat_select :     feature selection. e.g., '1-7,9' [default uses all features]

-n finetune_num :    fine-tune iteration number. [default is 200]

-f finetune_reg :    fine-tune regularizer. [default is 0.5]

-m init_model :      file for initial model parameters. [optional, and default is NULL]

-o out_root :        output directory for trained models. [optional, and default is 'MODELS/']

-G gate_function :   gate function type: sigmoid (1), tanh (2), and relu (0). [default is 1]

-M method :          maximize log_prob (0), posterior_prob (1), or AUC (2). [default is 0]

-W label_weight :    label weight. e.g., '0.1,0.9'. [default is 1 for each label]

-D AUC_degree :      degree (1-30) of polynomials for AUC [default is 3].

-------------------------------------


2. DeepCNF_Pred

Usage :

./DeepCNF_Pred -i input_file -w window_str -d node_str -s state_num -l feat_num
               -m init_model [-S feat_select]
Options:

-i input_file :      input feature file.

-w window_str :      window string for DeepCNF. e.g., '5,5'

-d node_str :        node string for DeepCNF. e.g., '40,20'

-s state_num :       state number.

-l feat_num :        feature number at each position.

-m init_model :      file for trained model.

-S feat_select :     feature selection. e.g., '1-7,9' [default uses all features]

---------------------------------------



========
Example:
========

1. Maximize log probability:

cd examples/
	./DeepCNF_Train -r 2fzlA.reso -t 2fzlA.reso -w 5 -d 5 -s 3 -l 40 -n 50
cd ../

---------------------

2. Maximize AUC value:

cd examples/
	./DeepCNF_Train -r 1kq1S.feat -t 1kq1S.feat -w 5 -d 5 -s 2 -l 129 -n 50 -M 2 
cd ../

---------------------

3. Predict the labels of a given feature file by using a certain model:

cd examples/
	./DeepCNF_Pred -i 1kq1S.feat -w 5 -d 5 -s 2 -l 129 -m model.10
cd ../


---------------------

4. make feature file for DeepCNF

cd examples/
	./DeepCNF_FeatMake 5fgnA.profile 5fgnA.label > 5fgnA.feat
cd ../



==================
Input file format:
==================

1. Training data format

//-> data format ( length, features, labels [should be authentic] )
//-> example below:

78
feat1 feat2 feat3, ..., featN
....  (with 78 feature lines)
0
1
0
2
...  ( with 78 label lines)
54
feat1 feat2 feat3, ..., featN
....  (with 54 feature lines)
0
1
0
2
...  ( with 54 label lines)

----------------------------------


2. Testing data format

//-> data format ( length, features, labels [can be anything] )
//-> example below:

78
feat1 feat2 feat3, ..., featN
....  (with 78 feature lines)
-1
-1
-1
-1
...  ( with 78 label lines)
54
feat1 feat2 feat3, ..., featN
....  (with 54 feature lines)
-1
-1
-1
-1
...  ( with 54 label lines)



About

Training Conditional Random Fields (CRF) by Maximizing Area Under the ROC Curve (AUC)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published