Massively parallel interrogation of protein fragment secretability using SECRiFY reveals features influencing secretory system transit
This is the python code for training, evaluating and visualizing a convolutional neural network for secretability prediction. The corresponding publication can be found at https://www.nature.com/articles/s41467-021-26720-y
Packages used:
tensorflow v1.10.0
numpy v1.14.5
sklearn v0.23.1
From the results tables found online, and put in a directory initial_data
, you can prepare the datafiles used in these scripts. This can be done by simply running
Uses files:
initial_data/Pp_resultstable_enriched.txt
initial_data/Pp_resultstable_depleted.txt
initial_data/Sc_resultstable_enriched.txt
initial_data/Sc_resultstable_depleted.txt
python create_data_files.py
The files containing the different folds for cross-validation are saved in the directory data
.
The neural network model is constructed and trained by executing the main.py
script.
python main.py <dataset> <varlen_reduction_strategy>
Args:
dataset
one ofpp
,sc
varlen_reduction_strategy
one ofglobal_maxp
,k_maxp
,gru
,zero_padding
Predictions over all folds are stored in a single file in the predictions
directory. Trained model parameters for all folds are stored in the parameters
directory, with one file per fold. Finally, the AUROC
and AUPRC
are calculated for all predictions.
It is also possible to calculate the AUROC
and AUPRC
afterwards. To this end, execute
python eval.py <prediction_file>
Using the model parameters (using the stored files with their original names), feature contribution can be calculated by running the following command. Note that output is written to stdout
.
python vis_calc_ig.py <parameter_file> <dataset_name>
Args:
parameter_file
one of the parameter directories (ending in foldX) in theparameters
directorydataset
one ofpp
,sc
Then, the calculated contributions can be normalized using vis_normalize_ig
. Note that output is written to stdout
.
python vis_normalize_ig.py <old_file_name> > <normalized_file_name>
And finally, the normalized values can be aggregated by running:
python vis_quantify_ig.py <normalized_file_name>
We recommend to then copy this to an excel spreadsheet to start analyzing the result.