Supplementary scripts for the paper "Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of protein therapeutics"
Documentation for the Rosetta SimpleMetric can be found at PTMPredictionMetric. All models are included in Rosetta and can be used through the RosettaScripts XML interface.
A conda build with the required libraries can be created with build_conda.sh
.
For compiling Rosetta with Tensorflow, see information on this page.
To train models from scratch use either the training.py
or the multi_training.py
scripts, e.g. python ./training.py -p NlinkedGlycosylation
.
For training the ptm_data.csv.gz found in ./data/
needs to be uncompressed first.
A function for calculating the features used is in calc_features.py
(requires PyRosetta to be installed in the conda environment).
PDB files can be found in ./data
and deamidation probabilities can be calculated with ./deamidation.sh
which uses the ./XML/deamidation.xml
protocol.
For designing the aquired glycosylation sites, use the ./influenza_design.sh
script. PDB files can be found in data and glycosylation probabilities can be calculated with ./influenza.sh
which uses the ./XML/influenza.xml
protocol.
In oder to run the Monte Carlo optimization use the run_phospho_opt.sh
script. The input pdb structure is in ./data
and the RosettaScript protocol in ./XML
.
All trained models can be found in ./models/
.