GitHub - jordantwells42/structural-replacement: A suite of scripts for performing structural replacement i.e. removing a structural feature of a protein and replacing it with a ligand, utilizing PyRosetta.

Structural Replacement

This document will guide you through how to run the structural-replacement scripts for your project. A more detailed tutorial can be found here that walks through a demo example [https://docs.google.com/document/d/1NEq-mbIoxclpstKW4C55wvxyhdNPPFbA7jrmUidYLdk/edit?usp=sharing]

Acknowledgements:

Thanks to Nick Polizzi for creating the van der Mer structural unit and providing access to the vdM database.

Prerequisites:

You do not need to know how to use any of the following, but they will be required by the scripts in order to run

Python 3.8+
A local installation of PyRosetta
The following Python Libraries (these can be installed in a terminal by typing “pip install lib-name”, more info can be found here https://pypi.org/project/pip/)
- numpy - very useful for math operations
- pandas - allows the use of the DataFrame, which is essentially a powerful spreadsheet/table
- scikit-learn - allows use of a nearest neighbors implementation, only needed if using vdMs
- ProDy - used to create pdb files of vdMs, again only needed if using vdMs
If you are using vdMs, an installation of a reduced subset of the vdM database (~300 mb)

General Usage:

Each script is ran with the path to the config file as an argument, such as

run a_script.py path/to/conf.txt

Config File:

The config file contains all of the information needed for the scripts to run.

Information in the [DEFAULT] category is used by all scripts, and each script has its own respective category.

REQUIRED options need to be filled in to fit your own specific project’s needs, while OPTIONAL options do NOT need to be filled in and can be left as is. The default settings in the optional options are what I have found to work the best.

Additionally, a lot of the options in the config files are paths to locations on your computer. Importantly these paths are in relation to where your scripts are located, NOT the config file.

Running the Scripts:

Once the config file has been set up with all of the necessary information. The necessary commands and a brief outline of what each script does is outlined below. If you would like more information on what each script does or how to use them, that can be found here.

run create_table.py conf.txt

This will create a spreadsheet from the ligand sdf files provided in the _MoleculeSDFs _option, create Rosetta .params files to make them Rosetta-readable, and create .pdb files for each ligand
If you want to have several conformers for each ligand, simply have all of the conformers for each ligand in one file and pass that to MoleculeSDFs

Manual input of atom alignments

Here you will fill in the resulting “Molecule Atoms” and “Target Atoms” columns in the generated spreadsheet by adding the atom labels for each that can be found with a software such as PyMOL by clicking on the atoms.
For example, if I wanted to align indole to Tryptophan, I’d go into PyMOL and find a substructure they share in common (such as the six-membered ring), choose corresponding atoms on that substructure, and list the atom labels .
This would ultimately look like a “C1-C5-C7” in the “Molecule Atoms” columns and a “"CD2-CZ2-CZ3” in the “Target Atoms” column (by chooosing three correpsonding atoms on the six-membered ring)

run grade_conformers.py conf.txt

This will align your ligands to your structural feature based off the alignments you have entered, as well as check for backbone collisions with the ligand
This will turn the spreadsheet into a Pandas DataFrame and include some information about which conformers passed the collision check. It will save this as a .pkl file

run precalculate_vdm_space.py conf.txt

This will use a Resfile to pre-calculate all of the possible interactions in the active site
The output of this will be three new files VDMSpaceFileStem-info.pkl, VDMSpaceFileStem-coords.pkl, and VDMSpaceFileStem-trees.pkl.
This step will take up to an hour to run, but the results of it can be used several times. If you wish to change the resfile or change the settings, it will have to be ran again

run evaluate_conformers_on_vdm_space.py conf.txt

This will use the outputted files from the previous script and give scores for each ligand, as well as store viewable .pdb files of the ligands and each of the interactions identified

run to_csv.py [path/to/dataframe.pkl] [path/to/new_spreadsheet.csv]

If you are not comfortable using Pandas or pickle files, use this script at any point to convert a .pkl file DataFrame into an easily-readable .csv
The VDMSpace files are not DataFrames so this will not produce anything useful for them

Protocol Capture

Pyrosetta: PyRosetta-4 2020 Rosetta PyRosetta4.Release.python38.ubuntu 2020.25+release.d2d9f90b8cbcacfd7a1f69aefa5de610b100e8a9 2020-06-19T14:33:13
Python: 3.8.5
numpy: 1.21.1
pandas: 1.3.0
sklearn: 0.24.2
ProDy: 2.0
Biopython: 1.79

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
docker		docker
scripts		scripts
structures		structures
vdm_database		vdm_database
.gitignore		.gitignore
README.md		README.md
alignment_answers.txt		alignment_answers.txt
indole-5-methanol.sdf		indole-5-methanol.sdf
indole.sdf		indole.sdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker

docker

scripts

scripts

structures

structures

vdm_database

vdm_database

.gitignore

.gitignore

README.md

README.md

alignment_answers.txt

alignment_answers.txt

indole-5-methanol.sdf

indole-5-methanol.sdf

indole.sdf

indole.sdf

Repository files navigation

Structural Replacement

About

Languages

jordantwells42/structural-replacement

Folders and files

Latest commit

History

Repository files navigation

Structural Replacement

About

Topics

Resources

Stars

Watchers

Forks

Languages