Skip to content

A suite of scripts for performing structural replacement i.e. removing a structural feature of a protein and replacing it with a ligand, utilizing PyRosetta.

Notifications You must be signed in to change notification settings

jordantwells42/structural-replacement

Repository files navigation

Structural Replacement

This document will guide you through how to run the structural-replacement scripts for your project. A more detailed tutorial can be found here that walks through a demo example [https://docs.google.com/document/d/1NEq-mbIoxclpstKW4C55wvxyhdNPPFbA7jrmUidYLdk/edit?usp=sharing]


Acknowledgements:

Thanks to Nick Polizzi for creating the van der Mer structural unit and providing access to the vdM database.


Prerequisites:

You do not need to know how to use any of the following, but they will be required by the scripts in order to run

  • Python 3.8+
  • A local installation of PyRosetta
  • The following Python Libraries (these can be installed in a terminal by typing “pip install lib-name”, more info can be found here https://pypi.org/project/pip/)
    • numpy - very useful for math operations
    • pandas - allows the use of the DataFrame, which is essentially a powerful spreadsheet/table
    • scikit-learn - allows use of a nearest neighbors implementation, only needed if using vdMs
    • ProDy - used to create pdb files of vdMs, again only needed if using vdMs
  • If you are using vdMs, an installation of a reduced subset of the vdM database (~300 mb)


General Usage:

Each script is ran with the path to the config file as an argument, such as

run a_script.py path/to/conf.txt


Config File:

The config file contains all of the information needed for the scripts to run.

Information in the [DEFAULT] category is used by all scripts, and each script has its own respective category.

REQUIRED options need to be filled in to fit your own specific project’s needs, while OPTIONAL options do NOT need to be filled in and can be left as is. The default settings in the optional options are what I have found to work the best.

Additionally, a lot of the options in the config files are paths to locations on your computer. Importantly these paths are in relation to where your scripts are located, NOT the config file.


Running the Scripts:

Once the config file has been set up with all of the necessary information. The necessary commands and a brief outline of what each script does is outlined below. If you would like more information on what each script does or how to use them, that can be found here.

run create_table.py conf.txt
  • This will create a spreadsheet from the ligand sdf files provided in the _MoleculeSDFs _option, create Rosetta .params files to make them Rosetta-readable, and create .pdb files for each ligand
  • If you want to have several conformers for each ligand, simply have all of the conformers for each ligand in one file and pass that to MoleculeSDFs
Manual input of atom alignments
  • Here you will fill in the resulting “Molecule Atoms” and “Target Atoms” columns in the generated spreadsheet by adding the atom labels for each that can be found with a software such as PyMOL by clicking on the atoms.
  • For example, if I wanted to align indole to Tryptophan, I’d go into PyMOL and find a substructure they share in common (such as the six-membered ring), choose corresponding atoms on that substructure, and list the atom labels .
  • This would ultimately look like a “C1-C5-C7” in the “Molecule Atoms” columns and a “"CD2-CZ2-CZ3” in the “Target Atoms” column (by chooosing three correpsonding atoms on the six-membered ring)
run grade_conformers.py conf.txt
  • This will align your ligands to your structural feature based off the alignments you have entered, as well as check for backbone collisions with the ligand
  • This will turn the spreadsheet into a Pandas DataFrame and include some information about which conformers passed the collision check. It will save this as a .pkl file
run precalculate_vdm_space.py conf.txt
  • This will use a Resfile to pre-calculate all of the possible interactions in the active site
  • The output of this will be three new files VDMSpaceFileStem-info.pkl, VDMSpaceFileStem-coords.pkl, and VDMSpaceFileStem-trees.pkl.
  • This step will take up to an hour to run, but the results of it can be used several times. If you wish to change the resfile or change the settings, it will have to be ran again
run evaluate_conformers_on_vdm_space.py conf.txt
  • This will use the outputted files from the previous script and give scores for each ligand, as well as store viewable .pdb files of the ligands and each of the interactions identified
run to_csv.py [path/to/dataframe.pkl] [path/to/new_spreadsheet.csv]
  • If you are not comfortable using Pandas or pickle files, use this script at any point to convert a .pkl file DataFrame into an easily-readable .csv
  • The VDMSpace files are not DataFrames so this will not produce anything useful for them


Protocol Capture

  • Pyrosetta: PyRosetta-4 2020 Rosetta PyRosetta4.Release.python38.ubuntu 2020.25+release.d2d9f90b8cbcacfd7a1f69aefa5de610b100e8a9 2020-06-19T14:33:13
  • Python: 3.8.5
  • numpy: 1.21.1
  • pandas: 1.3.0
  • sklearn: 0.24.2
  • ProDy: 2.0
  • Biopython: 1.79

About

A suite of scripts for performing structural replacement i.e. removing a structural feature of a protein and replacing it with a ligand, utilizing PyRosetta.

Topics

Resources

Stars

Watchers

Forks