# Structure-based Screening Pipeline

## Task 1: Get query structure from PDB 

Generally, protein structures can be downloaded from the protein data bank ([PDB](http://www.rcsb.org)), the largest freely available deposition of 3D protein structures. To get a suitable structure, you can either do it programmatically or manually. For an automated way, you can look at the teachopenCADD talktorial [T008:Protein data acquisition: Protein Data Bank](https://projects.volkamerlab.org/teachopencadd/talktorials/T008_query_pdb.html). 

To make it easier for you, you can choose a structure manually. On the website, you can use the upper right search field and search for ‘EGFR’, and a list of matching entries will be given. To refine your search you can, e.g., restrict the results to human only, from X-ray, having a good resolution (< 2Å) and a recent deposition date (>2010). 

For kinases, more and detailed information is available on the [KLIFS](https://klifs.net/) webpage. Here, you also find additional information about the DFG-loop state of the available structures as well as some quality criteria, such as the number of missing residues.
Your task is to choose a suitable EGFR structure for docking. Make sure that your structure is complete (no missing residues) around the active site. To keep it simple, try to select a structure containing only one biological unit. Note down the PDB ID of your structure. Using the example code below, you can retrieve the structure from PDB using the ID.


```ruby
# retrieve structure from the Protein Data Bank
pdb_id = "2ito"
structure = Structure.from_pdbid(pdb_id)
# element information maybe missing, but important for subsequent PDBQT conversion
if not hasattr(structure.atoms, "elements"):
    structure.add_TopologyAttr("elements", structure.atoms.types)
structure
```


## Task 2: Binding site detection

If a co-crystallized ligand is not present in your protein structure, you need to specify the binding site manually. Several pocket detection algorithms were developed to detect protein pockets suitable for binding of drug-like molecules. 

Helpful talktorial: [T014-Binding site detection](https://projects.volkamerlab.org/teachopencadd/talktorials/T014_binding_site_detection.html)


### Task 2.1
Use the DoGSiteScorer available at [proteins.plus](http://www.proteins.plus) to identify pockets in your protein-ligand complex. Check the “subpockets” box to receive more fine-grained pocket results. Is the binding site of the co-crystallized ligand scored the best (is it considered druggable)? Is the predicted binding site covering the co-crystallized ligand well? 


### Task 2.2
[*optional*] Download the results from DoGSiteScorer. In the zipped archive locate the PDB file of your favorite predicted binding site inside the residues directory. Open the PDB file in a text editor and note down the geometric center and radius of the binding site. You can visualize the identified geometric center and radius from DoGSiteScorer in PyMol with the use of the pseudoatom command. After creating the pseudoatom with the right coordinates, you can display it as a sphere and change the radius accordingly (Tip: Use the sphere_transparency setting to better visualize all components together). Are you confident about the selected parameters or would you adjust them?

### Task 2.3
[*optional*] Similar to proteins.plus, PyMol can also be used to visualize volumetric maps. In the downloaded DoGSiteScorer results locate your preferred binding pocket in CCP4 format and load it together with the protein-ligand complex into PyMol.




## Task 3: Docking

Before we can start with the docking, we need to prepare our structures. 

### Task 3.1
To automatically prepare the structure, use Protoss (proteins.plus). Protoss adds missing hydrogen atoms to protein structures (PDB-format) and detects reasonable protonation states, tautomers, and hydrogen coordinates of both protein and ligand molecules. Upload your chosen structure to the server and store the optimized PDB structure.

### Task 3.2

For the docking calculations, we need to separate the ligand and protein. Thus, the ligand must be extracted from the protein structure and deleted from the protein structure for docking calculations, otherwise there is no space in the binding site.

**TODO**

### Task 3.3 
Programs based on the AutoDock software require protein and ligand to be prepared in PDBQT format (AutoDock FAQ). This file format is very similar to the PDB format but additionally stores information about atom types and partial charges. Luckily, the OpenBabel package provides functionality for converting between different file formats and calculating partial charges (command: `obabel protein.pdb -O protein.pdbqt --partialcharge gasteiger`). Check if the generated PDBQT files for protein and ligand contain information about the assigned charges for each atom. In theory, we could also use OpenBabel to add hydrogens. However, Protoss already added missing hydrogens considering protein and ligand in complex, which is more accurate than protonating protein and ligand separately.

### Task 3.4
For docking your filtered compounds, you need the 3D structures of the molecules. Calculate the 3D structure of each molecule and save them in a single SDF file.

Tip:
```ruby
w = Chem.SDWriter('data/foo.sdf')
for m in mols:
    AllChem.EmbedMolecule(m)		# calculates 3D structure
    AllChem.UFFOptimizeMolecule(m)	# improves quality of the conformation
    w.write(m)
```

In [None]:
# calculate 3D structure and write to SDF file

### Task 3.5

With everything in hand, we can finally run our docking calculation using Smina with the following command:
```
smina --ligand test_ligand.sdf --receptor protein.pdbqt --out docking_poses.sdf --autobox_ligand ligand.pdbqt --num_modes 1
````
Your docked compounds will all be in `docking_poses.sdf`. 

## Task 4: Visualize docking results 

To visualize your docking results, we will use the NGLView package. 

Helpful tutorial: [T015-Protein ligand docking](https://projects.volkamerlab.org/teachopencadd/talktorials/T015_protein_ligand_docking.html)


**TODO**

To interpret your docking results, there are several indicators you can look at if you have time:  
- Which compounds/poses have the best docking scores?
- Visually inspect the results using a molecule viewer to analyze the binding mode and compare them to the co-crystallized ligand or other known EGFR inhibitors. 
- Is the binding site well covered? Do protein and pose fit to each other (shape and physico-chemically). Are key interactions (e.g. hinge binding motif) present. You may also use PoseView for inspection.

Based on these (and other criteria you can think of), you can filter your compounds down further. In the end, you should select three compounds that seem to be most promising to you.
