# Using PDBsum data to highlight changes in protein-protein interactions

This notebook builds on some of the basics covered in [Working with PDBsum in Jupyter Basics](Working%20with%20PDBsum%20in%20Jupyter%20Basics.ipynb) in order to compare protein-protein interactions of the same pair of proteins in two different structures.

----

### Retrieving Protein-Protein interface reports/ the list of interactions

This time we will need two different protein-protein interface reports, a.k.a, the list of interactions. We could get them directly from pages such as this [here](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6ah3&template=interfaces.html&o=RESIDUE&l=3) by collecting what is found when clicking on the link to the 'List of interactions' in the bottom right of the page and then copying the contents of each into this session. However, in [the previous notebook in this series](Working%20with%20PDBsum%20in%20Jupyter%20Basics.ipynb), we covered how to use curl to fetch the file into this session by provding a few details in the URL. We'll modify that to allow us to provide information needed and get two sets of data following a very similar way.

Here, the example data has the chains named with the same chain designations in the different PDB files; however, this doesn't have to be the case and the code is written to allow for that.  
**And the code is written for allowing changing of the settings to easily adapt it to examining the interaction of protein-protein pairs in your favorite related structures.**

Source of the data to use:</p>
From [Xue et al 2019 Structural basis of nucleosome recognition and modification by MLL methyltransferases](https://pubmed.ncbi.nlm.nih.gov/31485071/):
>"Here we report cryo-electron microscopy structures of human MLL1 and MLL3 catalytic modules associated with nucleosome core particles that contain H2BK120ub1 or unmodified H2BK120. These structures demonstrate that the MLL1 and MLL3 complexes both make extensive contacts with the histone-fold and DNA regions of the nucleosome; this allows ease of access to the histone H3 tail, which is essential for the efficient methylation of H3K4. The H2B-conjugated ubiquitin binds directly to RBBP5, orienting the association between MLL1 or MLL3 and the nucleosome. The MLL1 and MLL3 complexes display different structural organizations at the interface between the WDR5, RBBP5 and MLL1 (or the corresponding MLL3) subunits, which accounts for the opposite roles of WDR5 in regulating the activity of the two enzymes. These findings transform our understanding of the structural basis for the regulation of MLL activity at the nucleosome level, and highlight the pivotal role of nucleosome regulation in histone-tail modification."

PDB codes for entries from that publication for possible comparison structures:
6KIU, 6KIW, 6KIV, 6KIX,  6KIZ  
(There is also 6PWV, 6PWX, & 6PWW that are very related to those structures and were from a different group.)

----   
    

Let's define the structures to use.

In [1]:
structure1 = "6kiz"
structure2 = "6kix"

Let's define the chains to use

In [2]:
structure1_chain1 = "R"
structure1_chain2 = "N" # or "K" yields different results; don't know which is better illustrating yet
structure2_chain1 = structure1_chain1
structure2_chain2 = structure1_chain2

(As noted earlier, different structures don't necessarily have the same chains designated with the same alphanumeric. In the example they are the same; however, if we imagine in the second structure the chains were `C` and `E` that we were interested in the following code would have been used to assign the chains to use in the cell above.

```python
structure1_chain1 = "R"
structure1_chain2 = "N" # or "K" yields different results; don't know which is better illustrating yet
structure2_chain1 = "C"
structure2_chain2 = "E"
```
)

Next, let's define what we want the data files saved as:

In [3]:
structure1_data_name = "datai_6kiz_R_N.txt"
structure2_data_name = "datai_6kix_R_N.txt"

Code based on first notebook to get the interaction data files for each protein pair for each of the two structures:

In [4]:
def get_protein_inter_data_files(pdb_code,chain1,chain2,output_file_name):
    '''
    Takes a PDB entry accession identifier alphanumeic (PDB code), a chain 
    identifier for chain 1 and chain identifier for chain 2, along with a
    name to give the file produced when the data is retrieved and saved.

    The proteins have to interact in the structure for meaningful data to be returned.
    '''
    source_url = "http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetIface.pl"
    !curl -L -o {output_file_name} --data "pdb={pdb_code.lower()}&chain1={chain1}&chain2={chain2}" {source_url}
# Get data file for structure #1    
get_protein_inter_data_files(structure1,structure1_chain1,structure1_chain2,structure1_data_name)
# Get data file for structure #2   
get_protein_inter_data_files(structure2,structure2_chain1,structure2_chain2,structure2_data_name)

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  7528    0  7502  100    26   9307     32 --:--:-- --:--:-- --:--:--  9328
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6068    0  6042  100    26   3345     14  0:00:01  0:00:01 --:--:--  3358


We can lst the files prsent to varify the data files were obtained.

In [5]:
ls

 data_6kix.txt
 data_6kiz.txt
'Using PDBsum data to highlight changes in protein-protein interactions.ipynb'
'Working with PDBsum in Jupyter Basics.ipynb'


Get the scripts that will examine the two data files for similarity and difference.

In [6]:
# Get a file if not yet retrieved / check if file exists
import os
file_needed = "similarities_in_proteinprotein_interactions.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}
file_needed = "differences_in_proteinprotein_interactions.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21812  100 21812    0     0  43192      0 --:--:-- --:--:-- --:--:-- 43106
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15199  100 15199    0     0  65512      0 --:--:-- --:--:-- --:--:-- 65512


Now because we already defined the names of the data files `structure1_data_name` and `structure2_data_name`, the script knows to use those to analyze to compare residues involved in interactions of the same pair of proteins **in both structures**.

In [7]:
%run -i similarities_in_proteinprotein_interactions.py


Obtaining script containing a function to use to parse the data files from PDBsum ...

Parsing data files from PDBsum ...

Collecting similarities for chain vs chain interactions in the two structures ...

Determination of SIMILARITIES Completed.

************************RESULTS************************
The following interacting pairs of residues occur in both structures:
(289:R, 374:N)
(250:R, 372:N)
(225:R, 379:N)
(225:R, 377:N)
(224:R, 378:N)
(181:R, 380:N)
(228:R, 375:N)
(205:R, 331:N)
(208:R, 331:N)
(204:R, 327:N)
(196:R, 332:N)
(208:R, 332:N)
(223:R, 377:N)

The following residues of chain R contribute to interactions with
chain N in both structures 6kiz & 6kix, yet have differing sets of partners:
225
228
208
205
250
224

The following residues of chain N contribute to interactions with
chain R in both structures 6kiz & 6kix, yet have differing sets of partners:
374
378
54
376
375
379
377
331
380
332
55
The differing sets of partners are detailed by running the 'difference' scri

This script reveals two types of data under this 'similar' heading. One is residue-residue interactions that are shared between the two chains in both structures. These shared interactions are listed first.

Following that residues are listed, for each chain in turn, that maintain interactions of some sort with the other chain still. In order to think about the simplest situation that would manifest a residue being on this list, imagine it participates in an interaction with one residue of the other chain in one strucutre and interacts with a different residue in the other structure. Therefore residues of the same chain that shift interactions from one partner residue to another would be on this list. Therefore for that residue it is 'similar' in that it still contributes to interacting with the other chain. magine each of these only has one other residue it interacts with. Often a residue may interact with more than residue and if the same residue of a single chain has a different list of interaction partners it will also be on this list.

The details on the differences at the residue level of the interactions partners will be made clear when the 'differences' script is run next.

**Note the 'similarities' script highlights protein-protein interactions at the residue-level.  
Keep in mind it doesn't distinguish by type at this time.** For example, if a hydrogen-bond interaction was disrupted between atoms of the same residue, yet the van der Waals interactions between other atoms of that residue remain, the residue-level interactions will still remain highlighted here. **Whether it is all the same type of residue-residue interactions can be explored by further examination**, following the approach outlined in the first notebook in this series to make a dataframe that lists the types, or by analyzing the interactions for those residues in raw data file if it just a few residues you care about, such as `data_6kiz.txt` as compare `data_6kix.txt` in this example case.

The details on the differences where there has been shift in what the one residue interacts with will be made clear when the 'differences' script is run next.

In [8]:
%run -i differences_in_proteinprotein_interactions.py


Parsing data files from PDBsum ...

Collecting differences for chain vs chain interactions in the two structures ...

Determination of DIFFERENCES Completed.

************************RESULTS************************
The following are residue pairings where both members exclusively
interact only in 6kiz :
(210:R, 335:N)

The following are residue pairings where both members exclusively
interact only in 6kix:
(140:R, 52:N)

The following residues of chain R contribute only to interactions
with chain N in 6kiz:
288
162
227
198
201
266
206
210
249
The following residues of chain R contribute only to interactions
with chain N in 6kix:
200
140
143
207
183
184
The following residues of chain N contribute only to interactions
with chain R in 6kiz:
373
335
The following residues of chain N contribute only to interaction
with chain R in 6kix:
371
52
334

If you've previously run the script `similarities_in_proteinprotein_interactions.py`
you received a report listing residues for each chain that

The output of this script includes three ways in which residues in chain 1 differ in interactions with chain 2 in the two structures. 

- The first section lists residue pairings where both members exclusively interact only in only one structure or the other. 
- The second section lists individudal residues of both chains that contribute to the corresponding chain-chain interaction only in one of the structures.
- Finally, the third section details the differing sets of partners for the residues that interact with the other chain in both structures and yet interact with different sets of residues in the two structures. For example, residue 376 of chain #2 (chain designated N) interacts with different resides in the two structures. In struture #1 (6kiz) it interacts with residue 228 of chain #1 and in structure #2 (6kix) it interacts with residue 225.

Hopefully, you are ready to now go up and edit the assignments at the top of this notebook and run the code again to highlight the similarities and differences in interactions of protein pairs in your favorite pairs of structures.

Read on in this notebook if you want some advanced options.

-----

### Making the reports more human readable

Note the report that is generated can be made more informative after-the-fact by running the script again and capturing all the output and then using the string `.replace()` method to change the chain designations that come from the PDBsum data to be the protein names. The idea is that the PDBsum data file doesn't have the protein names yet you can easily add them in after to make the report easier. Adding a way to do this when calling the script would add more trouble than it is probably worth since you can perform this trick.    
An example is below in the next few cells. When you run the next one, you won't see any output; however, it will be captured.

In [9]:
%%capture out
%run -i similarities_in_proteinprotein_interactions.py

This next cell will substitue in the protein names in the desciption of the ineractions of residues that shift to some extent in their associations with residues of the other chain.

In [10]:
sys.stderr.write(out.stderr.replace("chain R","WDR5").replace("chain N","RBBP5"))


Parsing data files from PDBsum ...

Collecting similarities for chain vs chain interactions in the two structures ...

Determination of SIMILARITIES Completed.

************************RESULTS************************
The following interacting pairs of residues occur in both structures:
(289:R, 374:N)
(250:R, 372:N)
(225:R, 379:N)
(225:R, 377:N)
(224:R, 378:N)
(181:R, 380:N)
(228:R, 375:N)
(205:R, 331:N)
(208:R, 331:N)
(204:R, 327:N)
(196:R, 332:N)
(208:R, 332:N)
(223:R, 377:N)

The following residues of WDR5 contribute to interactions with
RBBP5 in both structures 6kiz & 6kix, yet have differing sets of partners:
225
228
208
205
250
224

The following residues of RBBP5 contribute to interactions with
WDR5 in both structures 6kiz & 6kix, yet have differing sets of partners:
374
378
54
376
375
379
377
331
380
332
55
The differing sets of partners are detailed by running the 'difference' script.

The did that for the report generated by the 'similarities' script. You can follow much the same process with the report from the 'differences' script.


----