## Interface statistics basics & comparing Interface statistics for two structures

### Preface

#### Getting table of Interactions Statistics for a structure from under PDBsum's 'Prot-prot' tab via command line.

[Here is the page](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiv&template=interfaces.html&c=999) you'd see if you looked at the 'Prot-Prot' tab for PDB id code [6kiv](https://www.rcsb.org/structure/6KIV). At the bottom of [that page](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiv&template=interfaces.html&c=999) is a table with the heading 'Interface statistics'.  
First, in this notebook we are going to bring that table into Python via a couple routes using the same script.

Then in the remainder of that notebook, that process will be used as a basis via another, related script to make a single summary dataframe that compares the 'Interface statistics' for two different structures conveniently.


### Basic use of the protein-protein inferface statistics-to-dataframe script

The script is easy to use. You just need the script in current working directory and then call it, providing a PDB code to retrieve the inferface statistics table as a Pandas dataframe.

Running the next cell will copy the script from Github into the current working directory.

In [1]:
import os
file_needed = "pdbsum_prot_interface_statistics_to_df.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17613  100 17613    0     0  27306      0 --:--:-- --:--:-- --:--:-- 27306


Next, call that script, providing a PDB identification code for a structure at the Protein Data Bank to examine.

In [2]:
%run pdbsum_prot_interface_statistics_to_df.py 6kiz

Interface statistics for provided structure read and converted to a dataframe...

A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_pickled_df.pkl'

That will have generated a dataframe and saved it in a form of a serialized Python object, which is a fancy way of saying it was saved as a compressed file in a special form so that it is still a Python object. To use the generared dataframe, we need to read it in to the namespace of this running notebook by running the next cell.

In [3]:
import pandas as pd
df = pd.read_pickle("int_stats_pickled_df.pkl")

Now by running the next cell we'll display the dataframe.

In [4]:
df

Unnamed: 0,Chains,No. of interface residues,Interface area (Å2),No. of salt bridges,No. of disulphide bonds,No. of hydrogen bonds,No. of non-bonded contacts
0,A:B,35:29,2477:2525,3,-,6,114
1,A:E,4:4,543:543,2,-,2,9
2,A:G,4:6,583:549,-,-,3,14
3,A:N,1:1,46:39,-,-,1,3
4,B:D,8:7,458:467,-,-,2,29
5,B:G,5:5,372:368,-,-,2,22
6,B:H,2:3,220:200,1,-,1,6
7,B:N,3:3,196:183,-,-,-,9
8,C:D,30:30,2395:2434,2,-,8,105
9,G:H,25:31,2527:2500,4,-,12,127


That shows the Interface statistics has been convered to a Pandas dataframe. 

So far this isn't overly helpful since this same table can be viewed at PDBsum [here](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiv&template=interfaces.html&c=999). However, if you know some Pandas you can already see this is much more useful going forward for you than the table at the PDBsum page.

Additionally, the script used to make this dataframe will form the behind-the-scenes of the bulk of the effort when two structures are compared below.

First, to complete the 'Basics' section, the following will demonstrate using the script in the notebook as a function to go directly to a dataframe without needing to save and read the Pandas dataframe as a serialized Python object (pickle) first. This provides a more convenient way to use the script if you are working in a Jupyter notebook.

#### Basics part 2: Using the main function of the protein-protein inferface statistics-to-dataframe script within a notebook

First we'll fetch the script if it isn't already here. Running the next cell won't cause any issues of the script has already been retrieved.

In [5]:
import os
file_needed = "pdbsum_prot_interface_statistics_to_df.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

This is going to rely on approaches very similar to those illustrated [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/PatMatch%20with%20more%20Python.ipynb#Passing-results-data-into-active-memory-without-a-file-intermediate) and [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/Sending%20PatMatch%20output%20directly%20to%20Python.ipynb##Running-Patmatch-and-passing-the-results-to-Python-without-creating-an-output-file-intermediate). See the first notebook in this series, [Working with PDBsum in Jupyter Basics](Working%20with%20PDBsum%20in%20Jupyter%20Basics.ipynb), for a related, more fully-explained example with a different script.

By running the following command. we'll bring the main function into the namespace of the notebook in a way that we can call that function later.

In [6]:
from pdbsum_prot_interface_statistics_to_df import pdbsum_prot_interface_statistics_to_df

The next cell will make the dataframe by calling the function and supplying it with a PDB code as an argument. Then the `df` line at the bottom allows for displaying the produced dataframe.

In [7]:
df = pdbsum_prot_interface_statistics_to_df("6kiz")
df

Interface statistics for provided structure read and converted to a dataframe...

A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_pickled_df.pkl'

Returning a dataframe with the information as well.

Unnamed: 0,Chains,No. of interface residues,Interface area (Å2),No. of salt bridges,No. of disulphide bonds,No. of hydrogen bonds,No. of non-bonded contacts
0,A:B,35:29,2477:2525,3,-,6,114
1,A:E,4:4,543:543,2,-,2,9
2,A:G,4:6,583:549,-,-,3,14
3,A:N,1:1,46:39,-,-,1,3
4,B:D,8:7,458:467,-,-,2,29
5,B:G,5:5,372:368,-,-,2,22
6,B:H,2:3,220:200,1,-,1,6
7,B:N,3:3,196:183,-,-,-,9
8,C:D,30:30,2395:2434,2,-,8,105
9,G:H,25:31,2527:2500,4,-,12,127


That ends the coverage of the 'basics' where PDBsum's Interface statistics is converted to a dataframe. We can use that process as the basis for further efforts. The remainder of this notebook will demonstrate using that as a basis for making a summary dataframe to compare the interface statistics for two structures.


### Comparing Interface statistics for two structures

Next, the remainder of this Jupyter notebook demonstrates use of the script `pdbsum_prot_interface_statistics_comparing_two_structures.py` to compare the Interface Statistics for two structures conveniently.  
The script above forms the core function behind this and so if you are looking for more information, make sure you have looked at the top of this notebook first. The comparison script simply uses that script to get the table for two structures and then rearranges the data for easy viewing as a dataframe.

Running the next cell will copy the script from Github into the current working directory, if it isn't there already.

In [8]:
import os
file_needed = "pdbsum_prot_interface_statistics_comparing_two_structures.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15275  100 15275    0     0  23829      0 --:--:-- --:--:-- --:--:-- 23792


Next, call that script providing a PDB identification codes for the two structures at the Protein Data Bank to examine.

In [9]:
%run pdbsum_prot_interface_statistics_comparing_two_structures.py 6kiv 6kiz

Parsing interaction statistics from PDBsum ...
Interface statistics for provided structures read and converted to a single dataframe...

Keep in mind this only compares portions in the structure for which there was experimental data.
You'll want to explore the 'Missing Residues' of any chains of interest.


A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_comparison_pickled_df.pkl'

That will have generated a dataframe and saved it in a form of a serialized Python object. As with in the upper section of this notebook, we need to read the generated object in to the namespace of this running notebook by running the next cell.

In [10]:
import pandas as pd
df = pd.read_pickle("int_stats_comparison_pickled_df.pkl")

Now by running the next cell we'll display it.

In [11]:
df

Unnamed: 0_level_0,No. of interface residues,No. of interface residues,Interface area (Å2),Interface area (Å2),No. of salt bridges,No. of salt bridges,No. of disulphide bonds,No. of disulphide bonds,No. of hydrogen bonds,No. of hydrogen bonds,No. of non-bonded contacts,No. of non-bonded contacts
Unnamed: 0_level_1,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz
Chains,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
A:B,43:43,35:29,2546:2614,2477:2525,3,3,-,-,8,6,163,114.0
A:E,6:5,4:4,613:607,543:543,3,2,-,-,2,2,14,9.0
A:G,9:11,4:6,690:644,583:549,-,-,-,-,2,3,26,14.0
A:N,3:3,1:1,124:118,46:39,-,-,-,-,-,1,7,3.0
B:D,9:8,8:7,539:513,458:467,1,-,-,-,2,2,54,29.0
B:G,5:6,5:5,394:388,372:368,-,-,-,-,1,2,23,22.0
B:H,3:2,2:3,249:236,220:200,1,1,-,-,1,1,6,6.0
B:N,3:3,3:3,140:133,196:183,1,-,-,-,-,-,5,9.0
C:D,37:42,30:30,2433:2531,2395:2434,1,2,-,-,7,8,128,105.0
C:E,11:10,8:6,671:712,577:612,-,-,-,-,1,3,30,21.0


The produced summary table (dataframe) makes it much easier to compare the interactions between two different chains.

Note that `NaN` (meaning 'not a number') is filled in for columns where the particular structure doesn't represent interactions for that chain pairing. For example, [6kiz](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiz&template=interfaces.html&c=999) doesn't show chains C and K interacting, see row `C:K` above. In both structures, chain C is histone H2A and chain K is KMT2A, and so it isn't that they represent different chains. While the data in the 6kiz structure has seven less residues experimentally observed for chain C, the interactions in 6kiv don't involve those residues, and so the 'missing residues' don't account for this interactions loss. Clearly, the structures differ in regards to this interaction.

Let's do the same by using the main function of the script to allow skipping saving the file intermediate.

#### Compaing part 2: Using the main function of the protein-protein inferface statistics comparing script within a notebook

First, we'll fetch the script if it isn't already here. Running the next cell won't cause any issues of the script has already been retrieved.

In [12]:
import os
file_needed = "pdbsum_prot_interface_statistics_comparing_two_structures.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

By running the following command. we'll bring the main function into the namespace of the notebook in a way that we can call that function later.

In [13]:
from pdbsum_prot_interface_statistics_comparing_two_structures import pdbsum_prot_interface_statistics_comparing_two_structures

The next cell will make the dataframe by calling the function and supplying it with **two** PDB codes as arguments. Then the `df` line at the bottom allows for displaying the produced dataframe.

In [14]:
df = pdbsum_prot_interface_statistics_comparing_two_structures("6kiv","6kiz")
df

Parsing interaction statistics from PDBsum ...
Interface statistics for provided structures read and converted to a single dataframe...

Keep in mind this only compares portions in the structure for which there was experimental data.
You'll want to explore the 'Missing Residues' of any chains of interest.


A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_comparison_pickled_df.pkl'
Returning a dataframe with the information as well.

Unnamed: 0_level_0,No. of interface residues,No. of interface residues,Interface area (Å2),Interface area (Å2),No. of salt bridges,No. of salt bridges,No. of disulphide bonds,No. of disulphide bonds,No. of hydrogen bonds,No. of hydrogen bonds,No. of non-bonded contacts,No. of non-bonded contacts
Unnamed: 0_level_1,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz,6kiv,6kiz
Chains,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
A:B,43:43,35:29,2546:2614,2477:2525,3,3,-,-,8,6,163,114.0
A:E,6:5,4:4,613:607,543:543,3,2,-,-,2,2,14,9.0
A:G,9:11,4:6,690:644,583:549,-,-,-,-,2,3,26,14.0
A:N,3:3,1:1,124:118,46:39,-,-,-,-,-,1,7,3.0
B:D,9:8,8:7,539:513,458:467,1,-,-,-,2,2,54,29.0
B:G,5:6,5:5,394:388,372:368,-,-,-,-,1,2,23,22.0
B:H,3:2,2:3,249:236,220:200,1,1,-,-,1,1,6,6.0
B:N,3:3,3:3,140:133,196:183,1,-,-,-,-,-,5,9.0
C:D,37:42,30:30,2433:2531,2395:2434,1,2,-,-,7,8,128,105.0
C:E,11:10,8:6,671:712,577:612,-,-,-,-,1,3,30,21.0


Because the function was supplied with the same PDB codes as when the script was run using `%run` (commandline-like), we see the same result. This route is convenient when working in a Jupyter notebook.

-----

Enjoy.