# GSD Compare protein-protein interactions statistics of ys RNase MRP vs. RNase P via PDBsum data

This is an effort to adapt the generic notebook I made to look at protein-protein interaction statistics in pairs of related structures, [Interface statistics basics & comparing Interface statistics for two structures](../Interface%20statistics%20basics%20and%20comparing%20Interface%20statistics%20for%20two%20structures.ipynb), to look at the protein-protein interaction statistics for yeast RNase MRP vs. RNase P. 

It is important to note that **Chain K in the two structures is actually two different proteins**: it is Snm1p in RNase MRP (PDB id: 7c7a) and Rprp2p in RNase P (PDB id: 6ah3). Therefore, the data involving that chain should not be considered as it would be comparing 'apples' to 'oranges'. 

----

**Step #1:** Get the necessary script.  
Running the next cell will copy the script from Github into the current working directory, if it isn't there already.

In [1]:
import os
file_needed = "pdbsum_prot_interface_statistics_comparing_two_structures.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 15275  100 15275    0     0  24246      0 --:--:-- --:--:-- --:--:-- 24246


**Step #2:** Make the script callable here.  
By running the following command. we'll bring the main function into the namespace of the notebook in a way that we can call that function later.

(This relies on approaches very similar to those illustrated [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/PatMatch%20with%20more%20Python.ipynb#Passing-results-data-into-active-memory-without-a-file-intermediate) and [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/Sending%20PatMatch%20output%20directly%20to%20Python.ipynb##Running-Patmatch-and-passing-the-results-to-Python-without-creating-an-output-file-intermediate). See the first notebook in this series, [Working with PDBsum in Jupyter Basics](../Working%20with%20PDBsum%20in%20Jupyter%20Basics.ipynb), for a related, more fully-explained example with a different script.)

In [2]:
from pdbsum_prot_interface_statistics_comparing_two_structures import pdbsum_prot_interface_statistics_comparing_two_structures

**Step #2:** Run the script to make the summary dataframe.  
The next cell will make the dataframe by calling the function and supplying it with **two** PDB codes as arguments. Then the `df` line at the bottom allows for displaying the produced dataframe that summarizes the interaction statistics for both structures.

In [3]:
df = pdbsum_prot_interface_statistics_comparing_two_structures("7c7a","6ah3")
df

Obtaining script containing a function to use to parse the interaction statistics from PDBsum ...
Parsing interaction statistics from PDBsum ...
Interface statistics for provided structures read and converted to a single dataframe...

Keep in mind this only compares portions in the structure for which there was experimental data.
You'll want to explore the 'Missing Residues' of any chains of interest.


A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_comparison_pickled_df.pkl'
Returning a dataframe with the information as well.

Unnamed: 0_level_0,No. of interface residues,No. of interface residues,Interface area (Å2),Interface area (Å2),No. of salt bridges,No. of salt bridges,No. of disulphide bonds,No. of disulphide bonds,No. of hydrogen bonds,No. of hydrogen bonds,No. of non-bonded contacts,No. of non-bonded contacts
Unnamed: 0_level_1,6ah3,7c7a,6ah3,7c7a,6ah3,7c7a,6ah3,7c7a,6ah3,7c7a,6ah3,7c7a
Chains,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
B:D,,17:17,,944:965,,-,,-,,6,,71.0
B:F,2:4,4:3,157:141,150:165,-,-,-,-,2,1,9.0,13.0
B:G,21:23,20:17,1202:1244,1100:1173,4,5,-,-,1,4,77.0,48.0
B:I,3:6,2:4,241:216,203:176,1,-,-,-,1,1,27.0,8.0
B:L,,31:31,,1850:1836,,1,,-,,6,,113.0
C:K,37:38,21:20,2129:2137,1267:1259,-,-,-,-,6,4,146.0,91.0
D:J,39:40,39:39,2268:2336,2289:2369,3,1,-,-,10,11,167.0,149.0
D:K,26:25,26:27,1478:1505,1632:1628,-,3,-,-,6,10,135.0,142.0
D:L,,11:11,,649:662,,-,,-,,4,,30.0
E:G,3:4,1:1,152:155,90:80,-,-,-,-,1,-,14.0,1.0


**Step #4:** [Optional] Make a file of the summary table for downloading.

Save the dataframe as a tab-delimited file that can be used later, opened in Excel. The next cell will make the file.

In [4]:
fn = "ysRNaseMRPvRNase P_interaction_statistics.tsv" #filename for the tab-delimited text table file
df.to_csv(fn, sep='\t')

Now download the file to your local machine if storing the results locally.   
(This notebook should run fairly fast and so it should be easy to just run things again if choosing that as the way to return to viewing the summary.)

-----

Enjoy.