# GSD Compare protein-protein interactions statistics of ys RNase MRP from two different groups via PDBsum data

This is an effort to adapt the notebook I made to look at the interaction statistics for RNase MRP vs. RNase P, [GSD Compare protein-protein interactions statistics of ys RNase MRP vs. RNase P via PDBsum data](GSD%20Compare%20protein-protein%20interactions%20statistics%20of%20ys%20RNase%20MRP%20v%20RNase%20P%20via%20PDBsum%20data.ipynb), to look at the combinations of protein-protein interactions for the yeast RNase MRP Cryo-EM structures published by two different groups in 2020. Since the single structure from one group, [PDB id 6w6v](https://www.rcsb.org/structure/6w6v), doesn't include any substrate-like ligand, I'll use the one from Lan et al., 2020 that also doesn't include an substrate-like ligand, [PDB id 7c79](https://www.rcsb.org/structure/7c79). 

References for the two structures:

- [Cryo-EM structure of catalytic ribonucleoprotein complex RNase MRP.
Perederina A, Li D, Lee H, Bator C, Berezin I, Hafenstein SL, Krasilnikov AS.
Nat Commun. 2020 Jul 10;11(1):3474. doi: 10.1038/s41467-020-17308-z.
PMID: 32651392](https://pubmed.ncbi.nlm.nih.gov/32651392/)

- [Structural insight into precursor ribosomal RNA processing by ribonuclease MRP.
Lan P, Zhou B, Tan M, Li S, Cao M, Wu J, Lei M.
Science. 2020 Aug 7;369(6504):656-663. doi: 10.1126/science.abc0149. Epub 2020 Jun 25.
PMID: 32586950](https://pubmed.ncbi.nlm.nih.gov/32586950/)

**Note:** [PDB id 6w6v](https://www.rcsb.org/structure/6w6v) from the Perederina *et al.*, 2020 lacks a model for chain C (Pop3p) because the data for that protein was not resolved well enough to be modeled. 

----

**Step #1:** Get the necessary script.  
Running the next cell will copy the script from Github into the current working directory, if it isn't there already.

In [2]:
import os
file_needed = "pdbsum_prot_interface_statistics_comparing_two_structures.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/{file_needed}

**Step #2:** Make the script callable here.  
By running the following command. we'll bring the main function into the namespace of the notebook in a way that we can call that function later.

(This relies on approaches very similar to those illustrated [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/PatMatch%20with%20more%20Python.ipynb#Passing-results-data-into-active-memory-without-a-file-intermediate) and [here](https://github.com/fomightez/patmatch-binder/blob/6f7630b2ee061079a72cd117127328fd1abfa6c7/notebooks/Sending%20PatMatch%20output%20directly%20to%20Python.ipynb##Running-Patmatch-and-passing-the-results-to-Python-without-creating-an-output-file-intermediate). See the first notebook in this series, [Working with PDBsum in Jupyter Basics](../Working%20with%20PDBsum%20in%20Jupyter%20Basics.ipynb), for a related, more fully-explained example with a different script.)

In [3]:
from pdbsum_prot_interface_statistics_comparing_two_structures import pdbsum_prot_interface_statistics_comparing_two_structures

**Step #2:** Run the script to make the summary dataframe.  
The next cell will make the dataframe by calling the function and supplying it with **two** PDB codes as arguments. Then the `df` line at the bottom allows for displaying the produced dataframe that summarizes the interaction statistics for both structures.

In [4]:
df = pdbsum_prot_interface_statistics_comparing_two_structures("6w6v","7c79")
df

Parsing interaction statistics from PDBsum ...
Interface statistics for provided structures read and converted to a single dataframe...

Keep in mind this only compares portions in the structure for which there was experimental data.
You'll want to explore the 'Missing Residues' of any chains of interest.


A dataframe of the data has been saved as a file
in a manner where other Python programs can access it (pickled form).
RESULTING DATAFRAME is stored as ==> 'int_stats_comparison_pickled_df.pkl'
Returning a dataframe with the information as well.

Unnamed: 0_level_0,No. of interface residues,No. of interface residues,Interface area (Å2),Interface area (Å2),No. of salt bridges,No. of salt bridges,No. of disulphide bonds,No. of disulphide bonds,No. of hydrogen bonds,No. of hydrogen bonds,No. of non-bonded contacts,No. of non-bonded contacts
Unnamed: 0_level_1,6w6v,7c79,6w6v,7c79,6w6v,7c79,6w6v,7c79,6w6v,7c79,6w6v,7c79
Chains,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
B:F,5:3,4:3,147:172,153:168,-,-,-,-,2,1.0,15.0,15.0
B:G,15:14,21:19,1012:1036,1044:1126,2,6,-,-,1,5.0,23.0,56.0
B:I,3:4,2:4,215:195,212:186,-,-,-,-,-,1.0,13.0,12.0
B:L,17:19,32:25,1321:1315,1793:1778,-,-,-,-,2,6.0,55.0,119.0
D:J,35:35,34:37,2251:2269,2237:2306,2,1,-,-,10,11.0,124.0,127.0
D:K,25:22,26:25,1634:1712,1605:1605,1,4,-,-,9,10.0,97.0,122.0
E:G,3:3,,119:114,,-,,-,,1,,6.0,
E:H,7:7,5:5,327:316,236:219,1,-,-,-,1,1.0,31.0,17.0
E:I,20:21,22:22,1259:1285,1287:1343,1,2,-,-,4,3.0,79.0,96.0
E:J,7:6,10:5,528:544,516:544,3,1,-,-,4,2.0,19.0,24.0


Note that While several pairs of interactions, such as D:J, B:F, and E:I, are very similar. Some are quite different.
Some of these are explained by 'missing residues'....

**Step #4:** [Optional] Make a file of the summary table for downloading.

Save the dataframe as a tab-delimited file that can be used later, opened in Excel. The next cell will make the file.

In [5]:
fn = "ysRNaseMRP_6w6v_v_7c79_interaction_statistics.tsv" #filename for the tab-delimited text table file
df.to_csv(fn, sep='\t')

Now download the file to your local machine if storing the results locally.   
(This notebook should run fairly fast and so it should be easy to just run things again if choosing that as the way to return to viewing the summary.)

-----

Enjoy.