# GSD Highlight changes in the protein-protein interactions of ys RNase MRP ys RNase MRP from two different groups via PDBsum data

This is an effort to adapt the notebook I made to look at protein-protein interactions in RNase MRP vs. RNase P, [GSD Highlight changes in the protein-protein interactions of ys RNase MRP v RNase P via PDBsum data](GSD%20Highlight%20changes%20in%20the%20protein-protein%20interactions%20of%20ys%20RNase%20MRP%20v%20RNase%20P%20via%20PDBsum%20data.ipynb), to look at the combinations of protein-protein interactions for the yeast RNase MRP Cryo-EM structures published by two different groups in 2020. Since the single structure from one group, [PDB id 6w6v](https://www.rcsb.org/structure/6w6v), doesn't include any substrate-like ligand, I'll use the one from Lan et al., 2020 that also doesn't include an substrate-like ligand, [PDB id 7c79](https://www.rcsb.org/structure/7c79). 

References for the two structures:

- [Cryo-EM structure of catalytic ribonucleoprotein complex RNase MRP.
Perederina A, Li D, Lee H, Bator C, Berezin I, Hafenstein SL, Krasilnikov AS.
Nat Commun. 2020 Jul 10;11(1):3474. doi: 10.1038/s41467-020-17308-z.
PMID: 32651392](https://pubmed.ncbi.nlm.nih.gov/32651392/)

- [Structural insight into precursor ribosomal RNA processing by ribonuclease MRP.
Lan P, Zhou B, Tan M, Li S, Cao M, Wu J, Lei M.
Science. 2020 Aug 7;369(6504):656-663. doi: 10.1126/science.abc0149. Epub 2020 Jun 25.
PMID: 32586950](https://pubmed.ncbi.nlm.nih.gov/32586950/)


----


**Step #1:** Make a table with a matrix of the protein-protein combinations in the pairs of cryo-EM structures of RNase MRP and RNase P. In this notebook I used the process outlined at the top of [Automagically making a table of all protein-protein interactions for two structures](../Automagically%20making%20a%20table%20of%20all%20protein-protein%20interactions%20for%20two%20structures.ipynb) to do that; however, I include a step to do some custom edits of the list of pairs. This is because.

When first contemplating making the table in a more automated manner I made some notes about some edge-like cases to make sure I catch adn I'm leaving those here for myself until I am super sure I am catching everything:  
.. note that that the interactions data file has to be 'empty' for both structures because if one has none and the other has some like Pop1p[chain B] and Pop5p[chain E] that interact in Rnase P but not in RNase MRP, this is definitely important differences to catch; just to not the flip side of that is that in Rnase MRP Pop1p gains interactions with Rmp1[chain L] and Pop4p[Chain D] not seen in RNase P.)

The next cell is used to define the structures of interest. The PDB code identifiers are supplied.

In [1]:
structure1 = "6w6v"
structure2 = "7c79"

The next cell gets the script `pdb_code_to_prot_prot_interactions_via_PDBsum.py` (see [here](https://github.com/fomightez/structurework/tree/master/pdbsum-utilities)) that will get the 'Interface Summary' information for each individual structure. This is the equivalent to the Summary on the left side of the 'Prot-prot' tab. The next cell also imports the main function of that script.

In [2]:
import os
file_needed = "pdb_code_to_prot_prot_interactions_via_PDBsum.py"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/pdb_code_to_prot_prot_interactions_via_PDBsum.py
from pdb_code_to_prot_prot_interactions_via_PDBsum import pdb_code_to_prot_prot_interactions_via_PDBsum

In [3]:
structure1_il = pdb_code_to_prot_prot_interactions_via_PDBsum(structure1)
structure2_il = pdb_code_to_prot_prot_interactions_via_PDBsum(structure2)
i_union = set(structure1_il).union(set(structure2_il))

Next the union of all the pairs is used to make a table like constructed at the top of [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).

In [4]:
s = ""
for pair in list(i_union):
    s+= f"{structure1} {pair[0]} {pair[1]} {structure2} {pair[0]} {pair[1]}\n"
%store s >int_matrix.txt

Writing 's' (str) to file 'int_matrix.txt'.


**Step #2:** Move the Snakefile to process the table of interactions to this directory.

In [5]:
!cp ../Snakefile .

**Step #3:** Run snakemake and it will process the `int_matrix.txt` file to extract the information and make individual notebooks corresponding to analysis of the interactions for each line.  

In [6]:
!snakemake --cores 1

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1 (use --cores to define parallelism)[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	all
	17	convert_scripts_to_nb_and_run_using_jupytext
	1	make_archive
	1	read_table_and_create_py
	20[0m
[32m[0m
[32m[Mon Feb  1 18:40:36 2021][0m
[32mrule read_table_and_create_py:
    input: int_matrix.txt
    output: interactions_report_for_6w6v_B_G_7c79_B_G.py, interactions_report_for_6w6v_D_K_7c79_D_K.py, interactions_report_for_6w6v_D_J_7c79_D_J.py, interactions_report_for_6w6v_E_H_7c79_E_H.py, interactions_report_for_6w6v_H_J_7c79_H_J.py, interactions_report_for_6w6v_C_K_7c79_C_K.py, interactions_report_for_6w6v_E_J_7c79_E_J.py, interactions_report_for_6w6v_F_G_7c79_F_G.py, interactions_report_for_6w6v_B_D_7c79_B_D.py, interactions_report_for_6w6v_B_F_7c79_B_F.py, interactions_report_for_6w6v_B_L_7c79_B_L.py, interactions_report_for_6w6v_B_I_7c79_B_I.py, inter

For those knowlegeable with snakemake, I will say that I set the number of cores as one because I was finding with eight that occasionally a race condition would ensue where some of the auxillary scripts by notebooks would overwrite each other as they was being accessed by another notebook causing failures. Using one core avoids that hazard. I will add though that in most cases if you use multiple cores, you can easily get the additional files and a new archive made by running snakemake with your chosen number of cores again.

I never saw a race hazard with my clean rule, and so if you want to quickly start over you can run `!snakemake --cores 8 clean`.

**Step #3:** Verify the Jupyter notebooks with the reports were generated.  

Run `!snakemake --cores 8 clean` in a cell to reset things if all is not correct, and then try running the `!snakemake --cores 1` step above, again.

**Step #4:** If you don't want to fix the reports by adding the protein names (see below), download the archive.


-----

Please continue on with the notebook [GSD Adding protein names to protein-protein interactions reports for ys RNase MRP v RNase P](GSD%20Adding%20protein%20names%20to%20protein-protein%20interactions%20reports%20for%20ys%20RNase%20MRP%20v%20RNase%20P.ipynb) to swap the protein names into the reports for easier reading. Since chain K in both of these is Snm1p, I'll uncomment out the line for that when running that notebooks so that chain K will right be swapped with Snm1p in the resulting reports. 




-----

-----

Enjoy.