# GSD Highlight changes in the protein-protein interactions of ys RNase MRP vs. RNase P via PDBsum data

This is an effort to adapt the generic notebook I made to look at protein-protein interactions in pairs of related structures to look at the combinations of protein-protein interactions for yeast RNase MRP vs. RNase P.

----


**Step #1:** Make a table with a matrix of the protein-protein combinations in the pairs of cryo-EM structures of RNase MRP and RNase P.

I used the PDBsum pages for protein-protein interactions make this table. I could have computationally generated the combinations; however, that way several will not be actually relevant and so instead of sorting out which ones actually return an 'empty' list of interactions for both structures from PDBsum (which shouldn't be too hard and could be added so down the road one could just supply two PDB code ids and let all the reports get generated, plus note that it has to be emptry for both structures because if one has none and the other has some like Pop1p[chain B] and Pop5p[chain E] that interact in Rnase P but not in RNase MRP, this is definitely important differences to catch; just to not the flip side of that is that in Rnase MRP Pop1p gains interactions with Rmp1[chain L] and Pop4p[Chain D] not seen in RNase P), I just decided to construct it myself so nothing is missed due to an error in handling the steps for doing it that way. Plus certain combinations were originally the impetus for this effort and so those were actuall added first and then I expanded out to check the other interactions.

In [1]:
s='''7c7a F G 6ah3 F G
7c7a F B 6ah3 F B
7c7a G B 6ah3 G B
7c7a E B 6ah3 E B
7c7a I B 6ah3 I B
7c7a F I 6ah3 F I
7c7a G E 6ah3 G E
7c7a I E 6ah3 I E
'''
%store s >int_matrix.txt

Writing 's' (str) to file 'int_matrix.txt'.


**Step #2:** Move the Snakefile to process the table of interactions to this directory.

In [4]:
!cp ../Snakefile .

**Step #3:** Run snakemake and it will process the `int_matrix.txt` file to extract the information and make individual notebooks corresponding to analysis of the interactions for each line.  

In [6]:
!snakemake --cores 1

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1 (use --cores to define parallelism)[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	all
	2	convert_scripts_to_nb_and_run_using_jupytext
	1	make_archive
	4[0m
[32m[0m
[32m[Fri Jan 22 16:47:47 2021][0m
[32mrule convert_scripts_to_nb_and_run_using_jupytext:
    input: interactions_report_for_7c7a_E_B_6ah3_E_B.py
    output: interactions_report_for_7c7a_E_B_6ah3_E_B.ipynb
    jobid: 6
    wildcards: details=7c7a_E_B_6ah3_E_B[0m
[32m[0m
[jupytext] Reading interactions_report_for_7c7a_E_B_6ah3_E_B.py in format py
[jupytext] Executing notebook with kernel python3
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/bin/jupytext", line 8, in <module>
    sys.exit(jupytext())
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupytext/cli.py", line 402, in jupytext
    exit_code += jupytext_single_file(nb_file, args, log)
  File 

Note that at present, I am seeing an issue with this step for this rule:
    
```python
rule convert_scripts_to_nb_and_run_using_jupytext:
    input: interactions_report_for_7c7a_E_B_6ah3_E_B.py
    output: interactions_report_for_7c7a_E_B_6ah3_E_B.ipynb
    jobid: 6
    wildcards: details=7c7a_E_B_6ah3_E_B
...
nbconvert.preprocessors.execute.CellExecutionError: An error occurred while executing the following cell:
------------------
%run -i similarities_in_proteinprotein_interactions.py
```

I suspect this is 7c7a doesn't have an iteractions between chains E and B (Pop1p and Pop5p) and my script `similarities_in_proteinprotein_interactions.py` isn't set up to handle that gracefully yet. (Most likely than `differences_in_proteinprotein_interactions.py` has the same issue.)

For those knowlegeable with snakemake, I will say that I set the number of cores as one because I was finding with eight that occasionally a race condition would ensue where some of the auxillary scripts by notebooks would overwrite each other as they was being accessed by another notebook causing failures. Using one core avoids that hazard. I will add though that in most cases if you use multiple cores, you can easily get the additional files and a new archive made by running snakemake with your chosen number of cores again.

I never saw a race hazard with my clean rule, and so if you want to quickly start over you can run `!snakemake --cores 8 clean`.

**Step #3:** Verify the Jupyter notebooks with the reports were generated.  
You can go to the dashboard and see the ouput of running snakemake. To do that click on the Jupyter logo in the upper left top of this notebook and on that page you'll look in  the notebooks directory and you should see files that begin with `interactions_report_` and end with `.ipynb`. You can examine some of them to insure all is as expected.

If things seem to be working and you haven't run your data yet, run `!snakemake --cores 8 clean` in a cell to reset things, and then edit & save `int_matrix.txt` to have your information, and then run the `!snakemake --cores 1` step above, again.

**Step #4:** If this was anyting other than the demonstration run, download the archive containing all the Jupyter notebooks bundled together.  
For ease in downloading, all the created notebooks have been saved as a compressed archive so that you only need to retieve and keep track of one file. The file you are looking for begins with `interactions_report_nbs` in front of a date/time stamp and ends with `.tar.gz`. The snakemake run will actually highlight this archive towards the very bottom of the run, following the words 'Be sure to download'.  
**Download that file from this remote, temporary session to your local computer.** You should see this archive file ending in `.tar.gz` on the dashboard. Toggle next to it to select it and then select `Download` to bring it from the remote Jupyterhub session to your computer. If you don't retieve that file and the session ends, you'll need to re-run to get the results again.

You should be able to unpack that archive using your favorite software to extract compressed files. If that is proving difficult, you can always reopen a session like you did to run this series of notebooks and upload the archive and then run the following command in a Jupyter notebook cellk to unpack it:

```bash
!tar xzf interactions_report_nbs*
```

(If you are running that command on the command line, leave off the exclamation book.)
You can then examine the files in the session or download the individual Jupyter notebooks similar to the advice on how to download the archive given above.

In the next notebook in this series, [Making the multiple reports generated via snakemake clearer by adding protein names](Making%20the%20multiple%20reports%20generated%20via%20snakemake%20clearer%20by%20adding%20protein%20names.ipynb), I work through how to make the reports more human readable by swapping the chain designations with the actual names of the proteins. This is similar to making the report more human readable that was discussed at the bottom of the previous notebook, [Using PDBsum data to highlight changes in protein-protein interactions](Using%20PDBsum%20data%20to%20highlight%20changes%20in%20protein-protein%20interactions.ipynb); however, it will be done to all the notebooks at once based on the file name beginning with `interactions_report_for_` and ending with `.ipynb`.

-----

Please continue on with the next notebook in this series, [Making the multiple reports generated via snakemake clearer by adding protein names](Making%20the%20multiple%20reports%20generated%20via%20snakemake%20clearer%20by%20adding%20protein%20names.ipynb).

-----

-----

Enjoy.

In [None]:
import time

def executeSomething():
    #code here
    print ('.')
    time.sleep(480) #60 seconds times 8 minutes

while True:
    executeSomething()

.
