# Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data

This notebook builds on some of the basics covered in [Using PDBsum data to highlight changes in protein-protein interactions](Using%20PDBsum%20data%20to%20highlight%20changes%20in%20protein-protein%20interactions.ipynb) in order to compare many combinations of protein-protein interactions of pairs of proteins in different structures.

----

The previous notebook, [Using PDBsum data to highlight changes in protein-protein interactions](Using%20PDBsum%20data%20to%20highlight%20changes%20in%20protein-protein%20interactions.ipynb), stepped through making reports about the interactions between two chains as they occur in two different, related structures.  
Is there a way to scale this up to make reports for many chains in several pairs of related structures? This may be especially helpful for the cases where the structures involved are large complexes and several pairs are of interest, such as all those pairs contributing interactions in a region of interest for a researcher.

This notebook spells out a way to do this with minimal effort. In fact, you only need knowledge of the PDB code identifiers of the structures you are interested in and the chain designations in each structure. You'll fill out a table to define the structures and chains and kick off the process and make Jupyter notebooks containing the reports for pair of protein-protein interactions.

Additionally, in an advanced notebook in this series, [Automagically making a table of all protein-protein interactions for two structures](Automagically%20making%20a%20table%20of%20all%20protein-protein%20interactions%20for%20two%20structures.ipynb), I show how for cases where the proteins designations are the same in the two structures, you can automate construction of such a table with  **all the interactions with only the PDB codes for each structure needed** and pass that table through the snakemake pipeline similar to this notebook where just a sampling of interactions for combinations of two structures are demonstrated.

-----

**Step #1:** Make a table with columns separated by spaces and each line as a row. Each row will specify two structures and two chains from each of those structures, for a total of six items per line. The order on each line is important. You'll want to specify structure #1 followed by the two chain designations to examine interactions between, and then on the rest of the line, put the other PDB code for structure number #2 followed by the chain designations that correspond to the same order as the prior half of the line. The following illustrates the content of such a table. Note that it is actually directing making reports for combinations of three related structures for how three sets of chains interact. This is just a sampling of the many chains in these three structures.

```text
6kiv N R 6kiz N R
6kiv N R 6kix N R
6kiz N R 6kix N R
6kiv K N 6kiz K N
6kiv K N 6kix K N
6kiv R K 6kiz R K
6kiv R K 6kix R K
```

You can open a text file in Jupyter and directly edit the file to make your table. For the sake of the demonstration, this will be done using code within this notebook found in the cell below.

If it helps you can think about the columns here for each line as the following, using the nomeclature from the first few code cells of previous notebook, [Using PDBsum data to highlight changes in protein-protein interactions](Using%20PDBsum%20data%20to%20highlight%20changes%20in%20protein-protein%20interactions.ipynb).

```text
structure1 structure1_chain1 structure1_chain2 structure2 structure2_chain1 structure2_chain2
```

I have now made a notebook that will actually generate this table in the case where both structures use the same chain designations (or essentially the same), see the notebook [Automagically making a table of all protein-protein interactions for two structures](Automagically%20making%20a%20table%20of%20all%20protein-protein%20interactions%20for%20two%20structures.ipynb) if that is the case for the structures that interest you.

**Step #2:** Save the table with the following name, `int_matrix.txt`. It has to have that name for the table to be recognized and processed to make the Jupyter notbeook files with the reports.

The following will do that here using this notebook; however, you can, and will want to, skip running this if already made your own table. If you run it, it will replace your file though. Alternatively, you can edit the code below to make a table with the contents that interest you.

In [1]:
s='''6kiv N R 6kiz N R
6kiv N R 6kix N R
6kiz N R 6kix N R
6kiv K N 6kiz K N
6kiv K N 6kix K N
6kiv R K 6kiz R K
6kiv R K 6kix R K
'''
%store s >int_matrix.txt

Writing 's' (str) to file 'int_matrix.txt'.


**Step #3:** Run snakemake and it will process the `int_matrix.txt` file to extract the information and make individual notebooks corresponding to analysis of the interactions for each line. This will be very similar to running the previous notebooks in this series with the items spelled out on each line.  
The file snakemake uses by default, named `Snakefile`, is already here and that is what will run when the next command is executed.  
It will take about a minute or less to complete if you are running the demonstration.

In [2]:
!snakemake --cores 1

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 23817  100 23817    0     0   116k      0 --:--:-- --:--:-- --:--:--  116k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 27684  100 27684    0     0   143k      0 --:--:-- --:--:-- --:--:--  143k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 24679  100 24679    0     0   122k      0 --:--:-- --:--:-- --:--:--  122k
[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1 (use --cores to define parallelism)[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob stats:
job                                             count    min threads    max threads
---

(For those knowledgeable with snakemake, I will say that I set the number of cores as one because I was finding with eight that occasionally a race condition would ensue where some of the auxillary scripts fetched in the course of running the report-generating notebooks would overwrite each other as they was being accessed by another notebook causing failures. Using one core avoids that hazard. I will add though that in most cases if you use multiple cores, you can easily get the additional files and a new archive made by running snakemake with your chosen number of cores again.  I never saw a race hazard with my clean rule, and so if you want to quickly start over you can run `!snakemake --cores 8 clean`.)

**Step #4:** Verify the Jupyter notebooks with the reports were generated.  
If you ran the demo ones, you can click [here](interactions_report_for_6kiv_K_N_6kiz_K_N.ipynb) to open one of them.  For the others...  
You can go to the dashboard and see the ouput of running snakemake. To do that click on the Jupyter logo in the upper left top of this notebook and on that page you'll look in  the notebooks directory and you should see files that begin with `interactions_report_` and end with `.ipynb`. You can examine some of them to insure all is as expected.

If things seem to be working and you haven't run your data yet, run `!snakemake --cores 8 clean` in a cell to reset things, and then edit & save `int_matrix.txt` to have your information, and then run the `!snakemake --cores 1` step above, again.

**Step #5:** If this was anything other than the demonstration run, download the archive containing all the Jupyter notebooks bundled together.  
For ease in downloading, all the created notebooks have been saved as a compressed archive so that you only need to retrieve and keep track of one file. The file you are looking for begins with `interactions_report_nbs` in front of a date/time stamp and ends with `.tar.gz`. The snakemake run will actually highlight this archive towards the very bottom of the run, following the words 'Be sure to download'.  
**Download that file from this remote, temporary session to your local computer.** You should see this archive file ending in `.tar.gz` on the dashboard. Toggle next to it to select it and then select `Download` to bring it from the remote Jupyterhub session to your computer. If you don't retieve that file and the session ends, you'll need to re-run to get the results again.

You should be able to unpack that archive using your favorite software to extract compressed files. If that is proving difficult, you can always reopen a session like you did to run this series of notebooks and upload the archive and then run the following command in a Jupyter notebook cellk to unpack it:

```bash
!tar xzf interactions_report_nbs*
```

(If you are running that command on the command line, leave off the exclamation mark.)
You can then examine the files in the session or download the individual Jupyter notebooks similar to the advice on how to download the archive given above.

In the next notebook in this series, [Making the multiple reports generated via snakemake clearer by adding protein names](Making%20the%20multiple%20reports%20generated%20via%20snakemake%20clearer%20by%20adding%20protein%20names.ipynb), I work through how to make the reports more human readable by swapping the chain designations with the actual names of the proteins. This is similar to making the report more human readable that was discussed at the bottom of the previous notebook, [Using PDBsum data to highlight changes in protein-protein interactions](Using%20PDBsum%20data%20to%20highlight%20changes%20in%20protein-protein%20interactions.ipynb); however, it will be done to all the notebooks at once based on the file name beginning with `interactions_report_for_` and ending with `.ipynb`.

-----

Please continue on with the next notebook in this series, [Making the multiple reports generated via snakemake clearer by adding protein names](Making%20the%20multiple%20reports%20generated%20via%20snakemake%20clearer%20by%20adding%20protein%20names.ipynb).

Or if you are interested in using PDBsum's interface statistics tables with Python or easily comparing those statistics for two structuress, see [Interface statistics basics & comparing Interface statistics for two structures](Interface%20statistics%20basics%20and%20comparing%20Interface%20statistics%20for%20two%20structures.ipynb).

-----