## Automagically making a table of all protein-protein interactions for two structures

If two structures use the same or essentially the same, you can use Python to make a table of all the pairs of the protein-protein interactions by the two structures that can be used as input for the pipeline described in an earlier notebook in this series, [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb). This notebook will step through this process.

It is important to note this won't work straight away if the protein chain designations by the same or closely related proteins differ between the two structures. Elements of the process to be used in this notebook could be adapted to do that; however, that would require some progamming knowledge beyond what will be covered here. I assume the number of times this would be needed would be limited and a table could more easily done by hand following along with this notebook as well as [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).  

The process relies on the fact that PDBsum shares under the 'Prot-prot' tab for every structure, the interacting pairs of proteins chains in an 'Interface summary' on the left side of the browser page. For example, look on the left of http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiv&template=interfaces.html&c=999 . That link is what the PDBsum entry for the PDB idenitifer 6kiv leads to if you click on the 'Prot-prot' tab page from [the main PDBsum page for 6kiv](http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=6kiv&template=main.html). A utility script [pdb_code_to_prot_prot_interactions_via_PDBsum.py](https://github.com/fomightez/structurework/tree/master/pdbsum-utilities) is used to collect the designations listed there for each individual structure involved. Then in this notebook a little Python is used to generate the table file that can be used as described in [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).

An example follows. It is meant to be adaptable to use the PDB codes of structures that interest you. You may wish to work through the demonstration first so you know what to expect.

----

The next cell is used to define the structures of interest. The PDB code identifiers are supplied.

In [1]:
structure1 = "6kiz"
structure2 = "6kix"

The next cell gets the script `pdb_code_to_prot_prot_interactions_via_PDBsum.py` (see [here](https://github.com/fomightez/structurework/tree/master/pdbsum-utilities)) that will get the 'Interface Summary' information for each individual structure. This is the equivalent to the Summary on the left side of the 'Prot-prot' tab.

In [2]:
!curl -OL https://raw.githubusercontent.com/fomightez/structurework/master/pdbsum-utilities/pdb_code_to_prot_prot_interactions_via_PDBsum.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10861  100 10861    0     0  15995      0 --:--:-- --:--:-- --:--:-- 15995


Import the main function of that script by running the next cell.

In [3]:
from pdb_code_to_prot_prot_interactions_via_PDBsum import pdb_code_to_prot_prot_interactions_via_PDBsum

The next cell gets the interaction summary for each structure and to get the pairs need to build the table described at the top of [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).

In [4]:
structure1_il = pdb_code_to_prot_prot_interactions_via_PDBsum(structure1)
structure2_il = pdb_code_to_prot_prot_interactions_via_PDBsum(structure1)
i_union = set(structure1_il).union(set(structure2_il))

In this case the pairs of both are the same; however, the script is written to not fail if there was extra proteins present in the other. Specficially, the interacting pairs of proteins for both are checked because if one had additional chain, by getting the listing of both structures and making the union, the combinations for all would be in the list of pairs `i_union`.

Next the union of all the pairs is used to make a table like constructed at the top of [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).

In [5]:
s = ""
for pair in list(i_union):
    s+= f"{structure1} {pair[0]} {pair[1]} {structure2} {pair[0]} {pair[1]}\n"
%store s >int_matrix.txt

Writing 's' (str) to file 'int_matrix.txt'.


The table has now been stored as `int_matrix.txt`. Open the file from the Jupyter dashboard to verify.  
That's what needed to be made. The rest of the process pickes up with 'Step #3' of [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb).

To make that clear, this following cell will run the snakemake pipeline. Consult the subsequent steps of [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb) to see what to do after it completes all the possible pairs.

In [6]:
!snakemake --cores 1

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1 (use --cores to define parallelism)[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	all
	21	convert_scripts_to_nb_and_run_using_jupytext
	1	make_archive
	1	read_table_and_create_py
	24[0m
[32m[0m
[32m[Tue Jan 26 03:12:07 2021][0m
[32mrule read_table_and_create_py:
    input: int_matrix.txt
    output: interactions_report_for_6kiz_C_D_6kix_C_D.py, interactions_report_for_6kiz_N_R_6kix_N_R.py, interactions_report_for_6kiz_A_E_6kix_A_E.py, interactions_report_for_6kiz_A_G_6kix_A_G.py, interactions_report_for_6kiz_C_F_6kix_C_F.py, interactions_report_for_6kiz_D_F_6kix_D_F.py, interactions_report_for_6kiz_B_D_6kix_B_D.py, interactions_report_for_6kiz_A_N_6kix_A_N.py, interactions_report_for_6kiz_A_B_6kix_A_B.py, interactions_report_for_6kiz_N_T_6kix_N_T.py, interactions_report_for_6kiz_B_N_6kix_B_N.py, interactions_report_for_6kiz_F_H_6kix_F_H.py, inter

Now change the structures used to your favorites and re-run the notebook. If the chains are the same in your two structures, you'll generate all the reports for all the interacting pairs of proteins upon doing that.

------

Enjoy!