## Making the multiple reports generated via snakemake clearer by adding protein names

This notebook builds on the previous notebook, [Using snakemake to highlight changes in multiple protein-protein interactions via PDBsum data](notebooks/Using%20snakemake%20to%20highlight%20changes%20in%20multiple%20protein-protein%20interactions%20via%20PDBsum%20data.ipynb);however, you don't need to have just run that Jupyter notebook as the option to get a previous set of resulting notebooks and use them. 


----

If you just ran the previous notebook in this session, you don't need to run this next cell; however, it has been set up to cause no issues if you did and run it again.  
If you don't yet have notebooks resulting from the previous notebook, it will get a set of demonstration results previously generated so that the rest of the cells in this notebook will work.

In [1]:
# Check if there seems to already be result notebooks. If there
# doesn't seem to be get one and unpack it.
prefix_for_report_nbs = "interactions_report_for_"
import os
import sys
import fnmatch
interactions_report_nb_pattern = f"{prefix_for_report_nbs}*.ipynb"
interactions_report_nbs = []
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, interactions_report_nb_pattern):
        interactions_report_nbs.append(file)
if not interactions_report_nbs:
    !curl -OL https://gist.githubusercontent.com/fomightez/a335d9aa051c92ab289bd9bda34c577c/raw/d8bbc3cb34bbbb765c252a2f94f8dee5787b65a1/interactions_report_nbsJan2220210159.tar.gz
    !tar xzf interactions_report_nbsJan2220210159.tar.gz
    # verify it worked and previde feedback
    for file in os.listdir('.'):
        if fnmatch.fnmatch(file, interactions_report_nb_pattern):
            interactions_report_nbs.append(file)
    if interactions_report_nbs:
        sys.stderr.write("A set of notebooks with reports like those the "
            "previous notebook would make have been retrieved.\nYou should be "
            "able to now run this notebok.")
    else:
        sys.stderr.write("No notebooks are present. THIS ISN'T GOING TO WORK!")
else:
    sys.stderr.write("A set of notebooks with reports are present and\n"
        "executing the cells in this notebook should work now.")

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5489  100  5489    0     0   9943      0 --:--:-- --:--:-- --:--:--  9943


A set of notebooks with reports like those the previous notebook would make have been retrieved.
You should be able to now run this notebok.

To define the replacements to make to make the notebooks more readable, the following cell will relate the text to change to new values in a set of key-value pairings. If you ran the demonstration and are continuing to use that, you can just run the cell. If you trying to make this notebook edit your own reports, you'll need to change the text on the left of the colon to match the chain designations and the text on the right to match the protein names. (The case of the letters in the text on the left side will be ignored.)

In [2]:
chain2name_pairs = {
    "Chain R":"WDR5",
    "Chain N":"RBBP5",
    "Chain K":"MLL1",
}

Then run the next cell to go through every notebook and swap the text on the left side of the colon to be the text on the right side.

In [3]:
# Make a list of the report-containing notebooks and then make the text swaps
prefix_for_report_nbs = "interactions_report_for_"
import os
import sys
import fnmatch
import re
interactions_report_nb_pattern = f"{prefix_for_report_nbs}*.ipynb"
interactions_report_nbs = []
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, interactions_report_nb_pattern):
        interactions_report_nbs.append(file)
def make_swaps(file_name,key_value_pairs):
    '''
    Takes a file name and edits every occurence of each key in all of them,
    replacing that text with the corresponding value.
    Saves the fixed file. Nothing is returned from this function.
    '''
    output_file_name = "temp.txt"
    with open(file_name, 'r') as thefile:
        nb_text=thefile.read()
    for k,v in key_value_pairs.items():
        #nb_text=nb_text.replace(k.lower(),v) # if wasn't case insensitive for key
        # case-insensitive string replacement from https://stackoverflow.com/a/919067/8508004
        insensitive = re.compile(re.escape(k), re.IGNORECASE)
        nb_text = insensitive.sub(v, nb_text)
    with open(output_file_name, 'w') as output_file:
        output_file.write(nb_text)
    # replace the original file with edited
    !mv {output_file_name} {file_name}
    # Feedback
    sys.stderr.write("Chain designations swapped for names in {}.\n".format(file_name))

for nbn in interactions_report_nbs:
    make_swaps(nbn,chain2name_pairs)

Chain designations swapped for names in interactions_report_for_6kiz_N_R_6kix_N_R.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_R_K_6kix_R_K.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_N_R_6kiz_N_R.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_R_K_6kiz_R_K.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_K_N_6kix_K_N.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_K_N_6kiz_K_N.ipynb.
Chain designations swapped for names in interactions_report_for_6kiv_N_R_6kix_N_R.ipynb.


Check the notebook files by examining them. The occurences of 'chain' followed by the chain designations should have been swapped for the proteins names.

-----

Now you can go back to the previous notebook and run through the set of interactions that interest you in some pairs of structures and then come and edit this notebook to make your reports have the name of the protein.