# GSD Examining for ec equivalents of Pop1p contacts to Pop6p and Pop7p

This effort is based on my notebook [Using snakemake with multiple chains or structures to report if residues interacting with a specific chain have equivalent residues in hhsuite-generated alignments](https://nbviewer.jupyter.org/github/fomightez/hhsuite3-binder/blob/main/notebooks/Using%20snakemake%20with%20multiple%20chains%20or%20structures%20to%20report%20if%20residues%20interacting%20with%20a%20specific%20chain%20have%20equivalent%20residues%20in%20hhsuite-generated%20alignments.ipynb) in order to look at ec equivalents of residues in Pop1p that contact Pop6p and Pop6p.

----



**Step #1:** Make a table with columns separated by spaces and each line as a row that specificies structures, chains, and hhr results file(s).

```text
6agb B F results_S288C_POP1.hhr 1 True True
6agb B G results_S288C_POP1.hhr 1 True True
6ah3 B F results_S288C_POP1.hhr 1 True True
6ah3 B G results_S288C_POP1.hhr 1 True True
7c79 B F results_S288C_POP1.hhr 1 True True
7c79 B G results_S288C_POP1.hhr 1 True True
7c7a B F results_S288C_POP1.hhr 1 True True
7c7a B G results_S288C_POP1.hhr 1 True True
6w6v B F results_S288C_POP1.hhr 1 True True
6w6v B G results_S288C_POP1.hhr 1 True True
```



**Step #2:** Save the table with the following name, `equiv_check_matrix.txt`. It has to have that name for the table to be recognized and processed to make the Jupyter notbeook files with the reports.

Running following will generate an `equiv_check_matrix.txt` file here with the indicated content; however, you can, and will want to, skip running this if already made your own table. If you run it, it will replace your file though. Alternatively, you can edit the code below to make a table with the contents that interest you.

In [1]:
s='''6agb B F results_S288C_POP1.hhr 1 True True
6agb B G results_S288C_POP1.hhr 1 True True
6ah3 B F results_S288C_POP1.hhr 1 True True
6ah3 B G results_S288C_POP1.hhr 1 True True
7c79 B F results_S288C_POP1.hhr 1 True True
7c79 B G results_S288C_POP1.hhr 1 True True
7c7a B F results_S288C_POP1.hhr 1 True True
7c7a B G results_S288C_POP1.hhr 1 True True
6w6v B F results_S288C_POP1.hhr 1 True True
6w6v B G results_S288C_POP1.hhr 1 True True
'''
%store s >equiv_check_matrix.txt

Writing 's' (str) to file 'equiv_check_matrix.txt'.


**Step #3:** Get the HH-suite3-generated results files (`*.hhr` files).  

In [2]:
# PUT THE *.hhr FILE, `results_S288C_POP1.hhr`,IN THE DIRECTORY WITH THIS NOTEBOOK

**Step #4:** Copy the Snakemake Snakefile to this directory

In [3]:
import os
file_needed = "equiv_snakefile"
if not os.path.isfile(file_needed):
    !curl -OL https://raw.githubusercontent.com/fomightez/hhsuite3-binder/main/notebooks/equiv_snakefile

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21926  100 21926    0     0  31294      0 --:--:-- --:--:-- --:--:-- 31278


**Step #5:** Run snakemake and point it at the corresponding snake file `equiv_snakefile` and it will process the `equiv_check_matrix.txt` file to extract the information and make individual notebooks corresponding to analysis of the interactions for each line. This will be very similar to running the previous notebooks in this series with the items spelled out on each line.  
The file snakemake uses in this pipeline, named `equiv_snakefile`, is already here. It is related to Python scripts and you can examine the text if you wis.  
It will take about a minute or less to complete if you are running the demonstration.

In [4]:
!snakemake -s equiv_snakefile --cores 1

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1 (use --cores to define parallelism)[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob stats:
job                                             count    min threads    max threads
--------------------------------------------  -------  -------------  -------------
all                                                 1              1              1
convert_scripts_to_nb_and_run_using_jupytext       10              1              1
make_archive                                        1              1              1
read_table_and_create_py                            1              1              1
total                                              13              1              1
[0m
[32m[0m
[32m[Mon Jul 19 21:07:55 2021][0m
[32mrule read_table_and_create_py:
    input: equiv_check_matrix.txt
    output: equivalents_report_for_6agb_B_F_results_S288C_POP1.hhr.py, equivalents_report

(For those knowlegeable with snakemake, I will note that I set the number of cores as one because I was finding with eight that occasionally a race condition would ensue where some of the auxillary scripts fetched in the course of running the report-generating notebooks would overwrite each other as they was being accessed by another notebook causing failures. Using one core avoids that hazard. I will add though that in most cases if you use multiple cores, you can easily get the additional files and a new archive made by running snakemake with your chosen number of cores again.  I never saw a race hazard with my clean rule, and so if you want to quickly start over you can run `!snakemake -s equiv_snakefile --cores 8 clean`.)

#### Make the reports clearer by substituting in the names of the proteins in place of the Chain designations.

In [5]:
chain2name_pairs = {
    "Chain B":"Pop1p",
    "Chain F":"Pop6p",
    "Chain G":"Pop7p",
}

In [6]:
# Make a list of the report-containing notebooks.
prefix_for_report_nbs = "equivalents_report_for_"
import os
import sys
import fnmatch
import re
equivalents_report_nb_pattern = f"{prefix_for_report_nbs}*.ipynb"
equivalents_report_nbs = []
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, equivalents_report_nb_pattern):
        equivalents_report_nbs.append(file)
def make_swaps(file_name,key_value_pairs):
    '''
    Takes a file name and edits every occurence of each key in all of them,
    replacing that text with the corresponding value.
    Saves the fixed file. Nothing is returned from this function.
    '''
    output_file_name = "temp.txt"
    with open(file_name, 'r') as thefile:
        nb_text=thefile.read()
    for k,v in key_value_pairs.items():
        #nb_text=nb_text.replace(k.lower(),v) # if wasn't case insensitive for key
        # case-insensitive string replacement from https://stackoverflow.com/a/919067/8508004
        insensitive = re.compile(re.escape(k), re.IGNORECASE)
        nb_text = insensitive.sub(v, nb_text)
    with open(output_file_name, 'w') as output_file:
        output_file.write(nb_text)
    # replace the original file with edited
    !mv {output_file_name} {file_name}
    # Feedback
    sys.stderr.write("Chain designations swapped for names in {}.\n".format(file_name))

for nbn in equivalents_report_nbs:
    make_swaps(nbn,chain2name_pairs)

Chain designations swapped for names in equivalents_report_for_6agb_B_G_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_6agb_B_F_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_7c7a_B_F_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_6w6v_B_G_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_6ah3_B_F_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_6w6v_B_F_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_7c7a_B_G_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_6ah3_B_G_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_7c79_B_G_results_S288C_POP1.hhr.ipynb.
Chain designations swapped for names in equivalents_report_for_7c79_B_F_results_S2

#### Make a new archive with the substituted files

In [7]:
# delete the archive withOUT the susbstitured protein names
!rm equivalents_report_nbs*.tar.gz
import datetime
now = datetime.datetime.now()
results_archive = f"equivalents_report_nbs{now.strftime('%b%d%Y%H%M')}.tar.gz"
!tar -czf {results_archive}  {" ".join(equivalents_report_nbs)}
sys.stderr.write(f"Be sure to download {results_archive}.")

Be sure to download equivalents_report_nbsJul1920212108.tar.gz.

**Step #4:** Verify the Jupyter notebooks with the reports were generated.  
You can go to the dashboard and see the ouput of running snakemake. To do that click on the Jupyter logo in the upper left top of this notebook and on that page you'll look in  the notebooks directory and you should see files that begin with `equivalents_report_` and end with `.ipynb`. You can examine some of them to insure all is as expected.

If things seem to be working and you haven't run your data yet, run `!snakemake -s equiv_snakefile --cores 8 clean` in a cell to reset things, and then edit & save `equiv_check_matrix.txt` to have your information, and then run the `!snakemake -s equiv_snakefile --cores 1` step above, again.

**Step #5:** If this was anyting other than the demonstration run, download the archive containing all the Jupyter notebooks bundled together.  
For ease in downloading, all the created notebooks have been saved as a compressed archive so that you only need to retieve and keep track of one file. The file you are looking for begins with `equivalents_report_nbs` in front of a date/time stamp and ends with `.tar.gz`. The snakemake run will actually highlight this archive towards the very bottom of the run, following the words 'Be sure to download'.  
**Download that file from this remote, temporary session to your local computer.** You should see this archive file ending in `.tar.gz` on the dashboard. Toggle next to it to select it and then select `Download` to bring it from the remote Jupyterhub session to your computer. If you don't retrieve that file and the session ends, you'll need to re-run to get the results again.

You should be able to unpack that archive using your favorite software to extract compressed files. If that is proving difficult, you can always reopen a session like you did to run this series of notebooks and upload the archive and then run the following command in a Jupyter notebook cellk to unpack it:

```bash
!tar xzf equivalents_report_nbs*
```

(If you are running that command on the command line, leave off the exclamation book.)
You can then examine the files in the session or download the individual Jupyter notebooks similar to the advice on how to download the archive given above.


-----

Enjoy.

-----