### Shared Interaction Network

In this notebook we provide an example of KIN application to the set of MD trajectories and crystal structures. We have previously generated contacts based on each frame of a trunkated MD-trajectory for each prtein structure in the family (simulation_contacts). Here we will convert them into a shared interaction network and obtain an network of interactions that are preserved in the family but are lacking in the protein of choice, missing interaction network. Protein of choice used for both projections is TEM-1 (PDB 1M40). 

In [16]:
import os
from kin.msa_indexing import parse_fasta
from kin.msa_indexing import indexing_pdb_to_msa
from kin.msa_indexing import clean_up_sequence
from kin.msa_indexing import parse_contact_output

 **1. Contact indexing according to the MSA alignment**
 
First we need to convert sequence indexing of each protein to the shared indexing based on the multi-sequence alignment(MSA). Steps for MSA with modeller are described in the comparative_data/msa_scores. We will make a set of contacts from crystal structure and 2 sets of contacts from MD a varying MD cutoff (cutoff 10 and cutoff 50). MD cutoff is a measure of how retained certain interaction within the MD trajectory. For further information on selection of the MD cutoff please see the corresponding paper. 

***Crystal structure contact reindexing***

In [2]:
INPUT_DIRECTORY_CS= "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/crystal_contacts"
OUTPUT_DIRECTORY_CS = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_10"

MSA_SEQ_FILE = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/bettaLac.ali"

sequence_dict = parse_fasta(MSA_SEQ_FILE)
# Crystal structure contacts
if os.path.isdir(INPUT_DIRECTORY_CS):
    for filename in os.listdir(INPUT_DIRECTORY_CS):
        if filename.endswith(".txt"):
            file_path = os.path.join(INPUT_DIRECTORY_CS, filename)
            SYSTEM_NAME = filename.split(".txt")[0]
            output_file_path = os.path.join(
                OUTPUT_DIRECTORY_CS, f"{SYSTEM_NAME}_msa_crystal.csv"
            )
            print("Processing ", SYSTEM_NAME)
            seq, short_seq = clean_up_sequence(sequence_dict, SYSTEM_NAME)
            pdb_df_md = parse_contact_output(
                file_path, contact_type="crystal",
            )
            msa_df_md = indexing_pdb_to_msa(seq, pdb_df_md)
            msa_df_md.to_csv(output_file_path, index=False)

Processing  6WGP_XCC-2
Processing  3W4Q_PenA-1
Processing  5F82_GES-5
Processing  6J25_CTX-M-64
Processing  5NE2_L2-2
Processing  5NJ2_BlaC
Processing  3ZNY_CTX-M-96
Processing  5E43_SROS-1
Processing  7QLP_TEM-171
Processing  3QHY_BcI-2
Processing  4EUZ_SFC-1
Processing  4YFM_MAB-1
Processing  6BN3_CTX-M-151
Processing  3V3R_GES-11
Processing  3ZNW_PER-2
Processing  6W34_BcI-248
Processing  5NPO_TEM-135
Processing  6TD0_KPC-2
Processing  3V3S_GES-18
Processing  6NJ1_CKA-1
Processing  3P98_TEM-72
Processing  4UA6_CTX-M-14
Processing  1BSG_SAL-1
Processing  4EWF_SPH-1
Processing  1BUE_NmcA
Processing  5HW3_BVA-1
Processing  2ZQ7_CTX-M-44
Processing  6WJM_DBA-1
Processing  2CC1_MFO-1
Processing  2WK0_PenP-2
Processing  3W4P_BPS-1d
Processing  6NIQ_RPA-1
Processing  6QWA_KPC-3
Processing  6WGR_PC1-159
Processing  1N9B_SHV-3
Processing  1HTZ_TEM-52
Processing  7A6Z_BlaC-13
Processing  1YLW_CTX-M-16
Processing  5A92_CTX-M-97
Processing  5GHX_PenP-1
Processing  3BYD_OXY-1-1
Processing  6MK6_

***MD-based contact reindexing***

In [None]:
INPUT_DIRECTORY_MD = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/simulation_contacts"
OUTPUT_DIRECTORY_10 = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_10"
OUTPUT_DIRECTORY_50 = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_50"

MSA_SEQ_FILE = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/bettaLac.ali"

sequence_dict = parse_fasta(MSA_SEQ_FILE)
outputs = [OUTPUT_DIRECTORY_10, OUTPUT_DIRECTORY_50]
for output in outputs:
    if not os.path.exists(output):
        os.makedirs(output)
    dirname = output.split("/")[-1]
    cutoff = dirname.split("_")[-1]

    for subdir in os.listdir(INPUT_DIRECTORY_MD):
        subdir_path = os.path.join(INPUT_DIRECTORY_MD, subdir)
        if os.path.isdir(subdir_path):
            for filename in os.listdir(subdir_path):
                if filename.endswith(".csv"):
                    file_path = os.path.join(subdir_path, filename)
                    SYSTEM_NAME = filename.split("_all")[0]
                    output_file_path = os.path.join(
                        output, "", f"{SYSTEM_NAME}_msa_md_{cutoff}.csv"
                    )

                    #Convert residue nomenclature from MSA to the MDAnalysis 
                    #format and remove non-standard residues
                    seq, short_seq = clean_up_sequence(sequence_dict, SYSTEM_NAME)
                    
                    #Parse the md contacts and apply desiered cutoff
                    cutoff_value = float(cutoff) / 100
                    pdb_df_md = parse_contact_output(
                        file_path, contact_type="md", retention_percent=cutoff_value
                    )
                    #Index the contacts according to the MSA
                    msa_df_md = indexing_pdb_to_msa(seq, pdb_df_md)
                    
                    #save as csv
                    msa_df_md.to_csv(output_file_path, index=False)



**2. Form an interaction network for a protein of choice**

Since we want to display the shared interaction network on a structure we need to select a reference structure for the interaction network. It is good to select a protein that, in your opinion, is a good representative of the family or a protein of a particular interest. There are 2 methods implemented to calculate conservation score of a contact: method 1 (**conservation_uniform==True**) disregards whether the ranked contacts are present in the structure of interest and evaluates all contacts based on their abundance in the family; method 2 (**conservation_uniform==False**) does not penalize the contact conservation score if the contact was not able to form due to the absence of the residues in the structure. These methods shown to be quite equivalent in structurally homogeneous families but the second method can be more beneficial for the protein groups with high structural variation. 

Network processing capabilities are analogous for both crystal structure and MD-based data. Here we will focus on the MD based network processing.

In [20]:
from matplotlib import colors
from kin.pymol_projections import project_pymol_res_res_scores
from kin.pymol_projections import project_pymol_per_res_scores
from kin.msa_network import common_network
from kin.msa_network import plot_per_res_score
from kin.msa_network import per_res_score
from kin.msa_network import filter_network
from matplotlib import pyplot as plt

To form a shared interaction network we will use common network function. As this operation contains multiple functionalities we will go through input aqnd output variables. 

Generate a network for the TEM-1 protein

**Input variable are:**

        input_files - path to the directory with the msa indexed contacts
        network_index - format of the output network
        "1M40_TEM-1" - name of the protein of interest (should be consistent with the name in the MSA alignment file)
        conservation_uniform - chooses netween two conservation methods, default is True
        missing_network - if True, the network will be generated for the missing contacts
        no_vdw - if True, the network will be generated without the van der Waals contacts
        (this is the only filtering availabel as a part of network genration bcause number 
        of moderatly conserved vdw interaction can be very large, additional filtering can be applied with filter_network function)

**Output variables are:**

        conservation_tem - dictionary with the conserved interactions and their conservation scores
        colors_int_type - dictionary with the contacts and the pymol-format colors that correspond to the interaction types
        properties - dictionary with the contacts and their properties
        miss_net - dictionary with the preserved interactions that re missing in TEM1 and their conservation scores
        miss_colors - dictionary with the missing interactions and the pymol-format colors that correspond to the interaction types
        miss_prop - dictionary with the missing interactions and their properties

Additionally we can apply a variety of filters to either of the networks.

We can filter the network by the minimum conservation score, interactions type, side chain or main chain contacts  

Interactions can be filtered by any type of the folowwing interaction: "vdw", "hbond", "saltbridge", "hydrophobic","pipi","cationpi".

We can also filter by minimum conservation score, for example if min_score=0.5, saved interactions will be conserved preserved in at least 50% of the structures

In [21]:

input_fiels_10 = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_10"
input_fiels_50 = "/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_50"
#Format for the output interaction network can be specifed as a pdb indexing of the protein of choice of as a msa indexing
network_index = "pdb"
input_files_list = [
    input_fiels_10,
    input_fiels_50,
]
for input_files in input_files_list:
    # Make a shared interaction network   
    (
        conservation_tem,
        colors_int_type,
        properties,
        miss_net,
        miss_colors,
        miss_prop,
    ) = common_network(
        input_files,
        "1M40_TEM-1",
        network_index,
        conservation_uniform=True,
        missing_network=True,
        no_vdw=True,
    )
    retention = input_files.split("/")[-1]
    retention_number = retention.split("_")[-1]

    # Apply filters to get interaction network of hydrophobic interactions 
    # preserved in at least 50% of structures that 
    # occure between not main chain-main chain residues

    new_network, new_colors = filter_network(
        conservation_tem, 
        colors_int_type,
        properties, 
        min_score=0.5, 
        network_index="pdb",
        no_main_chain=True, 
        int_exclude=["hbond", "saltbridge","pipi","cationpi"])
    

    #Output a pymol visualization of the conserved interactions network
    projection_output = (
        f"shared_network/TEM1_hydrophobic_no_mc-mc_{retention_number}.pml"
    )
    #visualize as a res-res interactions
    project_pymol_res_res_scores(
        conservation_tem, projection_output, colors_int_type
        )
    #caluclate and visualize per residue scores
    per_res_score = per_res_score(conservation_tem)
    print(per_res_score) 
    project_pymol_per_res_scores(
        per_res_score, f"shared_network/TEM1_hydrophobic_no_mc-mc_{retention_number}.pml")
    
    #Output a pymol visualization of the missing interactions network
    miss_projection = f"missing_network/tem1_missing_nvdw_{retention_number}.pml"
    properties_filename = (
        f"shared_network/properties_nvdw_{retention_number}.csv"
    )


/Users/dariiayehorova/lk_research/tools-project/contact_analysis/dynamic_contacts_processing/msa_index_contacts/retention_10
1M40_TEM-1
no contact for this residue pair
Function: conservation_nextwork_dict
Elapsed time: 0.011764 seconds


TypeError: filter_network() got an unexpected keyword argument 'min_score'

In [None]:
###3. Comparison between the crystal structure and MD contacts conservation networks