# Presentation filters

This notebook is presenting how you can use the JSON files to create filters.

First we will have a look at what is inside the JSON files.

In [1]:
import json

structures = json.load(open("./../example/structures.json"))

In [17]:
print(structures["1ASY"].keys())

dict_keys(['Nmodels', 'RNAprot_hb', 'bptype', 'breaks', 'canonized', 'hetnames', 'interface_hetatoms', 'interface_protein', 'intraRNA_hb', 'mapping', 'method', 'missing_atoms', 'nachains', 'protchain', 'resolution', 'sequence', 'ss', 'stacking'])


For the `structures.json` there are:
* Nmodels: the number of models;
* RNAprot_hb: the hydrogen bonds between the NA (chain and res and base/ph/sugar) and the protein;
* bptype: the base pairing type for the residue;
* breaks: the breaks into the NA chains;
* canonized: the residue that are canonized by the pipline;
* hetnames: the hetero atom names;
* interface_hetatoms: the interface between the NA (chain and res and base/ph/sugar) and the hetero atoms;
* interface_protein: the interface between the NA (chain and res and base/ph/sugar) and the protein;
* intraRNA_hb: the intra NA hydrogen bonds, for the chain and residue, between the residue n-1 or n+1 or other residues;
* mapping: the mapping created by the pipeline, new numbers : old numbers;
* method: the experimental method used to obtain the structure;
* missing_atoms: the residue of NA on which there are missing atoms;
* nachains: the NA chains;
* protchain: the protein chains;
* resolution: the resolution of the experimental structure;
* sequence: the NA sequence by chains;
* ss: the secondary structure for the NA by chain, residue;
* stacking: the stacking of the NA, for the residue with the residue n-1, n+1, or other.

In [18]:
fragments = json.load(open("./../example/fragments.json"))

In [26]:
print(fragments['CCC']['1'].keys())

dict_keys(['structure', 'chain', 'model', 'indices', 'resid', 'seq', 'missing_atoms'])


For the `fragments.json`, for each motif, and for each fragment, there are the following information:
* structure: the structure frome which the fragment is coming;
* chain: the NA chain from the structure from which the fragment is coming;
* model: the model, if the structure is an RMN structure for instance;
* indices: the indices of the nucleotides, they are the indices given by the pipline;
* resid: the number of the residues corresponding to the original structure;
* seq: the original sequence of the fragment;
* missing_atoms: the information of the missing atoms from the original residues.

In [28]:
fragments_clust = json.load(open("./../example/fragments_clust.json"))

In [38]:
print(fragments_clust['CCC']['1'].keys())

dict_keys(['chain', 'clust0.2', 'clust0.2_center', 'clust1.0', 'clust1.0_center', 'clust3.0', 'clust3.0_center', 'indices', 'missing_atoms', 'model', 'resid', 'seq', 'structure'])


And last, the `fragments_clust.json`, for each motif, and for each fragment:
* chain: the NA chain from the structure from which the fragment is coming;
* clust0.2: the number of the cluster at 0.2A;
* clust0.2_center: boolean if yes or no the fragment is the center of the cluster at 0.2A obtained by the fastclust method;
* clust1.0: the number of the cluster at 1.0A;
* clust1.0_center: boolean if yes or no the fragment is the center of the cluster at 1.0A obtained by the fastclust method;
* clust3.0: the number of the cluster at 3.0A;
* clust3.0_center: boolean if yes or no the fragment is the center of the cluster at 3.0A obtained by the fastclust method;
* structure: the structure frome which the fragment is coming;
* model: the model, if the structure is an RMN structure for instance;
* indices: the indices of the nucleotides, they are the indices given by the pipline;
* resid: the number of the residues corresponding to the original structure;
* seq: the original sequence of the fragment;
* missing_atoms: the information of the missing atoms from the original residues.

Using those information, we can try to do a filter to select all based paired nucleotides (WC pairing) that are in contact with the protein.

In [52]:
results = []
for structure in structures.keys():
    for chain in structures[structure]["nachains"]:
        chain = "chain_" + chain
        nucl_contact = structures[structure]["interface_protein"]["model_1"][chain]
        nucl_contact = [element.split("_")[1] for element in nucl_contact]
        
        nucl_bp_tmp = structures[structure]["bptype"][chain]
        nucl_bp = []
        for key, value in nucl_bp_tmp.items():
            if value[0] == 'WC':
                nucl_bp.append(key.split("_")[1])
        
        nucl_contact_bp = set(nucl_contact).intersection(set(nucl_bp))
        
        results.append([structure, chain, nucl_contact_bp])
for res in results:
    print("Structure: {}, chain: {}, nucleotides: {}".format(res[0], res[1], res[2]))

Structure: 1A1T, chain: chain_B, nucleotides: set()
Structure: 1A34, chain: chain_B, nucleotides: {'6', '4', '5'}
Structure: 1A34, chain: chain_C, nucleotides: {'7', '6', '8'}
Structure: 1A4T, chain: chain_A, nucleotides: set()
Structure: 1A9N, chain: chain_Q, nucleotides: {'5', '2', '1', '3', '19'}
Structure: 1A9N, chain: chain_R, nucleotides: {'5', '2', '1', '3', '4', '19'}
Structure: 1AQ3, chain: chain_R, nucleotides: {'1', '2', '11'}
Structure: 1AQ3, chain: chain_S, nucleotides: set()
Structure: 1AQ4, chain: chain_R, nucleotides: {'2', '11'}
Structure: 1AQ4, chain: chain_S, nucleotides: {'2', '11'}
Structure: 1ASY, chain: chain_R, nucleotides: {'39', '66', '11', '68', '71', '27', '1', '70', '28', '12', '29', '69'}
Structure: 1ASY, chain: chain_S, nucleotides: {'39', '66', '11', '68', '71', '27', '1', '70', '28', '12', '69', '24'}
Structure: 1AUD, chain: chain_B, nucleotides: set()
Structure: 1B23, chain: chain_R, nucleotides: {'50', '65', '62', '49', '48', '63', '1', '2', '3', '61'