# Interactome3D
Provides interactomes of 8 modeled organisms: *Arabidopsis thaliana, Caenorhabditis elegans, Escherichia coli, Drosophila melanogaster, Helicobacter pylori, Homo sapiens, Mus musculus and Saccharomyces cerevisiae*

* Update frequency : Twice a year (every 6 months)
* Current version 2020_01
* [Documentation](https://interactome3d.irbbarcelona.org/help.php)


## Downloadable file format: 
<img src="files.png">

## Modeling Strategy

<img src = "modeling_strategy.png">

## Modeling strategy

The modeling pipeline can handle two types of input data: **a set of interactions provided by the user, or a list of organisms** for the modeling of either their entire interactomes or functional sub-parts of the interactomes.

The first pipeline step collects structures for each individual protein in the set. The available experimental structures in the PDB is first identified and the structural coverage of the protein space is increased by using high-quality homology models. All the individual proteins are then classified into three categories: complete experimental structures (covering >80% of the length of the protein with 100% sequence identity), complete homology models (>80% coverage) and partial experimental structures or models (the rest). For the last category, the fragments are grouped together to cover the greatest possible length of the protein.

Next experimentally determined structures of each interaction is identified, when these are not available, suitable templates is searched to model them. All pairs of contacting proteins sharing over 30% sequence identity with the protein pairs is considered to be modeled (target interaction) and apply a battery of filtering criteria. Those potential templates are then sorted with a scoring function that considers the completeness of the template and the sequence identity to the target. To improve structural coverage, partial templates involving structural domains in the two partners is also included on the basis of domain-domain interaction preferences observed in the PDB. The final models are built with Modeller, and the resulting structures at atomic resolution are checked for the presence of structural knots. Finally all the structures and models are ranked on the basis of their completeness and quality.

## Homo sapiens: Representative Set 
[2020_01 release](https://interactome3d.irbbarcelona.org/downloadset.php?queryid=human&release=current&path=representative)

In [1]:
import pandas as pd
df = pd.read_csv('data/proteins.dat', sep = '\t')
df

Unnamed: 0,UNIPROT_AC,RANK_MAJOR,RANK_MINOR,TYPE,PDB_ID,CHAIN,SEQ_IDENT,COVERAGE,SEQ_BEGIN,SEQ_END,GA431,MPQS,ZDOPE,FILENAME
0,A0A024R3Z2,1,0,Structure,1a5r,A,100.0,100.0,1,101,-1.00,-1.000,-1.00,A0A024R3Z2-EXP-1a5r_A.pdb
1,A0A024RAD5,1,0,Structure,6s7o,G,100.0,90.1,42,452,-1.00,-1.000,-1.00,A0A024RAD5-EXP-6s7o_G.pdb
2,A0A024RAV5,1,0,Structure,2msd,B,100.0,98.4,1,185,-1.00,-1.000,-1.00,A0A024RAV5-EXP-2msd_B.pdb
3,A0A024RBG1,1,0,Model,1mk1,A,18.0,91.7,5,170,0.98,1.135,-0.17,A0A024RBG1-MDL-A0A024RBG1.17-1mk1_A.pdb
4,A0A075B5G3,1,0,Structure,4ov6,F,100.0,100.0,1,99,-1.00,-1.000,-1.00,A0A075B5G3-EXP-4ov6_F.pdb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16267,U5YJM1,1,0,Structure,2bnq,A,100.0,81.9,25,300,-1.00,-1.000,-1.00,U5YJM1-EXP-2bnq_A.pdb
16268,V9H1G0,1,1,Structure,3u5o,P,100.0,74.1,2,21,-1.00,-1.000,-1.00,V9H1G0-EXP-3u5o_P.pdb
16269,V9HW68,1,0,Model,1hzh,H,85.0,96.0,20,470,1.00,0.000,0.00,V9HW68-MDL-V9HW68.6-1hzh_H.pdb
16270,V9JN72,1,0,Structure,5xtd,k,100.0,99.0,1,97,-1.00,-1.000,-1.00,V9JN72-EXP-5xtd_k.pdb


## Understanding Protein Filename: 
    U5YJM1-EXP-2bnq_A.pdb = Uniprot_ac-Experimental_str-PDBID-Chain
    V9HW68-MDL-V9HW68.6-1hzh_H.pdb = Uniprot_ac-Model-Uniprot_ac.Modelno.-PDBID-Chain

Note: Structures from PDB are also provided with quality scores used for evaluating models (from Modeller)

### Ranking of structures/models:
**Structures with lower rank major index are, in general, of a higher quality.** The criteria used to rank structures and models for single proteins are the following:

1. **Complete experimental structures** are ranked first in the list and in order of **decreasing coverage**. A structure is considered to be complete if it covers more than 80% of the protein sequence length. **A complete experimental structure always has the rank minor index equal to 0.**
2. **Complete homology models** are ranked just after complete experimental structures in **decreasing order of sequence identity** of the template to the target. **A complete homology model always has the rank minor index equal to 0.**
3. **Partial experimental structures and homology models** are ranked after complete homology models. In this case Interactome3D tries to group together structures that cover, together, the largest portion of the protein sequence length. The different groups are generated by a greedy algorithm that takes into consideration the structures sorted by decreasing values of the following scoring function:

<img src="Scoring_function.png">

which takes into account the coverage of the structure and the sequence identity of the template to the target protein. The parameter alpha is set to 0.95 from empirical observations. 

**Every group of structures/models is assigned the same rank major number and different rank minor numbers in order of starting residue of the structure.
When creating the representative set only the structures/models with rank major index equal to 1 are retained.** This means that also in the representative set we can have more than one structure for the same protein. This happens when a protein only has structures/models with a coverage lower than 80%. In this case, the program selects a group of (potentially) overlapping structures/models that together span the largest portion of the protein sequence.

In [2]:
#Looking at the models
df.loc[df["TYPE"]== "Model"]

Unnamed: 0,UNIPROT_AC,RANK_MAJOR,RANK_MINOR,TYPE,PDB_ID,CHAIN,SEQ_IDENT,COVERAGE,SEQ_BEGIN,SEQ_END,GA431,MPQS,ZDOPE,FILENAME
3,A0A024RBG1,1,0,Model,1mk1,A,18.0,91.7,5,170,0.98,1.135,-0.17,A0A024RBG1-MDL-A0A024RBG1.17-1mk1_A.pdb
5,A0A075B6H7,1,0,Model,5dk3,A,80.0,82.8,21,116,1.00,1.711,-1.09,A0A075B6H7-MDL-A0A075B6H7.2-5dk3_A.pdb
6,A0A075B6H8,1,0,Model,1vge,L,73.0,81.2,23,117,1.00,1.642,-0.72,A0A075B6H8-MDL-A0A075B6H8.1-1vge_L.pdb
7,A0A075B6H9,1,0,Model,2otu,A,71.0,82.3,21,118,1.00,1.624,-1.01,A0A075B6H9-MDL-A0A075B6H9.1-2otu_A.pdb
8,A0A075B6I0,1,0,Model,1qok,A,38.0,99.2,2,122,1.00,1.326,0.05,A0A075B6I0-MDL-A0A075B6I0.1-1qok_A.pdb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16257,Q9Y6Z7,1,1,Model,4yli,A,52.0,55.2,123,275,1.00,1.234,-0.76,Q9Y6Z7-MDL-Q9Y6Z7.10-4yli_A.pdb
16258,Q9YNA8,1,1,Model,1bax,,35.0,14.0,1,93,1.00,0.000,0.00,Q9YNA8-MDL-Q9YNA8.8-1bax_.pdb
16265,U3KPV4,1,0,Model,1lzj,A,46.0,85.0,44,332,1.00,1.180,-0.11,U3KPV4-MDL-U3KPV4.1-1lzj_A.pdb
16269,V9HW68,1,0,Model,1hzh,H,85.0,96.0,20,470,1.00,0.000,0.00,V9HW68-MDL-V9HW68.6-1hzh_H.pdb


In [3]:
interactions = pd.read_csv('data/interactions.dat', sep = '\t')
interactions


Unnamed: 0,PROT1,PROT2,RANK_MAJOR,RANK_MINOR,TYPE,PDB_ID,BIO_UNIT,CHAIN1,MODEL1,SEQ_IDENT1,...,SEQ_END1,DOMAIN1,CHAIN2,MODEL2,SEQ_IDENT2,COVERAGE2,SEQ_BEGIN2,SEQ_END2,DOMAIN2,FILENAME
0,A0A024RAD5,P04843,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,E,0,98.9,93.2,29,594,-,A0A024RAD5-P04843-EXP-6s7o.pdb1-G-0-E-0.pdb
1,A0A024RAD5,P04844,1,0,Structure,6s7t,1,G,0,100.0,...,452,-,F,0,95.4,41.5,369,630,-,A0A024RAD5-P04844-EXP-6s7t.pdb1-G-0-F-0.pdb
2,A0A024RAD5,P46977,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,A,0,93.8,99.1,7,705,-,A0A024RAD5-P46977-EXP-6s7o.pdb1-G-0-A-0.pdb
3,A0A024RAD5,P61803,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,D,0,100.0,97.3,4,113,-,A0A024RAD5-P61803-EXP-6s7o.pdb1-G-0-D-0.pdb
4,A0A024RAD5,Q8TCJ2,1,0,Structure,6s7t,1,G,0,100.0,...,452,-,A,0,93.8,91.2,63,815,-,A0A024RAD5-Q8TCJ2-EXP-6s7t.pdb1-G-0-A-0.pdb
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15021,Q9Y6U3,Q9Y6U3,1,0,Structure,5a1m,1,A,0,100.0,...,350,-,A,1,100.0,14.5,247,350,-,Q9Y6U3-Q9Y6U3-EXP-5a1m.pdb1-A-0-A-1.pdb
15022,Q9Y6X8,Q9Y6X8,1,0,Structure,3nau,1,B,0,100.0,...,501,-,A,0,100.0,6.7,446,501,-,Q9Y6X8-Q9Y6X8-EXP-3nau.pdb1-B-0-A-0.pdb
15023,Q9Y6X9,Q9Y6X9,1,0,Structure,5of9,1,B,0,97.6,...,551,-,A,0,98.0,53.0,5,551,-,Q9Y6X9-Q9Y6X9-EXP-5of9.pdb1-B-0-A-0.pdb
15024,Q9Y6Y0,Q9Y6Y0,1,0,Structure,6n34,1,B,0,95.2,...,128,-,A,0,96.0,19.3,6,129,-,Q9Y6Y0-Q9Y6Y0-EXP-6n34.pdb1-B-0-A-0.pdb


## Understanding Interaction Filename: 
 
A0A024RAD5-P04843-EXP-6s7o.pdb1-G-0-E-0.pdb = Protein1-Protein2-Experimental-PDBID-Chain1-Model1-Chain2-Model2

In [7]:
# Checking for entries of a seq A0A024RBG1 from proteins.dat in interactions.dat
interactions_A0A024RBG1 = interactions.loc[interactions['PROT1']=='A0A024RAD5']
interactions_A0A024RBG1

Unnamed: 0,PROT1,PROT2,RANK_MAJOR,RANK_MINOR,TYPE,PDB_ID,BIO_UNIT,CHAIN1,MODEL1,SEQ_IDENT1,...,SEQ_END1,DOMAIN1,CHAIN2,MODEL2,SEQ_IDENT2,COVERAGE2,SEQ_BEGIN2,SEQ_END2,DOMAIN2,FILENAME
0,A0A024RAD5,P04843,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,E,0,98.9,93.2,29,594,-,A0A024RAD5-P04843-EXP-6s7o.pdb1-G-0-E-0.pdb
1,A0A024RAD5,P04844,1,0,Structure,6s7t,1,G,0,100.0,...,452,-,F,0,95.4,41.5,369,630,-,A0A024RAD5-P04844-EXP-6s7t.pdb1-G-0-F-0.pdb
2,A0A024RAD5,P46977,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,A,0,93.8,99.1,7,705,-,A0A024RAD5-P46977-EXP-6s7o.pdb1-G-0-A-0.pdb
3,A0A024RAD5,P61803,1,0,Structure,6s7o,1,G,0,100.0,...,452,-,D,0,100.0,97.3,4,113,-,A0A024RAD5-P61803-EXP-6s7o.pdb1-G-0-D-0.pdb
4,A0A024RAD5,Q8TCJ2,1,0,Structure,6s7t,1,G,0,100.0,...,452,-,A,0,93.8,91.2,63,815,-,A0A024RAD5-Q8TCJ2-EXP-6s7t.pdb1-G-0-A-0.pdb


In [9]:
# Check if interaction table has entries for Sequence from proteins table
interactions_A0A024R3Z2 = interactions.loc[interactions['PROT2']=='A0A024R3Z2']
interactions_A0A024R3Z2

Unnamed: 0,PROT1,PROT2,RANK_MAJOR,RANK_MINOR,TYPE,PDB_ID,BIO_UNIT,CHAIN1,MODEL1,SEQ_IDENT1,...,SEQ_END1,DOMAIN1,CHAIN2,MODEL2,SEQ_IDENT2,COVERAGE2,SEQ_BEGIN2,SEQ_END2,DOMAIN2,FILENAME


## Accessing data through RESTful API

Returns an XML file

In [5]:
import requests
from bs4 import BeautifulSoup
import json
from Bio.PDB import *

url_protein = "https://interactome3d.irbbarcelona.org/api/getProteinStructures?uniprot_ac=A0A5B9"
url_interaction = "https://interactome3d.irbbarcelona.org/api/getInteractionStructures?queryProt1=A0A5B9&queryProt2=P01848"

In [6]:
response = requests.get(url_protein)
print(type(response))

<class 'requests.models.Response'>


In [11]:
# Using BeautifuSoup object to parse content of the xml file
response_text = response.text
soup = BeautifulSoup(response_text, 'html.parser')
print(soup.prettify())

<?xml version="1.0"?>
<interactome3d version="2020_01">
 <protein_structure_list>
  <protein_structure>
   <uniprot_ac>
    A0A5B9
   </uniprot_ac>
   <rank_major>
    72
   </rank_major>
   <rank_minor>
    1
   </rank_minor>
   <stype>
    Structure
   </stype>
   <pdb_id>
    6mja
   </pdb_id>
   <chain_id>
    D
   </chain_id>
   <seq_id>
    94.5
   </seq_id>
   <coverage>
    71.9
   </coverage>
   <prot_start>
    1
   </prot_start>
   <prot_end>
    128
   </prot_end>
   <ga341>
    -1
   </ga341>
   <mpqs>
    -1
   </mpqs>
   <zdope>
    -1
   </zdope>
   <filename>
    A0A5B9-EXP-6mja_D.pdb
   </filename>
  </protein_structure>
  <protein_structure>
   <uniprot_ac>
    A0A5B9
   </uniprot_ac>
   <rank_major>
    34
   </rank_major>
   <rank_minor>
    1
   </rank_minor>
   <stype>
    Structure
   </stype>
   <pdb_id>
    5eu6
   </pdb_id>
   <chain_id>
    E
   </chain_id>
   <seq_id>
    96.9
   </seq_id>
   <coverage>
    72.5
   </coverage>
   <prot_start>
    1
   </pro

In [9]:
parser = PDBParser()
# structure = parser.get_structure("PHA-L", "1FAT.pdb")