<a href="https://colab.research.google.com/github/huhlim/alphafold-multistate/blob/main/AlphaFold_multistate_human_kinase_database.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#AlphaFold-multistate: human kinase database
This database offers protein models for human kinases in various conformational states. Models were generated by a multi-state modeling protocol using AlphaFold2, which was originally developed for modeling active and inactive conformations of the GPCR. [[Colab Notebook]](https://colab.research.google.com/github/huhlim/alphafold-multistate/blob/main/AlphaFold_multistate.ipynb) [[ref]](https://onlinelibrary.wiley.com/doi/10.1002/prot.26382) The protocol was extended toward kinases based on the same idea but using kinase conformational state-annotated databases. The definition of conformational states was brought from [Kincore](http://dunbrack.fccc.edu/kincore/home) [[ref]](https://doi.org/10.1073/pnas.1814279116). It classifies kinase conformations using structural characteristics of the DFGmotif: (1) direction of DFG-Asp and (2) dihedral angles of the motif. According to [Modi & Dunbrack](https://academic.oup.com/nar/article/50/D1/D654/6395339), the origianl AlphaFold2 has a strong bias to *DFGin_BLAminus* state. Alternatively, the multi-state modeling protocol using templates from kinase conformational state-annotated structure databases could predict various kinase models in multiple states. However, it was still not possible to predict in all possible states (see **Figure 1**) even though it could predict some alternative conformations because (1) AlphaFold2 has much stronger state bias for kinases than that for GPCRs, (2) structural difference between kinase states (as small as a sidechain torsion angle difference) were less significant than GPCRs, and (3) a kinase may not possess all the conformational states. Thus, in this database, successfully modeled conformations are included only.


![Figure 1. Confusion matrix of the multi-state modeling of kinases](https://github.com/huhlim/alphafold-multistate/blob/main/images/kinase.confusion_matrix_small.png?raw=true)

Figure 1. Confusion matrix of the multi-state modeling of kinases. 

In [None]:
#@title Initialize database
#@markdown This step clones databases from its repository and installs some programs to run this notebook. It takes less than a minute.

%%bash

if [[ ! -e human_kinases.zip ]]; then
    pip install gdown py3dmol &> /dev/null

    gdown 1MH5nNx7I0RL9THv_wqk9sByfOHLpqy_z
    gdown 14Rx2faKL1CBb4mMBHuYIzs4rmdf-48TI
    gdown 1cFNibkpu7uDyssnPqctHmQdsl5I14hZS
    unzip -qq human_kinases.zip
fi

In [None]:
#@title Indexing database
#@markdown This step prepares database information such as (1) the number of kinase models, (2) used modeling templates, and (3) the position of the DFG motif for visualization.

import glob
import pathlib

uniprot_s = {}
for _fn in glob.glob("human_kinases/*/*.pdb"):
    name, fn = _fn.split("/")[-2:]
    uniprot_id, resrange = name.split(".")
    if uniprot_id not in uniprot_s:
        uniprot_s[uniprot_id] = {}
    if resrange not in uniprot_s[uniprot_id]:
        uniprot_s[uniprot_id][resrange] = []
    uniprot_s[uniprot_id][resrange].append(_fn)

modeling_s = {}
with open("modeling_info.dat") as fp:
    for line in fp:
        x = line.strip().split()
        uniprot_id, resrange = x[0].split(".")
        state = x[1]
        plddt = float(x[2])
        template = x[3]
        seq_id = float(x[4])
        #
        if uniprot_id not in modeling_s:
            modeling_s[uniprot_id] = {}
        if resrange not in modeling_s[uniprot_id]:
            modeling_s[uniprot_id][resrange] = {}
        #
        modeling_s[uniprot_id][resrange][state] = (plddt, template, seq_id)

xDFG_s = {}
with open("xDFG.dat") as fp:
    for line in fp:
        x = line.strip().split()
        xDFG_s[x[0]] = x[1]

In [None]:
#@title Database search
#@markdown This step searches against the database. Please use a UniProt ID (*uniprot_id*) for a human kinase. 
#@markdown If there are multiple domains for a kinase, it will raise an error and ask to select a *residue_range* among a list of domains (*e.g.*, 0521-0779), otherwise, please leave it as "None".

import sys

uniprot_id = 'P31749' #@param {type: "string"}
if uniprot_id not in uniprot_s:
    raise KeyError(f"There is no structure for {uniprot_id}")
residue_range = "None" #@param {type: "string"}
if residue_range is not "None" and (residue_range not in uniprot_s[uniprot_id]):
    raise KeyError(f"There is no structure for {uniprot_id}, {residue_range}")

if residue_range is "None":
    record_s = uniprot_s[uniprot_id]
else:
    record_s = {residue_range: uniprot_s[uniprot_id][residue_range]}
n_record = len(list(record_s))

if n_record == 1:
    residue_range = list(record_s)[0]
    n_model = len(record_s[residue_range])
    sys.stdout.write(f"Found {n_model} models for {uniprot_id} that covers residues {residue_range}.\n\n")
else:
    sys.stdout.write(f"Found {n_record} domains for {uniprot_id}.\n")
    for residue in sorted(list(record_s)):
        sys.stdout.write(f" - residues: {residue}\n")
    raise KeyError("Please select a residue_range")

pdb_fn_s = sorted(uniprot_s[uniprot_id][residue_range])
#
sys.stdout.write("%-20s |  pLDDT | Template (sequence identity)\n"%("State"))
sys.stdout.write("-"*60 + "\n")
for pdb_fn in pdb_fn_s:
    state = pdb_fn.split("/")[-1].split(".")[-2]
    info = modeling_s[uniprot_id][residue_range][state]
    sys.stdout.write(f"{state:<20s} | {info[0]:6.2f} | {info[1]} {info[2]:5.1f}\n")

print()

xDFG = xDFG_s.get(f"{uniprot_id}.{residue_range}", None)

color_s = {}
color_s["DFGin_ABAminus"] = "#2471A3"
color_s["DFGin_BLAminus"] = "#154360"
color_s["DFGin_BLAplus"] = "#5499C7"
color_s["DFGin_BLBminus"] = "#A9CCE3"
color_s["DFGin_BLBplus"] = "#D4E6A1"
color_s["DFGin_BLBtrans"] = "#3498DB"
color_s["DFGin_Other"] = "#1B4F72"
color_s["DFGinter_BABtrans"] = "#17A589"
color_s["DFGinter_Other"] = "#A3E4D7"
color_s["DFGout_BBAminus"] = "#E74C3C"
color_s["DFGout_Other"] = "#F1948A"
color_s["Other_Other"] = "#D7BDE2"

import py3Dmol
def show_pdb(pdb_fn_s):
    view = py3Dmol.view(js='https://3dmol.org/build/3Dmol.js', height=640)
    #
    BB = ['C','O','N']
    for i, pdb_fn in enumerate(pdb_fn_s):
        state = pdb_fn.split("/")[-1].split(".")[-2]
        color = color_s[state]
        with open(pdb_fn) as fp:
            view.addModel(fp.read(), 'pdb')
        view.setStyle({"model": i}, \
                      {"cartoon": {"color": color}}
                      )
        if xDFG is not None:
            view.addStyle({'and':[{'resi':[xDFG]},{'atom':BB,'invert':True}]},
                      {'stick':{'radius':0.2}})
        view.addLabel(state, \
                      {
                       "fontColor": color, \
                       "backgroundColor": "white",\
                       "backgroundOpacity": 0.0, \
                       "font": "Arial", \
                       "fontSize": 20, \
                       "position":{"x":5.0,"y":i*25,"z":0.0},\
                       "useScreen": True,
                       })
    #
    view.zoomTo()
    return view

show_pdb(pdb_fn_s).show()

In [None]:
#@title Download
#@markdown This step allows you to download the models shown above as a zip file.

import subprocess as sp
from google.colab import files

name = f"{uniprot_id}.{residue_range}"
out_fn = f"{name}.multi_state.zip"
!zip -FSr $out_fn human_kinases/$name
files.download(out_fn)
