## Orange Team CQ#1.7

### Query: 
What genes show high phenotypic similarity to the 11 Fanconi Anemia core complex genes (set FA-core)?

### Services:
BioLink API (Monarch) - https://api.monarchinitiative.org/api/

Simsearch - https://monarchinitiative.org/simsearch

### Approach:
Take all 27 human FA genes. For each gene, retrieve the phenotypically similar genes from mouse, zebrafish, worm, and fly. The output is a list of genes and a score. Sum of the scores for each of the genes, across all 27 FA genes. Take the top X (10) phenotypically similar genes from each organism, and then get the human orthologs of those genes. 

### Author
Gstupp

In [1]:
import os, sys
# change this path
sys.path.insert(0, "/home/gstupp/projects/NCATS-Tangerine/mvp-module-library/")

In [14]:
from BioLink import biolink_client
from GeneStore.gene_store import FanconiGeneImporter
import requests
from itertools import chain
import pandas as pd
from pprint import pprint
from tqdm import tqdm, tqdm_notebook
from collections import defaultdict
from IPython.display import display, HTML

def pretty_print(df):
    return display( HTML( df.to_html().replace("\\n","<br>") ) )

pd.options.display.max_rows = 99
pd.options.display.max_colwidth = 10000

In [3]:
gene_df = FanconiGeneImporter("fa_all").gene_set_df()

b = biolink_client.BioLinkWrapper()
gene_df.head()

Unnamed: 0,gene_curie,gene_symbol
0,NCBIGene:2175,FANCA
1,NCBIGene:2187,FANCB
2,NCBIGene:2176,FANCC
3,NCBIGene:2178,FANCE
4,NCBIGene:2188,FANCF


In [4]:
from Modules.phenotype_similarity import PhenotypicSimilarity
ps = PhenotypicSimilarity(['NCBIGene:2175', 'NCBIGene:2187'], '10090')

In [6]:
ps.phenogene_score[:10]

id
MGI:1347466    136
MGI:88180      132
MGI:88276      134
MGI:109344     125
MGI:1330810    128
MGI:104671     127
MGI:88039      133
MGI:98297      127
MGI:1920563    127
MGI:95523      126
dtype: int64

In [15]:
pretty_print(ps.explain_phenotypically_similar_gene("MGI:105373"))

Unnamed: 0,id,score,label,explanation,input_gene
14,MGI:105373,67,Ptch1,Breast carcinoma -> carcinoma phenotype <- Basal cell carcinoma Abnormality of the fallopian tube -> pelvic region of trunk phenotype <- abnormal kidney pelvis morphology Micrognathia -> abnormal mandible morphology <- mandibular cysts Patent ductus arteriosus -> abnormal vascular development <- abnormal vascular development Growth hormone deficiency -> abnormal diencephalon morphology <- abnormal diencephalon morphology Decreased fertility in males -> Decreased fertility in males <- male infertility Frontal bossing -> Abnormality of calvarial morphology <- shortened head Ataxia -> ataxia <- ataxia Spina bifida -> open neural tube <- open neural tube Anal atresia -> orifice phenotype <- abnormal mandibular prominence morphology Finger syndactyly -> syndactyly <- syndactyly Pancytopenia -> decreased leukocyte cell number <- decreased transitional stage B cell number Absent thumb -> zone of long bone phenotype <- abnormal long bone hypertrophic chondrocyte zone Acute lymphoblastic leukemia -> hematopoietic/lymphoid malignancies/disorder phenotype <- Lymphoma Abnormal lung lobation -> abnormal respiratory system development <- abnormal nasal placode morphology Aganglionic megacolon -> Abnormality of ganglion <- abnormal geniculate ganglion morphology Intrauterine growth retardation -> abnormal prenatal growth/weight/body size <- increased fetal size Hypoplasia of the ulna -> decreased length of long bones <- Short long bone Abnormality of the preputium -> prepuce of penis phenotype <- small male preputial glands Hypertelorism -> ocular hypertelorism <- Hypertelorism Hypoplasia of the premaxilla -> skeleton of upper jaw phenotype <- abnormal upper incisor morphology Irregular hyperpigmentation -> abnormal(ly) quality pigmentation <- belly spot Multiple cafe-au-lait spots -> Localized skin lesion <- dermal cysts Increased hemoglobin -> increased myeloid cell number <- increased neutrophil cell number Inguinal hernia -> herniated abdominal wall <- omphalocele,NCBIGene:2187
0,MGI:105373,66,Ptch1,Breast carcinoma -> carcinoma phenotype <- Basal cell carcinoma Umbilical hernia -> herniated abdominal wall <- omphalocele Facial asymmetry -> abnormal head shape <- shortened head Micrognathia -> abnormal mandible morphology <- mandibular cysts Horseshoe kidney -> abnormal kidney morphology <- abnormal kidney pelvis morphology Patent ductus arteriosus -> abnormal vascular development <- abnormal vascular development Azoospermia -> abnormal(ly) process quality spermatogenesis <- Abnormal spermatogenesis Neutropenia -> abnormal neutrophil cell number <- increased neutrophil cell number Ataxia -> ataxia <- ataxia Amyotrophic lateral sclerosis -> abnormal neuron morphology <- abnormal neuronal precursor proliferation Scoliosis -> abnormal spine curvature <- kyphosis Finger syndactyly -> syndactyly <- syndactyly Pancytopenia -> decreased leukocyte cell number <- decreased transitional stage B cell number Absent thumb -> zone of long bone phenotype <- abnormal long bone hypertrophic chondrocyte zone Anemic pallor -> abnormal skin appearance <- skin lesions Acute lymphoblastic leukemia -> hematopoietic/lymphoid malignancies/disorder phenotype <- Lymphoma Deficient excision of UV-induced pyrimidine dimers in DNA -> cellular response to stimulus phenotype <- smoothened signaling pathway involved in dorsal/ventral neural tube patterning phenotype Anteverted nares -> olfactory system phenotype <- abnormal nasal placode morphology Hypopigmented skin patches -> Localized skin lesion <- dermal cysts Aganglionic megacolon -> Abnormality of ganglion <- abnormal geniculate ganglion morphology Intrauterine growth retardation -> abnormal prenatal growth/weight/body size <- increased fetal size Hypoplasia of the ulna -> decreased length of long bones <- Short long bone Abnormality of the preputium -> prepuce of penis phenotype <- small male preputial glands Hypertelorism -> ocular hypertelorism <- Hypertelorism Nystagmus -> Abnormal ear morphology <- small otic vesicle Metrorrhagia -> abnormal female reproductive system physiology <- female infertility Hypoplasia of the premaxilla -> skeleton of upper jaw phenotype <- abnormal upper incisor morphology,NCBIGene:2175


In [16]:
import pickle
with open("gene_genes_1.7_orthologs.pkl", "wb") as f:
    pickle.dump(ps, f)

In [None]:
###### not done below here ######

In [12]:
top10 = dict()
for prefix in prefixes:
    ss = {k:v for k,v in s.items() if k.startswith(prefix)}
    top10[prefix] = sorted(ss.items(), key=lambda x:x[1], reverse=True)[:20]
ss = list(chain(*top10.values()))
ss = [{'gene': s[0], 'score': s[1]} for s in ss]
ss

[{'gene': 'MGI:88276', 'score': 1433},
 {'gene': 'MGI:1347466', 'score': 1403},
 {'gene': 'MGI:88039', 'score': 1396},
 {'gene': 'MGI:99851', 'score': 1387},
 {'gene': 'MGI:88064', 'score': 1370},
 {'gene': 'MGI:95729', 'score': 1366},
 {'gene': 'MGI:1330810', 'score': 1347},
 {'gene': 'MGI:105373', 'score': 1344},
 {'gene': 'MGI:98297', 'score': 1340},
 {'gene': 'MGI:104327', 'score': 1335},
 {'gene': 'MGI:96677', 'score': 1333},
 {'gene': 'MGI:97490', 'score': 1323},
 {'gene': 'MGI:95523', 'score': 1318},
 {'gene': 'MGI:98834', 'score': 1316},
 {'gene': 'MGI:108072', 'score': 1315},
 {'gene': 'MGI:88180', 'score': 1305},
 {'gene': 'MGI:1352467', 'score': 1297},
 {'gene': 'MGI:98726', 'score': 1296},
 {'gene': 'MGI:97902', 'score': 1295},
 {'gene': 'MGI:104993', 'score': 1294},
 {'gene': 'ZFIN:ZDB-GENE-011026-1', 'score': 1179},
 {'gene': 'ZFIN:ZDB-GENE-080405-1', 'score': 1172},
 {'gene': 'ZFIN:ZDB-GENE-040426-1716', 'score': 1169},
 {'gene': 'ZFIN:ZDB-GENE-030131-6378', 'score': 112

In [13]:
for s in tqdm_notebook(ss):
    s['orthologs'] = query_orthologs(s['gene'], "NCBITaxon:9606")
ss




[{'gene': 'MGI:88276', 'orthologs': ['HGNC:2514'], 'score': 1433},
 {'gene': 'MGI:1347466', 'orthologs': ['HGNC:3800'], 'score': 1403},
 {'gene': 'MGI:88039', 'orthologs': ['HGNC:583'], 'score': 1396},
 {'gene': 'MGI:99851', 'orthologs': ['HGNC:1539'], 'score': 1387},
 {'gene': 'MGI:88064', 'orthologs': ['HGNC:644'], 'score': 1370},
 {'gene': 'MGI:95729', 'orthologs': ['HGNC:4319'], 'score': 1366},
 {'gene': 'MGI:1330810', 'orthologs': ['HGNC:15979'], 'score': 1347},
 {'gene': 'MGI:105373', 'orthologs': ['HGNC:9585'], 'score': 1344},
 {'gene': 'MGI:98297', 'orthologs': ['HGNC:10848'], 'score': 1340},
 {'gene': 'MGI:104327', 'orthologs': ['HGNC:7866'], 'score': 1335},
 {'gene': 'MGI:96677', 'orthologs': ['HGNC:6342'], 'score': 1333},
 {'gene': 'MGI:97490', 'orthologs': ['HGNC:8620'], 'score': 1323},
 {'gene': 'MGI:95523', 'orthologs': ['HGNC:3689'], 'score': 1318},
 {'gene': 'MGI:98834', 'orthologs': ['HGNC:11998'], 'score': 1316},
 {'gene': 'MGI:108072', 'orthologs': ['HGNC:12036'], 's

In [14]:
for s in tqdm_notebook(ss):
    s['label'] = get_obj(s['gene'])['label']
    s['ortholog_labels'] = [get_obj(x)['label'] for x in s['orthologs']]




## This is the output!!!

In [15]:
ss = sorted(ss, key=lambda x: x['score'], reverse=True)
print("\n".join([",".join([x['orthologs'][0],x['ortholog_labels'][0], str(x['score'])]) for x in ss[:20]]))

HGNC:2514,CTNNB1,1433
HGNC:3800,FOXC1,1403
HGNC:583,APC,1396
HGNC:1539,CBFB,1387
HGNC:644,AR,1370
HGNC:4319,GLI3,1366
HGNC:15979,TP63,1347
HGNC:9585,PTCH1,1344
HGNC:10848,SHH,1340
HGNC:7866,NOG,1335
HGNC:6342,KIT,1333
HGNC:8620,PAX6,1323
HGNC:3689,FGFR2,1318
HGNC:11998,TP53,1316
HGNC:12036,TRAF6,1315
HGNC:1071,BMP4,1305
HGNC:3467,ESR1,1297
HGNC:11768,TGFB2,1296
HGNC:9967,RET,1295
HGNC:6554,LEPR,1294


### Demo with one gene

In [16]:
## FANCC
phenotypes = get_phenotype_from_gene_verbose("NCBIGene:7042")
phenotypes

[('HP:0001677', 'Coronary artery atherosclerosis'),
 ('HP:0002107', 'Pneumothorax'),
 ('HP:0005162', 'Left ventricular failure'),
 ('HP:0001647', 'Bicuspid aortic valve'),
 ('HP:0002974', 'Radioulnar synostosis'),
 ('HP:0011106', 'Hypovolemia'),
 ('HP:0000965', 'Cutis marmorata'),
 ('HP:0002616', 'Aortic root aneurysm'),
 ('HP:0010772', 'Anomalous pulmonary venous return'),
 ('HP:0001643', 'Patent ductus arteriosus'),
 ('HP:0000494', 'Downslanted palpebral fissures'),
 ('HP:0004757', 'Paroxysmal atrial fibrillation'),
 ('HP:0002140', 'Ischemic stroke'),
 ('HP:0000525', 'Abnormality iris morphology'),
 ('HP:0000023', 'Inguinal hernia'),
 ('HP:0000218', 'High palate'),
 ('HP:0001629', 'Ventricular septal defect'),
 ('HP:0001199', 'Triphalangeal thumb'),
 ('HP:0002138', 'Subarachnoid hemorrhage'),
 ('HP:0001166', 'Arachnodactyly'),
 ('HP:0006695', 'Atrioventricular canal defect'),
 ('HP:0002943', 'Thoracic scoliosis'),
 ('HP:0001640', 'Cardiomegaly'),
 ('HP:0000822', 'Hypertension'),
 ('H

In [17]:
d = get_phenotypically_similar_genes([x[0] for x in phenotypes], "10090", return_all=True)
genes = get_phenotypically_similar_genes([x[0] for x in phenotypes], "10090", return_all=False)
genes

[('MGI:96817', 68, 'Lox'),
 ('MGI:106923', 66, 'Tll1'),
 ('MGI:1913761', 66, 'Chtop'),
 ('MGI:1891209', 65, 'Efemp2'),
 ('MGI:3050795', 65, 'Mkl2'),
 ('MGI:5560774', 64, 'b2b2736Clo'),
 ('MGI:97531', 64, 'Pdgfrb'),
 ('MGI:95489', 64, 'Fbn1'),
 ('MGI:2446294', 64, 'Megf8'),
 ('MGI:109340', 63, 'Pitx2'),
 ('MGI:98726', 63, 'Tgfb2'),
 ('MGI:1920563', 62, 'Rpgrip1l'),
 ('MGI:1928901', 62, 'Pdzk1'),
 ('MGI:2154244', 62, 'Plxnd1'),
 ('MGI:1345643', 61, 'Sufu'),
 ('MGI:107718', 61, 'Dnah5'),
 ('MGI:1347466', 60, 'Foxc1'),
 ('MGI:1347465', 60, 'Foxh1'),
 ('MGI:95586', 60, 'Fst'),
 ('MGI:1919247', 60, 'Smg9'),
 ('MGI:109448', 59, 'Cfc1'),
 ('MGI:97788', 59, 'Psph'),
 ('MGI:5570107', 59, 'b2b2821Clo'),
 ('MGI:1927166', 59, 'Chst11'),
 ('MGI:5646601', 59, 'b2b3077Clo'),
 ('MGI:97350', 59, 'Nkx2-5'),
 ('MGI:98715', 59, 'Ift88'),
 ('MGI:98754', 59, 'Timp3'),
 ('MGI:1298393', 59, 'Sh3pxd2a'),
 ('MGI:1920145', 59, 'Setd5'),
 ('MGI:99604', 58, 'Fgf8'),
 ('MGI:88452', 58, 'Col2a1'),
 ('MGI:96257', 58, 

In [18]:
match = d['b'][0]
(match['id'],match['label'])

('MGI:96817', 'Lox')

In [19]:
match['matches'][:2]

[{'a': {'IC': 9.717293049024157,
   'id': 'HP:0004950',
   'label': 'Peripheral arterial stenosis'},
  'b': {'IC': 10.102227593739434,
   'id': 'MP:0010457',
   'label': 'pulmonary artery stenosis'},
  'lcs': {'IC': 7.114954719349155,
   'id': 'MP:0006135',
   'label': 'artery stenosis'}},
 {'a': {'IC': 5.482052841731362, 'id': 'HP:0001640', 'label': 'Cardiomegaly'},
  'b': {'IC': 8.084227593739431,
   'id': 'MP:0000276',
   'label': 'heart right ventricle hypertrophy'},
  'lcs': {'IC': 5.482052841731362,
   'id': 'MP:0000274',
   'label': 'enlarged heart'}}]

In [20]:
# FANCC and Gli3 are "phenotypically similar" because of these phenotypes in common
[(x['lcs']['id'],x['lcs']['label']) for x in match['matches']]

[('MP:0006135', 'artery stenosis'),
 ('MP:0000274', 'enlarged heart'),
 ('MP:0011572', 'abnormal aorta bulb morphology'),
 ('MP:0009868', 'abnormal descending thoracic aorta morphology'),
 ('MP:0006049', 'semilunar valve regurgitation'),
 ('HP:0009131', 'Abnormality of the musculature of the thorax'),
 ('MP:0001958', 'emphysema'),
 ('MP:0001634', 'internal hemorrhage')]

In [21]:
human_orthologs = query_orthologs(match['id'], taxon="NCBITaxon:9606")
human_orthologs

['HGNC:6664']

In [22]:
for human_gene, pgenes in gene_genes.items():
    pgenes = [x for x in pgenes if "MGI:98726" == x[0]]
    print(human_gene, get_obj(human_gene)['label'], pgenes)

NCBIGene:80233 FAAP100 []
NCBIGene:79728 PALB2 [('MGI:98726', 61, 'Tgfb2')]
NCBIGene:675 BRCA2 [('MGI:98726', 57, 'Tgfb2')]
NCBIGene:2176 FANCC [('MGI:98726', 61, 'Tgfb2')]
NCBIGene:2178 FANCE [('MGI:98726', 61, 'Tgfb2')]
NCBIGene:378708 CENPS []
NCBIGene:91442 FAAP24 []
NCBIGene:2072 ERCC4 [('MGI:98726', 56, 'Tgfb2')]
NCBIGene:84464 SLX4 [('MGI:98726', 62, 'Tgfb2')]
NCBIGene:5888 RAD51 [('MGI:98726', 62, 'Tgfb2')]
NCBIGene:55159 RFWD3 [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:2175 FANCA [('MGI:98726', 62, 'Tgfb2')]
NCBIGene:2188 FANCF [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:10459 MAD2L2 [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:2187 FANCB [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:29089 UBE2T [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:57697 FANCM [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:7516 XRCC2 [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:55120 FANCL [('MGI:98726', 63, 'Tgfb2')]
NCBIGene:199990 FAAP20 []
NCBIGene:83990 BRIP1 [('MGI:98726', 61, 'Tgfb2')]
NCBIGene:2177 FANCD2 [('MGI:98726', 62, 'Tgfb2')]
NCBIGene

In [23]:
## Version 2 : Get orthologs first
phenotypes = get_phenotype_from_gene("MGI:88276")
get_phenotypically_similar_genes(phenotypes, "9606")

[('MONDO:0008112', 58, 'Goldenhar syndrome'),
 ('MONDO:0009046', 58, 'Fraser syndrome'),
 ('MONDO:0008965', 58, 'CHARGE syndrome'),
 ('MONDO:0009910', 57, 'Wiedemann-Rautenstrauch syndrome'),
 ('MONDO:0008678', 57, 'Williams syndrome'),
 ('MONDO:0010561', 57, 'Coffin-Lowry syndrome'),
 ('MONDO:0021002', 57, 'syndactyly (disease)'),
 ('MONDO:0019391', 56, 'Fanconi anemia'),
 ('MONDO:0002457', 56, 'Treacher-Collins syndrome'),
 ('MONDO:0009997', 56, 'Roberts syndrome'),
 ('MONDO:0002378', 56, 'dermoid cyst'),
 ('MONDO:0013099',
  56,
  'combined pituitary hormone deficiencies, genetic forms'),
 ('MONDO:0009736', 56, 'Neu-Laxova syndrome 1'),
 ('MONDO:0010731', 56, 'Simpson-Golabi-Behmel syndrome'),
 ('MONDO:0003119', 56, 'histiocytoid hemangioma'),
 ('MONDO:0002933', 56, 'osteosclerosis'),
 ('MONDO:0005096', 56, 'squamous cell carcinoma'),
 ('MONDO:0002601', 55, 'teratoma'),
 ('MONDO:0007534', 55, 'Beckwith-Wiedemann syndrome'),
 ('MONDO:0010802',
  55,
  'pancreatic hypoplasia-diabetes-