## Orange QB1.4

### Query:
Return sequences of FA genes associated with functional post-translational modifications.

In [1]:
# import biothing explorer (current in local, will make it an independent python package)
from visual import pathViewer

In [2]:
# pathViewer is a Python class for graphically display API connection maps and explore bio-entity relaitonships
k = pathViewer()

#### Show All Available IDs
Users could call the **availbale_ids** function to retrieve all IDs as well as descriptions for these ids used in BioThings Explorer

In [3]:
k.available_ids()

0,1,2,3,4
Preferred Name,URI,Description,Identifier pattern,Type
DrugBank,http://identifiers.org/drugbank/,"The DrugBank database is a bioinformatics and chemoinformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. This collection references drug information.",^DB\d{5}$,Entity
PubMed,http://identifiers.org/pubmed/,PubMed is a service of the U.S. National Library of Medicine that includes citations from MEDLINE and other life science journals for biomedical articles back to the 1950s.,^\d+$,Entity
CHEMBL ID,http://identifiers.org/chembl/,,,Entity
dbSNP,http://identifiers.org/dbsnp/,The dbSNP database is a repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms.,^rs\d+$,Entity
RXCUI,http://identifiers.org/rxcui/,,,Entity
Protein Data Bank,http://identifiers.org/pdb/,The Protein Data Bank is the single worldwide archive of structural data of biological macromolecules.,^[0-9][A-Za-z0-9]{3}$,Entity
Experimental Factor Ontology,http://identifiers.org/efo/,"The Experimental Factor Ontology (EFO) provides a systematic description of many experimental variables available in EBI databases. It combines parts of several biological ontologies, such as anatomy, disease and chemical compounds. The scope of EFO is to support the annotation, analysis and visualization of data handled by the EBI Functional Genomics Team.",^\d{7}$,Entity
RefSeq,http://identifiers.org/refseq/,"The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products.",^((AC|AP|NC|NG|NM|NP|NR|NT|NW|XM|XP|XR|YP|ZP)_\d+|(NZ\_[A-Z]{4}\d+))(\.\d+)?$,Entity
ClinVar Variant,http://identifiers.org/clinvar/,"ClinVar archives reports of relationships among medically important variants and phenotypes. It records human variation, interpretations of the relationship specific variations to human health, and supporting evidence for each interpretation. Each ClinVar record (RCV identifier) represents an aggregated view of interpretations of the same variation and condition from one or more submitters. Submissions for individual variation/phenotype combinations (SCV identifier) are also collected and made available separately. This collection references the Variant identifier.",^\d+$,Entity


#### Show How APIs/Endpoints/Bio-Entities can be connected together
graph is **interactive**

In [4]:
k.show_api_road_map()

#### Find Path
The CQ question above ask to find the path connecting **NCBI Gene IDs** and **PTM info**.
**find_path** function could be used here to find how two different biological entities/concepts can be connected together through API endpoints

In [5]:
k.find_path('NCBI Gene', 'PTM Object')

Path 0: [{'input': 'NCBI Gene', 'endpoint': 'http://mygene.info/v3/gene/geneid', 'output': 'UniProt Knowledgebase'}, {'input': 'UniProt Knowledgebase', 'endpoint': 'http://www.ebi.ac.uk/proteins/api/features/accession?categories=PTM', 'output': 'PTM Object'}]


#### Find output for single input
Users could utilize **find_output** function to find the output for one single input using the selected path

In [6]:
# 675 is a NCBI Gene ID in the FA Gene List
# find_output takes two parameters
# the first parameter is path, users should select the path ID based on the results from 'find_path'
# the second parameter is value, which represent the input value
k.find_output(path=k.paths[0], value='675')

#### Print result summary

In [7]:
k.result_summary()

Your exploration starts from NCBI Gene: 675. 
 It goes through 2 API Endpoints. 
 The final output comes from API Endpoint http://www.ebi.ac.uk/proteins/api/features/accession?categories=PTM. 
 You can access the final output by calling the 'final_results' object in pathViewer Class.



#### Explore results
The final results is represented using a dictionary with the key being the input, and the value being the final output

In [8]:
k.final_results

{'675': [{'begin': '70',
   'category': 'PTM',
   'description': 'Phosphoserine',
   'end': '70',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
      'id': '23186163',
      'name': 'PubMed',
      'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
   'type': 'MOD_RES'},
  {'begin': '445',
   'category': 'PTM',
   'description': 'Phosphoserine',
   'end': '445',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
      'id': '23186163',
      'name': 'PubMed',
      'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
   'type': 'MOD_RES'},
  {'begin': '492',
   'category': 'PTM',
   'description': 'Phosphoserine',
   'end': '492',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
      'id': '23186163',
      'name': 'PubMed',
      'url': 'http://www.ncbi.nlm.nih

#### Explorer a list of Inputs
FA Gene List contains 26 different genes, users could input all these genes together using **find_output** function

In [9]:
import pandas as pd
'''
return list of FA genes from github txt file
'''
def get_fa_gene_list():
    gene_list_url = 'https://raw.githubusercontent.com/NCATS-Tangerine/cq-notebooks/master/FA_gene_sets/FA_4_all_genes.txt'
    gl = pd.read_table(gene_list_url, header=None)
    return [_gene.replace('NCBIGene:','') for _gene in gl[0].values.tolist()]
gene_list = get_fa_gene_list()

print(gene_list)

['2175', '2187', '2176', '2178', '2188', '2189', '55120', '57697', '2177', '55215', '29089', '675', '83990', '79728', '5889', '84464', '2072', '5888', '672', '10459', '7516', '55159', '80233', '91442', '199990', '378708', '201254']


In [10]:
k.find_output(path=k.paths[0], value=gene_list)

In [11]:
k.result_summary()

Your exploration starts from NCBI Gene: ['2175', '2187', '2176', '2178', '2188', '2189', '55120', '57697', '2177', '55215', '29089', '675', '83990', '79728', '5889', '84464', '2072', '5888', '672', '10459', '7516', '55159', '80233', '91442', '199990', '378708', '201254']. 
 It goes through 2 API Endpoints. 
 The final output comes from API Endpoint http://www.ebi.ac.uk/proteins/api/features/accession?categories=PTM. 
 You can access the final output by calling the 'final_results' object in pathViewer Class.



Each key in the final_results represent one input (e.g. NCBI Gene ID)

In [12]:
k.final_results

{'10459': [],
 '199990': [{'begin': '113',
   'category': 'PTM',
   'description': 'Phosphoserine',
   'end': '113',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
      'id': '23186163',
      'name': 'PubMed',
      'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
   'type': 'MOD_RES'},
  {'begin': '137',
   'category': 'PTM',
   'description': 'Phosphoserine',
   'end': '137',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
      'id': '23186163',
      'name': 'PubMed',
      'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
   'type': 'MOD_RES'}],
 '201254': [{'begin': '1',
   'category': 'PTM',
   'description': 'N-acetylmethionine',
   'end': '1',
   'evidences': [{'code': 'ECO:0000244',
     'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/22814378',
      'id': '22814378',
      'name': 'PubMed',
      

#### Retrieve output for a specific NCBI Gene ID in the FA Gene List

In [13]:
k.final_results['199990']

[{'begin': '113',
  'category': 'PTM',
  'description': 'Phosphoserine',
  'end': '113',
  'evidences': [{'code': 'ECO:0000244',
    'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
     'id': '23186163',
     'name': 'PubMed',
     'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
  'type': 'MOD_RES'},
 {'begin': '137',
  'category': 'PTM',
  'description': 'Phosphoserine',
  'end': '137',
  'evidences': [{'code': 'ECO:0000244',
    'source': {'alternativeUrl': 'http://europepmc.org/abstract/MED/23186163',
     'id': '23186163',
     'name': 'PubMed',
     'url': 'http://www.ncbi.nlm.nih.gov/pubmed/23186163'}}],
  'type': 'MOD_RES'}]