# OmicIDX analytical capabilities

The OmicIDX REST API has capabilities to perform
very fast summaries of query results. In practice, this functionality
is useful for understanding large sets of OmicIDX without
having to return all query results. In this notebook, I will give examples
of how to use this functionality to perform basic reporting.

In [1]:
import requests

In [38]:
from typing import List

class OResult(object):
    def __init__(self, results):
        self._data = results
     
    @property
    def hits(self) -> dict:
        """"""
        return self._data['hits']
    
    @property
    def total(self):
        """Returns the total number of records found"""
        return self._data['stats']['total']
    
    @property
    def facets(self):
        return self._data['facets']
    
    @property
    def facet_names(self):
        return list(self._data['facets'].keys())
    
    def cursor(self):
        return self._data['cursor']
    

class OClient(object):
    def __init__(self, base_url="https://api.omicidx.cancerdatasci.org"):
        self.base_url = base_url
        
    def get(self, path, **query_parameters):
        try:
            return OResult(results = requests.get(self.base_url+path, params = query_parameters).json())
        except:
            print("error")

In [41]:
client = OClient()
res = client.get('/sra/studies')
print("total hits: ", res.total)
print("Titles.............")
for hit in res.hits:
    print(hit['title'])

total hits:  226283
Titles.............
Bacillus subtilis subsp. natto BEST195 genome sequencing project
Model organism for prokaryotic cell differentiation and development
Comprehensive identification and characterization of the nucleosome             structure
Comprehensive identification and characterization of the transcripts, their             expression levels and sub-cellular localizations
Comprehensive identification and characterization of the transcripts, their             expression levels and sub-cellular localizations
Comprehensive identification and characterization of the transcripts, their             expression levels and sub-cellular localizations
Comprehensive identification and characterization of the binding sites of             polymerase II
Comprehensive identification and characterization of the binding sites of             polymerase II
Subsurface mine microbial mat metagenome
Oryza sativa Japonica group genome sequencing project by QTL Genomics Research Center

In [42]:
res2 = client.get('/sra/studies', facets=['center_name', 'study_type'])

In [43]:
facets = res2.facets
for facet_name in res2.facet_names:
    print(f'facet: {facet_name}')
    facet = res2.facets[facet_name]
    for item in facet['buckets']:
        print(f"    {item['key']} : {item['doc_count']}")

facet: center_name
    BioProject : 140823
    GEO : 38518
    DOE - JOINT GENOME INSTITUTE : 2590
    UMIGS : 2557
    JGI : 2368
    JCVI : 1634
    WUGSC : 1402
    BI : 976
    SC : 969
    The Wellcome Trust Sanger Institute : 883
facet: study_type
    Other : 95040
    Whole Genome Sequencing : 66450
    Metagenomics : 31662
    Transcriptome Analysis : 31087
    Population Genomics : 807
    Epigenetics : 624
    Cancer Genomics : 290
    Exome Sequencing : 279
    Pooled Clone Sequencing : 31
    Synthetic Genomics : 10


In [44]:
import pandas as pd

In [45]:
pd.DataFrame.from_dict(res2.facets['center_name']['buckets'])

Unnamed: 0,key,doc_count
0,BioProject,140823
1,GEO,38518
2,DOE - JOINT GENOME INSTITUTE,2590
3,UMIGS,2557
4,JGI,2368
5,JCVI,1634
6,WUGSC,1402
7,BI,976
8,SC,969
9,The Wellcome Trust Sanger Institute,883


In [33]:
pd.DataFrame.from_dict(res2.hits)

Unnamed: 0,pubmed_ids,attributes,BioProject,study_type,alias,identifiers,abstract,accession,title,center_name,...,published,received,sample_count,experiment_count,run_count,total_bases,total_spots,mean_bases_per_run,taxon_ids,description
0,"[20398357, 25329997]",[],PRJDA38027,Whole Genome Sequencing,DRP000001,"[{'id': 'PRJDA38027', 'namespace': 'BioProject...",<b><i>Bacillus subtilis</i> subsp. <i>natto</i...,DRP000001,Bacillus subtilis subsp. natto BEST195 genome ...,KEIO,...,2015-07-31T15:20:44,2009-06-20T02:48:02,1,1,1,730668528,10148174,730668500.0,[645657],
1,[20398357],[],PRJDA39275,Whole Genome Sequencing,DRP000002,"[{'id': 'PRJDA39275', 'namespace': 'BioProject...",,DRP000002,Model organism for prokaryotic cell differenti...,KEIO,...,2010-03-24T03:11:55,2009-08-04T07:37:05,1,1,1,598805064,8316737,598805100.0,[224308],
2,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000003,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000003,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:53:29,2009-08-06T07:54:04,1,1,9,7461074340,207252065,829008300.0,[9606],Although recent studies have revealed that the...
3,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000004,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000004,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:54:44,2009-08-06T07:55:04,1,1,3,1976458608,54901628,658819500.0,[9606],Although recent studies have revealed that the...
4,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000005,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000005,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:55:26,2009-08-06T07:55:24,1,1,3,1668749004,46354139,556249700.0,[9606],Although recent studies have revealed that the...
5,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000006,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000006,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:56:11,2009-08-06T07:55:39,1,1,3,1696349916,47120831,565450000.0,[9606],Although recent studies have revealed that the...
6,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000007,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000007,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:57:44,2009-08-06T07:55:32,1,2,2,1702328760,47286910,851164400.0,[9606],Although recent studies have revealed that the...
7,[20400770],[],PRJDA34559,Transcriptome Analysis,DRP000008,"[{'id': 'PRJDA34559', 'namespace': 'BioProject...",Comprehensive identification and characterizat...,DRP000008,Comprehensive identification and characterizat...,UT-MGS,...,2010-10-14T04:58:50,2009-08-06T07:55:45,1,2,2,1643442516,45651181,821721300.0,[9606],Although recent studies have revealed that the...
8,[22303444],[],PRJNA39577,Metagenomics,DRP000009,"[{'id': 'PRJNA39577', 'namespace': 'BioProject...",This is a metagenomic project to figure out th...,DRP000009,Subsurface mine microbial mat metagenome,JAMSTEC,...,2011-03-30T04:00:00,2009-08-21T03:01:03,1,1,3,259621154,1016012,86540380.0,[527640],This is a metagenomic project to figure out th...
9,[20423466],[],PRJDA39809,Whole Genome Sequencing,DRP000010,"[{'id': 'PRJDA39809', 'namespace': 'BioProject...",Oryza sativa has important syntenic relationsh...,DRP000010,Oryza sativa Japonica group genome sequencing ...,NIAS,...,2015-07-31T15:20:47,2009-08-21T06:09:02,1,1,9,8941218259,272124169,993468700.0,[39947],
