# Use Cases

## Retrieve a single entry

We are going to start from scratch with a nuts-and-bolts approach to accessing _OmicIDX_ data. 
For this first example, let us assume that we know the accession number of an SRA study of interest.
In this case, the example accession is `SRP012682`. Most folks will not recognize the significance
of a specific SRA accession, but they can be broken down into components. In particular, the first three
letters tell us the source of the accession as well as the type. 

The first letter will be an `S`, an `E`, or a `D` and signify that the accession originates with NCBI (S) in the US, 
EBI (E) in Europe, or DDBJ (D) in Japan, respectively. 

The third letter signifies the type of record in the SRA database. 

- `P` for study (or project)
- `S` for sample
- `X` for experiment
- `R` for run

So, the accession, `SRP012682` means that we are going to be fetching information about an SRA study (`P`) from a 
project that was originally submitted to NCBI (`S`) in the US.

We will be using the python `requests` library to access the _OmicIDX_ API. As a first step, be sure to install the 
library:

```bash
pip install requests
```

Once installed, we can proceed with this introduction.

In [2]:
import requests

In general, API requests will include at least the `host` part of a URL and the `path` part. In this case, an additional component
of the API call is a *known* accession. Here, I assign these separate components to 

In [20]:
host = 'http://api.omicidx.cancerdatasci.org'
path = '/sra/study/'
accession = 'SRP012682'
url = host + path + accession
print(url)

http://api.omicidx.cancerdatasci.org/sra/study/SRP012682


Now, we can use the `requests.get()` function to retrieve the contents of the URL.

In [14]:
response = requests.get(url)
type(response)

requests.models.Response

We can check to see if the query "worked" by checking the `status code`.

In [15]:
response.status_code

200

A status code of `200` means that the API call was successful. We can also look at a few details about the response by examing the response `headers`. 

In [17]:
response.headers

{'date': 'Sun, 08 Dec 2019 19:55:47 GMT', 'server': 'uvicorn', 'content-length': '1736', 'content-type': 'application/json'}

In particular, we can note that the `content-type` is `application/json`. The content type header, if set appropriately by the server, 
tells us how to deal with the content. In this case, the content type is `json`. The python requests library includes a convenience 
function, `Response.json()`, that returns a python `dict` after parsing the json. Perhaps it is easier to understand by looking at an 
example.

In [18]:
study_record = response.json()

In [19]:
study_record

{'pubmed_ids': [],
 'attributes': [{'value': 'PRJNA75897', 'tag': 'parent_bioproject'}],
 'BioProject': 'PRJNA75899',
 'study_type': 'Other',
 'alias': 'phs000424',
 'identifiers': [{'id': 'PRJNA75899', 'namespace': 'BioProject'},
  {'id': 'phs000424', 'namespace': 'dbGaP'},
  {'id': 'phs000424', 'namespace': 'dbGaP'}],
 'abstract': 'Lay Description.  The aim of the Genotype-Tissue Expression (GTEx) Project is to increase our understanding of how changes in our genes affect human health and disease with the ultimate goal of improving health care for future generations. GTEx will create a database that researchers can use to study how inherited changes in genes lead to common diseases.  GTEx researchers are studying genes in different tissues obtained from many different people. The GTEx project also includes a study of the GTEx donor consent process - this study will help ensure that the consent process and other aspects of the project effectively address the concerns and expectations 

So, the record that we grabbed from the API is the record for the GTeX project. See the `title` key for "proof".

In [21]:
study_record['title']

'Genotype-Tissue Expression (GTEx) Common Fund Project'

The API documentation for the "endpoint" we just used to fetch a specific study is [here](http://api.omicidx.cancerdatasci.org/docs#/SRA/get_study_accession_sra_study__accession__get).

Replacing the `study` part of the URL with `sample`, `experiment`, or `run` will perform the same accession
lookup for those types of accessions. For example, we fetch information about a sample by doing:

In [24]:
accession = 'SRS000237'
path = '/sra/sample/'
url = host + path + accession
print(url)

http://api.omicidx.cancerdatasci.org/sra/sample/SRS000237


In [27]:
response = requests.get(url)
sample_record = response.json()

In [28]:
sample_record

{'attributes': [{'value': 'A6', 'tag': 'strain'},
  {'value': 'type strain of Pseudarthrobacter chlorophenolicus',
   'tag': 'type-material'}],
 'xrefs': [],
 'identifiers': [{'id': 'SAMN00000051', 'namespace': 'BioSample'},
  {'id': 'FOOW_sample', 'namespace': 'DOE Joint Genome Institute'}],
 'status': 'live',
 'insdc': True,
 'organism': 'Pseudarthrobacter chlorophenolicus A6',
 'accession': 'SRS000237',
 'description': 'Arthrobacter chlorophenolicus A6 isolate',
 'BioSample': 'SAMN00000051',
 'alias': 'FOOW_sample',
 'taxon_id': '452863',
 'title': 'Generic sample from Arthrobacter chlorophenolicus A6',
 'lastupdate': '2019-06-20T20:14:28',
 'published': '2008-04-04T12:45:00',
 'received': '2008-04-04T12:45:00',
 'study': {'pubmed_ids': [],
  'attributes': [],
  'BioProject': 'PRJNA20011',
  'study_type': 'Whole Genome Sequencing',
  'alias': 'FOOW_study',
  'identifiers': [{'id': 'PRJNA20011', 'namespace': 'BioProject'},
   {'id': 'FOOW_study', 'namespace': 'JGI'}],
  'abstract': '

## Finding related records

If we wanted to find all related samples in the GTeX project, we can use another type of API call to fetch records related to a parent 
record. See the [API documentation](http://api.omicidx.cancerdatasci.org/docs#/SRA/get_study_samples_sra_study__accession__samples_get) for details of the next API call.

In [31]:
study_accession = 'SRP012682'
path = f'/sra/study/{study_accession}/samples'
url = host + path
print(url)

http://api.omicidx.cancerdatasci.org/sra/study/SRP012682/samples


In [32]:
response = requests.get(url)
first_ten_gtex_samples = response.json()

In [33]:
len(first_ten_gtex_samples['hits'])

10

In [41]:
first_ten_gtex_samples['hits'][0]

{'attributes': [{'value': 'phs000424', 'tag': 'gap_accession'},
  {'value': 'GTEx', 'tag': 'submitter handle'},
  {'value': 'GTEx', 'tag': 'biospecimen repository'},
  {'value': 'Genotype-Tissue Expression (GTEx)', 'tag': 'study name'},
  {'value': 'Cross-Sectional', 'tag': 'study design'},
  {'value': 'GTEX-WEY5-1826-SM-5CHRT_rep2',
   'tag': 'biospecimen repository sample id'},
  {'value': 'GTEX-WEY5-1826-SM-5CHRT_rep2', 'tag': 'submitted sample id'},
  {'value': 'GTEX-WEY5', 'tag': 'submitted subject id'},
  {'value': '1664162', 'tag': 'gap_sample_id'},
  {'value': '706583', 'tag': 'gap_subject_id'},
  {'value': 'female', 'tag': 'sex'},
  {'value': 'Skin - Sun Exposed (Lower leg)', 'tag': 'body site'},
  {'value': 'Skin', 'tag': 'histological type'},
  {'value': 'RNA:Total RNA', 'tag': 'analyte type'},
  {'value': 'No', 'tag': 'is tumor'},
  {'value': '1', 'tag': 'gap_consent_code'},
  {'value': 'GRU', 'tag': 'gap_consent_short_name'}],
 'xrefs': [],
 'identifiers': [{'id': 'SAMN038

In [35]:
first_ten_gtex_samples['stats'] # max reported "total" currently 10000, but I consider this a bug and will fix

{'total': 10000, 'took': 512}

In [40]:
for hit in first_ten_gtex_samples['hits']:
    print('accession: ', hit['accession'], '\n', '  title: ', hit['title'])

accession:  SRS1017133 
   title:  Non-tumor RNA:Total RNA sample from Skin - Sun Exposed (Lower leg) of a human female participant in the dbGaP study "Genotype-Tissue Expression (GTEx)"
accession:  SRS1017134 
   title:  Non-tumor RNA:Total RNA sample from Testis of a human male participant in the dbGaP study "Genotype-Tissue Expression (GTEx)"
accession:  SRS1017135 
   title:  Non-tumor RNA:Total RNA sample from Thyroid of a human male participant in the dbGaP study "Genotype-Tissue Expression (GTEx)"
accession:  SRS1017136 
   title:  Non-tumor RNA:Total RNA sample from Thyroid of a human male participant in the dbGaP study "Genotype-Tissue Expression (GTEx)"
accession:  SRS1017137 
   title:  Non-tumor RNA:Total RNA sample from Stomach of a human male participant in the dbGaP study "Genotype-Tissue Expression (GTEx)"
accession:  SRS1017138 
   title:  Non-tumor RNA:Total RNA sample from Esophagus - Mucosa of a human male participant in the dbGaP study "Genotype-Tissue Expression (

## Searching

[TODO] in more detail

See [search documentation](http://api.omicidx.cancerdatasci.org/docs#/Search) for details.


In [42]:
path = '/sra/studies/search'
url = host + path

In [47]:
response = requests.get(url, params = {"q":'cancer AND title:breast'}) # note that title of "AND" matters

In [48]:
study_search_results = response.json()
print(study_search_results['stats'])
for hit in study_search_results['hits']:
    print(f"accession: {hit['accession']}\n  title: {hit['title']}")

{'total': 786, 'took': 3.0}
accession: DRP000030
  title: human epigenomics sequencing project of breast cancer and normal cell lines
accession: DRP003592
  title: Effects of 4T1 breast cancer on normal organ gene expressions
accession: DRP003950
  title: Time-course mRNA expression analysis of human breast cancer MCF-7 cells treated with tamoxifen up to 12 weeks
accession: DRP005227
  title: Elucidation of the genome-wide chromatin interactions for therapy-resistance of ER positive breast cancer
accession: DRP005235
  title: Elucidation of chromatin interactions for hormone therapy- resistance of ER positive breast cancer
accession: ERP000258
  title: Mouse Breast Cancer
accession: ERP000380
  title: ChIP-seq for FOXA1, ER and CTCF in breast cancer cell lines
accession: ERP000604
  title: Transcriptome analysis of MCF-7 breast cancer cell population to reveal the transcriptional diversity at the single cell level
accession: ERP001755
  title: Whole exome sequencing suggests much of no

## Basic analytics


See [search documentation](http://api.omicidx.cancerdatasci.org/docs#/Search) for details
[TODO]