# Mouse Phenotype API

Example queries and simple data processing for the [mousephenotype.org](mousephenotype.org) API.

## Endpoints

- `/genes`: Gene search endpoint
- `/geneBundles`: Gene bundled data endpoint

## Example queries

### Setting things up

In [None]:
import requests
from IPython.display import HTML, display
import tabulate
from tqdm.notebook import trange

impc_api_url = "http://localhost:8080"
impc_api_search_url = f"{impc_api_url}/genes"
impc_api_gene_bundle_url = f"{impc_api_url}/geneBundles"

### 1. Extract all measured phenotypes related to this gene
Using [Cib2 - MGI:1929293](https://www.mousephenotype.org/data/genes/MGI:1929293)
> Hint 💡: For any query that relies directly on an MGI Accession ID you can use the `/geneBundles` endpoint directly.

In [None]:
mgi_accession_id = "MGI:1929293"

# https://www.gentar.org/impc-dev-api/geneBundles/MGI:1929293
gene_bundle_url = f"{impc_api_gene_bundle_url}/{mgi_accession_id}"
gene_bundle =  requests.get(gene_bundle_url).json()

### 2. Extract all genes related to a phenotype **and** 3. Extract all genes having a particular phenotype or a set of phenotypes (e.g. relevant to a disease)

Using increased basophil cell number ([MP:0002606](https://www.mousephenotype.org/data/phenotypes/MP:0002606)) and increased circulating cholesterol level
 [MP:0005178](https://www.mousephenotype.org/data/phenotypes/MP:0005178))
> Hint 💡: For any query that needs to perform a search you'll need to hit the `/genes` first and get the bundles URLs from the response to get the actual data.

In [None]:
target_mp_terms = ['MP:0002606', 'MP:0005178']

## All the data is paginated using the page and size parameters, by default the endpoint returns the first 20 hits
gene_by_phenotypes_query = f"{impc_api_search_url}/search/findAllBySignificantMpTermIdsContains?mpTermIds={','.join(target_mp_terms)}&page=0&size=20"
genes_with_clinical_chemistry_phenotypes = requests.get(gene_by_phenotypes_query).json()
print(f"Genes with {target_mp_terms}: {genes_with_clinical_chemistry_phenotypes['page']['totalElements']}")
list_of_genes = []

for gene in genes_with_clinical_chemistry_phenotypes['_embedded']['genes']:
    gene_dict = {"gene_accession_id": gene['mgiAccessionId'], "gene_name": gene['markerName'], "gene_bundle_url": gene["_links"]["geneBundle"]['href']}
    list_of_genes.append(gene_dict)

display(HTML(tabulate.tabulate([i.values() for i in list_of_genes], headers=list_of_genes[0].keys(), tablefmt='html')))

Genes with ['MP:0002606', 'MP:0005178']: 278


gene_accession_id,gene_name,gene_bundle_url
MGI:108086,histone deacetylase 1,http://localhost:8080/geneBundles/MGI:108086
MGI:108404,"amyloid beta (A4) precursor protein-binding, family B, member 3",http://localhost:8080/geneBundles/MGI:108404
MGI:102784,transforming growth factor beta 1 induced transcript 1,http://localhost:8080/geneBundles/MGI:102784
MGI:105985,a disintegrin and metallopeptidase domain 26A (testase 3),http://localhost:8080/geneBundles/MGI:105985
MGI:1914743,methyltransferase like 16,http://localhost:8080/geneBundles/MGI:1914743
MGI:107567,immunity-related GTPase family M member 1,http://localhost:8080/geneBundles/MGI:107567
MGI:109128,"hepatic nuclear factor 4, alpha",http://localhost:8080/geneBundles/MGI:109128
MGI:1917084,"RAB43, member RAS oncogene family",http://localhost:8080/geneBundles/MGI:1917084
MGI:107628,"PR domain containing 2, with ZNF domain",http://localhost:8080/geneBundles/MGI:107628
MGI:1346879,mitogen-activated protein kinase kinase kinase 10,http://localhost:8080/geneBundles/MGI:1346879


### 4. Extract all phenotypes which are present in a particular gene set (e.g. genes together in a pathway)
Using [MGI:2444773](https://www.mousephenotype.org/data/genes/MGI:2444773), [MGI:1351500](https://www.mousephenotype.org/data/genes/MGI:1351500), [MGI:2157522](https://www.mousephenotype.org/data/genes/MGI:2157522), [MGI:2141861](https://www.mousephenotype.org/data/genes/MGI:2141861), [MGI:3588194](https://www.mousephenotype.org/data/genes/MGI:3588194), [MGI:1918313](https://www.mousephenotype.org/data/genes/MGI:1918313), [MGI:2444431](https://www.mousephenotype.org/data/genes/MGI:2444431), [MGI:1913658](https://www.mousephenotype.org/data/genes/MGI:1913658), [MGI:1922354](https://www.mousephenotype.org/data/genes/MGI:1922354), [MGI:1917336](https://www.mousephenotype.org/data/genes/MGI:1917336).
> Hint 💡: The light-weight `/genes` endpoint contains all the searchable fields, if you don't need any extra data there is no need to use the heavy-weight `/geneBundles` endpoint.

In [None]:
target_genes = ['MGI:2444773', 'MGI:2444773', 'MGI:2157522', 'MGI:2141861', 'MGI:3588194', 'MGI:1918313', 'MGI:2444431', 'MGI:1913658', 'MGI:1922354', 'MGI:1917336']

genes_in_gene_list_query = f"{impc_api_search_url}/search/findAllByMgiAccessionIdIn?mgiAccessionIds={','.join(target_genes)}"

genes_in_gene_list = requests.get(genes_in_gene_list_query).json()
list_of_mp_terms_vs_gene_index = {}

for gene in genes_in_gene_list['_embedded']['genes']:
    mp_terms = gene['significantMpTerms']
    gene_acc_id = gene["mgiAccessionId"]
    if mp_terms is None:
        continue
    for mp_term in mp_terms:
        mp_term_name = mp_term['mpTermName']
        if mp_term_name not in list_of_mp_terms_vs_gene_index:
            list_of_mp_terms_vs_gene_index[mp_term_name] = {"mp_term": mp_term_name, "genes": []}
        list_of_mp_terms_vs_gene_index[mp_term_name]["genes"].append(gene_acc_id)
genes_by_mp_term = list(list_of_mp_terms_vs_gene_index.values())
display(HTML(tabulate.tabulate([i.values() for i in genes_by_mp_term], headers=genes_by_mp_term[0].keys(), tablefmt='html')))

mp_term,genes
decreased bone mineral density,"['MGI:2444773', 'MGI:3588194', 'MGI:1913658']"
decreased bone mineral content,"['MGI:2444773', 'MGI:3588194', 'MGI:1913658']"
abnormal bone structure,['MGI:3588194']
impaired glucose tolerance,['MGI:3588194']
increased effector memory T-helper cell number,['MGI:3588194']
male infertility,['MGI:3588194']
decreased lean body mass,['MGI:3588194']
short tibia,['MGI:3588194']
abnormal rib morphology,['MGI:3588194']
defective growth and differentiation process,['MGI:3588194']


## 7. Extract images with a particular phenotype or a set of phenotypes and 8. How many images are available with a particular phenotype (Way to access: Phenotype2Gene2Images)
> Warning ⚠️: The IMPC data has not direct relationship between images and phenotypes, but it is possible to get all the images related to all the genes that have a significant hit for a given phenotype.
> Hint 💡: The images live inside each individual gene-bundle under the field `geneImages`. The easiest way to query images by phenotype is first hitting the `genes` endpoint to
Using *abnormal femur morphology* ([MP:0000559](https://www.mousephenotype.org/data/phenotypes/MP:0000559)) and abnormal digit morphology
 [MP:0002110](https://www.mousephenotype.org/data/phenotypes/MP:0002110))

### First let's get the genes from the light-weight `/genes` endpoint:

In [None]:
target_mp_terms = ['MP:0002110', 'MP:0000559']

## All the data is paginated using the page and size parameters, by default the endpoint returns the first 20 hits
gene_by_phenotypes_query = f"{impc_api_search_url}/search/findAllBySignificantMpTermIdsContains?mpTermIds={','.join(target_mp_terms)}&page=0&size=20"
genes_with_morphology_mps = requests.get(gene_by_phenotypes_query).json()
print(f"Genes with {target_mp_terms}: {genes_with_morphology_mps['page']['totalElements']}")
list_of_gene_bundle_urls = [gene["_links"]["geneBundle"]['href'] for gene in genes_with_morphology_mps['_embedded']['genes']]

Genes with ['MP:0002110', 'MP:0000559']: 57


### Now let's get the bundles from the heavy-weight `/geneBundles` endpoint (this may take a while):

In [None]:
gene_bundles = []
for gene_bundle_url in list_of_gene_bundle_urls:
    gene_bundle = requests.get(gene_bundle_url).json()
    gene_bundles.append(gene_bundle)

images_with_morphology_mps = []

## Doing just the first 20 and filtering out fields on the images
display_fields = ['geneSymbol', 'parameterName', 'biologicalSampleGroup', 'colonyId', 'zygosity', 'sex', 'downloadUrl', 'externalSampleId', 'thumbnailUrl']
for gene_bundle in gene_bundles[:20]:
    if "geneImages" in gene_bundle and gene_bundle["geneImages"] is not None:
        images = gene_bundle["geneImages"]
        for image in images:
            display_image = {k:v for k,v in image.items() if k in display_fields}
            images_with_morphology_mps.append(display_image)

images_table = []

## Displaying just the first 20 images
for i in images_with_morphology_mps[:20]:
    row = [f"<img src='{i['thumbnailUrl']}' />"] + list(i.values())
    images_table.append(row)

display(HTML(tabulate.tabulate(images_table, headers=["thumbnail"] + list(images_with_morphology_mps[0].keys()) , tablefmt='unsafehtml')))

KeyboardInterrupt: 

### 9. Which parameters have been measured for a particular gene

In [None]:
# Using Cib2 MGI:1929293
cib2_gene_query = f"{impc_api_search_url}/search/getGeneByMgiAccessionId?mgiAccessionId=MGI:1929293"
cib2_gene_data = requests.get(cib2_gene_query).json()
cib2_gene_parameters = cib2_gene_data["testedParameters"]
headers = {'parameterName': 'Parameter name', 'parameterStableId': 'Parameter stable id', 'pipelineName': 'Pipeline name', 'pipelineStableId': 'Pipeline stable id', 'procedureName': 'Procedure name', 'procedureStableId': 'Procedure stable id'}
print('Tested parameters for Cib2')
display(HTML(tabulate.tabulate(cib2_gene_parameters, headers=headers, tablefmt='unsafehtml')))

### 10. Which parameters identified a significant finding for a particular knockout

### 11. How many gene have been measured inside a particular pipeline

In [None]:
# Using IMPC_001 -> IMPC Standard Early Adult Pipeline
genes_by_tested_pipeline_query = f"{impc_api_search_url}/search/findAllByTestedPipelineId?pipelineId=IMPC_001&size=0"
impc_001_tested_genes_req = requests.get(genes_by_tested_pipeline_query, headers={"Accept": "application/json"})
print(f'Total measured genes for IMPC_001: {impc_001_tested_genes_req.json()["page"]["totalElements"]}')


### 12. Extract all genes and corresponding phenotypes related to a particular organ system (via MP terms from organic systems; similar search as significant phenotypes)

In [None]:
# using MP:0005391 (vision/eye phenotype)
target_mp_terms = ['MP:0005391']

## All the data is paginated using the page and size parameters, by default the endpoint returns the first 20 hits
gene_by_top_level_phenotypes_query = f"{impc_api_search_url}/search/findAllBySignificantTopLevelMpTermIdsContains?mpTermIds={','.join(target_mp_terms)}&page=0&size=20"

genes_with_vision_eye_phenotypes = requests.get(gene_by_top_level_phenotypes_query).json()
print(f"Genes with {target_mp_terms}: {genes_with_vision_eye_phenotypes['page']['totalElements']}")
list_of_genes = []

for gene in genes_with_vision_eye_phenotypes['_embedded']['genes']:
    gene_dict = {"gene_accession_id": gene['mgiAccessionId'], "gene_name": gene['markerName'], "gene_bundle_url": gene["_links"]["geneBundle"]['href']}
    list_of_genes.append(gene_dict)

display(HTML(tabulate.tabulate([i.values() for i in list_of_genes], headers=list_of_genes[0].keys(), tablefmt='html')))

### 13. Full table of genes and all identified phenotypes

In [None]:
page_size = 1000
complete_list_gene_page_info_query = f"{impc_api_search_url}?size={page_size}&page=0"

complete_list_gene_first_page_info = requests.get(complete_list_gene_page_info_query).json()

total_pages = complete_list_gene_first_page_info["page"]["totalPages"]
complete_gene_list = complete_list_gene_first_page_info["_embedded"]

for page_number in trange(1, total_pages):
    page_query = f"{impc_api_search_url}?size={page_size}&page={page_number}"
    page_genes = requests.get(page_query).json()["_embedded"]
    complete_gene_list += page_genes

print(len(complete_gene_list))