<a href="https://colab.research.google.com/github/pythseq/2015/blob/master/python3_rest_course.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ensembl REST API course

Welcome to the Ensembl REST API course. This Jupyter notebook contains all you need to complete this course, including sample code, space to write your own code and sample answers, which can all be run within the notebook in Python.

To run the code within a cell, select it and hit Run.

## REST calls in the browser

Our first REST call will be the [ping call](http://rest.ensembl.org/documentation/info/ping) which checks your connection to the database:

[http://rest.ensembl.org/info/ping?content-type=application/json](http://rest.ensembl.org/info/ping?content-type=application/json)

Our next call will [get information about a gene](http://rest.ensembl.org/documentation/info/lookup), using the Ensembl gene ID:

[http://rest.ensembl.org/lookup/id/ENSG00000157764?content-type=application/json](http://rest.ensembl.org/lookup/id/ENSG00000157764?content-type=application/json)

## Exercises 1

The first set of exercises just use URLs in the browser. You cannot do these within the Python notebook, however we have provided a box where you can note them down.

1. Find an endpoint which you can use to lookup information about a gene using its symbol.
2. Create a URL to find information about the gene *ESPN* in human.
3. Expand your results to include information about transcripts.

Answers:

1. GET lookup/symbol/:species/:symbol

2.https://rest.ensembl.org/lookup/symbol/homo_sapiens/ESPN?;content-type=application/json 

3.https://rest.ensembl.org/lookup/symbol/homo_sapiens/ESPN?expand=1;content-type=application/json

# Making requests with Python

To make a request, you'll need to specify the server and extension, using the requests module.

In [None]:
import requests, sys

server = "http://rest.ensembl.org"
ext = "/lookup/id/ENSG00000157764"
 
r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

print (r)

Never assume that your request has worked. If it doesn't work, you should check the response code.

In [None]:
import requests, sys

server = "http://rest.ensembl.org"
ext = "/lookup/id/ENSG00000157764"
 
r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

if not r.ok:
    r.raise_for_status()


If you get responses in json (recommended), you can then decode them. I've also imported the pretty print (pprint) module from python, which makes my json easy to read. You'll find this useful during the exercises to see how the json looks.

In [None]:
import requests, sys, json
from pprint import pprint

server = "http://rest.ensembl.org"
ext = "/lookup/id/ENSG00000157764"
 
r = requests.get(server+ext, headers={ "Content-Type" : "application/json"})

if not r.ok:
    r.raise_for_status()

decoded = r.json()

pprint (decoded)

The helper function allows you to call the request, check the status and decode the json in a single line in your script. If you're using lots of REST calls in your script, creating the function at the beginning of your script will save you a lot of time.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Content-Type" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text


server = "http://rest.ensembl.org/"
ext = "lookup/id/ENSG00000157764?"
con = "application/json"
get_gene = fetch_endpoint(server, ext, con)

pprint (get_gene)

## Exercises 2

1. Write a script to **lookup** the gene called *ESPN* in human and print the results in json.

In [None]:
# Exercise 2.1

## Exercises 2 – answers

1. Write a script to **lookup** the gene called *ESPN* in human and print the results in json.

In [None]:
#!/usr/bin/env python

# Get modules needed for script
import sys, requests, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# define the gene name
gene_name = "ESPN"

# define the general URL parameters
server = "http://rest.ensembl.org/"

# define REST query to get the gene ID from the gene name
ext_get_lookup = "lookup/symbol/homo_sapiens/" + gene_name + "?"

# define the content type
con = "application/json"

# submit the query
get_lookup = fetch_endpoint(server, ext_get_lookup, con)

pprint (get_lookup)

# Using results

Since json is a dictionary, you can pull out a single datapoint using the key.

```
{
  "source": "ensembl_havana",
  "object_type": "Gene",
  "logic_name": "ensembl_havana_gene",
  "version": 12,
  "species": "homo_sapiens",
  "description": "B-Raf proto-oncogene, serine/threonine kinase [Source:HGNC Symbol;Acc:HGNC:1097]",
  "display_name": "BRAF",
  "assembly_name": "GRCh38",
  "biotype": "protein_coding",
  "end": 140924764,
  "seq_region_name": "7",
  "db_type": "core",
  "strand": -1,
  "id": "ENSG00000157764",
  "start": 140719327
}
```

We can add this to our previous script:

In [None]:
import requests, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text


server = "http://rest.ensembl.org/"
ext = "lookup/id/ENSG00000157764?"
con = "application/json"
get_gene = fetch_endpoint(server, ext, con)

symbol = get_gene['display_name']
print (symbol)

If the output of an endpoint contains a list, you can use a for loop to move through it.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# Run a query to get all the genes overlapping a locus.
# This will get a list as output.
locus = "17:63992802..64038237"
server = "http://rest.ensembl.org/"
ext_get_overlap = "overlap/region/human/" + locus + "?feature=gene";
con = "application/json"
get_overlap = fetch_endpoint(server, ext_get_overlap, con)

# Move through all the genes in the list and print details
for gene in get_overlap:
  print (gene['id'], gene['external_name'], sep=" ")

Where a dictionary contains a dictionary, you may need to layer up your keys to get the next level down.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# Get a gene tree using its ID
gene_tree = "ENSGT00390000003602"
server = "http://rest.ensembl.org/"
ext_gt = "genetree/id/" + gene_tree + "?"
con = "application/json"
get_gt = fetch_endpoint(server, ext_gt, con)

# The json looks like:
# {
#   "type": "gene tree",
#   "rooted": 1,
#   "tree": {
#     "taxonomy": {
#       "id": 7742,
#       "scientific_name": "Vertebrata",
#       "timetree_mya": 615,
#       "common_name": "Vertebrates"
# So we need to go down three levels to get the common name
taxonomy = get_gt['tree']['taxonomy']['common_name']

print(taxonomy)

## Exercises 3

1\. Write a script to lookup the gene called *ESPN* in human and print the stable ID of this gene.

In [None]:
# Exercise 3.1

2\. Get all variants that are associated with the phenotype 'Coffee consumption'. For each variant print

   a. the p-value for the association
   
   b. the PMID for the publication which describes the association between that variant and ‘Coffee consumption’
   
   c. the risk allele and the associated gene.

In [None]:
# Exercise 3.2

3\. Get the mouse homologue of the human BRCA2 and print the ID and sequence of both.

Note that the JSON for the endpoint you need is several layers deep, containing nested lists (appear as square brackets [ ] in the JSON) and key value sets (dictionary; appear as curly brackets { } in the JSON). Pretty print (pprint) comes in very useful here for the intermediate stage when you're trying to work out the json.

In [None]:
# Exercise 3.3

## Exercises 3 – answers

1\. Write a script to lookup the gene called *ESPN* in human and print the stable ID of this gene.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# Get the gene name from the command line
gene_name = "ESPN"

# define the general URL parameters
server = "http://rest.ensembl.org/"
con = "application/json"
ext_get_lookup = "lookup/symbol/homo_sapiens/" + gene_name + "?"

# submit the query
get_lookup = fetch_endpoint(server, ext_get_lookup, con)

print (get_lookup['id'])

2\. Get all variants that are associated with the phenotype 'Coffee consumption'. For each variant print:

   a. the p-value for the association
   
   b. the PMID for the publication which describes the association between that variant and ‘Coffee consumption’
   
   c. the risk allele and the associated gene.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

print ("Variant\tp-value\tPub-med ID\tRisk allele\tGene")

# define the general URL parameters
server = "http://rest.ensembl.org/"
ext_phen = "/phenotype/term/homo_sapiens/coffee consumption?"
con = "application/json"

# submit the query
get_phen = fetch_endpoint(server, ext_phen, con)

for variant in get_phen:
    id = variant['Variation']
    pv = str(variant['attributes'].get('p_value'))
    pmid = variant['attributes']['external_reference']
    risk = str(variant['attributes'].get('risk_allele'))
    gene = str(variant['attributes'].get('associated_gene'))
 
    print (id + "\t" + pv + "\t" + pmid + "\t" + risk + "\t" + gene)

3\. Get the mouse homologue of the human BRCA2 and print the ID and the aligned sequence of both.

Note that the JSON for the endpoint you need is several layers deep, containing nested lists (appear as square brackets [ ] in the JSON) and key value sets (appear as curly brackets { } in the JSON). Pretty print (pprint) comes in very useful here for the intermediate stage when you're trying to work out the json.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

gene = "BRCA2"

# define the general URL parameters
server = "http://rest.ensembl.org/"
ext_hom = "homology/symbol/human/" + gene + "?target_species=mouse"
con = "application/json"

get_hom = fetch_endpoint(server, ext_hom, con)

for datum in get_hom['data']:
    for homology in datum['homologies']:
        source_id = homology['source']['id']
        source_species = homology['source']['species']
        source_seq = homology['source']['align_seq']
        target_id = homology['target']['id']
        target_seq = homology['target']['align_seq']
        target_species = homology['target']['species']
        
        print (">", source_id + " " + source_species + "\n" + source_seq + "\n>", target_id + " " + target_species + "\n" + target_seq)

# Other content types

If you specify another content type (not json), the helper function will get you this as text
This can be used to get:
* Sequence in FASTA
* Gene trees and homologues in various formats
* Alignments

```
    if content_type == 'application/json':
        return r.json()
    else:
        return r.text
```
The [REST documentation](https://github.com/Ensembl/ensembl-rest/wiki/Output-formats) lists how you specify the output formats.

For example, to get genome features in BED, you need to specify the content type as `text/x-bed` not just `bed`:

In [None]:
import requests, sys, json

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

locus = "5:62797383..62927669"
server = "http://rest.ensembl.org/"
ext_get_bed = "overlap/region/human/" + locus + "?feature=repeat";

get_bed = fetch_endpoint(server, ext_get_bed, "text/x-bed")

# print the bed file
print (get_bed)

## Exercises 4 

1\. Get the gene tree predicted for the gene ENSG00000189221 in full nh format. 

In [None]:
# Exercise 4.1

2\. Get the sequence of the gene ENSG00000157764 in FASTA.

In [None]:
# Exercise 4.2

## Exercises 4 – answers 

1\. Get the gene tree predicted for the gene ENSG00000189221 in full nh format. 

In [None]:
import requests, sys, json

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

gene_id = "ENSG00000189221"

# define the general URL parameters
server = "http://rest.ensembl.org/"
ext_gt = "genetree/member/id/" + gene_id + "?nh_format=full;"
gt_content_type = "text/x-nh"
get_gt = fetch_endpoint(server, ext_gt, gt_content_type)

print (get_gt)

2\. Get the sequence of the gene ENSG00000157764 in FASTA.

In [None]:
import requests, sys, json

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

gene = "ENSG00000157764"
server = "http://rest.ensembl.org/"
ext_get_seq = "/sequence/id/" + gene + "?";

get_seq = fetch_endpoint(server, ext_get_seq, "text/x-fasta")

# print the gene name, ID and sequence
print (get_seq)

# Linking endpoints together

If you can pull a datapoint from the json, you can use it as input for another endpoint.

For example, this script gets the symbol of a gene then looks up xrefs associated with it:

In [None]:
import requests, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text


server = "http://rest.ensembl.org/"
gene_ext = "lookup/id/ENSG00000157764?"
con = "application/json"
get_gene = fetch_endpoint(server, gene_ext, con)

symbol = get_gene['display_name']

xrefs_ext = "xrefs/symbol/human/" + symbol + "?"
get_xrefs = fetch_endpoint(server, xrefs_ext, con)

pprint (get_xrefs)

## Linking to other REST APIs

You can use multiple REST APIs in a single script. The example below takes the output from an Ensembl REST API endpoint, getting the phenotype associated with a variant, then inputs the PubMed ID into the [Europe PMC REST API](https://europepmc.org/RestfulWebService) to get the full text of the paper in XML:

In [None]:
import requests, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

var = "rs632180"

# get the Ensembl server and extension
ens_serv = "http://rest.ensembl.org/"
ens_ext = "variation/human/" + var + "?phenotypes=1"
con = "application/json"

get_var = fetch_endpoint(ens_serv, ens_ext, con)

# move through the phenotypes (in this example there is only one but we still need this)
# and pull out the PubMed ID, stripping off the "PMID:"
for phen in get_var['phenotypes']:
  pmid = phen['study'].replace("PMID:", "")

  # get the EuropePMC server and search extension and run the endpoint 
  pmc_serv = "https://www.ebi.ac.uk/europepmc/webservices/rest/"
  pmc_search = "search?query=" + pmid + "&format=json"
  get_pmc = fetch_endpoint(pmc_serv, pmc_search, con)

  # since this was a search, this has given us a lot of results
  # we need to move through the list and use an if to find the one where the id matches our input
  # and pull out the pmcid
  for result in get_pmc['resultList']['result']:
    if pmid == result['pmid']:
      pmcid = result['pmcid']
       
      # we can use the pmcid as input for the full text XML endpoint
      # we need a new extension
      pmc_text = pmcid + "/fullTextXML"
      get_text = fetch_endpoint(pmc_serv, pmc_text, "application/xml")
      print(get_text)


## Exercises 5

1\. Using the script from 3.1, add a call to fetch and print the sequence for the gene *ESPN* in FASTA.

In [None]:
# Exercise 5.1

2\. Print the stable ID of any regulatory features that overlap the region 1000 bp upstream of the ESPN gene. (Hints: lookup the gene first to get the coordinates, then check the strand of the gene to see which way is upstream, and use this to create coordinates to query.)

In [None]:
# Exercise 5.2

## Exercises 5 - answers

1\. Using the script from 3.1, add a call to fetch and print the sequence for the gene *ESPN* in FASTA.

In [None]:
import requests, sys, json
from pprint import pprint


def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# Get the gene name
gene_name = "ESPN"

# define the general URL parameters
server = "http://rest.ensembl.org/"
con = "application/json"
ext_get_lookup = "lookup/symbol/homo_sapiens/" + gene_name + "?"

# submit the query
get_lookup = fetch_endpoint(server, ext_get_lookup, con)

# define the REST query to get the sequence from the gene
ext_get_seq = "/sequence/id/" + get_lookup['id'] + "?";
get_seq = fetch_endpoint(server, ext_get_seq, "text/x-fasta")

# print the gene name, ID and sequence
print (get_seq)

2\. Print the stable ID of any regulatory features that overlap the region 1000 bp upstream of the *ESPN* gene. (Hints: get the gene info first, then check the strand of the gene to see which way is upstream.)

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

server = "http://rest.ensembl.org/"
con = "application/json"

gene_name = "ESPN"
ext_get_lookup = "lookup/symbol/homo_sapiens/" + gene_name + "?"
get_lookup = fetch_endpoint(server, ext_get_lookup, con)

if get_lookup['strand'] == 1:
    locus = str(get_lookup['seq_region_name']) + ":" + str(get_lookup['start'] - 1000) + "-" + str(get_lookup['start'])

else:
    locus =  str(get_lookup['seq_region_name']) + ":" + str(get_lookup['end']) + "-" + str(get_lookup['end'] + 1000)

overlap_ext = "overlap/region/human/" + locus + "?feature=regulatory;"

get_overlap = fetch_endpoint(server, overlap_ext, con)

for rf in get_overlap:
    id = rf['id']
    print (id)

# Using POST

POST allows you to run a query with multiple inputs at once. The output will be a dictionary of dictionaries.

In [None]:
import requests, sys
from pprint import pprint

server = "http://rest.ensembl.org"
ext = "/lookup/id"
headers={ "Content-Type" : "application/json", "Accept" : "application/json"}
r = requests.post(server+ext, headers=headers, data='{ "ids" : ["ENSG00000157764", "ENSG00000248378" ] }')

# error checking removed for space
 
decoded = r.json()
pprint (decoded)


There is a helper function in POST. You can specify both helper functions in your script and use whichever one you need.

In [None]:
def fetch_endpoint_POST(server, request, data, content_type='application/json'):

    r = requests.post(server+request,
                      headers={ "Accept" : content_type},
                      data=data )

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

### Optional parameters

In order to add optional parameters to your POST query, you can just add them onto the extention with a slash. For example if you wanted to mask UTRs when running the [sequence_id_post](http://rest.ensembl.org/documentation/info/sequence_id_post) endpoint, you could specify your extension as:

        ext = "sequence/id/mask_feature=1"


### Input

Your input list for POST queries need to be a JSON list. You can create this from a list in Python using the [JSON module](https://docs.python.org/3/library/json.html):

        data = json.dumps({ "ids" : my_list })

### Output

The Output from POST queries will be a dictionary of dictionaries. To access items, you could use your input list as your keys, or you could move through the dictionary with:

		for key, value in post_query.items():

## Example

The following scripts inputs a list of variants in HGVS format into the VEP and gets out the IDs of known colocated variants, including failed variants (an optional parameter):

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

def fetch_endpoint_POST(server, request, data, content_type):

    r = requests.post(server+request,
                      headers={ "Accept" : content_type},
                      data=data )

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

# define the server, extension and content type
server = "http://rest.ensembl.org/"
con = "application/json"
vep_ext = "vep/homo_sapiens/hgvs/failed=1"

# create the list of HGVS annotations
hgvs = ["ENST00000366667.6:c.776T>C", "ENST00000335295.4:c.20A>T", "ENST00000415952.1:c.-149-34206G>T"]

# convert the list into json format
hgvs_json = json.dumps({ "hgvs_notations" : hgvs })

# run the query
post_vep = fetch_endpoint_POST(server, vep_ext, hgvs_json, con)

# move through the results
for variant in post_vep:
    
    # get the data
    input = variant['input']
    colocated_list = []
    for colocated in variant['colocated_variants']:
        colocated_list.append(colocated['id'])  
    print (input + ": " + (', '.join(colocated_list)))

## Exercises 6

1\. Fetch the all the transcripts of *ESPN* using the lookup endpoint. Fetch the cDNA sequences of all transcripts using a single POST request, and print in FASTA format.

In [None]:
# Exercise 6.1

2\. You have the following list of variants:
```rs1415919662, rs957333053, rs762944488, rs1372123943, rs553810871, rs1451237599, rs751376931```
Get the variant class, evidence attributes, source and the most_severe_consequence for all variants using the variation POST endpoint.

In [None]:
# Exercise 6.2

## Exercises 6 – answers

1\. Fetch the all the transcripts of *ESPN* using the lookup function. Fetch the cDNA sequences of all transcripts using a single POST request, and print in FASTA format.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

def fetch_endpoint_POST(server, request, data, content_type):

    r = requests.post(server+request,
                      headers={ "Accept" : content_type},
                      data=data )

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text
    
# Get the gene name
gene_name = "ESPN"
transcripts = []

# define the general URL parameters
server = "http://rest.ensembl.org/"
con = "application/json"
ext_get_gene = "lookup/symbol/homo_sapiens/" + gene_name + "?expand=1;"

get_gene = fetch_endpoint(server, ext_get_gene, con)

for transcript in get_gene['Transcript']:
    transcripts.append(transcript['id'])
 
data = json.dumps({ "ids" : transcripts })

ext_sequence = '/sequence/id?type=cdna'

sequences = fetch_endpoint_POST(server, ext_sequence, data, "text/x-fasta")

print(sequences)

# Alternatively:

#sequences = fetch_endpoint_POST(server, ext_sequence, data, con)
#for query in sequences:
#    print (">", query['id'])
#    print (query['seq'])

2\. You have the following list of variants:
```rs1415919662, rs957333053, rs762944488, rs1372123943, rs553810871, rs1451237599, rs751376931```
Get the variant class, evidence attributes, source and the most_severe_consequence for all variants using the variant POST endpoint.

In [None]:
import requests, sys, json
from pprint import pprint

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

def fetch_endpoint_POST(server, request, data, content_type):

    r = requests.post(server+request,
                      headers={ "Accept" : content_type},
                      data=data )

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

server = "http://rest.ensembl.org/"
con = "application/json"

variant_ids = ["rs1415919662", "rs957333053", "rs762944488", "rs1372123943", "rs553810871", "rs1451237599", "rs751376931"]

data = json.dumps({ "ids" : variant_ids })

var_ext = "variation/homo_sapiens"

post_variants = fetch_endpoint_POST(server, var_ext, data, con)

print ("ID\tClass\tEvidence\tSource\tMost severe consequence")

for key, value in post_variants.items():
 
    id = post_variants[key]['name']
    cls = post_variants[key]['var_class']
    evidence = post_variants[key]['evidence']
    source = post_variants[key]['source']
    severe = post_variants[key]['most_severe_consequence']
 
    print (id + "\t" + cls + "\t" + ", ".join(evidence) + "\t" + source + "\t" + severe)

# Rate limiting

Requests are rate limited to prevent a single user from monopolising the resources. We are allowed 55000 requests over an hour (3600 seconds): an average 15 requests per second. This is per **IP address** so another person in your organisation can limit your access to the REST API.

The number of requests remaining is shown in the headers.

In [None]:
import requests, sys
from pprint import pprint

server = "http://rest.ensembl.org/"
con = "application/json"
ext_ping = "/info/ping?"

ping = requests.get(server+ext_ping, headers={ "Accept" : con})

pprint(ping.headers)

We can pull this parameter out to see how it changes over time. In this example script we run ping repeatedly and get the number of queries remaining from the header:

In [None]:
import requests, sys

server = "http://rest.ensembl.org/"
con = "application/json"
ext_ping = "/info/ping?"

x = 0

while x < 25:

    # submit the query
    ping = requests.get(server+ext_ping, headers={ "Accept" : con})
    x += 1
    print ("count:", x, "status:", ping.status_code, "remaining:", ping.headers['X-RateLimit-Remaining'])

If we're using a the REST API in a long script or pipeline, we want to ensure we do not exceed our limits. To do this we can add wait steps to the helper function, dependent on the remaining limits. These version of the helper functions add in a 1s wait whenever the remaining queries get below 50000:

In [None]:
import requests, sys, time, json

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if int(r.headers['X-RateLimit-Remaining']) < 50000:
        time.sleep(1)

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

def fetch_endpoint_POST(server, request, data, content_type):

    r = requests.post(server+request,
                      headers={ "Accept" : content_type},
                      data=data )

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if int(r.headers['X-RateLimit-Remaining']) < 50000:
        time.sleep(1)

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text


Now, we can run our ping with these:

In [None]:
import requests, sys, time, json, datetime

def fetch_endpoint(server, request, content_type):

    r = requests.get(server+request, headers={ "Accept" : content_type})

    if not r.ok:
        r.raise_for_status()
        sys.exit()

    if int(r.headers['X-RateLimit-Remaining']) < 550000:
        time.sleep(1)

    if content_type == 'application/json':
        return r.json()
    else:
        return r.text

server = "http://rest.ensembl.org/"
con = "application/json"
ext_ping = "/info/ping?"

x = 0

while x < 25:
  ping = fetch_endpoint (server, ext_ping, con)
  print ("count:", x, ", time", datetime.datetime.now())
  x += 1

If you're doing anything on a large scale, such as moving through a long (maybe genome-wide) list or putting lots of queries together into a pipeline, we recommend using the version of helper functions with the wait step incorporated.