# Asynchronous queries on BioMart using `apybiomart` 


[BioMart](https://www.ensembl.org/biomart/martview) is a web-based tool that allows to retrieve and export datasets from Ensembl. It is one of the most useful resources for bioinformatics, because it makes easy to retrieve data and further analyse them or integrate them in downstream analyses. For these reasons, researchers have developed applications to perform BioMart queries from within R and Python programs, namely [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html) and [pybiomart](https://github.com/jrderuiter/pybiomart).  

I've been using both for different needs, and I was very happy with them. But lately I've been working on a database that needs to access many different online resources, and thus performs asynchronous requests (exploiting the great `asyncio` Python library). Since the database was developed in Python, I tried to use `pybiomart` for these tasks, but unfortunately it was not suitable to perform async calls, so I decided to modify it to better suit my needs, and that's when `apybiomart` was born!  

What this package does is simply modifying the original `pybiomart` behaviour to retain its basic capabilities, extending them to handle asynchronous requests to the BioMart service. As such, it is possible to perform multiple queries at the same time.  

## Installation  
Apybiomart **only supports Python 3**, and can be installed through `pip`:  
```bash
pip install apybiomart
```

## Usage  
After installation, apybiomart can be easily imported into Python:  

In [1]:
import apybiomart

Now it is possible to use its functions:  

* explore the list of available marts, datasets, attributes and filters using the synchronous `find_*()` functions;  
* perform a single synchronous query using the `query()` function;  
* perform multiple asynchronous queries using the `aquery()` function.  

### Marts, datasets, attributes and filters  

BioMart contains different databases, called *marts*, each of which in turn contains several *datasets*, each related to a particular species. These datasets can be queried and it is possible to restrict the amount of data returned to one or more particular types of information, namely *attributes*, and using *filters* that only retain data satisfying one or more specific criteria (more information on BioMart's [help page](https://www.ensembl.org/info/data/biomart/index.html).  

This tutorial will mostly follow the outline available in the official [biomaRt users guide](https://www.bioconductor.org/packages/devel/bioc/vignettes/biomaRt/inst/doc/biomaRt.html).  

#### Marts  

In order to view marts available on BioMart, use `find_marts()`; a dataframe with the available marts is returned, with their proper `name` and `display_name`:  

In [2]:
apybiomart.find_marts()

Unnamed: 0,name,display_name
0,ENSEMBL_MART_ENSEMBL,Ensembl Genes 96
1,ENSEMBL_MART_MOUSE,Mouse strains 96
2,ENSEMBL_MART_SEQUENCE,Sequence
3,ENSEMBL_MART_ONTOLOGY,Ontology
4,ENSEMBL_MART_GENOMIC,Genomic features 96
5,ENSEMBL_MART_SNP,Ensembl Variation 96
6,ENSEMBL_MART_FUNCGEN,Ensembl Regulation 96


#### Datasets  

Each BioMart database (or mart) can contain several different datasets, each referring to a particular species. The `find_datasets()` function allows to list all the available datasets for a given mart, which by default is "ENSEMBL_MART_ENSEMBL".  

In [2]:
apybiomart.find_datasets()

Unnamed: 0,name,display_name,mart
0,uamericanus_gene_ensembl,American black bear genes (ASM334442v1),ENSEMBL_MART_ENSEMBL
1,ngalili_gene_ensembl,Upper Galilee mountains blind mole rat genes (...,ENSEMBL_MART_ENSEMBL
2,oprinceps_gene_ensembl,Pika genes (OchPri2.0-Ens),ENSEMBL_MART_ENSEMBL
3,oanatinus_gene_ensembl,Platypus genes (OANA5),ENSEMBL_MART_ENSEMBL
4,malbus_gene_ensembl,Swamp eel genes (M_albus_1.0),ENSEMBL_MART_ENSEMBL
5,mnemestrina_gene_ensembl,Pig-tailed macaque genes (Mnem_1.0),ENSEMBL_MART_ENSEMBL
6,catys_gene_ensembl,Sooty mangabey genes (Caty_1.0),ENSEMBL_MART_ENSEMBL
7,elucius_gene_ensembl,Northern pike genes (Eluc_V3),ENSEMBL_MART_ENSEMBL
8,drerio_gene_ensembl,Zebrafish genes (GRCz11),ENSEMBL_MART_ENSEMBL
9,zalbicollis_gene_ensembl,White-throated sparrow genes (Zonotrichia_albi...,ENSEMBL_MART_ENSEMBL


#### Attributes  

When querying one of BioMart's datasets, attributes define specific values we might be interested in retrieving, such as gene symbols or chromosomal coordinates. The `find_attributes()` function displays all available attributes in a given dataset, which by default is "hsapiens_gene_ensembl":  

In [3]:
apybiomart.find_attributes()

Unnamed: 0,name,display_name,description,dataset
0,ensembl_gene_id,Gene stable ID,Stable ID of the Gene,hsapiens_gene_ensembl
1,ensembl_gene_id_version,Gene stable ID version,Versionned stable ID of the Gene,hsapiens_gene_ensembl
2,ensembl_transcript_id,Transcript stable ID,Stable ID of the Transcript,hsapiens_gene_ensembl
3,ensembl_transcript_id_version,Transcript stable ID version,Versionned stable ID of the Transcript,hsapiens_gene_ensembl
4,ensembl_peptide_id,Protein stable ID,,hsapiens_gene_ensembl
5,ensembl_peptide_id_version,Protein stable ID version,,hsapiens_gene_ensembl
6,ensembl_exon_id,Exon stable ID,,hsapiens_gene_ensembl
7,description,Gene description,,hsapiens_gene_ensembl
8,chromosome_name,Chromosome/scaffold name,Chromosome/scaffold name,hsapiens_gene_ensembl
9,start_position,Gene start (bp),Start Coordinate of the gene in chromosomal co...,hsapiens_gene_ensembl


#### Filters  

Filters, on the other hand, can be used to define a restriction on a query, such as retrieving only results from chromosome X. All available filters for a given dataset can be displayed using the `find_filters()` function, which by default will use the "hsapiens_gene_ensembl" dataset:  

In [4]:
apybiomart.find_filters()

Unnamed: 0,name,type,description,dataset
0,link_so_mini_closure,list,,hsapiens_gene_ensembl
1,link_go_closure,text,,hsapiens_gene_ensembl
2,link_ensembl_transcript_stable_id,text,,hsapiens_gene_ensembl
3,gene_id,text,,hsapiens_gene_ensembl
4,transcript_id,text,,hsapiens_gene_ensembl
5,link_ensembl_gene_id,text,Filter to include genes with supplied list of ...,hsapiens_gene_ensembl
6,chromosome_name,text,,hsapiens_gene_ensembl
7,start,text,Determine which base pair on the specified chr...,hsapiens_gene_ensembl
8,end,text,Determine which base pair on the specified chr...,hsapiens_gene_ensembl
9,band_start,drop_down_basic_filter,,hsapiens_gene_ensembl
