# Asynchronous queries on BioMart using `apybiomart` 


[BioMart](https://www.ensembl.org/biomart/martview) is a web-based tool that allows to retrieve and export datasets from Ensembl. It is one of the most useful resources for bioinformatics, because it makes easy to retrieve data and further analyse them or integrate them in downstream analyses. For these reasons, researchers have developed applications to perform BioMart queries from within R and Python programs, namely [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html) and [pybiomart](https://github.com/jrderuiter/pybiomart).  

I've been using both for different needs, and I was very happy with them. But lately I've been working on a database that needs to access many different online resources, and thus performs asynchronous requests (exploiting the great `asyncio` Python library). Since the database was developed in Python, I tried to use `pybiomart` for these tasks, but unfortunately it was not suitable to perform async calls, so I decided to modify it to better suit my needs, and that's when `apybiomart` was born!  

What this package does is simply modifying the original `pybiomart` behaviour to retain its basic capabilities, extending them to handle asynchronous requests to the BioMart service. As such, it is possible to perform multiple queries at the same time.  

## Installation  
Apybiomart **only supports Python 3**, and can be installed through `pip`:  
```bash
pip install apybiomart
```

## Usage  
After installation, apybiomart can be easily imported into Python:  

In [1]:
import apybiomart

Now it is possible to use its functions:  

* explore the list of available marts, datasets, attributes and filters using the synchronous `find_*()` functions;  
* perform a single synchronous query using the `query()` function;  
* perform multiple asynchronous queries using the `aquery()` function.  

### Marts, datasets, attributes and filters  

BioMart contains different databases, called *marts*, each of which in turn contains several *datasets*, each related to a particular species. These datasets can be queried and it is possible to restrict the amount of data returned to one or more particular types of information, namely *attributes*, and using *filters* that only retain data satisfying one or more specific criteria (more information on BioMart's [help page](https://www.ensembl.org/info/data/biomart/index.html).  

#### Marts  

In order to view marts available on BioMart, use `find_marts()`; a dataframe with the available marts is returned, with their proper `name` and `display_name`:  

In [2]:
apybiomart.find_marts()

Unnamed: 0,name,display_name
0,ENSEMBL_MART_ENSEMBL,Ensembl Genes 96
1,ENSEMBL_MART_MOUSE,Mouse strains 96
2,ENSEMBL_MART_SEQUENCE,Sequence
3,ENSEMBL_MART_ONTOLOGY,Ontology
4,ENSEMBL_MART_GENOMIC,Genomic features 96
5,ENSEMBL_MART_SNP,Ensembl Variation 96
6,ENSEMBL_MART_FUNCGEN,Ensembl Regulation 96
