Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: BiomaRt? #11

Closed
zx8754 opened this issue Apr 19, 2017 · 1 comment
Closed

Question: BiomaRt? #11

zx8754 opened this issue Apr 19, 2017 · 1 comment
Labels

Comments

@zx8754
Copy link

zx8754 commented Apr 19, 2017

I have been using bioconductor's BiomaRt package recently.

How is this package different from BiomaRt?

@HajkD
Copy link
Member

HajkD commented Apr 19, 2017

Hi Tokhir,

Thank you very much for your question. I am always very happy to receive feedback on biomartr.

I tried to specify the differences between the established BiomaRt package and my new biomartr package and how biomartr extends the functionality of BiomaRt in the Functional Annotation Vignette -> https://github.com/HajkD/biomartr/blob/master/vignettes/Functional_Annotation.Rmd .

The main difference between BiomaRt and biomartr for functional annotation retrieval is that my package biomartr allows users to screen available marts using only the scientific name of an organism of interest instead of first searching for marts and datasets which support a particular organism of interest (which is required when using the BiomaRt package). Furthermore, biomartr allows you to search for particular topics when searching for attributes and filters. You can also find an example workflow using biomartr here -> #5 .

To give you a short example:

Imagine you are interested in the plant model Arabidopsis thaliana or the fungi model Saccharomyces cerevisiae. I usually cannot recall from the back of my head which marts or datasets I have to specify when I want to retrieve functional annotation data for those species. So whenever I used the BiomaRt package in the past I had to google which marts or databases I had to specify to retrieve data for those organisms.

Since this googling part might be feasible for some species, it becomes impossible to automate when using dozens or hundreds of species. For this reason, I implemented the organism*() functions in biomartr to allow users to automate the mart and database retrieval process for particular organisms of interest.

Here are the examples for Arabidopsis thaliana and Saccharomyces cerevisiae.

# retrieve available marts and datasets for Arabidopsis thaliana
Ath <- biomartr::organismBM(organism = "Arabidopsis thaliana")
# look at results
Ath[ , c("mart", "dataset", "version")] 
               mart                dataset version
              <chr>                  <chr>   <chr>
1       plants_mart      athaliana_eg_gene  TAIR10
2 plants_variations       athaliana_eg_snp  TAIR10
3 plants_variations athaliana_eg_structvar  TAIR10

Thus, thanks to the biomartr::organismBM() function we now know that for Arabidopsis thaliana there are 2 marts and 3 datasets available.

For Saccharomyces cerevisiae:

# retrieve available marts and datasets for Saccharomyces cerevisiae
Scerevisiae <- biomartr::organismBM(organism = "Saccharomyces cerevisiae")
# look at results
Scerevisiae[ , c("mart", "dataset", "version")] 
                   mart                      dataset version
                  <chr>                        <chr>   <chr>
1  ENSEMBL_MART_ENSEMBL     scerevisiae_gene_ensembl R64-1-1
2 ENSEMBL_MART_SEQUENCE scerevisiae_genomic_sequence R64-1-1
3      ENSEMBL_MART_SNP              scerevisiae_snp R64-1-1
4           fungal_mart          scerevisiae_eg_gene R64-1-1
5     fungal_variations           scerevisiae_eg_snp R64-1-1

Here we see that there are 5 different marts and 5 different datasets available for Saccharomyces cerevisiae.

As you can see, thanks to the specification by scientific name, this process of mart and dataset retrieval can be automated for many species.

Analogously, the function biomartr::organismAttributes() allows you to retrieve all attributes that are available for e.g. Arabidopsis thaliana.

In a common scenario, users wish to map gene ids of particular genes or proteins between database ids. For this purpose, biomartr::organismAttributes() has the topic argument that allows you to retrieve all id related attributes for e.g. Arabidopsis thaliana.

# retrieve all id related attributes for Arabidopsis thaliana
biomartr::organismAttributes("Arabidopsis thaliana", topic = "id")
                    name              description           dataset        mart
                   <chr>                    <chr>             <chr>       <chr>
1        ensembl_gene_id           Gene stable ID athaliana_eg_gene plants_mart
2  ensembl_transcript_id     Transcript stable ID athaliana_eg_gene plants_mart
3     ensembl_peptide_id        Protein stable ID athaliana_eg_gene plants_mart
4        ensembl_exon_id           Exon stable ID athaliana_eg_gene plants_mart
5      study_external_id Study external reference athaliana_eg_gene plants_mart
6                  go_id        GO term accession athaliana_eg_gene plants_mart
7                  po_id   PO term accession (bp) athaliana_eg_gene plants_mart
8             protein_id         INSDC protein ID athaliana_eg_gene plants_mart
9             mirbase_id               miRBase ID athaliana_eg_gene plants_mart
10          nasc_gene_id             NASC Gene ID athaliana_eg_gene plants_mart
# ... with 442 more rows

This way, users can retrieve all available id mapping attributes only by specifying the scientific name.

Furthermore, in addition to the functionality provided by the BiomaRt package, biomartr provides extensive functionality for genome, proteome, metagenome, gff, etc file retrieval for thousands of organisms.

I hope I could answer your question?

Best wishes,
Hajk

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants