Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENSEMBLE 88 #107

Closed
AR-Shicheng opened this issue Jul 18, 2024 · 5 comments
Closed

ENSEMBLE 88 #107

AR-Shicheng opened this issue Jul 18, 2024 · 5 comments

Comments

@AR-Shicheng
Copy link

Hi Mike,

I am wondering why only ENSEMBLE 88 is missed from BiomaRt package?

Thanks,

Shicheng

@AR-Shicheng
Copy link
Author

Pretty interesting, the most important ENSEMBLE 88 is missing.

> library(biomaRt)
Warning: program compiled against libxml 210 using older 209
>
> listEnsembl()

        biomart                version
1         genes      Ensembl Genes 112
2 mouse_strains      Mouse strains 112
3          snps  Ensembl Variation 112
4    regulation Ensembl Regulation 112
>
> listEnsemblArchives()
             name     date                                 url version
1  Ensembl GRCh37 Feb 2014          https://grch37.ensembl.org  GRCh37
2     Ensembl 112 May 2024 https://may2024.archive.ensembl.org     112
3     Ensembl 111 Jan 2024 https://jan2024.archive.ensembl.org     111
4     Ensembl 110 Jul 2023 https://jul2023.archive.ensembl.org     110
5     Ensembl 109 Feb 2023 https://feb2023.archive.ensembl.org     109
6     Ensembl 108 Oct 2022 https://oct2022.archive.ensembl.org     108
7     Ensembl 107 Jul 2022 https://jul2022.archive.ensembl.org     107
8     Ensembl 106 Apr 2022 https://apr2022.archive.ensembl.org     106
9     Ensembl 105 Dec 2021 https://dec2021.archive.ensembl.org     105
10    Ensembl 104 May 2021 https://may2021.archive.ensembl.org     104
11    Ensembl 103 Feb 2021 https://feb2021.archive.ensembl.org     103
12    Ensembl 102 Nov 2020 https://nov2020.archive.ensembl.org     102
13    Ensembl 101 Aug 2020 https://aug2020.archive.ensembl.org     101
14    Ensembl 100 Apr 2020 https://apr2020.archive.ensembl.org     100
15     Ensembl 99 Jan 2020 https://jan2020.archive.ensembl.org      99
16     Ensembl 98 Sep 2019 https://sep2019.archive.ensembl.org      98
17     Ensembl 97 Jul 2019 https://jul2019.archive.ensembl.org      97
18     Ensembl 80 May 2015 https://may2015.archive.ensembl.org      80
19     Ensembl 77 Oct 2014 https://oct2014.archive.ensembl.org      77
20     Ensembl 75 Feb 2014 https://feb2014.archive.ensembl.org      75
21     Ensembl 54 May 2009 https://may2009.archive.ensembl.org      54
   current_release

@AR-Shicheng
Copy link
Author

Is there a way for us to build ENSEMBLE 88 ourselves? If so, how can we do it?

@grimbough
Copy link
Owner

Ensembl keeps each release available for 5 years. A few selected releases are retained for longer, but in most cases once 5 years has passed it is deemed out of date and removed. Ensembl 88 is from May 2017 and was removes ~ 2 years ago. There are some more details on the archive policies at https://www.ensembl.org/info/website/archives/index.html

biomaRt is only an interface to query to databases Ensembl makes available, and so you can't access release 88.

In theory you could potentially build your own version from the original source data, available from https://ftp.ensembl.org/pub/release-88/ However I don't think Ensembl provide any instructions on how to do this and it will be a very difficult task.

I would ask why using such an old version is important. If there's a really good reason, maybe you can get the information you need from those files on the FTP site, rather than using BioMart. If not, then perhaps using a more recent version of the annotation data would be fine.

@AR-Shicheng
Copy link
Author

AR-Shicheng commented Jul 23, 2024 via email

@grimbough
Copy link
Owner

grimbough commented Jul 25, 2024

I'm not going to create my own instance of BioMart. I don't work for Ensembl, nor do I have the time or resources to maintain my own BioMart server.

However, you could potentially use another source of annotation in Bioconductor. The ensembldb packages (https://bioconductor.org/packages/release/bioc/html/ensembldb.html) let you download snapshots of each Ensembl release to work with locally.

BiocManager::install('AnnotationHub')
ah <- AnnotationHub::AnnotationHub()

## search for the Human Ensembl 88 database
query(ah, pattern = c("Ensembl 88", "Sapiens"))

AnnotationHub with 1 record
# snapshotDate(): 2024-04-30
# names(): AH53715
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2017-04-05
# $title: Ensembl 88 EnsDb for Homo Sapiens
# $description: Gene and protein annotations for Homo Sapiens based on Ensembl version 88.
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein", "Annotation", "88", "AHEnsDbs") 
# retrieve record with 'object[["AH53715"]]' 

## This finds only one record, and gives instruction is on how to retrieve it
## Downloading might take quite a while
ens_88 <- ah[["AH53715"]]
ens_88
# EnsDb for Ensembl:
# |Backend: SQLite
# |Db type: EnsDb
# |Type of Gene ID: Ensembl Gene ID
# |Supporting package: ensembldb
# |Db created by: ensembldb package from Bioconductor
# |script_version: 0.3.1
# |Creation time: Thu Jun 15 08:50:24 2017
# |ensembl_version: 88
# |ensembl_host: localhost
# |Organism: homo_sapiens
# |taxonomy_id: 9606
# |genome_build: GRCh38
# |DBSCHEMAVERSION: 2.1
# | No. of genes: 64592.
# | No. of transcripts: 219063.
# |Protein data available.

You'll need to look at the manual for ensembldb to figure out how to work with that object and extract the data you want, but it should match the Ensembl release you want to work with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants